# Python - working with Files

© Advanced Analytics, Amir Ben Haim, 2024

Python uses file objects to interact with external files on your computer.

These file objects can be any sort of file you have on your computer, whether it be an audio file, a text file, emails, Excel documents, etc.
<br>Note: You will probably need to install certain libraries or modules to interact with those various file types, but they are easily available.

Python has a built-in open function that allows us to open and play with basic file types.

First we will need a file though.

<br><b><u>Create a `test.txt` file (manually)</u></b></br>
with the folowing lines:
<br></br>

Hello Python files
<br>This is a test file

<br></br>

## Python - Opening and Closing a file

<br>
Let's beging by opening the file `test.txt`
<br><b><p style="color:red">Make sure that the file is located in the same directory as this notebook.</b></p>

In [1]:
# open() function: Used to open a file and returns a file object
myfile = open('test.txt')
myfile

<_io.TextIOWrapper name='test.txt' mode='r' encoding='cp1252'>

In [2]:
# a file object
type(myfile)

_io.TextIOWrapper

<b><p style="font-size:25px"> When creating a file object does it mean we're loading data from the file to the file object ?</b></p>

<b><p style="color:blue;font-size:25px"> No, a file object does not mean that the entire file's data is automatically loaded into memory of the file object itself.
<br>Instead, the file object acts as an interface through which you can read from or write to the associated file on disk</b></p>

In [3]:
# close() method: It's important to close files to free up system resources.
myfile.close()

<br>
It is important to close opened files in any programming language for many reasons, for example to free up system resources.<br>
Also, when you open a file, the file is locked for writing(if opened in write mode).<br>
Thus you cannot access that file outside your program(e.g. through file browser) while it is open in python.

<br></br>
To check your notebook location, use **pwd**

In [3]:
pwd

'C:\\Users\\itaig\\Desktop\\pyt'

<br></br>
<b>To grab files from any location on your computer, simply pass in the entire file path.</b>
<br></br>

- For Windows you need to use double \ so python doesn't treat the second \ as an escape character, a file path is in the form:
```python
    myfile = open("C:\\Users\\YourUserName\\Home\\Folder\\test.txt")
```

- For MacOS and Linux you use slashes in the opposite direction:
```python
    myfile = open("/Users/YouUserName/Folder/test.txt")
```
<br></br>

## Python - Reading a File

In [4]:
myfile = open('test.txt')

### `myfile.read()`

In [5]:
# We can now read the file
myfile.read()

'working with files\nI also work with Python'

In [6]:
# But what happens if we try to read it again?
myfile.read()

''

This happens because you can imagine the reading "cursor" is at the end of the file after having read it.

So there is nothing left to read.

We can reset the "cursor" like this:

### `myfile.seek()`

In [12]:
# Seek to the start of file (index 0)
myfile.seek(0)

0

In [13]:
# Now read again
myfile.read()

'Hello Python files\nThis is a test file'

### `myfile.readlines()`

<br>

You can read a file line by line using the readlines method.

Use carefully with large files, since everything will be held in memory.
<br>

In [14]:
# readlines() returns a list of the lines in the file
myfile.seek(0)
myfile.readlines()

['Hello Python files\n', 'This is a test file']

In [15]:
# readlines() returns a list of the lines in the file
myfile.seek(0)
mylist = myfile.readlines()

print(type(mylist))
print(mylist)

<class 'list'>
['Hello Python files\n', 'This is a test file']


<br>

### `myfile.readline()`

In [16]:
# readline() Reads the file line by line.
myfile.seek(0)
myfile.readline()

'Hello Python files\n'

In [17]:
myfile.readline()

'This is a test file'

In [18]:
myfile.readline()

''

<br>

### `myfile.tell()`

In [19]:
# Returns the current position in the file.
# Useful to know where you are in the file, especially after reading or writing certain data.
myfile.tell()

39

In [20]:
myfile.seek(0)
myfile.tell()

0

<br>

### `myfile.close()`

Remember !!! When you have finished using a file, it is always good practice to close it.

In [21]:
myfile.close()

<br>
Does `myfile` still exists ?

In [22]:
# Yes
myfile

<_io.TextIOWrapper name='test.txt' mode='r' encoding='cp1252'>

<br>
Can we use it ?

In [23]:
# No
# Error
myfile.read()

ValueError: I/O operation on closed file.

<br>

## Python - Writing to a File

By default, the `open()` function will only allow us to read the file.

We need to pass the argument `'w'` to write over the file.

Passing `w+` lets us <u>read and write</u> to the file.

In [24]:
# Add a second argument to the function, 'w+' which stands for write.

myfile = open('test.txt','w+')

In [25]:
myfile.read()

''

<strong><font color='red'>Use with caution!</font></strong>
<br>
Opening a file with `'w'` or `'w+'` truncates the original, meaning that anything that was in the original file **is deleted**!

<br>

<br>

### `myfile.write()`

In [26]:
# Write to the file
myfile.write('This is a new line\n')

19

In [28]:
# Read the file
myfile.seek(0)
myfile.read()

'This is a new line\n'

<br>

### `myfile.writelines()`

In [29]:
# Writes a list of strings to the file
# Useful for writing multiple items to a file without needing to loop and call write() for each item

lines = ['First line\n', 'Second line\n', 'Third line\n']

myfile.writelines(lines)

In [30]:
# Read the file
myfile.seek(0)
myfile.read()

'This is a new line\nFirst line\nSecond line\nThird line\n'

<br>

### `myfile.truncate()`

In [31]:
# Reduces the file size to a specified size or the current file position.
# Often used in write mode to shorten the file.
myfile.truncate(18)

18

In [32]:
# Read the file
myfile.seek(0)
myfile.read()

'This is a new line'

In [33]:
myfile.close()

<br>

## Python - Appending to a File

Passing the argument `'a'` opens the file and puts the pointer at the end, so anything written is appended.

Like `'w+'`, `'a+'` lets us read and write to a file. If the file does not exist, one will be created.

In [34]:

myfile = open('test.txt','a+')

myfile.write('\nThis is text being appended to test.txt')
myfile.write('\nAnd another line here')

22

In [35]:
myfile.seek(0)
print(myfile.read())

This is a new line
This is text being appended to test.txt
And another line here


In [36]:
myfile.seek(0)
myfile.read()

'This is a new line\nThis is text being appended to test.txt\nAnd another line here'

In [37]:
myfile.close()

<br></br>
## Python - Iterating through a File

Lets get a quick preview of a for loop by iterating over a text file

In [38]:
for line in open('test.txt'):
    print(line)

This is a new line

This is text being appended to test.txt

And another line here


<br>

By not calling `.read()` on the file, the whole text file was not stored in memory.


<br></br>

## Python - Opening a file using `with` Statement

<br>
The `with open()` statement in Python is particularly useful because it simplifies the management of file resources.

It uses what's known as a <b><u>Context Manager</u></b> to handle files, which has several advantages over using the plain open function by itself.

- Context manager: Automatically handles opening and closing files, which is useful for managing resources efficiently

<br>

<b><p style="font-size:20px">`with open()` is generally the preferred method for file handling in Python<br>
    It ensures that resources are managed efficiently and reduces the risk of resource leaks or other file-related errors.</b></p>
    
<b><p style="color:blue;font-size:25px"> We'll make sure to write all the procedures in the `with open()` block of code</b></p>

In [98]:
with open('test.txt', '+a') as myfile:
    content = myfile.read()
    myfile.write('\nThe last line')
    myfile.seek(0)
    print(myfile.read())

working with files
The last line
The last line
The last line
The last line


<br>

Can we use `myfile` ?
<br>
Is it closed ?

In [40]:
#Error


# Can we use `myfile` ?  --->  NO!


# is it closed ?  --->  YES!
# Because we use 'with' statement


myfile.read()

ValueError: I/O operation on closed file.

<br></br>
<br></br>

# Pandas & Files

In [1]:
import pandas as pd

In [2]:
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva', 'Frank', 'Grace', 'Helen', 'Ian', 'Jane'],
    'Age': [25, 30, 35, 40, 45, 50, 55, 60, 65, 70],
    'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix', 'Philadelphia', 'San Antonio', 'San Diego', 'Dallas', 'San Jose']
}

In [3]:
df = pd.DataFrame(data)
df

Unnamed: 0,Name,Age,City
0,Alice,25,New York
1,Bob,30,Los Angeles
2,Charlie,35,Chicago
3,David,40,Houston
4,Eva,45,Phoenix
5,Frank,50,Philadelphia
6,Grace,55,San Antonio
7,Helen,60,San Diego
8,Ian,65,Dallas
9,Jane,70,San Jose


## Pandas - `df.read_csv()` and `df.to_csv()`

A comma-separated values (CSV) file is a plaintext file with a .csv extension that holds tabular data.<br>
This is one of the most popular file formats for storing large amounts of data.<br>
Each row of the CSV file represents a single table row.<br>
The values in the same row are by default separated with commas, but you could change the separator to a semicolon, tab, space, or some other character.

In [49]:
# Write a CSV File
# A csv file has been created
# the file "data.csv" in your current working directory.
# Check how your CSV file looks like (open with notepad)
df.to_csv('data.csv')

In [52]:
# This text file contains the data separated with commas.
# The first column contains the row labels. In some cases, you’ll find them irrelevant.
# If you don’t want to keep them, then you can pass the argument as folows:
df.to_csv('data.csv',index=False)
# Check again your CSV file (open with notepad)

In [66]:
# Read a CSV File
# Once the data is saved in a CSV file, we’ll likely want to load and use it from time to time.
# We can do that with the Pandas read_csv() function
df.to_csv('data.csv')
df = pd.read_csv('data.csv')
df

# What happened ??
# We got another column with no column name, this column was before our index colum in the dataframe (the df)
# And as usualily with DataFrames, if an index is not being declared, we get the default index 0..to..-->

Unnamed: 0.1,Unnamed: 0,Name,Age,City
0,0,Alice,25,New York
1,1,Bob,30,Los Angeles
2,2,Charlie,35,Chicago
3,3,David,40,Houston
4,4,Eva,45,Phoenix
5,5,Frank,50,Philadelphia
6,6,Grace,55,San Antonio
7,7,Helen,60,San Diego
8,8,Ian,65,Dallas
9,9,Jane,70,San Jose


In [57]:
# Back to original df
df = pd.DataFrame(data)
df

Unnamed: 0,Name,Age,City
0,Alice,25,New York
1,Bob,30,Los Angeles
2,Charlie,35,Chicago
3,David,40,Houston
4,Eva,45,Phoenix
5,Frank,50,Philadelphia
6,Grace,55,San Antonio
7,Helen,60,San Diego
8,Ian,65,Dallas
9,Jane,70,San Jose


In [65]:
df.to_csv('data.csv')
df = pd.read_csv('data.csv', index_col=0)
df
# The parameter index_col specifies the column from the CSV file that contains the row labels.
# You assign a zero-based column index to this parameter.
# You should determine the value of index_col when the CSV file contains the row labels to avoid loading them as data.

Unnamed: 0,Name,Age,City
0,Alice,25,New York
1,Bob,30,Los Angeles
2,Charlie,35,Chicago
3,David,40,Houston
4,Eva,45,Phoenix
5,Frank,50,Philadelphia
6,Grace,55,San Antonio
7,Helen,60,San Diego
8,Ian,65,Dallas
9,Jane,70,San Jose


In [61]:
# In case we wanted a txt file
df.to_csv(r'data.txt')
# Check your folder for the new file

## Pandas - `df.read_excel()` and `df.to_excel()`

In [4]:
# Write a xlsx File
# The argument 'data.xlsx' represents the target file and, optionally, its path.
# The above statement should create the file data.xlsx in your current working directory
# Check it out
df.to_excel('data.xlsx')
# The first column of the file contains the labels of the rows, while the other columns store data.

In [5]:
# Read an Excel File
df = pd.read_excel('data.xlsx', index_col=0)
df

Unnamed: 0,Name,Age,City
0,Alice,25,New York
1,Bob,30,Los Angeles
2,Charlie,35,Chicago
3,David,40,Houston
4,Eva,45,Phoenix
5,Frank,50,Philadelphia
6,Grace,55,San Antonio
7,Helen,60,San Diego
8,Ian,65,Dallas
9,Jane,70,San Jose


## Pandas - `df.read_json()` and `df.to_json()`

<b><p style="font-size:25px">json File
<br>JavaScript Object Notation

An open standard file format and data interchange format that uses human-readable text to store and transmit data objects consisting of attribute–value pairs and arrays (or other serializable values).
<br>It is a common data format with a diverse range of functionality in data interchange including communication of web applications with servers.

In [69]:
# Write a json File
# The argument 'data.json' represents the target file and, optionally, its path.
# The above statement should create the file data.json in your current working directory
# Check it out
df.to_json('data.json')
# The first column of the file contains the labels of the rows, while the other columns store data.

In [70]:
# Write a json File --> to your desktop
df.to_json('data.json')

In [71]:
# Read a json File
df = pd.read_json('data.json')
df

Unnamed: 0.1,Unnamed: 0,Name,Age,City
0,0,Alice,25,New York
1,1,Bob,30,Los Angeles
2,2,Charlie,35,Chicago
3,3,David,40,Houston
4,4,Eva,45,Phoenix
5,5,Frank,50,Philadelphia
6,6,Grace,55,San Antonio
7,7,Helen,60,San Diego
8,8,Ian,65,Dallas
9,9,Jane,70,San Jose


<br></br>
# Delete Files

In [93]:
import os

os.remove("test.txt")
os.remove("data.csv")
os.remove("data.json")
os.remove("data.txt")
#os.remove("data.xlsx")


PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'test.txt'

In [86]:
import os
os.rename("Files - Questions.txt", "Files - Questions.ipynb")
