# File I/O in python

## Files

Files are named locations on disk to store related information. They are used to permanently store data in a non-volatile memory (e.g. hard disk).

Since Random Access Memory (RAM) is volatile (which loses its data when the computer is turned off), we use files for future use of the data by permanently storing them.

When we want to read from or write to a file, we need to open it first. When we are done, it needs to be closed so that the resources that are tied with the file are freed.

Hence, in Python, a file operation takes place in the following order:

    1. Open a file
    2. Read or write (perform operation)
    3. Close the file


## Opening Files in Python

Python has a built-in open() function to open a file. This function returns a file object, also called a handle, as it is used to read or modify the file accordingly.



In [None]:
f = open("test.txt")    # open file in current directory
f = open("C:/Python38/README.txt")  # specifying full path

We can specify the mode while opening a file. In mode, we specify whether we want to read r, write w or append a to the file. We can also specify if we want to open the file in text mode or binary mode.

The default is reading in text mode. In this mode, we get strings when reading from the file.

On the other hand, binary mode returns bytes and this is the mode to be used when dealing with non-text files like images or executable files.

In [None]:
f = open("test.txt")      # equivalent to 'r' or 'rt'
f = open("test.txt",'w')  # write in text mode
f = open("img.bmp",'r+b') # read and write in binary mode

Unlike other languages, the character a does not imply the number 97 until it is encoded using ASCII (or other equivalent encodings).

Moreover, the default encoding is platform dependent. In windows, it is cp1252 but utf-8 in Linux.

So, we must not also rely on the default encoding or else our code will behave differently in different platforms.

Hence, when working with files in text mode, it is highly recommended to specify the encoding type.

In [None]:
f = open("test.txt", mode='r', encoding='utf-8')

## Closing Files in Python

When we are done with performing operations on the file, we need to properly close the file.

Closing a file will free up the resources that were tied with the file. It is done using the close() method available in Python.

Python has a garbage collector to clean up unreferenced objects but we must not rely on it to close the file

In [None]:
f = open("test.txt", encoding = 'utf-8')
# perform file operations
f.close()

This method is not entirely safe. If an exception occurs when we are performing some operation with the file, the code exits without closing the file.

A safer way is to use a try...finally block.

In [None]:
try:
   f = open("test.txt", encoding = 'utf-8')
   # perform file operations
finally:
   f.close()

## Writing to Files in Python

In order to write into a file in Python, we need to open it in write w, append a or exclusive creation x mode.

We need to be careful with the w mode, as it will overwrite into the file if it already exists. Due to this, all the previous data are erased.

Writing a string or sequence of bytes (for binary files) is done using the write() method. This method returns the number of characters written to the file.

In [None]:
with open("test.txt",'w',encoding = 'utf-8') as f:
   f.write("my first file\n")
   f.write("This file\n\n")
   f.write("contains three lines\n")

Reading Files in Python

To read a file in Python, we must open the file in reading r mode.

There are various methods available for this purpose. We can use the read(size) method to read in the size number of data. If the size parameter is not specified, it reads and returns up to the end of the file.

We can read the text.txt file we wrote in the above section in the following way:

In [None]:
f = open("test.txt",'r',encoding = 'utf-8')
f.read(4)  # read the first 4 data
f.read(4)  # read the next 4 data
f.read()   # read in the rest till end of file
f.read()    # further reading returns empty sting

We can see that the read() method returns a newline as '\n'. Once the end of the file is reached, we get an empty string on further reading.

We can change our current file cursor (position) using the seek() method. Similarly, the tell() method returns our current position (in number of bytes).

In [None]:
f.tell()    # get the current file position
f.seek(0)   # bring file cursor to initial position
print(f.read())  # read the entire file

In this program, the lines in the file itself include a newline character \n. So, we use the end parameter of the print() function to avoid two newlines when printing.

Alternatively, we can use the readline() method to read individual lines of a file. This method reads a file till the newline, including the newline character.

In [None]:
f.readline() #reads and displays the firstline
f.readline() #reads and displays second line

Lastly, the readlines() method returns a list of remaining lines of the entire file. All these reading methods return empty values when the end of file (EOF) is reached.

In [None]:
f.readlines() # reads and prints all the lines

# Pickle and Unpickle in Python

Python comes with a built-in package, known as pickle, that can be used to perform pickling and unpickling operations.

Pickling and unpickling in Python is the process that is used to describe the conversion of objects into byte streams and vice versa - serialization and deserialization, using Python's pickle module. Let's take a look at a few examples!

In [2]:
import pickle

athletes = {
    "Name": ["Cristiano Ronaldo", "Lionel Messi", "Eden Hazard", "Luis Suarez", "Neymar"],
    "Club": ["Manchester United", "PSG", "Real Madrid", "Atletico Madrid", "PSG"]
 }

print(athletes)

{'Name': ['Cristiano Ronaldo', 'Lionel Messi', 'Eden Hazard', 'Luis Suarez', 'Neymar'], 'Club': ['Manchester United', 'PSG', 'Real Madrid', 'Atletico Madrid', 'PSG']}


In [3]:
athletes_file = open('athletes.txt', 'wb')
pickle.dump(athletes, athletes_file)
athletes_file.close()

The load() function reads the contents of a pickled file and returns the object constructed by reading the data. The type of object as well as its state depend on the contents of the file. Since we've saved a dictionary with athlete names - this object with the same entries is reconstructed. Let's read the pickled file you just created back to a Python object and print its contents

In [4]:
import pickle

athletes_file = open("athletes.txt", "rb")
athletes = pickle.load(athletes_file)
athletes_file.close()
print(athletes)

{'Name': ['Cristiano Ronaldo', 'Lionel Messi', 'Eden Hazard', 'Luis Suarez', 'Neymar'], 'Club': ['Manchester United', 'PSG', 'Real Madrid', 'Atletico Madrid', 'PSG']}


## How to read a pickle file in a pandas DataFrame?


In [5]:
import pickle
import pandas as pd

athletes = {
    "Name": ["Cristiano Ronaldo", "Lionel Messi", "Eden Hazard", "Luis Suarez", "Neymar"],        
    "Club": ["Manchester United", "PSG", "Real Madrid", "Atletico Madrid", "PSG"]
}

df = pd.DataFrame(athletes)
print(df)

                Name               Club
0  Cristiano Ronaldo  Manchester United
1       Lionel Messi                PSG
2        Eden Hazard        Real Madrid
3        Luis Suarez    Atletico Madrid
4             Neymar                PSG


As you can see in the output, we will get a Pandas DataFrame object with 3 columns and 6 rows including the indices. After this, the process is similar to how we handled the normal, non-DataFrame objects. We will use file handling along with the dump() and load() methods to first create a pickle file from a Pandas DataFrame, and then read the byte stream to get the Pandas DataFrame

In [6]:
df = pd.DataFrame(athletes)

athelets_df_file = open("athletes_df.txt", "wb")
pickle.dump(df, athelets_df_file)
athelets_df_file.close()

The above code will create a pickle file that will store the Pandas DataFrame as a byte stream in our current directory as athletes_df.txt.

When we want to use this DataFrame again, we can just unpickle this file to get it back

In [7]:
import pickle

athletes_df_file = open("athletes_df.txt", "rb")
athletes = pickle.load(athletes_df_file)
athletes_df_file.close()
print(athletes)

                Name               Club
0  Cristiano Ronaldo  Manchester United
1       Lionel Messi                PSG
2        Eden Hazard        Real Madrid
3        Luis Suarez    Atletico Madrid
4             Neymar                PSG


## Pickling into strings and unpickling from strings



In [8]:
import pickle

simple_obj = {1: ['o', 'n', 'e'], "two": (1, 2), 3: "Three"}
pickled_obj = pickle.dumps(simple_obj)
print(pickled_obj)

b'\x80\x04\x95-\x00\x00\x00\x00\x00\x00\x00}\x94(K\x01]\x94(\x8c\x01o\x94\x8c\x01n\x94\x8c\x01e\x94e\x8c\x03two\x94K\x01K\x02\x86\x94K\x03\x8c\x05Three\x94u.'


In [11]:
out = pickle.loads(pickled_obj)
print(out)

{1: ['o', 'n', 'e'], 'two': (1, 2), 3: 'Three'}


# Slicing in Python

Python slice() function is used to get a slice of elements from the collection of elements. Python provides two overloaded slice functions. The first function takes a single argument while the second function takes three arguments and returns a slice object. This slice object can be used to get a subsection of the collection. For example, if we want to get first two elements from the ten element?s list, here slice can be used.

In [13]:
# Python slice() function example  
# Calling function  
result = slice(5) # returns slice object
result2 = slice(0,5,3) # returns slice object
# Displaying result  
print(result)
print(result2)

slice(None, 5, None)
slice(0, 5, 3)


In [14]:
# Python slice() function example  
# Calling function  
tup=(45,68,955,1214,41,558,636,66)
slic=slice(0,10,3) #returns slice object
slic2=slice(-1,0,-3) #returns slice object
# We can use this slice object to get elements  
str2=tup[slic]
str3=tup[slic2] #returns elements in reverse order
# Displaying result  
print(str2)
print(str3)

(45, 1214, 636)
(66, 41, 68)


In [15]:
 # Python slice() function example  
# Calling function  
tup=(45,68,955,1214,41,558,636,66)
slic=slice(0,10,3) #returns slice object
slic2=tup[0:10:3] #fetch the same elements
# We can use this slice object to get elements  
str2=tup[slic]
# Displaying result  
print(str2)
print(slic2)

(45, 1214, 636)
(45, 1214, 636)
