### Python File Handling

Python supports file handling and allows users to handle files i.e., to read and write files, along with many other file handling options, to operate on files. Python make the concept of file handling easy and short. Python treats files differently as text or binary and this is important to note. Each line of code includes a sequence of characters and they form a text file. Each line of a file is terminated with a special character, called the EOL or End of Line characters like comma {,} or newline character. It ends the current line and tells the interpreter a new one has begun. Let’s start with the reading and writing files. 

#### Using the open() function

Before performing any operation on the file like reading or writing, first, we have to open that file. For this, we should use Python’s inbuilt function open() but at the time of opening, we have to specify the mode, which represents the purpose of the opening file.

```python
f = open(filename, mode)
```

Where the following mode is supported:

1. r: open an existing file for a read operation.
2. w: open an existing file for a write operation. If the file already contains some data then it will be overridden but if the file is not present then it creates the file as well.
3. a:  open an existing file for append operation. It won’t override existing data.
4. r+:  To read and write data into the file. The previous data in the file will be overridden.
5. w+: To write and read data. It will override existing data.
6. a+: To append and read data from the file. It won’t override existing data.

In [None]:
# a file named "extensivepythonfundamentals.txt", will be opened with the reading mode.
file = open('extensivepythonfundamentals.txt', 'r')
# This will print every line one by one in the file
for each in file:
    print (each)

The open command will open the file in the read mode and the for loop will print each line present in the file. 

#### Using the `read()` mode

There is more than one way to read a file in Python. If you need to extract a string that contains all characters in the file then we can use file.read(). The full code would work like this: 

In [None]:
# Python code to illustrate read() mode
file = open("extensivepythonfundamentals.txt", "r")
print (file.read())

#### Creating a file using write() mode

Let’s see how to create a file and how to write mode works, so in order to manipulate the file, write the following in your Python environment:

In [None]:
# Python code to create a file
file = open('samplefile.txt','w')
file.write("This is the write command")
file.write("It allows us to write in a particular file")
file.close()

#The close() command terminates all the resources in use and frees 
#the system of this particular program. 

#### Working with the append() mode

Let us see how the append mode works: 

In [None]:
# Python code to illustrate append() mode
file = open('samplefile.txt', 'a')
file.write("This will add this line")
file.close()

There are also various other commands in file handling that is used to handle various tasks like: 

- rstrip(): This function strips each line of a file off spaces from the right-hand side. i.e. it reemoves any white spaces at the end of the string: So, the rstrip() method removes any trailing characters (characters at the end a string), space is the default trailing character to remove.

Syntax is:

```python
string.rstrip(characters)
```

```python
txt = "     banana     "
x = txt.rstrip()
print("of all fruits", x, "is my favorite")
```
```python
txt = "banana,,,,,ssqqqww....."
x = txt.rstrip(",.qsw")
print(x)
```
- lstrip(): This function strips each line of a file off spaces from the left-hand side.

```python
string.rstrip(characters)
```

```python
txt = "     banana     "
x = txt.lstrip()
print("of all fruits", x, "is my favorite")
```
```python
txt = "banana,,,,,ssqqqww....."
x = txt.lstrip(",.qsw")
print(x)
```

#### Using the `with` function

THe With function is designed to provide much cleaner syntax and exception handling when you are working with code. That explains why it’s good practice to use them with a statement where applicable. This is helpful because using this method any files opened will be closed automatically after one is done, so auto-cleanup.

In [None]:
# Python code to illustrate with()
with open("samplefile.txt") as file: 
    data = file.read()
# do something with data

In [None]:
# Python code to illustrate with() alongwith write()
with open("samplefile.txt", "w") as f:
    f.write("Hello World. This is an example and this is!!!")
#This will overwrite what was there before

#### Using the `split()` in file handling

We can also split lines using file handling in Python. This splits the variable when space is encountered. You can also split using any characters as we wish. Here is the code:

In [None]:
# Python code to illustrate split() function
with open("samplefile.txt", "r") as file:
    data = file.readlines()
    for line in data:
        word = line.split()
        print (word)

## Finding Files

#### Listing a directory

Before you can manipulate a file or folder you first need to know what a folder contains. To do that lets create a functions that will assist us in listing the file within the folder that

In [None]:
import os
#We are passign the variable that represente 
#the name of the folder we want to list as trhe function parameter
def list_directory(folder):
    #We need to use the for loop where cf is the current file within
    #the folder. This will help us to retrive the list of files that
    #the folder contains by using the os.listdir method and passing the
    #name of the folder we want to list
    for cf in os.listdir(folder):
        #and then display the names of each of the file within the folder
        #by printing out the file names
        print(cf)
#That is all is it to list the content.
#Then we can call the list_directory function and pass the folder
#we want to read. In this case we want to list the files in a folder
#which can be found within the current project directory. Otherwise,
#You may have to pass the full path of the folder you want to read like"
#The os.listdir method makes the work simple to achieve
list_directory('./filestoread/') #notice the ./ in the passing

#### Using the string Method to look for filenames within the folder

In [None]:
#We will start by importing the OS library

import os
'''
Now lets create a new function called stringsearchendswith
Then pass the folder name as well as the search criteria
'''
def stringseachendswith(folder, searchstring):
    #The we need to get the list of files contained in a folder by listing the files within the specific folder
    #This can be achieved by using the OS.listdir method
    for cf in os.listdir(folder):
        #Once we loop through the list file, we need to check whether the current files matches our search criteria
        #We can do this by using the endswith methods. The endswith method checks whether the current file ends with
        #the substring criteria and if that is true, we simply print the names of the files that match the search criteria
        if cf.endswith(searchstring):
            print(cf)
'''
There is another way to check whether the search string matches the
search criteria. We check at the begining of the string rather than the
end of the string. We can create a new function called stringsearchstartswith 
'''
def stringsearchstartswith(folder, searchstring):
    #the next thing to do is to list the files we
    #want to search in the folder using the os.listdir
    #like we did before
    for cf in os.listdir(folder):
        #After looping through the file, we need to check whether
        #The current file matches the search criteria using the startswith string method
        #What this methods does is to check whether the current file starts with the substring
        if cf.startswith(searchstring):
            #If the search results returns true then we print out the name of the files that matches
            print(cf)
#To see this in action lets invoke the function one after the other.

#stringseachendswith('./filestoread/', '.csv')
stringsearchstartswith('./filestoread/', 'act_2019')

### Using the Python `fnmatch()` method to find file

It is possible that by using the basic string methods, you may not find all the files that you need. As a result, Python has some alterniatives. One of them is the fnmatch method. So we are going to take a look at how to use the Python fnmatch to find file

In [None]:
#We need to import the os and fnmatch modules
import os, fnmatch
#Then let's create a function call fnmatchsearch
def fnmatchsearch(folder, searchstring): #seacrhstring = search criteria
    #for each file found withing the folder
    for cf in os.listdir(folder):
        #we need to call the fnmatch function then pass the currentfile name and the search criterial
        if fnmatch.fnmatch(cf, searchstring):
            #If the results returns is true, then we print out the current file name
            print(cf)
#Now let's test it by passing the folder to inspect and also the search criteria. In this case all the files that has the
#file name ending with .csv extension


#fnmatchsearch('./filestoread/', '*.csv')

'''
One thing that makes the fnmatch great is that we can create more powerful search criteria instead of 
simply looking for criteria, we can use wildcard. For instance we can search for any file that starts with
_data and also have the .csv extension
'''

#fnmatchsearch('./filestoread/', '*_data.csv')
'''
You can also search for in the same way and include another wild card by saying get me file 
that includes 2 as part of its name
'''
#fnmatchsearch('./filestoread/', '*2*.csv')

'''
We can also perform some advanced pattern matching with fnmatch. In this case, we will search for file
that start with any string but contains the _data substring and followed by any other substring and of any file type
'''
#fnmatchsearch('./filestoread/', '*_data*.*')

'''
Now let's try with a different string criteria. I will just add another underscore after the word data. This means
only file names that include _data_ would be returned
'''
#fnmatchsearch('./filestoread/', '*_data_*.*')
'''
lets start with a different criteria. We only want the file that include the word 2.
'''
#fnmatchsearch('./filestoread/', '*2_*_*.*')

#### Using the `glob()` function for pattern matching to search for files

In [None]:
from pathlib import Path

def globpattern_match(folder, searchstring):
    #The first thing we want to do is to get the path for the folder we want to inspect
    #Then call the glob method of the path object
    path = Path(folder)
    for x in path.glob(searchstring):
        print(x)
'''
This search criteria indicates that we are looking for files that start with any substring
has 2 as part of their file name and has extension that starts with the letter c
'''
#globpattern_match('./filestoread/', '*2*.c*')

#Now lets check in the subfolder
globpattern_match('./filestoread/subfolder', '*_data_*.c*')

#glob_match('./filestoread/subfolder', '*1_t*data_*c*.t*')

## Working with Files and Folder

Welcome back, in this session we will learn how to work with files and folders as part of daily operations. We will learn:

1. How to get file attributes
2. How to traverse and navigate a directory
3. How to use python to copy files
4. how to move files, renaming files and deleting files

OK let's jump right into it

#### Getting the File attribute

In [None]:
import os
from datetime import datetime

def return_date(timestmp): #all this function do is to return the current utc date time
    return datetime.utcfromtimestamp(timestmp).strftime('%d %m %Y')

def display_file_attrs(folder):
    with os.scandir(folder) as dir:
        for file_item in dir:
            if file_item.is_file():
                file_attribute = file_item.stat()
                print(f'Modified Date {return_date(file_attribute.st_mtime)} {file_item.name}')

display_file_attrs('./filestoread/subfolderfortraining/')

#### Traversing/Navigating a Directory

Traversing or navigating a directory is done by looping through the directory, the folder and the files that are 
within the the os.walk method for the folder we want to traverse or navigate in 

In [None]:
import os

def traversenaviagte(folder):
    for folderpath, dirs, files in os.walk(folder):
        print(f'This is a Folder: {folderpath}') #Lets print out the folder path
        for file in files: #Loop through each of the file returned
            print(f'\t{file}')

traversenaviagte('./filestoread/')

#### Copying Files

In [None]:
import shutil

def copyfile(source, destination):
    shutil.copy(source, destination)

def copyfoldercontent(source, destination):
    shutil.copytree(source, destination)

#copyfile('./filestoread/MikeTroutData.csv', './filestoread/subfolderfortraining/')
copyfoldercontent('./filestoread/dataset/', './filestoread/subfolderfortraining/newdatasetfolder')

#### Moving Files same as cutting file

In [None]:
import shutil

def movefile(source, destination):
    shutil.move(source, destination)

#movefile('./filestoread/MikeTroutData.csv', './filestoread/subfolderfortraining/MikeTroutData.csv')
#movefile('./filestoread/dataset/', './filestoread/subfolderfortraining/') #Move entire folder
#movefile('./filestoread/subfolderfortraining/dataset/', './filestoread/') #Move the folder back

#### Renaming Files

In [None]:
import os
from pathlib import Path

#This is one name of renaming file
def filerename(source, destination): 
    os.rename(source, destination)

    #This is another way of renaming name
def renamefile(source, destination):
    file = Path(source)
    file.rename(destination)

#filerename('./filestoread/MikeTroutData.csv', './filestoread/MikeTroutDataRename.csv')
filerename('./filestoread/MikeTroutDataRename.csv', './filestoread/MikeTroutData.csv')

#### Deleting File

One important aspect of working with files is the ability to delete them when the need arises

In [None]:
from genericpath import isfile
import os

def deletefile(file): #pass the name of the file to delete
    if os.path.isfile(file): #verify that the file you want to delete is actually a file
        #Lets wrap the logic of the code around the try ... except startement. We will discuss this latter
        try:
            os.remove(file)
        except OSError as e:
            print(f'Error: {file} : {e.strerror}')
    else:
        print(f'Error: {file} is not a valid file or the file is no longer existing')

deletefile('./filestoread/MikeTroutData - Copy.csv')

## Working with Archive files in Python

An archive is a collection of compressed files. Like your zipped files and rar files which takes less space on your computer. Zipped file is the most archiving file format.

In this session, we will begin by learning how to create a zip file. 
Then we will learn how to add files to an existing zipped files
We will see how to read the content of a zipped file
And finally we will see how to extract the content of a zipped file.

Let's get started

#### Creating a Zipped File

In [None]:
import zipfile

filestozip = ['./binarizedcsvfile', 
    './diabetes-data.csv', 
    './normalizedcsvfile', 
    './robustscaledcsvfile']

def createzip(nameofsipfile, filestocompress, opt): #He we pass as parameters the name of the zip file (zipf) 
    #the list of iles to compress and some additional zip options
    with zipfile.ZipFile(nameofsipfile, opt, allowZip64=True) as archive:
        for files in filestocompress:
            archive.write(files)

createzip('./filestoread/extensivepythonfundamentals.zip', filestozip, 'w')

#### Adding file to exisiting zipped File

In [None]:
import zipfile

filestoadd = ['./Nwama_Grace_Reference_Letter.docx',
             './Ogunleye Documents PP.docx']

def addtozippedfile(nameofexistingzippedfile, listfilestoadd, opt):
    with zipfile.ZipFile(nameofexistingzippedfile, opt) as archive:
        for file in listfilestoadd: #for each file found within the list of files to the existing zipped file
            namelist = archive.namelist() #invoke the name list method from archive
            if not file in namelist: #if the name of the file we want to add to the zipped file does not exist
                #Add the current file to the existing zip by calling the write method of the zip file and 
                #pass the name of the current file
                archive.write(file)
            else: #Else if the current file already exist in the zipped file, then we can print out the message
                #that this file already exist within the zip
                print(f'File exists in zip: {file}')

addtozippedfile('./filestoread/extensivepythonfundamentals.zip', filestoadd, 'a')

#### Reading a zipped file

In [None]:
import zipfile

def readzippedfile(nameofthezippedfiletoread): #Here we pass the name of the zipped file that we want to read
    with zipfile.ZipFile(nameofthezippedfiletoread, 'r') as archive: #OPen the zipped file in read mode
        namelist = archive.namelist() #We will check what files are available within he zipped file
        for filelist in namelist: #for each file within the archive call the getinfor method from archive
            zipinfo = archive.getinfo(filelist) #And pass the name of the current file
            print(f'{filelist}: filesize => {zipinfo.file_size} bytes, compressed size => {zipinfo.compress_size} bytes') #Then print the information
            #about the zip, such as the file size and the compressed size

readzippedfile('./filestoread/extensivepythonfundamentals.zip')

#### Extracting the content of a Zipped File

In [None]:
import zipfile

#This will only extract one file
def extractzippedfile(nameofthezippedfiletoextract, nameofthefilewithinthezip, locationtoextractfileto):
    with zipfile.ZipFile(nameofthezippedfiletoextract, 'r') as archive:
        archive.extract(nameofthefilewithinthezip, path=locationtoextractfileto) #Then invoke the extract method

#if we want to extract all the files contained in the zipped file
def extractallzippedfile(nameofthezippedfiletoextract, locationtoextractfileto):
    with zipfile.ZipFile(nameofthezippedfiletoextract, 'r') as archive:
        archive.extractall(path=locationtoextractfileto)

#extractzippedfile('./filestoread/extensivepythonfundamentals.zip', 'diabetes-data.csv', 'filestoread/extractedfolder')
extractallzippedfile('./filestoread/extensivepythonfundamentals.zip', 'filestoread/extractedfolder')

## Reading and Writing Files

In this session, we will explore how to work with text files, how to work with comma separated (CSV) files, We will also explore how to work with Javascript Object Notation (JSON) Files. Finally we will learn how to persist binary files through the use of Python Pickle module. Let's get started.

#### Working with Text Files

In [None]:
def readtext(nameofthefiletoread):
    with open(nameofthefiletoread) as file:
        print(file.read())

def readtextlinebyline(nameofthefiletoread):
    with open(nameofthefiletoread) as file:
        lines = file.readlines()
        for line in lines:
            print(line, end='')
            line = file.readline()

def writenewtext(nameofthefiletoread, str):
    with open(nameofthefiletoread, 'w', encoding='utf-8') as file:
        file.write(str)

def appendlinetotext(nameofthefiletoread, str):
    with open(nameofthefiletoread, 'a', encoding='utf-8') as file:
        file.write('\n')
        file.write(str)

#readtext('./extensivepythonfundamentals.txt')
#readtextlinebyline('./extensivepythonfundamentals.txt')
#writenewtext('./sample.txt', 'this is a just to test the file...')
#appendlinetotext('./sample.txt', 'this is just to append an extra line')

#### Working with the CSV files

Lets see how to work with the Comma Separated Files also known as csv files

In [None]:
import csv

def readcsvfile(nameofcsvfile, delimiter): #Pass the name of the csv file to read and the delimiter
    with open(nameofcsvfile) as csvfile: #Now let's open the csv files and pass the name of the file we want to open
        cnt = -1 #lets create a counter and initiate it to -1
        rows = csv.reader(csvfile, delimiter=delimiter) #And also lets read the rows from the csv by invoking the reader method of the csv
        for row in rows: #Then we need to loop across all the row of the csv files  
            if cnt == -1: #if the value of the counter == -1 then
                print(f'{" | ".join(row)}') #print the header of the csv file
            else: #otherwise print each of the values for the current role
                print(f'{row[0]} | {row[1]} | {row[2]} | {row[3]}') #Each record value is accessed using an array index
            cnt += 1 #for each loop we increment the counter. The reason we are using the counter is because we want to 
            #be able to differentiate between the header and the actual data
        print(f'{cnt} lines')

def writetocsvfile(nameofcsvfile, header, row):
    with open(nameofcsvfile, mode='w', newline='') as csvfile: #We need to open the csv file in write mode
        writer = csv.writer(csvfile, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
        writer.writerow(header)
        writer.writerow(row)
        
#readcsvfile('./filestoread/act_2019_ca.csv', ',')
#writetocsvfile('./personalinfo.csv', 
#['Firstname', 'lastname', 'Date of Birth', 'Phone Number'], 
#['Olalekan Samuel', 'Ogunleye', '14/07/1976', '08145674532'])
readcsvfile('./personalinfo.csv', ',')

#### Work with JSON Files

Lets explore how we can work with Javascript Object Notation, also know as json 

In [None]:
import json

def readjson(nameofjsonfile, printnice, sort): #Pass the name of Json file as parameter and also
    with open(nameofjsonfile) as jsonfile:
        data = json.load(jsonfile)
        print(json.dumps(data, sort_keys=sort, indent=4) 
        if printnice else data)

#def update_author_json(nameofjsonfile, arr_name, pos, key, value):
 #   with open(nameofjsonfile, 'r') as read_file:
 #       data = json.load(read_file)
  #      data[arr_name][pos][key] = value
  #      with open(nameofjsonfile, 'w') as write_file:
  #          json.dump(data, write_file)

readjson('./filestoread/esp-abuse.json', True, True)


#Persisting Python Object into Binary Files.

Persisiting Objects is simply saving internal state to disk, a database or over the network. 
The first question you may be asking yourself is why do we need to persist python objects. Well as a developer, it is import to serve the internal state of your application to disk, the database or send the details over the network. Thi is why persisting an object might come in handy.

So let's start by importing the pickle module.

We will create a Person class that includes age name, kids, employers, and shoes sizes properties.

The idea behind this is that we will use the Python pickle module to serialize this class, and stor the data in a binary file.

###### This is for `def serialize(obj)`
To do that we will create a serialize function to which we pass an object. This function will invoke the pickle. dumps method, which will convert the object that we are passing into binary object using the serialization protocols. We will then print out the serialized object and return it.

###### This is for `def  deserialize(obj)`
We will also do the reverse, where we will take the binary object and covert it into a Python Object. To do this, we will invoke the load method in the pickle module and pass the binary object and return the python object. We can then print and return it.

###### This is for `def deserialize_prev_employers(obj)`
We can also create another function to specifically desrialize a particular properties of the object, lets say in this case Previous employees attribute of the person. By can do this by using the same laod method in the pickle module while we pass the binary object and print the previous employers properties from the desrialized object

###### This is for `def obj_to_file(fn, obj)`

We also need to write a function that will save the object into a binary file. We can do this by opening the file in write binary mode and call the dump method in the pickle module, and pass the object, the reference to the file and the serialization protocol.

###### This is for `def file_to_obj`
Finally we need to perform the opposite by taking the binary file and convert it into a python object. To do this, we need to open the Object in read binary mode and then call the load method of the pickle module while we pass the reference to the binary file. Then we can print the deserialize Python object and return it.

#### This is for running the code now. Say this after you have written the code and want to run it
We can then proceed from there by invoking the serialize function by passing the Person class into it as a parameter. Then we can call the deserialize function and then pass the serialized object into it. We can also invoke the deserialize employers function and also passing the serialized object.

Once we execute the script, we can see the output of the serialized object that was first executed, and then we can also see the deserialized object that followed, by the representation of the deserialized previous employers.

Now we need to test the object to file and file to object function. To the object to file all we need to do is to pass the name of the result in binary file where we want the instance of the Person class to be written to and then pass the person class as the second parameter.

Then we will pass the serialized object of that person class to the file object function which will then read the content of the resulting binary file.

In [6]:
import pickle

class Employee_Details:
    age = 44
    name = 'Olalekan Samuel'
    dependants = ['Sandy', 'Mandy', 'Wendy']
    prev_employers = {'NIOH': 2022, 'CSIR': 2014, 'MONASH': 2008}
    prev_salaries = (70000, 85000)

'''
To do that we will create a serialize function to which we pass an object. This function will invoke the pickle.dumps
method, which will convert the object that we are passing into binary object using the serialization protocols. 
We will then print out the serialized object and return it.
'''
def serialize(obj):
    pickled = pickle.dumps(obj, protocol=pickle.HIGHEST_PROTOCOL)
    print(f'Serialized object: \n{pickled}\n')
    return pickled
'''
We will also do the reverse, where we will take the binary object and covert it into a Python Object. To do this, 
we will invoke the load method in the pickle module and pass the binary object and return the python object. 
We can then print and return it.
'''
def deserialize(obj):
    unpickled = pickle.loads(obj)
    print(f'Deserialized: \n{unpickled}\n')

'''
We can also create another function to specifically desrialize a particular properties of the object, lets say in 
this case Previous employees attribute of the person. By can do this by using the same laod method in the pickle 
module while we pass the binary object and print the previous employers properties from the desrialized object
'''
def deserialize_prev_employers(obj):
    unpickled = pickle.loads(obj)
    print(f'Deserialized Previous Employers: \n{unpickled.prev_employers}\n')

'''
We also need to write a function that will save the object into a binary file. We can do this by opening the 
file in write binary mode and call the dump method in the pickle module, and pass the object, the reference to 
the file and the serialization protocol.
'''
def obj_to_file(fn, obj):
    with open(fn, 'wb') as pf:
        pickle.dump(obj, pf, protocol=pickle.HIGHEST_PROTOCOL)

'''
Finally we need to perform the opposite by taking the binary file and convert it into a python object. To do this, 
we need to open the Object in read binary mode and then call the load method of the pickle module while we pass 
the reference to the binary file. Then we can print the deserialize Python object and return it.
'''       
def file_to_obj(fn, obj):
    with open(fn, 'rb') as pf:
        obj = pickle.load(pf)
        print(obj)
        return obj

#pickled = serialize(Employee_Details())
#deserialize(pickled)
#deserialize_prev_employers(pickled)

#obj = obj_to_file('./filestoread//employee_details.xyz', Employee_Details())
#employee = file_to_obj('./filestoread//employee_details.xyz', obj)

<__main__.Employee_Details object at 0x0000028BC7BB03D0>
