## FILE HANDLING

## FILE AND FILE PATHS

A file is used to persist data after a program has finished executing. A file has two key properties:
1. Filename
2. path (file location)


For example, suppose on my windows computer,  I have a file named 'project.docx' and it is stored in the location 'C:\Users\Al\Documents'. Then, 'project.docx' is the filename and 'C:\Users\Al\Documents' is the path.  
A filename consists of 2 parts: the name of the file and its extension, seperated by a dot(.). The extension tell us the type of file it is. In the example above, the name of the file is project and its extension is docx which tells us its a word document. 
Users, AI and Documents are callled folders/directories. A folder can contain file(s)/folder(s). The C:\ in path is called the root folder (i.e it contains all other folders).  
On Windows, the root folder is named C:\ and is also called the C: drive. On macOS and Linux, the root folder is /. 

## FILE PATH SEPARATORS ON DIFFERENT OPERATING SYSTEM

On Windows, paths are written using backslashes (*\*) as the separator between folder names while the macOS and Linux operating systems uses the forward slash (*/*) as their path separator. For our codes to be Operating System independent (i.e. to run across different OS), we would have to write codes that handles both cases. This can be achieved using the path module. The path module returns a path with the correct OS path separators.

In [5]:
from pathlib import Path
filePath = Path('User', 'TripleA')
print(filePath)

User\TripleA


In [6]:
Path.cwd()

WindowsPath('c:/Users/fadhl/Desktop/RAIN/python/fwpythonnotes')

In [7]:
file_path = Path.cwd()
print(file_path)
filenames = ['Assignment.txt', 'Image.jpeg','Review.docx']
for filename in filenames:
    print(Path(file_path, filename))

c:\Users\fadhl\Desktop\RAIN\python\fwpythonnotes
c:\Users\fadhl\Desktop\RAIN\python\fwpythonnotes\Assignment.txt
c:\Users\fadhl\Desktop\RAIN\python\fwpythonnotes\Image.jpeg
c:\Users\fadhl\Desktop\RAIN\python\fwpythonnotes\Review.docx


## CURRENT WORKING DIRECTORY

The Current Working Directory (CWD) is the current folder or directory a user is working on. To get the CWD, we use the cwd() method of the Path module.

In [8]:
print(Path.cwd())

c:\Users\fadhl\Desktop\RAIN\python\fwpythonnotes


## HOME DIRECTORY
All users have a folder for their own files on the computer called the home directory or home folder. Path has a home() method that returns 



In [9]:
print(Path.home())

C:\Users\fadhl


## CREATING FOLDERS

To create a folder or a folder with sub folders, we use the `makedirs()` method from the `os` module. 

In [10]:
import os
os.makedirs(Path(Path.home(), 'Wednesday', 'File handling'))

FileExistsError: [WinError 183] Cannot create a file when that file already exists: 'C:\\Users\\fadhl\\Wednesday\\File handling'

We can also use the `mkdir()` method of the Path module to create a folder. Unlike the `makedirs()` method which can create a folder with sub folders, the `mkdir()` can only create a folder. 

In [11]:
from pathlib import Path
Path(Path.home(), 'Myfolder').mkdir()

In [12]:
file_path = Path('Python', 'Assessments','Functions')
print(file_path)

Python\Assessments\Functions


In [13]:
new_file_path = Path(Path.home(), file_path)
print(new_file_path)

C:\Users\fadhl\Python\Assessments\Functions


In [14]:
os.makedirs(new_file_path)

FileExistsError: [WinError 183] Cannot create a file when that file already exists: 'C:\\Users\\fadhl\\Python\\Assessments\\Functions'

## ABSOLUTE VS RELATIVE PATH

We can specify a file using its absolute path or its relative path. An absolute path begins with the root folder while a relative path is relative to the program's current working directory. Dot (`.`) is used to refer to **this folder or directory** while dot dot (`..`) is used to refer to the **parent folder**.
We can check if a path is an absolute path or a relative path using the pathlib module.

In [15]:
from pathlib import Path
print(Path.cwd())
Path.cwd().is_absolute()

c:\Users\fadhl\Desktop\RAIN\python\fwpythonnotes


True

In [16]:
Path('home', 'folder').is_absolute()

False

We can join paths using the forward slash symbol. With this, we can join the current working directory with a relative path to get an absolute path.

In [None]:
print(Path.cwd()/Path(r'home\folder'))
print(Path(Path.cwd(), 'home', 'folder'))

The `os` module has methods to retrieve the absolute path or relative path of a file.

In [17]:
import os
#returns the absolute path of a file
os.path.abspath('.')

'c:\\Users\\fadhl\\Desktop\\RAIN\\python\\fwpythonnotes'

In [18]:
os.path.relpath('.')

'.'

In [19]:
#returns the relative path of a file
file_path = Path(Path.cwd(),'new_folder2','new_sub_folder')
os.makedirs(file_path)
os.path.relpath(file_path)

'new_folder2\\new_sub_folder'

We can also use the os module to find our if a path is an absolute path or a relative path.

In [13]:
os.path.isabs('.')

False

In [14]:
os.path.isabs(Path.cwd())

True

## GETTING PARTS OF A FILE

In [20]:
file_path = Path.cwd()/Path('Functions2.ipynb')
print(f"File Path: {file_path}")

File Path: c:\Users\fadhl\Desktop\RAIN\python\fwpythonnotes\Functions2.ipynb


In [21]:
directory_name = os.path.dirname(file_path)
print(f"Directory name: {directory_name}")

Directory name: c:\Users\fadhl\Desktop\RAIN\python\fwpythonnotes


In [22]:
base_name = os.path.basename(file_path)
print(f"Base Name: {base_name}")

Base Name: Functions2.ipynb


We can use the os.path.split() to get a tuple of the directory name and the base name of a file.

In [None]:
dir_name, base_name = os.path.split(file_path)
print(dir_name)
print(base_name)

In [None]:
filename , ext = os.path.splitext(base_name)
print(ext)

## EXPLORING FOLDERS
To retrieve the list of folders or files in a directory, we use the `listdir()` in the `os` module.

In [26]:
os.listdir(Path.cwd())

['.ipynb_checkpoints',
 'CLASSES AND OOP.ipynb',
 'DataStructures.ipynb',
 'FILEHANDLING.ipynb',
 'FUNCTIONS.ipynb',
 'new_folder2',
 'ProgramFlow.ipynb',
 'PythonBasics.ipynb',
 'PYTHONIC STYLE.ipynb',
 'Reading and writing to files.ipynb',
 'REGULAREXPRESSIONS.ipynb',
 'TimeSchedulingTasks_LaunchingPrograms.ipynb',
 'WebScraping.ipynb',
 'WorkingWithCSV_JSONFiles.ipynb',
 'WorkingWithDocuments.ipynb',
 'WorkingWithExcel.ipynb']

We can get the file size (in bytes) of any file using the getsize() method

In [27]:
for file in os.listdir(Path.cwd()):
    print(f"The size of file '{file}' is {os.path.getsize(file)} bytes")

The size of file '.ipynb_checkpoints' is 4096 bytes
The size of file 'CLASSES AND OOP.ipynb' is 30539 bytes
The size of file 'DataStructures.ipynb' is 99965 bytes
The size of file 'FILEHANDLING.ipynb' is 32268 bytes
The size of file 'FUNCTIONS.ipynb' is 26247 bytes
The size of file 'new_folder2' is 0 bytes
The size of file 'ProgramFlow.ipynb' is 24565 bytes
The size of file 'PythonBasics.ipynb' is 58687 bytes
The size of file 'PYTHONIC STYLE.ipynb' is 26327 bytes
The size of file 'Reading and writing to files.ipynb' is 8341 bytes
The size of file 'REGULAREXPRESSIONS.ipynb' is 25043 bytes
The size of file 'TimeSchedulingTasks_LaunchingPrograms.ipynb' is 109316 bytes
The size of file 'WebScraping.ipynb' is 1561984 bytes
The size of file 'WorkingWithCSV_JSONFiles.ipynb' is 67362 bytes
The size of file 'WorkingWithDocuments.ipynb' is 201530 bytes
The size of file 'WorkingWithExcel.ipynb' is 18022 bytes


## GLOB PATTERNS

Glob patterns are like a simplified form of regular expressions often used in command line commands. It is used to get the contents of a folder according to a glob pattern. The example below returns all files with the extension `ipynb` in the current working directory.  

In [28]:
list(Path.cwd().glob('*.ipynb'))

[WindowsPath('c:/Users/fadhl/Desktop/RAIN/python/fwpythonnotes/CLASSES AND OOP.ipynb'),
 WindowsPath('c:/Users/fadhl/Desktop/RAIN/python/fwpythonnotes/DataStructures.ipynb'),
 WindowsPath('c:/Users/fadhl/Desktop/RAIN/python/fwpythonnotes/FILEHANDLING.ipynb'),
 WindowsPath('c:/Users/fadhl/Desktop/RAIN/python/fwpythonnotes/FUNCTIONS.ipynb'),
 WindowsPath('c:/Users/fadhl/Desktop/RAIN/python/fwpythonnotes/ProgramFlow.ipynb'),
 WindowsPath('c:/Users/fadhl/Desktop/RAIN/python/fwpythonnotes/PythonBasics.ipynb'),
 WindowsPath('c:/Users/fadhl/Desktop/RAIN/python/fwpythonnotes/PYTHONIC STYLE.ipynb'),
 WindowsPath('c:/Users/fadhl/Desktop/RAIN/python/fwpythonnotes/Reading and writing to files.ipynb'),
 WindowsPath('c:/Users/fadhl/Desktop/RAIN/python/fwpythonnotes/REGULAREXPRESSIONS.ipynb'),
 WindowsPath('c:/Users/fadhl/Desktop/RAIN/python/fwpythonnotes/TimeSchedulingTasks_LaunchingPrograms.ipynb'),
 WindowsPath('c:/Users/fadhl/Desktop/RAIN/python/fwpythonnotes/WebScraping.ipynb'),
 WindowsPath('c:

In [29]:
#lists all files in the downloads folder with an extension that starts with p
p = Path.cwd()
list(p.glob('*.p*'))

[]

## PATH VALIDITY

To prevent our program from crashing due to a path that does not exist, it is best to always check if the path is valid before attempting to use it. The `Path` module has the methods `exists()`, `is_file()` and `is_dir()` to check whether a given path exists and whether it is a file or folder.

1. `exists()` returns True if the path exists or returns False if it doesn’t exist.
2. `is_file()` returns True if the path exists and is a file, or returns False otherwise.
3. `is_dir()` returns True if the path exists and is a directory, or returns False otherwise.

In [30]:
Path.exists(Path(Path.home(),'Downloads'))

True

In [31]:
Path.exists(Path(Path.home(),r'Downloads\NonExistentFolder'))

False

In [32]:
Path.exists(Path(Path.home(),r'Python\Assignment\Functions'))

False

In [33]:
Path.is_file(Path(Path.cwd(),'anitaBorg.jpg'))

False

In [34]:
dir_name = Path(Path.home(),r'Programs\Python2')
if not Path.exists(dir_name):
    os.makedirs(dir_name)

In [38]:
web_design = Path(Path.home(),'HTML_CSS')
if not Path.exists(web_design):
    os.makedirs(web_design)

In [40]:
Path.exists(Path(Path.home(),r'Desktop\RAIN\HTML_CSS'))

True

Plain text files are files that contain basic text characters only. They do not include other information like font, color etc. An example is `.txt` files which can be opened with windows notepad or MacOs text edit application.  
Binary files are all other files e.g word documents (.doc or .docx), PDFs, Images (.png, .jpeg) etc. If you open a binary file in Notepad or TextEdit, it would appear as a scrambled gibberish.

The `open()` function is used to open a file. It returns a file object and takes the file path and file mode as arguments.  
The steps involved in reading or writing files in Python are:
1. Call the open() function to return a File object.
2. Call the read() or write() method on the File object.
3. Close the file by calling the close() method on the File object.

**There are four different methods (modes) for opening a file:**

"r" - Read - Default value. Opens a file for reading. It throws an error if the file does not exist

"a" - Append - Opens a file for appending, creates the file if it does not exist. 

"w" - Write - Opens a file for writing, creates the file if it does not exist

"x" - Create - Creates the specified file, returns an error if the file exists

**In addition to the mode, you can specify if the file should be handled as binary or text mode**

"t" - Text - Default value. Text mode

"b" - Binary - Binary mode (e.g. images)

In [41]:
#writing to a file
from pathlib import Path
f = open(Path.cwd()/'newfile.txt', 'w')
f.write('Welcome to PY 101 class \n')
f.write('I am happy to have you all in my class')
f.close()

In [42]:
#reading from a file
if (Path.exists(Path.cwd()/'newfile.txt')):
    f = open(Path.cwd()/'newfile.txt', 'rt')
    text = f.read()
    print(text)
    f.close()

Welcome to PY 101 class 
I am happy to have you all in my class


The file object also has `readlines()` and `readline()` methods. The `readline()` method reads a single line from the file while the `readlines()` method returns a list of string values of the file. Each string represents a line in the file. 

In [43]:
#appending to a file
with open(Path.cwd()/'newfile.txt', 'a') as f:
    f.write('\nHow are you doing today?')

In [46]:
#checking the new contents in the file
with open(Path.cwd()/'newfile.txt') as f:
    text = f.readlines()
    for line in text:
        print(line)

Welcome to PY 101 class 

I am happy to have you all in my class

How are you doing today?



It is a good practice to always close our file when we are done with it. In some cases, due to buffering, changes made to a file may not show until you close the file.  In the above code example we opened a file without calling the close() function. This is because the `with` statement which we used in our program automatically closes the file. The `with` statement simplifies exception handling by encapsulating common preparation and cleanup tasks.

## EXERCISE
Write a program that searches for all .txt files in any given directory, reads and prints contents of each file.

In [41]:
 file_path =list(Path.cwd().glob('*txt'))



file_path

[WindowsPath('c:/Users/fadhl/Desktop/RAIN/python/fwpythonnotes/newfile.txt')]

In [47]:
for file in file_path:
    with open (file, 'r') as f:
        text=f.read()
        print(text)

Welcome to PY 101 class 
I am happy to have you all in my class
How are you doing today?


## ORGANISING FILES

We have seen how to write and read from files. Now we would see how we can copy and modify files and folders using Python programming. 

## COPYING FILES AND FOLDERS
The `shutil` module allows us to copy, rename, move and delete files and folders in Python.

In [48]:
import os
import shutil
from pathlib import Path
source = Path.cwd()
destination = Path.home()
shutil.copy(source/'newfile.txt',destination/'textfile_copy.txt' )

WindowsPath('C:/Users/fadhl/textfile_copy.txt')

In the above code we copied the `textfile.txt` in the current working directory is copied to the home folder and renamed with the specified file name `textfile_copy.txt`. If we did not specify a filename in our destination path, the file name would be the same as the filename we copied.  

Unlike the `shutil.copy()` which copies a single file, the `shutil.copytree()` will copy an entire folder and every folder and file contained in it. In the code below, we are going to copy the current working directory (folder) to a new folder called `RAIN-CLASSES` in the home folder.

In [49]:
print(f"Current Working Directory: {Path.cwd()}")
shutil.copytree(Path.cwd(), Path.home()/'RAIN-ClASSES')

Current Working Directory: c:\Users\fadhl\Desktop\RAIN\python\fwpythonnotes


WindowsPath('C:/Users/fadhl/RAIN-ClASSES')

## RENAMING AND MOVING FILES AND FOLDERS
The shutil.move() will move a folder from the source path to the destination path. If a filename is not included in the destination path, the filename of the source would be used as the name of the file. Otherwise, the filename in the destination path is used as the new filename.

In [None]:
if (Path.is_file(source/'textfile.txt')):
    shutil.move(source/'textfile.txt',destination/'textfile2.txt')
else:
    print('File does not exist.')

The above code moved the `textfile.txt` in the current working directory  into the  home folder and while  at it, renames that `textfile.txt` file to `textfile2.txt`.

In [None]:
if(Path.is_dir(Path('./new_folder'))):
    shutil.move('./new_folder',  Path.home()/'Andela2' )
else:
    print('Folder does not exist.')

    # Create a new folder called Andela 2 in Cwd and move it to home folder
    

## PERMANENTLY DELETING FILES AND FOLDERS

The `os` module allows us to delete a single file and an empty folder. To delete a folder with contents, we use the `shutil` module. These functions are dangerous to use because they irreversibly delete files and folders.

In [3]:
Path.cwd()

NameError: name 'Path' is not defined

In [4]:
#deletes the file
if(Path.exists(Path(Path.cwd(),'Untitled.ipynb'))):
    os.unlink(Path(Path.cwd(),'Untitled.ipynb'))

NameError: name 'Path' is not defined

In [None]:
#deletes an empty folder
if(Path.is_dir(Path(Path.cwd(),'Untitled Folder'))):
    os.rmdir(Path(Path.cwd(),'Untitled Folder'))

In [None]:
if(Path.is_dir(Path(Path.home(),'new_folder2'))):
    shutil.rmtree(Path(Path.home(),'new_folder2'))

## SEND2TRASH MODULE

The send2trash module provides a safe way to delete files and folders because it will send folders and files to your computer’s trash or recycle bin instead of permanently deleting them.

In [None]:
from send2trash import send2trash
if(Path.exists(Path('./untitled.txt'))):
    send2trash('./untitled.txt')

## WALKING A DIRECTORY TREE

The os.walk() module lets us walk through a directory tree visiting every file in a folder and also every file in every subfolder of that folder. The os.walk() function returns three values on each iteration through the loop:

1. A string of the current folder’s name i.e 
2. A list of strings of the folders in the current folder
3. A list of strings of the files in the current folder

In [51]:
import os
from pathlib import Path
for folder_name, sub_folders, filenames in os.walk(Path.cwd()):
    print(f"Curent Folder: {folder_name}\n")
    for sub_folder in sub_folders:
        print(f"{sub_folder} is a sub folder of {folder_name}\n")
    for filename in filenames:
        print(f"{filename} is a file in {folder_name}\n")

Curent Folder: c:\Users\fadhl\Desktop\RAIN\python\fwpythonnotes

.ipynb_checkpoints is a sub folder of c:\Users\fadhl\Desktop\RAIN\python\fwpythonnotes

new_folder2 is a sub folder of c:\Users\fadhl\Desktop\RAIN\python\fwpythonnotes

CLASSES AND OOP.ipynb is a file in c:\Users\fadhl\Desktop\RAIN\python\fwpythonnotes

DataStructures.ipynb is a file in c:\Users\fadhl\Desktop\RAIN\python\fwpythonnotes

FILEHANDLING.ipynb is a file in c:\Users\fadhl\Desktop\RAIN\python\fwpythonnotes

FUNCTIONS.ipynb is a file in c:\Users\fadhl\Desktop\RAIN\python\fwpythonnotes

newfile.txt is a file in c:\Users\fadhl\Desktop\RAIN\python\fwpythonnotes

ProgramFlow.ipynb is a file in c:\Users\fadhl\Desktop\RAIN\python\fwpythonnotes

PythonBasics.ipynb is a file in c:\Users\fadhl\Desktop\RAIN\python\fwpythonnotes

PYTHONIC STYLE.ipynb is a file in c:\Users\fadhl\Desktop\RAIN\python\fwpythonnotes

Reading and writing to files.ipynb is a file in c:\Users\fadhl\Desktop\RAIN\python\fwpythonnotes

REGULAREXPRESSIO

## ZIP FILES
The zipfile module allows us to work with zip files. To create a zipfile object, we import the `ZipFile` module from the zipfile library.

In [None]:
from pathlib import Path
from zipfile import ZipFile
p = Path.cwd()
exampleZip = ZipFile(p/'RAIN_PYTHON_PROGRAMMING.zip')
print(exampleZip)

The zipfile object has a method called `namelist()` that allows us to see all the files and folders in the zipped folder. 

In [None]:
exampleZip.namelist()

The ZipFile has a getinfo()  method that returns a ZipInfo object about any particular file contained in the zip.

In [None]:
info = exampleZip.getinfo('RAIN-PYTHON PROGRAMMING/RAIN-PYTHON PROGRAMMING/Data_copy.xlsx')
print(info)
print(f"Actual file size: {info.file_size}")
print(f"Compressed file size: {info.compress_size}")

## EXTRACTING FROM ZIP FILES
To extract contents from a zip file, we use the extractall() method. This method extracts all the files and folders from a ZIP file into the specified directory or into the current working directory if no directory is specified.

In [None]:
exampleZip.extractall()

In [None]:
exampleZip.extractall(Path.home())

The extract() method for ZipFile objects will extract a single file from the ZIP file.

In [None]:
exampleZip.extract('RAIN-PYTHON PROGRAMMING/RAIN-PYTHON PROGRAMMING/Data_copy.xlsx')
exampleZip.extract('RAIN-PYTHON PROGRAMMING/RAIN-PYTHON PROGRAMMING/Data_copy.xlsx', Path.home())
exampleZip.close()

## CREATING AND ADDING TO ZIP FILES
To create a zip file, we open a zip file object in write mode by pass a second parameter 'w' to the ZipFile object. 

In [None]:
with ZipFile('new.zip', 'w') as new_zip:
    pass

The write() method of the ZipFile object takes two paramaeters. The first parameter is the file and the second parameter is the compression_type. The compression_type parameter tells the computer the compression algorithm to use on the files. You can just set the compression_parameter to 'zipfile.ZIP_DEFLATED' which is the deflate compression algorithm and it works well on all data types.

In [None]:
import zipfile
with ZipFile('new.zip', 'w') as new_zip:
    new_zip.write('newfile.txt', compress_type = zipfile.ZIP_DEFLATED)
    new_zip.extractall(Path(Path.cwd() , 'new_folder'))

NOTE: Just like the open function, opening a zip file in write mode would overwrite its existing contents. To add to the contents, pass 'a' as the mode to open the zip file in an append mode.

## EXERCISES

1. Write a program that walks through a folder tree, searches for files with a certain file extension (such as .pdf or .jpg) or that begins with a certain prefix (e.g PY101-PythonBasics, PY101-ProgramFlow) and copies these files from their current location to a new folder.
2. Write a program that walks through a folder tree and searches for  files or folders with file size of more than 100MB. Print these files with their absolute path to the screen.




## EXERCISE 1


In [15]:
import os
import shutil

def copy_files(source_folder, destination_folder, extensions=None, prefixes=None):
    if not os.path.exists(destination_folder):
        os.makedirs(destination_folder)

    for root, dirs, files in os.walk(source_folder):
        for file in files:
            file_path = os.path.join(root, file)
            if extensions and os.path.splitext(file)[1].lower() in extensions:
                shutil.copy2(file_path, destination_folder)
            elif prefixes and any(file.startswith(prefix) for prefix in prefixes):
                shutil.copy2(file_path, destination_folder)

if __name__ == "__main__":
    source_folder = input(Path.cwd())
    destination_folder = input(Path.home)

    extensions = ['.pdf', '.jpg']  
    prefixes = ['PY101-', 'CS200-'] 

    if os.path.exists(source_folder):
        copy_files(source_folder, destination_folder, extensions, prefixes)
        print("Files copied successfully.")
    else:
        print("Source folder not found.")


Source folder not found.


In [7]:
import os

def find_large_files(start_folder, min_size=100 * 1024 * 1024):  # 100MB in bytes
    for root, dirs, files in os.walk(start_folder):
        for file in files:
            file_path = os.path.join(root, file)
            if os.path.getsize(file_path) > min_size:
                print("Large file found:", file_path)

if __name__ == "__main__":
    folder_to_search = input("Enter the folder path to search: ")
    if os.path.exists(folder_to_search):
        find_large_files(folder_to_search)
    else:
        print("Folder not found.")


Folder not found.
