# File Operations

Use the open() function to read(r), append (a), or write (w) to a file. Opening a file returns a file handle, not the actual data in the file. After opening the file you can read or write to it. When you are finished with the file, ensure it is closed. Failing to close a file may lead to memory issues, inaccessible files, and possibly data loss.

In [None]:
%cat demofile.txt

In [None]:
# use the os module to access operating system information such as the current working directory (getcwd())
import os

# Create (or overwrite) a file
# If no path is specified, the file will be created in the current working directory

# If the file exists, opening with the "w" parameter will overwrite a file of the same name
#   if present in the same folder. To append instead of overwriting, use the "a" mode.
f = open("demofile.txt", "w")

f.write("This is the first line of the file.\n")

# Be sure to close your file. Failure to do so will cause problems.
f.close()

# Get the current directory
print(os.getcwd())

## Using the With statement for opening files
One advantage to using the With statement is that files you open using this method are automatically closed.

In [None]:
# Append the file
with open("demofile.txt", "a") as f:
    f.write("This is the second line.\n")

    # No need to explicitly close the file. Close() is automatically called.

In [None]:
%cat demofile.txt

## Reading files
There are several ways to read data from a file. Some of the methods to read a file include: reading a specified number of characters, reading line-by-line, or a reading number of lines.

### Reading an entire file

In [None]:
import pathlib
file_path = pathlib.Path('files/gettysburg.txt')
with open(file_path,"r") as fh_getty:
    #read() will access the entire file. Not a good option for large files.
    print(fh_getty.read())

In [None]:
with open(file_path,"r") as fh_getty:
    n = 100
    #read() will access the entire file. Not a good option for large files.
    print(fh_getty.read(n)) # Read the first n characters

In [None]:
with open(file_path,"r") as fh_getty:    
    #read() will access the entire file. Not a good option for large files.
    print(fh_getty.readline()) # Read a line
    print(fh_getty.readline()) # Read a line
    print(fh_getty.readline()) # Read a line

In [None]:
with open(file_path,"r") as fh_getty:    
    #read() will access the entire file. Not a good option for large files.
    x = fh_getty.readlines() # Read all lines with new line characters, separated by commas
    print(x[5])

In [None]:
with open(file_path,"r") as fh_getty:
    line_number = 0
    for x in fh_getty: # "x" will represent a line
        print(str(line_number) + ": " + x)
        line_number += 1

In [None]:
customer_file = pathlib.Path(r'files/fake_customer_list.txt')
with open(customer_file, 'r') as fh_customers:

    for line in fh_customers:
        #print(line)
        customer_list = line.strip().split("|")
        full_name = customer_list[0]
        email = customer_list[-1]
        print(full_name + " -- " + email)        

In [None]:
# List files in the current directory 
import os
import pprint as pp # 'pretty prints' the output in a column
pp.pprint(os.listdir(os.getcwd()))

In [None]:
# 'Magic' command to list files in current directory
%ls

### Use the CSV module to read a file

In [None]:
import csv
with open('./files/lyricsonly2M.csv', 'r') as f:
    reader = csv.reader(f)
    your_list = list(reader)

In [None]:
pp.pprint(your_list[:99])

In [None]:
len(your_list)

In [None]:
import pandas as pd
df_lyrics = pd.DataFrame(your_list, columns=your_list[0])

In [None]:
print(df_lyrics.shape)
with pd.option_context('display.max_seq_items', None):
    print(df_lyrics.head(5000))


In [None]:
print(your_list[0])

## Writing to a File

In [None]:
import os
with open('notebook_list.txt','w') as f:
    for root, dirs, files in os.walk("~", topdown=False):
        for name in files:
            if name.endswith('.ipynb'):
                print("writing...",os.path.join(root, name))
                f.write(os.path.join(root, name)+"\n")

## Use Pandas to Read a file
Although built-in file operations in Python may be useful for trivial matters, Pandas and Numpy are much more effective for reading, shaping, and analyzing data. Using these libraries is beyond the scope of this course, however, you should be aware of these libraries. See the Pandas notebook for more information.

In [None]:
import pandas as pd

df = pd.read_csv(".\\files\\2017_instacart_products.csv")
df.head()

In [None]:
# Use the describe function to display descriptive statistics for numerical fields (even if that doesn't make sense...)
df.describe()