<h3>FILES</h3>

A <b>file</b> is anything that contains <b>data</b>. A file is external to your coding environment and you'll need to first import or upload it here in order to work with it.

<u>NOTE</u>: You do not need to know commands for manipulating files in the sense that they will not appear on a test or exam. However, pay attention to familiarize yourself with basic file commands as you will likely enounter them in courses which use Python (or a similar programming language). Of course, once data is extracted from a file, you need to know how to work with it. Examples that we discuss will illustrate this point.

EXAMPLE WITH A TEXT FILE

In [1]:
## open a file in read mode, read the entire contents into a string and then print it
## Note: We have to have the file 'recipe.txt' saved in the same directory where this notebook is located!

f=open('recipe.txt','r')   # 'r' for read 
content_string=f.read()   # the entire file is read as one string
print(content_string)  
f.close()

FileNotFoundError: [Errno 2] No such file or directory: 'recipe.txt'

In [None]:
## REPRESENTATION: to see exactly what it looks like, i.e., how Python sees it 
## Note: \n = new line

print(repr(content_string))   

In [None]:
## reading and printing an entire file; short 

f=open('recipe.txt','r')    
print(f.read())  # prints the entire file
f.close()  # you don't have to close the file, but it helps python run efficiently

In [None]:
## reading and printing parts of a file

f=open('recipe.txt','r') 

print("first 13 characters are--> ",f.read(13))  ## python reads the first 13 characters;
                                                 ## reader now positioned at location 13
                                                 ## which is reset to location 0
print("next 10 characters are--> ",f.read(10))  ## python reads the next 10 characters  
                                                ## reader position is reset to 0
print("remaining part--> ",f.read())      ## python reads rest of the file

f.close()

In [None]:
## moving/resetting the "head" or the "file reader"

f=open('recipe.txt','r')   
            
print("first 13 characters are--> ", f.read(13))

f.seek(0)  ## resets header back to position 0
print("first 3 characters are--> ", f.read(3))

f.seek(3)  ## resets header back to position 3
print("20 characters starting from character indexed by 3--> ", f.read(20))

f.close()

In [None]:
## files as strings/lines

f=open('recipe.txt','r')

print("FIRST 4 LINES OF THE FILE: \n")

line=f.readline()  ## f is viewed as being made up of lines; first line is assigned to 'line'
print("Line 1:", line)        ## note: python keeps track of how much has been read

line=f.readline()  # second line (first unread line) is now assigned to 'line'
print("Line 2:", line)
line=f.readline()  # third line is now assigned to 'line'
print("Line 3:", line)
line=f.readline()  # 4th line is now assigned to 'line'
print("Line 4:", line)
print()
print()


## printing the first 6 lines of the file

print("FIRST 6 LINES OF THE FILE: \n")

f.seek(0)  
for i in range(6):    
    print(f.readline())

f.close()

In [None]:
## or, like this

f=open('recipe.txt','r')  

ctr=0
for line in f:   
    print(line)
    ctr+=1
print("This file has",ctr,"lines.\n")

f.close()


## treating file as one long string

f=open('recipe.txt','r')  

content_string=f.read(28)  # reads the first 28 characters and assigns to the string 'content_string'

for ch in content_string:
    print(ch)
    
f.close()

WORKING WITH COURSE GRADES

In [None]:
## read a text file of letter grades
## what does it look like?

f=open("coursegrades.txt")


## print the whole file:

#for line in f:
    #print(line)
    

## print only the first 5 lines:

for i in range(0,5):  ## this just controls how many lines are read and printed
    print(f.readline()) 
    
    
## print only some characters at a time

f.seek(0)
print(f.read(4))  ## f is treated as one long string and the first 4 characters are printed
print(f.read(5))  ## the next 5 characters are printed
#print(f.read(9))
#print(f.read(18))

f.close()

In [None]:
## how does python see this file?

f=open("coursegrades.txt")

print(repr(f),"\n")


## what it sees when reading the file as one long string

print(repr(f.read()),"\n")


## what python sees when reading the file line by line

f.seek(0)
print(repr(f.readline()))
print(repr(f.readline()))
print(repr(f.readline()),"\n")

f.close()

In [None]:
## read a file of letter grades

f=open("coursegrades.txt")

for i in range(0,5):
    line=f.readline()
    print(repr(line))
    line=line.strip()  ## what does this do?
    print(repr(line))
    print()

f.close()

In [None]:
## exercise: organize course grades as a dictionary, 
## where the student number is the key and the grade is the value

## preparing the items

f=open("coursegrades.txt")

for line in f:
    line=line.strip()
    print(line)
    key=line[0:5]
    value=line[6:8]
    print("id:",key, "  grade:",value,"\n")
    
f.close() 

In [None]:
## building the dictionary

f=open("coursegrades.txt")

letter_grades={}

for line in f:
    line=line.strip()
    key=line[0:5]
    value=line[6:8]
    letter_grades[key]=value
    
f.close() 

print(letter_grades)

In [None]:
## exercise: create a new dictionary by replacing letters by numeric grades

numeric_grades={}

for k,v in letter_grades.items():
    if v=='A+':
        numeric_grades[k]=12
    if v=='A':
        numeric_grades[k]=11
    if v=='A-':
        numeric_grades[k]=10
    if v=='B+':
        numeric_grades[k]=9
    if v=='B':
        numeric_grades[k]=8
    if v=='B-':
        numeric_grades[k]=7
    if v=='C+':
        numeric_grades[k]=6
    if v=='C':
        numeric_grades[k]=5
    if v=='C-':
        numeric_grades[k]=4
    if v=='D+':
        numeric_grades[k]=3
    if v=='D':
        numeric_grades[k]=2
    if v=='D-':
        numeric_grades[k]=1
    if v=='F':
        numeric_grades[k]=0

print(numeric_grades)           

## how could we do this more efficiently??

WORKING WITH A .CSV FILE

In [None]:
import pandas as pd  ## to work with data

**Example:** Estimate the derivative $y'$ where $y=f(x)$ is given by a table of values.

In [None]:
## Note: The file "numerical_function.csv" must be uploaded to the same directory as this notebook!

values=pd.read_csv("numerical_function.csv") 

x=[]
y=[]

for i in range(len(values)):
    x.append(values.iloc[i,0]) ## takes value from row i, column 0 (x) and appends it to list 'x'
    y.append(values.iloc[i,1]) ## takes value from row i, column 1 (y) and appends it to list 'y'
    
print(x)
print(y)

<font color="navy"><b>Forward and Backward Difference Quotients</b></font>

The derivative of a function $f(x)$ at a point $(x_1,y_1)$ can be estimated by an average rate of change between the point $(x_1,y_1)$ and some nearby point. 

When we use the nearby point $(x_2,y_2)$, where $x_2>x_1$, we are computing a **forward difference**: 
$\displaystyle{f'(x_1) \approx {f(x_2) - f(x_1) \over x_2-x_1} }$

When we use the nearby point $(x_0,y_0)$, where $x_0<x_1$, we are computing a **backward difference**:
$\displaystyle{f'(x_1) \approx {f(x_0) - f(x_1) \over x_0-x_1} }$

In [None]:
## estimating the derivative f'(x), when y=f(x) is given numerically (i.e. as a table of values)

y_prime=[] 

for i in range(len(y)-1):
    diff_quotient=(y[i+1]-y[i])/(x[i+1]-x[i])  # use forward differences (slope of secant line)
    y_prime.append(diff_quotient) 
y_prime.append(diff_quotient) # once out of the loop, we add the last computed difference quotient to the list again

print("y'=", y_prime)

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline

plt.plot(x,y,label="f(x)")
plt.plot(x,y_prime,label="f'(x)")
plt.grid()
plt.title("The graphs of f(x) and f'(x)")
plt.legend()
plt.show()

<h3> Reading Files from the Internet or Online Sites </h3>


ONE MILLION DIGITS OF $\pi$

In [None]:
############################ no need to know this #####################################
import requests
target_url="https://ms.mcmaster.ca/lovric/1MP3/files/pi1million.txt"  ## where file is
response=requests.get(target_url)  ## using requests module to 'get' file
pi=response.text  ## digits of pi as a string ("1415...")
#######################################################################################

print(pi[0:50])
print(len(pi))
print()

## what does this code do?

ctr=0
for dig in pi:
     if dig=='7':
            ctr+=1
print(ctr)

## if the digits of pi are truly random, how many times would you expect 7 to
## appear in one million digits??

In [None]:
## index: gives location of the beginning of the string you're looking for

look_for='2653'
print(look_for in pi)
print()

if look_for in pi:
    print(pi.index(look_for)) 

THE JUNGLE BOOK

In [None]:
############################ no need to know this #############################
import requests
target_url="https://ms.mcmaster.ca/lovric/1MP3/files/TheJungleBook.txt"
response=requests.get(target_url)
jungle_book=response.text
################################################################################ 

print(jungle_book[0:700])  ## prints the beginning
print()
print(jungle_book[-245:])  ## prints the end

In [None]:
## count how many times a word appears in a text

ctr=0
for i in range(0,len(jungle_book)-3):  ## len(jungle_book)-len(word)+1  OR  len(jungle_book)
    if jungle_book[i:i+4]=='wolf' or jungle_book[i:i+4]=='Wolf':
        ctr+=1
print(ctr)
print()

## exercise: write a function which returns the number of times a word appears in a given text 
## i.e. generalize the code in this cell and package as a function