# Files with Python

Agenda : 

1. Text files

2. Comma Separated Values





# 1. Text files



when we do : `fh = open('some_file')` we create the instance `fh` which is a file-handler : not the file itself but an object to manage the file. Sometimes `fh`is called fileObject. 

Methods that apply to `fh` include read, write, open, and close... and many others


In [25]:
fh = open('my_file','r')   #  ('r' is for read)

# This fails because this does not create the file if it does not exist.

TypeError: 'str' object cannot be interpreted as an integer

- `r` for reading – The file pointer is placed at the beginning of the file. This is the default mode.
- `r+` Opens a file for both reading and writing. The file pointer will be at the beginning of the file.
- `w` Opens a file for writing only. Overwrites the file if the file exists. If the file does not exist, creates a new file for writing.
- `w+` Opens a file for both writing and reading. Overwrites the existing file if the file exists. If the file does not exist, it creates a new file for reading and writing.
- `rb` Opens a file for reading only in binary format. The file pointer is placed at the beginning of the file.
- `rb+` Opens a file for both reading and writing in binary format.
- `wb+` Opens a file for both writing and reading in binary format. Overwrites the existing file if the file exists. If the file does not exist, it creates a new file for reading and writing.
- `a` Opens a file for appending. The file pointer is at the end of the file if the file exists. That is, the file is in the append mode. If the file does not exist, it creates a new file for writing.
- `ab` Opens a file for appending in binary format. The file pointer is at the end of the file if the file exists. That is, the file is in the append mode. If the file does not exist, it creates a new file for writing.
- `a+` Opens a file for both appending and reading. The file pointer is at the end of the file if the file exists. The file opens in the append mode. If the file does not exist, it creates a new file for reading and writing.
- `ab+` Opens a file for both appending and reading in binary format. The file pointer is at the end of the file if the file exists. The file opens in the append mode. If the file does not exist, it creates a new file for reading and writing.
- `x` open for exclusive creation, failing if the file already exists (Python 3)


(source : Eyehunts)

In [2]:
fh = open('my_file','a') # opens in append mode : creates the file if it did not exists

#fh = open('my_file','w') # opens file in write mode : creates the file if it did not exists ... also 
                         # : this will overwrite whatever was in the file ! 


In [4]:
# 'x' mode creates the file and returns an error if the file already exists 
# it is safe in some cases to test if a file exists before creating it

try : 
    fh = open('my_file','r')
except:
    print("R : Error file does not exist ! ")

#or   

try :  
    fh = open('my_file','x')
except: 
    print("X : Error file already exists ! ")
    
    
# Both test if the file exists already. When the file did not exist,
# only the second one ('x') does create the file.


# Run this cell twice, explain the results !


R : Error file does not exist ! 


In [9]:
fh = open('my_file', 'w')

fh.write("Hello !\n")

8

In [10]:
fh.write("Hello World!")

12

In [11]:
fh.read()

UnsupportedOperation: not readable


Also, you can try to see what is in my_file using the jupyter file selector, click on my_file. 

You should find it empty ... 

Why ? 

In [12]:
# this will actually make the file persistant on the filesystem
fh.close()

In [13]:
fh.read()  # produces an other error !

ValueError: I/O operation on closed file.

In [14]:
fh = open('my_file')

In [15]:
fh.read()


'Hello !\nHello World!'

In [16]:
fh = open('my_file')
for line in fh.read():
    print(line)

H
e
l
l
o
 
!


H
e
l
l
o
 
W
o
r
l
d
!


In [17]:
fh = open('my_file')
for line in fh.readlines():
    print(line)

Hello !

Hello World!


In [18]:
fh = open("my_file", "r")
myline = fh.readline()
while myline:
    print(myline)
    myline = fh.readline()
    
fh.close()   

Hello !

Hello World!


In [19]:
import os 
os.remove('my_file')

In [24]:
with open('myfile','w') as fh:
    for i in range(50):
        fh.write("Bonjour\n")


fh.read()

ValueError: I/O operation on closed file.

# 2. CSV

CSV files are very popular because they can be used to interface data to and from worksheet (aka Excel® files).

They can be seen as the "first" level (as in "not sophisticated) data structures. 

Usually the file extension is csv, and they are files with lines of data, each separated by a comma : `,` and can contain a "header" line : 


`Name, Firstname, City, Age`

`Berger, Alice, Marseille, 34`

`Vernet, Bertrand, Angers, 32`

`Jordes, Charles, Lyon, 29`

`Duchamps, Eve, Bordeaux, 30`




In [1]:
# a csv file is a text file (with some user formatting, so it is quite similar management as before)

# first let' populate our file, as a text file : 

fh = open("my_file.csv", "w")
fh.write('Name, Firstname, City, Age\n')
fh.write('Berger, Alice, Marseille, 34\n')
fh.write('Vernet, Bertrand, Angers, 32\n')
fh.write('Jordes, Charles, Lyon, 29\n')
fh.write('Duchamps, Eve, Bordeaux, 30\n')
fh.close()

In [2]:
# to be able to access some data, let'us use the csv library: 
import csv

## Let's try to get the average age of this group of people

In [3]:
# csv.reader is a function that reads the file into a lists (one for each line) 
fh = open("my_file.csv")
spamreader = csv.reader(fh, delimiter=',')
for row in spamreader:
    print(row)

['Name', ' Firstname', ' City', ' Age']
['Berger', ' Alice', ' Marseille', ' 34']
['Vernet', ' Bertrand', ' Angers', ' 32']
['Jordes', ' Charles', ' Lyon', ' 29']
['Duchamps', ' Eve', ' Bordeaux', ' 30']


So if we want to get the ages : 

In [4]:
# let us get the 4th element of each list 
fh = open("my_file.csv")
spamreader = csv.reader(fh, delimiter=',')
for row in spamreader:
    print(row[3])

 Age
 34
 32
 29
 30


We do not need the first line (header):

In [5]:
# before we iterate : use next to skip the first line
fh = open("my_file.csv")
spamreader = csv.reader(fh, delimiter=',')
next(spamreader)
for row in spamreader:
    print(row[3])

 34
 32
 29
 30


Now we can compute the average

In [None]:
# try it


## Challenge : 

Let us use opendata.gouv : 
- https://www.data.gouv.fr/fr/
- Données relatives à la santé et à la Covid 19 (https://www.data.gouv.fr/fr/pages/donnees-sante/)
- Inventaire des données relatives à la Covid (https://www.data.gouv.fr/fr/pages/donnees-coronavirus/)
- Sites de prélèvement pour les tests COVID (https://www.data.gouv.fr/fr/datasets/sites-de-prelevements-pour-les-tests-covid/) 

Download the file : sites-prelevements-grand-public.csv (1.2Mo) 

How many sites are around Lille (Lille center = 50° 38' 14" Nord , 3° 03' 48" East), we will consider 20 km wide square to say a site is around Lille ? 

- 1 latitude minute (N-S) is approx 2 km

- 1 longitude minute (W-E) is approx 1,2 km


In [None]:
# calculate the min and max longitute and latitude around Lille coordinates, definging a 20 km x 20 km square

lille_long = 3 + 3/60 + 48/3600
lille_lat  = 50 + 38/60 + 14/3600

lille_max_long = 3 + 11 / 60 + 48 / 3600
lille_min_long = 2 +  55 / 60 + 48 / 3600

lille_max_lat = 50 + 48 / 60 + 14 / 3600
lille_min_lat = 50 + 28 / 60 + 14 / 3600


print(lille_min_long)
print(lille_long)
print(lille_max_long)

print(lille_min_lat)
print(lille_lat)
print(lille_max_lat)

In [None]:
# make a first run browsing the .csv file to get the latitudes and longitudes (almost same code as above) 
fh = open("sites-prelevements-grand-public.csv")
spamreader = csv.reader(fh, delimiter=',')
next(spamreader)
count = 0 
for row in spamreader:
    try: 
        lat = float(row[9])
    except: 
        next(spamreader)
        
    try:
        lon = float(row[8])
    except: 
        next(spamreader)
        
    if lat < lille_max_lat and lat  > float(lille_min_lat) and lon > lille_min_long and lon < lille_max_long:
        print(row[4],'long=',row[8],'lat =', row[9])
        count = count + 1 

print('there are ', count ,'locations available')

In [None]:
# select and count only the ones with the desired coordinates, print their address and coordinates.
# csv reader provides strings, we need to convert to float
# there are defects in the file... !