##### <img src="../SDSS-Logo.png" style="display:inline; width:500px" />


## Learning Objectives

- Understand how to read from and write to files in python
- Look at the special case of reading and writing from CSV files


### We will look t a number of packages that are useful in Python File I/O

### It is helpful to see what version of Python you are running
* The `sys` package is helpful, see the command below

In [1]:
import sys
print(sys.version)

3.11.5 | packaged by conda-forge | (main, Aug 27 2023, 03:34:09) [GCC 12.3.0]


### Reading and writing from files will usually involve dealing with file names and directories.
* In Python, file name and directory names are usually treated as strings.
* The [`os`](https://docs.python.org/3/library/os.html) module and [`os.path`](https://docs.python.org/3/library/os.path.html#module-os.path) in particular provides operating system independent ways of dealing with files and directories.

#### For example, you may want to see if a given file exists
* `os.path.isfile()` can be used to do that.

In [2]:
# Check if `Unit2-4-1-test.txt` is a file in the current directory
import os
print("File exists?", os.path.isfile('Unit-2-4-1-test.txt'))

File exists? True


In [4]:
# Now for a file that does not exist
print("File exists?", os.path.isfile('Not-there.txt'))

File exists? False


### Reading and writing files
* Typically, when dealing with file I/O:
    * You open a file, which yelds a file handle
    * Use the file handle to read from or write to the file
    * Close the file

### Opening a file using `open`

In [10]:
# Open the file `Unit-2-4-1-test.txt` for reading
myFileHdl = open("Unit-2-4-1-test.txt", "r")
print(myFileHdl)

<_io.TextIOWrapper name='Unit-2-4-1-test.txt' mode='r' encoding='UTF-8'>


#### What happens if you try to open a file for reading that does not exist?

In [8]:
anotherHdl = open("Not-there.txt", "r")
print(anotherHdl)

FileNotFoundError: [Errno 2] No such file or directory: 'Not-there.txt'

### If the file exists and it is a text file, we can read it in line-by-line using the `readline()` method.

In [11]:
# Read a single line and print it out
lineData = myFileHdl.readline()
print(lineData)
print(f"\n {type(lineData)}")

One bright morning as the Fox was following his sharp nose through the wood in search of a bite to eat, he saw a Crow on the limb of a tree overhead. This was by no means the first Crow the Fox had ever seen. What caught his attention this time and made him stop for a second look, was that the lucky Crow held a bit of cheese in her beak.


 <class 'str'>


### `readlines()` will read all the lines in the file into a list of strings

In [12]:
linesList = myFileHdl.readlines()
print(linesList)

['\n', '"No need to search any farther," thought sly Master Fox. "Here is a dainty bite for my breakfast."\n', '\n', 'Up he trotted to the foot of the tree in which the Crow was sitting, and looking up admiringly, he cried, "Good-morning, beautiful creature!"\n', '\n', 'The Crow, her head cocked on one side, watched the Fox suspiciously. But she kept her beak tightly closed on the cheese and did not return his greeting.\n', '\n', '"What a charming creature she is!" said the Fox. "How her feathers shine! What a beautiful form and what splendid wings! Such a wonderful Bird should have a very lovely voice, since everything else about her is so perfect. Could she sing just one song, I know I should hail her Queen of Birds."\n', '\n', "Listening to these flattering words, the Crow forgot all her suspicion, and also her breakfast. She wanted very much to be called Queen of Birds. So she opened her beak wide to utter her loudest caw, and down fell the cheese straight into the Fox's open mouth

### Note that `readlines()` did not read the first line.
* That is because as you read from a file you can imagine a pointer that moves along the file.
* So the first `readline()` read the first line and the pointer is at the end of that line.
* `readlines()` then reads the rest of the file from the current pointer position.

#### Once you have read or written to a file, it is best to use `close()` to close the file

In [13]:
myFileHdl.close()

### CSV (comma separated values) is a very common data format used for tabular data in data science.
* See for example the file `Unit-2-5-polling_place_NC_20201103.csv`

#### python has the `csv` module and `csv.reader()` to make reading CSV files easy
* The example below also shows how to use the `with` statement with file I/O a common pattern

In [17]:
# example of using `csv_reader` and `with` to read a CSV file
import csv
bigList = []
noLines = 0
with open('Unit-2-5-polling_place_NC_20201103.csv', mode = 'r') as myFile:
    csvFile = csv.reader(myFile)
    for lines in csvFile:
        noLines += 1
        bigList.append(lines)
print(f"Number of lines read = {noLines}")
print(f"First 10 list values\n {bigList[:10]}")
        
        

Number of lines read = 2663
First 10 list values
 [['election_dt', 'county_name', 'polling_place_id', 'polling_place_name', 'precinct_name', 'house_num', 'street_name', 'city', 'state', 'zip'], ['11/03/2020', 'ALAMANCE', '1', 'ALAMANCE CIVITAN CLUB HOUSE', 'COBLE', '3328', 'DOCTOR PICKETT RD', 'BURLINGTON', 'NC', '27215'], ['11/03/2020', 'ALAMANCE', '10', 'ELMIRA COMMUNITY CENTER', 'BURLINGTON 7', '810', 'WICKER ST', 'BURLINGTON', 'NC', '27217'], ['11/03/2020', 'ALAMANCE', '12', 'FIRST BAPTIST CHURCH OF ELON', 'NORTH BOONE', '621', 'HAGGARD AVE', 'ELON', 'NC', '27244'], ['11/03/2020', 'ALAMANCE', '13', 'LAKEVIEW COMMUNITY CHURCH', 'FAUCETTE', '101', 'BOONE RD', 'BURLINGTON', 'NC', '27217'], ['11/03/2020', 'ALAMANCE', '14', 'FELLOWSHIP BAPTIST CHURCH', 'GRAHAM 4', '2744', 'MAPLE AVE', 'BURLINGTON', 'NC', '27215'], ['11/03/2020', 'ALAMANCE', '15', 'FIRST PRESBYTERIAN CHURCH', 'WEST BURLINGTON', '508', 'DAVIS ST', 'BURLINGTON', 'NC', '27215'], ['11/03/2020', 'ALAMANCE', '16', 'GRAHAM CIVI