# Writing and Reading Files in Python

There are lots of ways to handle reading and writing files in Python, especially for CSV data. 

You already know the method using Python basic functions and you are going to find out two more, using two different libraries : CSV and Pandas. 

Lets create our toy data and explore reading and writing it using the 3 methods.

The data we will create corresponds to students and their corresponding scores on 4 courses. So each data sample contains 4 numbers and the student's name.

In [10]:
csvdata = [[2,4,5,12,"fred"],[34,43,21,43,"annie"], [324,3,43,4,"jean"]]

This creates a list of 3 lists

In [11]:
csvdata

[[2, 4, 5, 12, 'fred'], [34, 43, 21, 43, 'annie'], [324, 3, 43, 4, 'jean']]

In [12]:
csvdata[1]

[34, 43, 21, 43, 'annie']

In [13]:
csvdata[1][2]

21

In order to write our data inside a file, we have to convert all the values in it into a string. for that we use the function `.join`

Join concatenates strings into one string. For example: 

In [14]:
# Remember how join works:
','.join(["fred", "annie", "howard"])  # creates a string separated by ,

'fred,annie,howard'

If we use it to join all the values inside the first data row for example:

In [15]:
','.join(csvdata[1])

TypeError: sequence item 0: expected str instance, int found

It won't work because of the numbers in it! to solve this, we have to change the type of numbers from integer (int) to string (str)

In [16]:
# change the type of numbers from int to str 
row = csvdata[1]
','.join([ str(row[0]), str(row[1]), str(row[2]), str(row[3]), str(row[4]) ])

'34,43,21,43,annie'

or you can make it all strings first:

In [17]:
row = [str(item) for item in row]
print(row)
','.join(row)

['34', '43', '21', '43', 'annie']


'34,43,21,43,annie'

Now that you have learned how to convert the values in a data list to strings, lets start by exploring using basic functions of python to write and read files.

## Using basic functions of Python

To write a string to a file, we use the Python build-in function .write().

**When we open a file for writing, we need to say "w" for the write operation. Read is the default (but you can also say 'r' if you prefer when reading.)**

*Note*: In Python3, if you have an error with characters it can't read, you can get around it by saying
`errors="ignore"` in your file() function.
### Writing 

In [18]:
with open("data/myfile.csv", "w", errors="ignore") as handle:
    # up here, print your headers to the file:
    handle.write("Score1,Score2,Score3,Score4,Name\n")
    for row in csvdata:
        # we loop through the data -- but we have to make it a string
        # to write it with the plain file write command.
        # each string has to end in a \n -- new line.
        #handle.write("Some string")
        row = [str(item) for item in row]
        print(row)
        handle.write(','.join(row) + "\n")

['2', '4', '5', '12', 'fred']
['34', '43', '21', '43', 'annie']
['324', '3', '43', '4', 'jean']


**the option "w" overwrites the file **

#### Wrap It In a Function! 

Lets wrap the above in a function that takes as argument the filepath, the data and the headers which contains the names of the cilumns.

In [None]:
def write_csv(filepath, data, headers):
    """ Takes the path of thefile to write to, data in list, and header string."""
    with open(filepath, "w", errors="ignore") as handle:
        # up here, print your headers to the file:
        handle.write(headers)
        for row in data:
            # we loop through the data -- but we have to make it a string
            # to write it with the plain file write command.
            # each string has to end in a \n -- new line.
            #handle.write("Some string")
            row = [str(item) for item in row]
            handle.write(','.join(row) + "\n")
        print("wrote file %s" % filepath)

In [None]:
header = "Score1,Score2,Score3,Score4,Name\n"

# call the function with the arguments:
write_csv("data/myfile1.csv", csvdata, header)

### Reading
To read we use the .read() function. This function returns a string, which is not convenient to manage data

In [None]:
with open("data/myfile.csv","r") as handle:
    data = handle.read()
    print(data)

## Using the CSV Module

docs: https://docs.python.org/3.6/library/csv.html

### Writing

In [23]:
# Writing with it:
import csv

with open('data/myfile3.csv', 'w', newline='\n') as csvfile:
    spamwriter = csv.writer(csvfile, delimiter=',')#initialize a writer with delimiter ","
    for row in csvdata:
        spamwriter.writerow(row)

In [24]:
# unix, probably won't work on windows - you can go find the file and look at it.
!cat data/myfile3.csv

2,4,5,12,fred
34,43,21,43,annie
324,3,43,4,jean


### Reading
Reading is done using the function csv.reader()

In [22]:
# reading a csv file using csv -- notice, no 'w', so 'r' is assumed:

with open('data/myfile.csv', 'r', errors='ignore') as csvfile:
    data2 = csv.reader(csvfile, delimiter=',')
    for row in data2:
        print("raw row looks like this:", row)
        # make it prettier with:
        print("Prettier", ', '.join(row))

raw row looks like this: ['Score1', 'Score2', 'Score3', 'Score4', 'Name']
Prettier Score1, Score2, Score3, Score4, Name
raw row looks like this: ['2', '4', '5', '12', 'fred']
Prettier 2, 4, 5, 12, fred
raw row looks like this: ['34', '43', '21', '43', 'annie']
Prettier 34, 43, 21, 43, annie
raw row looks like this: ['324', '3', '43', '4', 'jean']
Prettier 324, 3, 43, 4, jean


### CSV Dict files
#### DictReader
If you want to read the data as a dictionary. The CSV module has a function called `DictReader()`
Each row of your data will be a dictionary.

If your first row has labels in it, you can return a dictionary using the CSV DictReader.

In [17]:
import csv

mydata = []
with open('data/myfile.csv', errors='ignore') as csvfile:
    reader = csv.DictReader(csvfile)
    for row in reader:
        print("the raw dictionary", row)
        # accessing certain columns:
        print(row['Score1'], row['Score2'], row['Name'])
        mydata.append(row)

FileNotFoundError: [Errno 2] No such file or directory: 'data/myfile.csv'

Likewise, you can write dictionary data out as a csv using the DictWriter.  Up above, we collected the rows into a list called mydata.

In [None]:
mydata

In [None]:
mydata.keys() # does not work because rows are dictionaries and not the mydata

In [None]:
# These are the column headers for the data file.  Remember there is no order here.
mydata[0].keys()

#### DictWriter
The function DictWriter is used to write data to a csv file starting from a dictionary. 
In our example, we have read the csv data as a set of dictionaries and put the results in mydata.
In the following example, we open a new file called "myfile3.csv" then using `DictWriter`, `writeheader()` and `writerow` functions, we write the data into the file.

In `DictWriter`, We can set the delimiter between fields to whatever we want, including tab - \t.

Notice you have kind of random order on the fields. That's because we just got the keys from the first item, and didn't specify the order. if you say the order, it's controlled.

In [None]:
with open('data/myfile3.csv', 'w', errors='ignore') as csvfile:
    writer = csv.DictWriter(csvfile, delimiter='\t', fieldnames=list(mydata[0].keys()))
    writer.writeheader()
    for row in mydata:
        # accessing certain columns:
        writer.writerow(row)

Notice what fieldnames gives :

In [None]:
list(mydata[0].keys())

cat is a linux command that permits to concatenate and view the contents of files. To use it here to show the content of myfile3.csv:

In [None]:
!cat data/myfile3.csv

Here we specify the field order and it controls how it gets written out.

In [None]:
with open('data/myfile4.csv', 'w', errors='ignore') as csvfile:
    writer = csv.DictWriter(csvfile, delimiter='\t', fieldnames=['Name', 'Score1', 'Score2', 'Score3', 'Score4'])
    writer.writeheader()
    for row in mydata:
        # accessing certain columns:
        writer.writerow(row)

In [None]:
!cat data/myfile4.csv

##  Using the pandas library 
### Reading

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html

In [1]:
#import the library pandas
import pandas as pd 

header = 0 indicates the first row is headers, it's the default value

if no headers, put header = None 

to look at the different args you can pass in this function, read the doc

In [2]:
# Chargement CSV
data_path = "./data/goog.csv"
data = pd.read_csv(data_path, delimiter=',',header = 0)

data is a pandas DataFrame.

.head() prints only the five first rows of the DataFrame

In [3]:
data.head()

Unnamed: 0,Date,Open,High,Low,Close,Volume
0,2010-01-04,313.16,314.44,311.81,313.06,
1,2010-01-05,313.28,313.61,310.46,311.68,
2,2010-01-06,312.62,312.62,302.88,303.83,
3,2010-01-07,304.4,304.7,296.03,296.75,
4,2010-01-08,295.7,301.32,294.26,300.71,


To access certain values in a dataframe, we use iloc

In [None]:
print(data.columns)
print('\n')
print(data.iloc[1,1])
print("\n")
print(data.iloc[0,:])
print("\n")
print(data.iloc[1:5,3])
print("\n")
data['Date'].head()

**The week 2 is about pandas DataFrames and how to use them**

### Writing 

To write data into a csv file: `to_csv(filename,sep = ' ')`
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_csv.html

sep and delimiter are equivalent, see the doc for other arguments


In [None]:
data.to_csv('./data/myfile5.csv', sep = ',')


** The function overwrites the file**

In [None]:
cat ./data/myfile5.csv