Python CSV 

What is a CSV file?

CSV stands Comma Separated Value and at its core consists of plain text separated by a delimeter, adding some structure to the data. So what does this mean? Well let's a look and see. 

First we open the file in question using python's built in ```open()``` [function](https://docs.python.org/2/library/functions.html#open).

In [2]:
f = open('Birthdays.csv', 'r')

So what do we have now? This is what's called a file object. We then use python's built in read function to read the file into a variable.

In [3]:
data = f.read()

The file data is now loaded in the variable so we can close the file to free up resources.

In [4]:
f.close()

Data is one giant string. Let's take a look at a slice. 

In [5]:
data[:200]

'"","state","year","month","day","date","wday","births"\n"1","AK",1969,1,1,1969-01-01,"Wed",14\n"2","AL",1969,1,1,1969-01-01,"Wed",174\n"3","AR",1969,1,1,1969-01-01,"Wed",78\n"4","AZ",1969,1,1,1969-01-01,"'

So as you can see here we have a bunch of characters that somehow through the magic of python will turn into some structured data. Let's break it down. First thing we notice is that there is a comma after every value, hence CSV. Also, every few values there is a character "\n". In python this is a the new line character. So we have a set of values separate by commas followed by a new line. This is just columns and rows. Every new line is a new row. If we break the data up on the new lines we can generate a list of rows. Let's see this in action. 

In [6]:
rows = data.split('\n')

Let's take a look at the first 5 rows. Looking alot more tabular. 

In [7]:
rows[:5]

['"","state","year","month","day","date","wday","births"',
 '"1","AK",1969,1,1,1969-01-01,"Wed",14',
 '"2","AL",1969,1,1,1969-01-01,"Wed",174',
 '"3","AR",1969,1,1,1969-01-01,"Wed",78',
 '"4","AZ",1969,1,1,1969-01-01,"Wed",84']

In [8]:
header = rows[0][1:]
rows = rows[1:]

In [9]:
head = list()
header = header.split(',')
for i in header:
    i = i.replace('"','')
    head.append(i)

head = head[1:]


In [10]:
row_data = list()
for row in rows:
    row = row.split(',')
    add_row = []
    for i in row:
        i = i.replace('"','')
        add_row.append(i)
    row_data.append(add_row[1:])



In [11]:
row_data[:10]

[['AK', '1969', '1', '1', '1969-01-01', 'Wed', '14'],
 ['AL', '1969', '1', '1', '1969-01-01', 'Wed', '174'],
 ['AR', '1969', '1', '1', '1969-01-01', 'Wed', '78'],
 ['AZ', '1969', '1', '1', '1969-01-01', 'Wed', '84'],
 ['CA', '1969', '1', '1', '1969-01-01', 'Wed', '824'],
 ['CO', '1969', '1', '1', '1969-01-01', 'Wed', '100'],
 ['CT', '1969', '1', '1', '1969-01-01', 'Wed', '90'],
 ['DC', '1969', '1', '1', '1969-01-01', 'Wed', '88'],
 ['DE', '1969', '1', '1', '1969-01-01', 'Wed', '32'],
 ['FL', '1969', '1', '1', '1969-01-01', 'Wed', '288']]

Python has a built csv module that allows for reading and writing to CSV.

In [12]:
import csv

In [16]:
with open('Birthdays.csv', newline='') as csvfile:
    reader = csv.reader(csvfile)
    row_data = [row for row in reader]
        

In [17]:
row_data[:10]

[['', 'state', 'year', 'month', 'day', 'date', 'wday', 'births'],
 ['1', 'AK', '1969', '1', '1', '1969-01-01', 'Wed', '14'],
 ['2', 'AL', '1969', '1', '1', '1969-01-01', 'Wed', '174'],
 ['3', 'AR', '1969', '1', '1', '1969-01-01', 'Wed', '78'],
 ['4', 'AZ', '1969', '1', '1', '1969-01-01', 'Wed', '84'],
 ['5', 'CA', '1969', '1', '1', '1969-01-01', 'Wed', '824'],
 ['6', 'CO', '1969', '1', '1', '1969-01-01', 'Wed', '100'],
 ['7', 'CT', '1969', '1', '1', '1969-01-01', 'Wed', '90'],
 ['8', 'DC', '1969', '1', '1', '1969-01-01', 'Wed', '88'],
 ['9', 'DE', '1969', '1', '1', '1969-01-01', 'Wed', '32']]