# How to use and manipulate CSV files in Python

## Part 1 - What is a CSV file

A csv file is a type of file where the data is structured into rows and columns, using commas and new lines. They're commonly used to represent data in a spreadsheet - such as from Microsoft Excel or Google Sheets.

For example, if I had this spreadsheet on Google Sheets:

![http://i.imgur.com/9ig18Kt.png](http://i.imgur.com/9ig18Kt.png)

And then I went to File->Download As->Comma-separated values and downloaded the file, it would look like this:

In this example, the first row has three things separated by commas. For each of the following row, the first item correspons to the first item in the first row (so Bag of carrots is an item, 3 is a quantity, and $3 is a price per item. 

P.S. If you can't remember which are rows and which are columns, you can think of columns like the roman columns (going up and down), and rows like running (left to right).

This shows probably the biggest reason we care about CSV files. They're structured, so they're easy to read programmatically, but they're also easy to just give to someone who knows nothing about code, so they can just open it in their favorite spreadsheet program. Many times when downloading large data sets from online, they will give you CSV files to read.

## Part 2 - Reading CSV Files

### Part 2a - Reading them manually

Remember that CSV files are just regular files with a standard formatting.
Therefore, you can just read them like a regular file, and get the data you want.

For example, if we want to figure out how much we will have to pay in the end, we have to go through the rows, and multiply the price by the quantity, and add them all up.

In [2]:
csv_text = open("shopping_list.csv").read() # How to turn a file into a string
print (csv_text)

Item,Quantity,Price per item
Bag of carrots,3,$3
Box of cookies,2,$4
Brie Cheese,1,$4


In [3]:
# Now we have to split it on the new lines, so each 
csv_text_split = csv_text.split("\n")
print(csv_text_split)

['Item,Quantity,Price per item', 'Bag of carrots,3,$3', 'Box of cookies,2,$4', 'Brie Cheese,1,$4']


In [4]:
# We can get rid of the first line, since that's just the headers. We know that quantity is index 1, and price is index 2.
csv_text_split.pop(0) # Delete the line at index 0 - since it's the headers

'Item,Quantity,Price per item'

In [5]:
# And now we just iterate through each line, and take out the information we want. 
total_price = 0
for line in csv_text_split: # For each line
    new_line = line.split(",") # split the line on the comma, so the line is now a list [Item,Quantity,Price]
    quantity = int(new_line[1]) # turn the thing at index 1 into an int
    price_per_item = int(new_line[2][1:]) # take the dollar sign off the thing at index 2, and turn it into an int
    
    total_price += quantity*price_per_item

print("Total Price: ${}".format(total_price)) 

Total Price: $21


### Part 2b - Using the CSV library

That wasn't awful - but there's a lot of code in there that would be repeated in all CSVs. In addition, our code doesn't handle some special cases (What if there are commas in the item, for example?)

Because of this, Python comes with a CSV library that makes it extremely easy to turn a CSV file into a list of lists, so that you can parse it more easily. Let me show you how it works:

In [6]:
# First step: import the library
import csv

# Second step: pass the open file into csv.reader
csv_lists = csv.reader(open("shopping_list.csv"))

# Third step: iterate through the file you created:
for line in csv_lists: 
    print(line)

['Item', 'Quantity', 'Price per item']
['Bag of carrots', '3', '$3']
['Box of cookies', '2', '$4']
['Brie Cheese', '1', '$4']


If we want to rewrite the code for finding the total price:

In [7]:
csv_lists = csv.reader(open("shopping_list.csv"))
total_cost = 0
next(csv_lists) # this advances the csv_lists by one

for line in csv_lists:
    quantity = int(line[1])
    price = int(line[2][1:])
    total_cost += quantity*price
print("Total cost is: ${}".format(total_cost))

Total cost is: $21


Something to keep in mind is that csv.reader isn't exactly a list of lists. It just goes through each line, and then becomes empty. So you can't read from csv_lists twice - the second time it will just be empty. This also means you can't do indexing on it. 

In [8]:
for line in csv_lists:
    print(line)

The reason the csv library does this is in case you had a very large CSV file - this way, you don't have to store it all in memory, you can just read it line by line.

To read from it more than once, you can convert it into a list after reading it. This will store the entire list in your computer's memory, and allow you to use it like a list of lists. 

In [9]:
csv_lists = list(csv.reader(open("shopping_list.csv")))
print(csv_lists)

print(csv_lists[1][2])

[['Item', 'Quantity', 'Price per item'], ['Bag of carrots', '3', '$3'], ['Box of cookies', '2', '$4'], ['Brie Cheese', '1', '$4']]
$3


Now you have a list of lists, which is the data from your CSV.

Try writing code that goes through csv_lists, and prints out the item you're spending the most money on.

In [10]:
# answer - will be blank for students:
def most_expensive(csv_lists):
    max_item = ""
    max_price = -1
    for line in csv_lists:
        total_price = int(line[1])*int(line[2][1:])
        if total_price>max_price:
            max_price = total_price
            max_item = line[0]
    
    return max_item
most_expensive(csv_lists[1:])

'Bag of carrots'

You might have noticed a lot of annoying things about working with this library while working with it. For one, you have to drop the first row, since it doesn't contain any data you want. Secondly, you have to refer to the items by index, which means you have to know the index of what you want.

These issues can be solved with the DictReader module of the csv library. Let me show you how that one works, and what it produces:

In [11]:
# csv library is already imported
csv_file = csv.DictReader(open("shopping_list.csv"))

# now let's see what's inside
for line in csv_file:
    print(line)

{'Item': 'Bag of carrots', 'Price per item': '$3', 'Quantity': '3'}
{'Item': 'Box of cookies', 'Price per item': '$4', 'Quantity': '2'}
{'Item': 'Brie Cheese', 'Price per item': '$4', 'Quantity': '1'}


As you can see, the DictReader takes in a CSV file, and gives you a bunch of dictionaries, where the key is the header, and the value is the value at that line. This makes it easy to write very readable code, as you can use the name of the header to get what you want. For example, to rewrite the "total cost" code:

In [13]:
csv_file = csv.DictReader(open("shopping_list.csv"))
total_cost = 0
for line in csv_file:
    quantity = int(line['Quantity'])
    price = int(line['Price per item'][1:])
    total_cost += quantity*price
print("Total cost is: ${}".format(total_cost))

Total cost is: $21


It's up to you which version you want to use - whatever you're more comfortable with and you think looks the best.