As we've often discussed in the course, computers are dumb. I say this, writing weeks before the course actually meets, confident that we will say it at least once. Computers are very bad at inferring things, and your data needs to be clearly structured in order to work with it properly. Because of these difficulties, it is no accident that a number of different best practices have emerged for working with data. One of the most common formats for storing data that you will interact with is a Comma Separated Value (CSV) file. We will spend just a little time working with this file type today, as this form of data cleaning forms the basis for a lot of the work you are likely to do with Python. 

A CSV file is, just as it sounds, a series of data fields separated by commas, often with headers at the top. And there is one data field per line:
```
id,last_name, first_name, cool?
0,reed,ethan,so cool
1,walsh,brandon,the coolest
```
And so on. By keeping careful track of our data in this way we are essentially creating a spreadsheet. This is good, because computers are quite good at reading spreadsheets.

First, we'll pull in the CSV file that we need:

In [None]:
import csv
with open('csvs/basic.csv', 'r') as csvfile:
    our_reader = csv.reader(csvfile)
    data = [row for row in our_reader]
data

What I've done above is open the CSV file. Then, using the csv.reader function, we walk over the CSV file to see what is inside. The fourth line above is where we actually construct the data in a way we can work with it - we walk over every row in the table and smash those rows into a list called 'data'. At the end, data is a list of lists, and each sublist contains one row of the CSV file. We can use list indexing to explore parts of the data. We might be interested in the second row. Indexing once will give you the row you're interested in:

In [7]:
data[1]

['0', 'reed', 'ethan', 'so cool']

Indexing twice will let you select first the row and then the column.

In [8]:
data[1][2]

'ethan'

Since these are just lists, we can do anything to them that we might want to do to lists.

In [24]:
len(data)

# find the length of each first name
for row in data:
    print(len(row[2]))
    
# find the longest first name
longest = ""
for row in data:
    if len(row[2]) > len(longest):
        longest = row[2]
print(longest)

# construct a new list consisting of only the last names we have here.
last_names = [row[2] for row in data]
last_names.reverse()
print(last_names)

10
5
7
 last_name
['brandon', 'ethan', ' last_name']


Since our CSV is just a list of lists, we could add to it by adding another row. And that's as easy as adding a new list:

In [25]:
new_row = [2,'wayne','graham','meh']
data.append(new_row)
data

[['id', 'first_name', ' last_name', ' cool?'],
 ['0', 'reed', 'ethan', 'so cool'],
 ['1', 'walsh', 'brandon', 'the coolest'],
 [2, 'wayne', 'graham', 'meh']]

We could go on and on, adding to our CSV one row at a time. Let's try something else.

In [29]:
a_row = [3,'fox','eliza','SO COOL']
data + a_row

[['id', 'first_name', ' last_name', ' cool?'],
 ['0', 'reed', 'ethan', 'so cool'],
 ['1', 'walsh', 'brandon', 'the coolest'],
 [2, 'wayne', 'graham', 'meh'],
 3,
 'fox',
 'eliza',
 'SO COOL']

What happened? Take a look and see if you can tell.

When we tried adding our new list to our collection of lists, it broke it apart and tried to add the individual items. You can tell this because the brackets disappear and we start getting rows of one item each.

We're starting to get to the point where we could use some more sophisticated ways of working with data. You might have noticed that I'm doing a LOT of looping. There are sometimes easier ways to work with your data than this. The way we're interacting with this CSV as a list of lists is really slow. There are other data structures that let you suck in a csv and, say, quickly get a particular column without having to loop over it first. If you're interested in learning better ways for interacting with there's a [great book on using Python for data science](https://github.com/jakevdp/PythonDataScienceHandbook) that I can't recommend enough. We'll just touch on a few elements on this book. Let's import a new library and then actually read them in again using the pandas library.

In [31]:
import pandas
our_csv_through_pandas = pandas.read_csv('csvs/basic.csv')
our_csv_through_pandas

Unnamed: 0,id,last_name,first_name,cool?
0,0,reed,ethan,so cool
1,1,walsh,brandon,the coolest


Woah look at that! It spat the table out for us! Pandas is _powerful_. For one, it indexes our data without us having to track individual id's for things. Here are a few things you can do with it.

In [33]:
our_csv_through_pandas.columns

Index(['id', 'last_name', ' first_name', ' cool?'], dtype='object')

In [35]:
our_csv_through_pandas.first_name

AttributeError: 'DataFrame' object has no attribute 'first_name'

to do - write to csv, read in longer data from csv as an example. some exercises for each lesson