### Introduction to pandas!

What we will learn in this notebook:
- how to open and read in a csv spreadsheet
- how to look at the data we have
- how to select columns
- how to do some math with them


First we need to import pandas as a library. We import the library and then tell Python to refer to it as `pd`:

In [1]:
import pandas as pd

### Reading spreadsheets

This is how you read a spreadsheet and assign it to a variable:

In [2]:
census_data = pd.read_csv('../data/2016_census_data.csv')

### Looking at your data

To look at the data you just read into Python, you can just run a cell with the name of the variable:

In [3]:
census_data

Unnamed: 0,geoid,name,total_population,median_income,median_home_value,educational_attainment,white_alone,black_alone,native,asian,native_hawaiian_pacific_islander,some_other_race_alone,two_or_more,hispanic_or_latino
0,34003001000,"Census Tract 10, Bergen County, New Jersey",6767,151641.0,680000.0,3045.0,5667.0,75.0,0.0,759.0,0.0,0.0,132.0,134.0
1,34003002100,"Census Tract 21, Bergen County, New Jersey",1522,114545.0,2000001.0,836.0,788.0,141.0,0.0,444.0,0.0,0.0,27.0,122.0
2,34003002200,"Census Tract 22, Bergen County, New Jersey",5389,90647.0,453800.0,1791.0,3481.0,99.0,9.0,1247.0,0.0,36.0,19.0,504.0
3,34003002300,"Census Tract 23, Bergen County, New Jersey",5828,112031.0,610000.0,2363.0,3595.0,89.0,37.0,1627.0,0.0,0.0,32.0,448.0
4,34003003100,"Census Tract 31, Bergen County, New Jersey",4946,76906.0,301900.0,1588.0,1803.0,306.0,0.0,1435.0,0.0,13.0,24.0,1365.0
5,34003003200,"Census Tract 32, Bergen County, New Jersey",5044,69531.0,322400.0,1417.0,1342.0,186.0,19.0,1882.0,0.0,6.0,64.0,1564.0
6,34003003300,"Census Tract 33, Bergen County, New Jersey",6638,97957.0,328100.0,1737.0,2437.0,400.0,0.0,2131.0,0.0,0.0,148.0,1522.0
7,34003003401,"Census Tract 34.01, Bergen County, New Jersey",2958,122650.0,385200.0,941.0,1704.0,109.0,0.0,520.0,0.0,0.0,36.0,589.0
8,34003003402,"Census Tract 34.02, Bergen County, New Jersey",3827,105776.0,356100.0,1237.0,1937.0,260.0,0.0,733.0,0.0,4.0,122.0,771.0
9,34003003500,"Census Tract 35, Bergen County, New Jersey",4100,52382.0,340200.0,891.0,886.0,502.0,16.0,1160.0,0.0,0.0,59.0,1493.0


Oops, that's a little long. Maybe we just want to see the first 10 rows:

In [4]:
census_data.head(10)

Unnamed: 0,geoid,name,total_population,median_income,median_home_value,educational_attainment,white_alone,black_alone,native,asian,native_hawaiian_pacific_islander,some_other_race_alone,two_or_more,hispanic_or_latino
0,34003001000,"Census Tract 10, Bergen County, New Jersey",6767,151641.0,680000.0,3045.0,5667.0,75.0,0.0,759.0,0.0,0.0,132.0,134.0
1,34003002100,"Census Tract 21, Bergen County, New Jersey",1522,114545.0,2000001.0,836.0,788.0,141.0,0.0,444.0,0.0,0.0,27.0,122.0
2,34003002200,"Census Tract 22, Bergen County, New Jersey",5389,90647.0,453800.0,1791.0,3481.0,99.0,9.0,1247.0,0.0,36.0,19.0,504.0
3,34003002300,"Census Tract 23, Bergen County, New Jersey",5828,112031.0,610000.0,2363.0,3595.0,89.0,37.0,1627.0,0.0,0.0,32.0,448.0
4,34003003100,"Census Tract 31, Bergen County, New Jersey",4946,76906.0,301900.0,1588.0,1803.0,306.0,0.0,1435.0,0.0,13.0,24.0,1365.0
5,34003003200,"Census Tract 32, Bergen County, New Jersey",5044,69531.0,322400.0,1417.0,1342.0,186.0,19.0,1882.0,0.0,6.0,64.0,1564.0
6,34003003300,"Census Tract 33, Bergen County, New Jersey",6638,97957.0,328100.0,1737.0,2437.0,400.0,0.0,2131.0,0.0,0.0,148.0,1522.0
7,34003003401,"Census Tract 34.01, Bergen County, New Jersey",2958,122650.0,385200.0,941.0,1704.0,109.0,0.0,520.0,0.0,0.0,36.0,589.0
8,34003003402,"Census Tract 34.02, Bergen County, New Jersey",3827,105776.0,356100.0,1237.0,1937.0,260.0,0.0,733.0,0.0,4.0,122.0,771.0
9,34003003500,"Census Tract 35, Bergen County, New Jersey",4100,52382.0,340200.0,891.0,886.0,502.0,16.0,1160.0,0.0,0.0,59.0,1493.0


Or the last four:

In [5]:
census_data.tail(4)

Unnamed: 0,geoid,name,total_population,median_income,median_home_value,educational_attainment,white_alone,black_alone,native,asian,native_hawaiian_pacific_islander,some_other_race_alone,two_or_more,hispanic_or_latino
4696,42103950702,"Census Tract 9507.02, Pike County, Pennsylvania",3119,59239.0,151100.0,405.0,2908.0,44.0,0.0,60.0,0.0,0.0,19.0,88.0
4697,42103950801,"Census Tract 9508.01, Pike County, Pennsylvania",4403,55530.0,120000.0,718.0,2777.0,705.0,0.0,53.0,0.0,0.0,97.0,771.0
4698,42103950802,"Census Tract 9508.02, Pike County, Pennsylvania",6004,50724.0,146700.0,795.0,3072.0,970.0,20.0,44.0,0.0,0.0,11.0,1887.0
4699,42103950900,"Census Tract 9509, Pike County, Pennsylvania",4184,49453.0,146100.0,721.0,3888.0,55.0,29.0,22.0,0.0,0.0,9.0,181.0


or we want to know the length of the entire set:

In [6]:
len(census_data)

4700

How about selecting columns? This is how you can do that:

In [7]:
census_data['black_alone']

0        75.0
1       141.0
2        99.0
3        89.0
4       306.0
5       186.0
6       400.0
7       109.0
8       260.0
9       502.0
10      145.0
11      462.0
12        0.0
13      124.0
14       68.0
15      145.0
16      297.0
17       28.0
18       86.0
19       64.0
20       11.0
21       36.0
22       95.0
23      146.0
24      271.0
25      208.0
26      471.0
27      150.0
28      274.0
29      282.0
        ...  
4670    206.0
4671     18.0
4672     17.0
4673    151.0
4674      2.0
4675      0.0
4676    103.0
4677    549.0
4678    809.0
4679    465.0
4680     97.0
4681    324.0
4682      7.0
4683     10.0
4684     14.0
4685    362.0
4686    107.0
4687     28.0
4688      0.0
4689     81.0
4690    237.0
4691     33.0
4692     16.0
4693    333.0
4694     71.0
4695     12.0
4696     44.0
4697    705.0
4698    970.0
4699     55.0
Name: black_alone, Length: 4700, dtype: float64

You can also select multiple columns:

In [8]:
column_names  = ['black_alone', 'native']
census_data[column_names]

Unnamed: 0,black_alone,native
0,75.0,0.0
1,141.0,0.0
2,99.0,9.0
3,89.0,37.0
4,306.0,0.0
5,186.0,19.0
6,400.0,0.0
7,109.0,0.0
8,260.0,0.0
9,502.0,16.0


### Doing math with your data
There are a few nifty functions you can apply to your data columns. 

In [9]:
census_data['black_alone'].sum()

3155672.0

In [10]:
census_data['black_alone'].median()

198.5

In [11]:
census_data['black_alone'].mean()

671.4195744680851

There is also this nifty function which gives you a quick overview of your data:

In [12]:
census_data['black_alone'].describe()

count     4700.000000
mean       671.419574
std       1041.216791
min          0.000000
25%         47.000000
50%        198.500000
75%        889.500000
max      17123.000000
Name: black_alone, dtype: float64

### Making new data columns
You can make a new column based on two columns like so:

In [13]:
census_data['black_alone_percentage'] = census_data['black_alone']/census_data['total_population']

In [14]:
census_data.head()

Unnamed: 0,geoid,name,total_population,median_income,median_home_value,educational_attainment,white_alone,black_alone,native,asian,native_hawaiian_pacific_islander,some_other_race_alone,two_or_more,hispanic_or_latino,black_alone_percentage
0,34003001000,"Census Tract 10, Bergen County, New Jersey",6767,151641.0,680000.0,3045.0,5667.0,75.0,0.0,759.0,0.0,0.0,132.0,134.0,0.011083
1,34003002100,"Census Tract 21, Bergen County, New Jersey",1522,114545.0,2000001.0,836.0,788.0,141.0,0.0,444.0,0.0,0.0,27.0,122.0,0.092641
2,34003002200,"Census Tract 22, Bergen County, New Jersey",5389,90647.0,453800.0,1791.0,3481.0,99.0,9.0,1247.0,0.0,36.0,19.0,504.0,0.018371
3,34003002300,"Census Tract 23, Bergen County, New Jersey",5828,112031.0,610000.0,2363.0,3595.0,89.0,37.0,1627.0,0.0,0.0,32.0,448.0,0.015271
4,34003003100,"Census Tract 31, Bergen County, New Jersey",4946,76906.0,301900.0,1588.0,1803.0,306.0,0.0,1435.0,0.0,13.0,24.0,1365.0,0.061868


To overwrite your previous data you can just re-assign the column new values, the way you do with any variable: 

In [15]:
census_data['black_alone_percentage'] = (census_data['black_alone']/census_data['total_population'])*100

In [16]:
census_data.head()

Unnamed: 0,geoid,name,total_population,median_income,median_home_value,educational_attainment,white_alone,black_alone,native,asian,native_hawaiian_pacific_islander,some_other_race_alone,two_or_more,hispanic_or_latino,black_alone_percentage
0,34003001000,"Census Tract 10, Bergen County, New Jersey",6767,151641.0,680000.0,3045.0,5667.0,75.0,0.0,759.0,0.0,0.0,132.0,134.0,1.10832
1,34003002100,"Census Tract 21, Bergen County, New Jersey",1522,114545.0,2000001.0,836.0,788.0,141.0,0.0,444.0,0.0,0.0,27.0,122.0,9.264126
2,34003002200,"Census Tract 22, Bergen County, New Jersey",5389,90647.0,453800.0,1791.0,3481.0,99.0,9.0,1247.0,0.0,36.0,19.0,504.0,1.837076
3,34003002300,"Census Tract 23, Bergen County, New Jersey",5828,112031.0,610000.0,2363.0,3595.0,89.0,37.0,1627.0,0.0,0.0,32.0,448.0,1.527111
4,34003003100,"Census Tract 31, Bergen County, New Jersey",4946,76906.0,301900.0,1588.0,1803.0,306.0,0.0,1435.0,0.0,13.0,24.0,1365.0,6.186818


A quick sorting function can now help you find the spots with the highest or lowest black populations (we can go over this again next week):

In [17]:
census_data.sort_values(by='black_alone_percentage', ascending = False) 

Unnamed: 0,geoid,name,total_population,median_income,median_home_value,educational_attainment,white_alone,black_alone,native,asian,native_hawaiian_pacific_islander,some_other_race_alone,two_or_more,hispanic_or_latino,black_alone_percentage
2485,36047085200,"Census Tract 852, Kings County, New York",8,-666666666.0,-666666666.0,0.0,0.0,8.0,0.0,0.0,0.0,0.0,0.0,0.0,100.000000
3185,36061029700,"Census Tract 297, New York County, New York",18,-666666666.0,-666666666.0,0.0,0.0,18.0,0.0,0.0,0.0,0.0,0.0,0.0,100.000000
3699,36081061000,"Census Tract 610, Queens County, New York",1354,76818.0,407500.0,308.0,11.0,1324.0,0.0,0.0,0.0,0.0,19.0,0.0,97.784343
216,34013004600,"Census Tract 46, Essex County, New Jersey",2656,26369.0,227200.0,258.0,0.0,2585.0,0.0,0.0,0.0,49.0,22.0,0.0,97.326807
3700,36081061200,"Census Tract 612, Queens County, New York",1747,117361.0,423600.0,410.0,0.0,1682.0,0.0,0.0,0.0,18.0,19.0,28.0,96.279336
215,34013004500,"Census Tract 45, Essex County, New Jersey",3115,38512.0,211700.0,296.0,29.0,2996.0,0.0,33.0,0.0,0.0,14.0,43.0,96.179775
2541,36047098400,"Census Tract 984, Kings County, New York",2143,67961.0,431600.0,433.0,68.0,2061.0,0.0,14.0,0.0,0.0,0.0,0.0,96.173588
217,34013004700,"Census Tract 47, Essex County, New Jersey",5508,45224.0,232000.0,872.0,27.0,5284.0,0.0,72.0,0.0,0.0,64.0,61.0,95.933188
2522,36047093200,"Census Tract 932, Kings County, New York",1256,86563.0,446400.0,296.0,18.0,1204.0,0.0,0.0,0.0,0.0,0.0,34.0,95.859873
2483,36047084800,"Census Tract 848, Kings County, New York",1491,54120.0,432100.0,301.0,2.0,1425.0,0.0,34.0,0.0,0.0,8.0,22.0,95.573441
