### Introduction to pandas!

What we will learn in this notebook:
- how to open and read in a csv spreadsheet
- how to look at the data we have
- how to select columns
- how to do some math with them


First we need to import pandas as a library. We import the library and then tell Python to refer to it as `pd`:

In [1]:
import pandas as pd
import matplotlib.pyplot as plot
import numpy 
import seaborn 
import datetime

### Reading spreadsheets

This is how you read a spreadsheet and assign it to a variable:

In [2]:
# Bring in all my spreadsheets 
voters1 = pd.read_csv('SC-List-Asof-10-15-18-Pt1.csv')
voters2 = pd.read_csv('SC-List-Asof-10-15-18-Pt2.csv')
voters3 = pd.read_csv('SC-List-Asof-10-15-18-Pt3.csv')

  interactivity=interactivity, compiler=compiler, result=result)
  interactivity=interactivity, compiler=compiler, result=result)
  interactivity=interactivity, compiler=compiler, result=result)


### Looking at your data

To look at the data you just read into Python, you can just run a cell with the name of the variable:

In [3]:
voters1.head(5)

Unnamed: 0,SOS_VoterID,idnumber,voter_status,party_code,lastname,firstname,middlename,namesuffix,streetnumber,streetbuilding,...,party_code40,registration_date,party_affiliation_date,last_activity_date,precinct_group,phone_number,ID_Compliant,Absentee_Category,Absentee_Category_Date,Ethnicity
0,2140230000.0,80139849,S,,ADAMS,DOMINIQUE,ALEXIS,,3400,,...,,10/10/17,10/10/17,2/8/18,,,Y,,,
1,1128803000.0,287,A,,ADAMS,WILLIE,RAY,,2801,,...,,1/4/76,,1/28/15,,,Y,,,
2,2140351000.0,80140000,A,,AGUILAR,EDGAR,IVAN,,1121,,...,,10/13/17,10/13/17,7/27/18,,,Y,,,
3,1129249000.0,242762,A,,AGUILAR,MARIA,CRUZ,,1121,,...,,3/22/01,,1/28/15,,,Y,,,
4,1182291000.0,80077967,A,,ALARCON,JAZMIN,,,269,,...,,1/24/12,,10/4/17,,,Y,,,


In [4]:
voters1.tail(5)

Unnamed: 0,SOS_VoterID,idnumber,voter_status,party_code,lastname,firstname,middlename,namesuffix,streetnumber,streetbuilding,...,party_code40,registration_date,party_affiliation_date,last_activity_date,precinct_group,phone_number,ID_Compliant,Absentee_Category,Absentee_Category_Date,Ethnicity
89191,2148931000.0,80149269,A,,BURGESS,ISIAHA,JAVONE,,12993,,...,,9/27/18,9/27/18,9/28/18,,,Y,,,
89192,1129061000.0,146819,A,,BURKHAM,MARLON,RANDALL,,15075,,...,,8/17/07,,3/10/15,,,Y,,,
89193,1177610000.0,80073294,A,,BURKS-JONES,BESSIE,LEE,,13652,,...,,3/30/11,,8/27/14,,,Y,,,
89194,1128479000.0,237499,A,,BURNETTE,SHERRY,KOTON,,7259,,...,,10/6/00,,1/28/15,,,Y,,,
89195,1129018000.0,150343,A,,BURNS,FAIRY,WARD,,13065,,...,,8/24/88,,11/10/14,,,Y,,,


In [12]:
# Find out the length of one of the tables 
len(voters2)

18712

In [5]:
# The pandas equivalent of Excel's transpose 
voters1.T

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,89186,89187,89188,89189,89190,89191,89192,89193,89194,89195
SOS_VoterID,2.14023e+09,1.1288e+09,2.14035e+09,1.12925e+09,1.18229e+09,1.14622e+09,1.21249e+09,2.1444e+09,1.12855e+09,1.21572e+09,...,1.12834e+09,1.12857e+09,1.12857e+09,2.1362e+09,1.12891e+09,2.14893e+09,1.12906e+09,1.17761e+09,1.12848e+09,1.12902e+09
idnumber,80139849,287,80140000,242762,80077967,301628,80104869,80144417,192678,80107559,...,56531,222574,222576,80135295,6332,80149269,146819,80073294,237499,150343
voter_status,S,A,A,A,A,A,A,A,A,S,...,A,A,A,A,A,A,A,A,A,A
party_code,,,,,,,,,,,...,,,,,,,,,,
lastname,ADAMS,ADAMS,AGUILAR,AGUILAR,ALARCON,ALLEN,ALONSO,ALONSO,ALVARADO,ALVAREZ,...,BUESS,BUHRKUHL,BUHRKUHL,BUNTZ,BURCH,BURGESS,BURKHAM,BURKS-JONES,BURNETTE,BURNS
firstname,DOMINIQUE,WILLIE,EDGAR,MARIA,JAZMIN,SCOTT,SILVESTRE,YASMELI,JUAN,ARISTOTELES,...,FRANCES,CATHERINE,DAVID,JEREMY,LAFON,ISIAHA,MARLON,BESSIE,SHERRY,FAIRY
middlename,ALEXIS,RAY,IVAN,CRUZ,,W,,,C,,...,LILLIAN,DEBORAH,ROSSER,JACKSON,HELM,JAVONE,RANDALL,LEE,KOTON,WARD
namesuffix,,,,,,,JR,,,,...,,,,,,,,,,
streetnumber,3400,2801,1121,1121,269,5075,3400,3400,5375,712,...,13124,16218,16218,13127,16051,12993,15075,13652,7259,13065
streetbuilding,,,,,,,,,,,...,,,,,,,,,,


In [6]:
# Smooshing together all the early voting data
# This is like copying and pasting them together, quickly 
frames = [voters1, voters2, voters3]
voters2018 = pd.concat(frames, sort=False)

In [7]:
# Show me the top of my new dataframe
voters2018.head(5)

Unnamed: 0,SOS_VoterID,idnumber,voter_status,party_code,lastname,firstname,middlename,namesuffix,streetnumber,streetbuilding,...,JP3,Unnamed: 62,REP,16PR,REP.1,14GE,P.1,E.2,7/30/10,8/13/12
0,2140230000.0,80139849.0,S,,ADAMS,DOMINIQUE,ALEXIS,,3400.0,,...,,,,,,,,,,
1,1128803000.0,287.0,A,,ADAMS,WILLIE,RAY,,2801.0,,...,,,,,,,,,,
2,2140351000.0,80140000.0,A,,AGUILAR,EDGAR,IVAN,,1121.0,,...,,,,,,,,,,
3,1129249000.0,242762.0,A,,AGUILAR,MARIA,CRUZ,,1121.0,,...,,,,,,,,,,
4,1182291000.0,80077967.0,A,,ALARCON,JAZMIN,,,269.0,,...,,,,,,,,,,


In [8]:
# Tell me the length of my new dataframe 
# This is how many people are on the voter list Smith County 
len(voters2018)

135626

In [9]:
voters2018.head(5)

Unnamed: 0,SOS_VoterID,idnumber,voter_status,party_code,lastname,firstname,middlename,namesuffix,streetnumber,streetbuilding,...,JP3,Unnamed: 62,REP,16PR,REP.1,14GE,P.1,E.2,7/30/10,8/13/12
0,2140230000.0,80139849.0,S,,ADAMS,DOMINIQUE,ALEXIS,,3400.0,,...,,,,,,,,,,
1,1128803000.0,287.0,A,,ADAMS,WILLIE,RAY,,2801.0,,...,,,,,,,,,,
2,2140351000.0,80140000.0,A,,AGUILAR,EDGAR,IVAN,,1121.0,,...,,,,,,,,,,
3,1129249000.0,242762.0,A,,AGUILAR,MARIA,CRUZ,,1121.0,,...,,,,,,,,,,
4,1182291000.0,80077967.0,A,,ALARCON,JAZMIN,,,269.0,,...,,,,,,,,,,


In [10]:
voters2018.head(5)

Unnamed: 0,SOS_VoterID,idnumber,voter_status,party_code,lastname,firstname,middlename,namesuffix,streetnumber,streetbuilding,...,JP3,Unnamed: 62,REP,16PR,REP.1,14GE,P.1,E.2,7/30/10,8/13/12
0,2140230000.0,80139849.0,S,,ADAMS,DOMINIQUE,ALEXIS,,3400.0,,...,,,,,,,,,,
1,1128803000.0,287.0,A,,ADAMS,WILLIE,RAY,,2801.0,,...,,,,,,,,,,
2,2140351000.0,80140000.0,A,,AGUILAR,EDGAR,IVAN,,1121.0,,...,,,,,,,,,,
3,1129249000.0,242762.0,A,,AGUILAR,MARIA,CRUZ,,1121.0,,...,,,,,,,,,,
4,1182291000.0,80077967.0,A,,ALARCON,JAZMIN,,,269.0,,...,,,,,,,,,,


In [11]:
df = pd.DataFrame(voters2018)
df.to_csv(r'SC-Voter-List-FULL-2018.csv')
