### Introduction to pandas!

What we will learn in this notebook:
- how to open and read in a csv spreadsheet
- how to look at the data we have
- how to select columns
- how to do some math with them


First we need to import pandas as a library. We import the library and then tell Python to refer to it as `pd`:

In [1]:
import pandas as pd
import matplotlib.pyplot as plot
import numpy 
import seaborn 
import datetime

### Reading spreadsheets

This is how you read a spreadsheet and assign it to a variable:

In [2]:
# Bring in all my spreadsheets 
ev_1022 = pd.read_csv('Voted-Early-10222018.csv')
ev_1023 = pd.read_csv('Voted-Early-10232018.csv')
ev_1024 = pd.read_csv('Voted-Early-10242018.csv')
ev_1025 = pd.read_csv('Voted-Early-10252018.csv')
ev_1026 = pd.read_csv('Voted-Early-10262018.csv')
ev_1027 = pd.read_csv('Voted-Early-10272018.csv')
ev_1029 = pd.read_csv('Voted-Early-10292018.csv')
ev_1030 = pd.read_csv('Voted-Early-10302018.csv')
ev_1031 = pd.read_csv('Voted-Early-10312018.csv')
ev_1101 = pd.read_csv('Voted-Early-11012018.csv')
ev_1102 = pd.read_csv('Voted-Early-11022018.csv')

### Looking at your data

To look at the data you just read into Python, you can just run a cell with the name of the variable:

In [3]:
ev_1022.head(5)

Unnamed: 0,ID,CERT,LASTNAME,FIRSTNAME,MIDDLENAME,SUFFIX,ADDRESS,CHECK-IN,PRECINCT,SITE,BALLOT STYLE,PARTY,POLLWORKER,CATEGORY
0,1205658124,1205658124,ABBEY,IAN,ISSARA,,"5872 OLD JACKSONVILLE HWY #921, TYLER 75703",10/22/18 17:03,38.01,MAIN,25,NP,General Pollworker 2,Electronic
1,1057082649,1057082649,ABEL,PAULA,ROSE,,"2051 W CUMBERLAND RD #1103, TYLER 75703",10/22/18 12:52,71.01,HERITAGE BUILDING,25,NP,General Pollworker 3,Electronic
2,1129020391,1129020391,ABRAHAM,CAROL,ANN,,"5212 GLEN ABBEY LN, TYLER 75703",10/22/18 11:47,25.01,HERITAGE BUILDING,20,NP,General Pollworker 1,Electronic
3,1129100532,1129100532,ABRAHAM,EUGENE,,JR,"5212 GLEN ABBEY LN, TYLER 75703",10/22/18 11:49,25.01,HERITAGE BUILDING,20,NP,General Pollworker 1,Electronic
4,1128947343,1128947343,ACKER,RICHARD,BRYAN,,"415 W RIECK RD #220, TYLER 75703",10/22/18 10:07,54.01,MAIN,25,NP,General Pollworker 3,Electronic


In [4]:
ev_1022.tail(5)

Unnamed: 0,ID,CERT,LASTNAME,FIRSTNAME,MIDDLENAME,SUFFIX,ADDRESS,CHECK-IN,PRECINCT,SITE,BALLOT STYLE,PARTY,POLLWORKER,CATEGORY
4958,2002308280,2002308280,ZEMER,CHRISTOPER,FREDRICK,,"1497 FM 724, TYLER 75704",10/22/18 14:27,44.03,HERITAGE BUILDING,25,NP,General Pollworker 1,Electronic
4959,1128395421,1128395421,ZEPPA,PRISCILLA,PRATT,,"3502 MATT LN, TYLER 75701",10/22/18 11:51,50.01,WHITEHOUSE MUNICIPAL COURT,25,NP,General Pollworker 1,Electronic
4960,1016714491,1016714491,ZETTEL,CONNIE,HUBBARD,,"1659 FROSTWOOD DR, TYLER 75703",10/22/18 14:11,40.07,NOONDAY COMMUNITY CENTER,25,NP,General Pollworker 3,Electronic
4961,2145876345,2145876345,ZIEGELBAUER,JUANITA,ELAINE,,"104 LEGENDS CT #C, LINDALE 75771",10/22/18 8:44,5.16,NOONDAY COMMUNITY CENTER,4,NP,General Pollworker 2,Electronic
4962,1128892628,1128892628,ZORN,LAURA,ELLEN,,"1202 RICE RD #110, TYLER 75703",10/22/18 15:59,54.01,MAIN,25,NP,General Pollworker 3,Electronic


In [5]:
# Find out the length of one of the tables (i.e. how many voters that day)
len(ev_1022)

4963

In [6]:
# The pandas equivalent of Excel's transpose 
ev_1022.T

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,4953,4954,4955,4956,4957,4958,4959,4960,4961,4962
ID,1205658124,1057082649,1129020391,1129100532,1128947343,1128228140,1129122969,1128228164,1128110606,1128688676,...,1174368370,1011626073,1130376468,1130344690,1027911363,2002308280,1128395421,1016714491,2145876345,1128892628
CERT,1205658124,1057082649,1129020391,1129100532,1128947343,1128228140,1129122969,1128228164,1128110606,1128688676,...,1174368370,1011626073,1130376468,1130344690,1027911363,2002308280,1128395421,1016714491,2145876345,1128892628
LASTNAME,ABBEY,ABEL,ABRAHAM,ABRAHAM,ACKER,ACY,ACY,ADAIR,ADAIR,ADAMS,...,ZAVALA,ZEID,ZEINERT,ZEINERT,ZELLER,ZEMER,ZEPPA,ZETTEL,ZIEGELBAUER,ZORN
FIRSTNAME,IAN,PAULA,CAROL,EUGENE,RICHARD,KAREY,ROBERT,FAVIAN,MARY,ANDREW,...,CARLOS,YASSER,DEBORAH,NEIL,DELANNE,CHRISTOPER,PRISCILLA,CONNIE,JUANITA,LAURA
MIDDLENAME,ISSARA,ROSE,ANN,,BRYAN,DON,C,MURELL,JOANN,JAMES,...,JAIME,FAHMY,KAY,JACK,,FREDRICK,PRATT,HUBBARD,ELAINE,ELLEN
SUFFIX,,,,JR,,,,JR,,,...,JR,,,,,,,,,
ADDRESS,"5872 OLD JACKSONVILLE HWY #921, TYLER 75703","2051 W CUMBERLAND RD #1103, TYLER 75703","5212 GLEN ABBEY LN, TYLER 75703","5212 GLEN ABBEY LN, TYLER 75703","415 W RIECK RD #220, TYLER 75703","10215 COUNTY ROAD 41, LINDALE 75771","20402 COUNTY ROAD 4114, LINDALE 75771","206 CLEMSON DR, TYLER 75703","206 CLEMSON DR, TYLER 75703","303 HARMONY LN, HIDEAWAY 75771",...,"7373 FLAT ROCK LN, TYLER 75703","763 HAMPTON HILL DR, TYLER 75703","9354 SAINT PATRICK PL, TYLER 75703","9354 SAINT PATRICK PL, TYLER 75703","3729 WOODS BLVD, TYLER 75707","1497 FM 724, TYLER 75704","3502 MATT LN, TYLER 75701","1659 FROSTWOOD DR, TYLER 75703","104 LEGENDS CT #C, LINDALE 75771","1202 RICE RD #110, TYLER 75703"
CHECK-IN,10/22/18 17:03,10/22/18 12:52,10/22/18 11:47,10/22/18 11:49,10/22/18 10:07,10/22/18 8:22,10/22/18 8:49,10/22/18 9:33,10/22/18 9:32,10/22/18 16:56,...,10/22/18 12:48,10/22/18 12:39,10/22/18 11:16,10/22/18 11:16,10/22/18 17:04,10/22/18 14:27,10/22/18 11:51,10/22/18 14:11,10/22/18 8:44,10/22/18 15:59
PRECINCT,38.01,71.01,25.01,25.01,54.01,14.09,5.12,59.01,59.01,15.01,...,71.02,59.01,71.02,71.02,56.01,44.03,50.01,40.07,5.16,54.01
SITE,MAIN,HERITAGE BUILDING,HERITAGE BUILDING,HERITAGE BUILDING,MAIN,NOONDAY COMMUNITY CENTER,NOONDAY COMMUNITY CENTER,WHITEHOUSE MUNICIPAL COURT,WHITEHOUSE MUNICIPAL COURT,EV LINDALE PUBLIC LIBRARY,...,MAIN,MAIN,NOONDAY COMMUNITY CENTER,NOONDAY COMMUNITY CENTER,MAIN,HERITAGE BUILDING,WHITEHOUSE MUNICIPAL COURT,NOONDAY COMMUNITY CENTER,NOONDAY COMMUNITY CENTER,MAIN


In [7]:
# Smooshing together all the early voting data
# This is like copying and pasting them together, quickly 
frames = [ev_1022, ev_1023, ev_1024, ev_1025, ev_1026, ev_1027, ev_1029, ev_1030, ev_1031, ev_1101, ev_1102]
ev_2018 = pd.concat(frames, sort=False)

In [8]:
# Show me the top of my new dataframe
ev_2018.head(5)

Unnamed: 0,ID,CERT,LASTNAME,FIRSTNAME,MIDDLENAME,SUFFIX,ADDRESS,CHECK-IN,PRECINCT,SITE,...,Unnamed: 5,Unnamed: 6,Unnamed: 7,Unnamed: 8,Unnamed: 9,Unnamed: 10,Unnamed: 11,Unnamed: 12,Unnamed: 13,Unnamed: 14
0,1205658000.0,1205658000.0,ABBEY,IAN,ISSARA,,"5872 OLD JACKSONVILLE HWY #921, TYLER 75703",10/22/18 17:03,38.01,MAIN,...,,,,,,,,,,
1,1057083000.0,1057083000.0,ABEL,PAULA,ROSE,,"2051 W CUMBERLAND RD #1103, TYLER 75703",10/22/18 12:52,71.01,HERITAGE BUILDING,...,,,,,,,,,,
2,1129020000.0,1129020000.0,ABRAHAM,CAROL,ANN,,"5212 GLEN ABBEY LN, TYLER 75703",10/22/18 11:47,25.01,HERITAGE BUILDING,...,,,,,,,,,,
3,1129101000.0,1129101000.0,ABRAHAM,EUGENE,,JR,"5212 GLEN ABBEY LN, TYLER 75703",10/22/18 11:49,25.01,HERITAGE BUILDING,...,,,,,,,,,,
4,1128947000.0,1128947000.0,ACKER,RICHARD,BRYAN,,"415 W RIECK RD #220, TYLER 75703",10/22/18 10:07,54.01,MAIN,...,,,,,,,,,,


In [9]:
# Tell me the length of my new dataframe 
# This is how many people voted early in Smith County 
len(ev_2018)

48758

In [10]:
# Delete the columbs we don't need 
del ev_2018['CERT']
del ev_2018['PARTY']
del ev_2018['POLLWORKER']
del ev_2018['CATEGORY']

In [11]:
ev_2018.head(5)

Unnamed: 0,ID,LASTNAME,FIRSTNAME,MIDDLENAME,SUFFIX,ADDRESS,CHECK-IN,PRECINCT,SITE,BALLOT STYLE,...,Unnamed: 5,Unnamed: 6,Unnamed: 7,Unnamed: 8,Unnamed: 9,Unnamed: 10,Unnamed: 11,Unnamed: 12,Unnamed: 13,Unnamed: 14
0,1205658000.0,ABBEY,IAN,ISSARA,,"5872 OLD JACKSONVILLE HWY #921, TYLER 75703",10/22/18 17:03,38.01,MAIN,25.0,...,,,,,,,,,,
1,1057083000.0,ABEL,PAULA,ROSE,,"2051 W CUMBERLAND RD #1103, TYLER 75703",10/22/18 12:52,71.01,HERITAGE BUILDING,25.0,...,,,,,,,,,,
2,1129020000.0,ABRAHAM,CAROL,ANN,,"5212 GLEN ABBEY LN, TYLER 75703",10/22/18 11:47,25.01,HERITAGE BUILDING,20.0,...,,,,,,,,,,
3,1129101000.0,ABRAHAM,EUGENE,,JR,"5212 GLEN ABBEY LN, TYLER 75703",10/22/18 11:49,25.01,HERITAGE BUILDING,20.0,...,,,,,,,,,,
4,1128947000.0,ACKER,RICHARD,BRYAN,,"415 W RIECK RD #220, TYLER 75703",10/22/18 10:07,54.01,MAIN,25.0,...,,,,,,,,,,


In [12]:
del ev_2018['Unnamed: 1']
del ev_2018['Unnamed: 2']
del ev_2018['Unnamed: 3']
del ev_2018['Unnamed: 4']
del ev_2018['Unnamed: 5']
del ev_2018['Unnamed: 6']
del ev_2018['Unnamed: 7']
del ev_2018['Unnamed: 8']
del ev_2018['Unnamed: 9']
del ev_2018['Unnamed: 10']
del ev_2018['Unnamed: 11']
del ev_2018['Unnamed: 12']
del ev_2018['Unnamed: 13']
del ev_2018['Unnamed: 14']

In [13]:
ev_2018.head(5)

Unnamed: 0,ID,LASTNAME,FIRSTNAME,MIDDLENAME,SUFFIX,ADDRESS,CHECK-IN,PRECINCT,SITE,BALLOT STYLE,Voted Early 10/25/2018 Illegal use of Registered Voter Lists - Election Code 18.008: Current law provides that a person commits a Class A misdemeanor if the person uses information obtained from a registration list in connection with advertising or promoting commercial services or products.
0,1205658000.0,ABBEY,IAN,ISSARA,,"5872 OLD JACKSONVILLE HWY #921, TYLER 75703",10/22/18 17:03,38.01,MAIN,25.0,
1,1057083000.0,ABEL,PAULA,ROSE,,"2051 W CUMBERLAND RD #1103, TYLER 75703",10/22/18 12:52,71.01,HERITAGE BUILDING,25.0,
2,1129020000.0,ABRAHAM,CAROL,ANN,,"5212 GLEN ABBEY LN, TYLER 75703",10/22/18 11:47,25.01,HERITAGE BUILDING,20.0,
3,1129101000.0,ABRAHAM,EUGENE,,JR,"5212 GLEN ABBEY LN, TYLER 75703",10/22/18 11:49,25.01,HERITAGE BUILDING,20.0,
4,1128947000.0,ACKER,RICHARD,BRYAN,,"415 W RIECK RD #220, TYLER 75703",10/22/18 10:07,54.01,MAIN,25.0,


In [15]:
# Change the name of columns because of that SUPER lengthy one
# I couldn't figure out how to delete it so this will do 
ev_2018.columns = ['voterid', 'lastname', 'firstname', 'middle', 'suffix', 'address', 'checkin', 'precinct', 'site', 'ballotstyle', 'misc']

In [16]:
ev_2018.head(5)

Unnamed: 0,voterid,lastname,firstname,middle,suffix,address,checkin,precinct,site,ballotstyle,misc
0,1205658000.0,ABBEY,IAN,ISSARA,,"5872 OLD JACKSONVILLE HWY #921, TYLER 75703",10/22/18 17:03,38.01,MAIN,25.0,
1,1057083000.0,ABEL,PAULA,ROSE,,"2051 W CUMBERLAND RD #1103, TYLER 75703",10/22/18 12:52,71.01,HERITAGE BUILDING,25.0,
2,1129020000.0,ABRAHAM,CAROL,ANN,,"5212 GLEN ABBEY LN, TYLER 75703",10/22/18 11:47,25.01,HERITAGE BUILDING,20.0,
3,1129101000.0,ABRAHAM,EUGENE,,JR,"5212 GLEN ABBEY LN, TYLER 75703",10/22/18 11:49,25.01,HERITAGE BUILDING,20.0,
4,1128947000.0,ACKER,RICHARD,BRYAN,,"415 W RIECK RD #220, TYLER 75703",10/22/18 10:07,54.01,MAIN,25.0,


In [2]:

ev_2018.to_csv(r'EarlyVoters2018.csv')


NameError: name 'ev_2018' is not defined