# Intro to Pandas in Python

#### Playing with 538 Politics data to get a feeling for using pandas

In [5]:
import pandas as pd
turnovers = pd.read_csv('https://git.io/fj9Vn')

We have some data, so let's play around with it. 

In [9]:
turnovers.head()

Unnamed: 0,president,position,appointee,start,end,length,days
0,Carter,OMB Director,Bert Lance,1/21/77,9/23/77,245,247
1,Carter,Secretary of Transportation,Brock Adams,1/23/77,7/20/79,908,912
2,Carter,"Secretary of Health, Education & Welfare",Joseph Califano Jr.,1/25/77,8/3/79,920,926
3,Carter,Secretary of Housing & Urban Development,Patricia Harris,1/23/77,8/3/79,922,926
4,Carter,Secretary of the Treasury,W. Michael Blumenthal,1/23/77,8/4/79,923,927


In [11]:
turnovers.tail()

Unnamed: 0,president,position,appointee,start,end,length,days
307,Trump,Secretary of Homeland Security,Kirstjen Nielsen,12/6/17,Still in office,#VALUE!,#VALUE!
308,Trump,Secretary of Health & Human Services,Alex Azar,1/29/18,Still in office,#VALUE!,#VALUE!
309,Trump,Secretary of State,Mike Pompeo,4/26/18,Still in office,#VALUE!,#VALUE!
310,Trump,CIA Director,Gina Haspel,5/21/18,Still in office,#VALUE!,#VALUE!
311,Trump,Secretary of Veterans Affairs,Robert Wilkie,7/30/18,Still in office,#VALUE!,#VALUE!


In [12]:
type(turnovers)

pandas.core.frame.DataFrame

Cool, we have a dataframe with some data about political stuff. 

In [16]:
turnovers['position']

0                                  OMB Director
1                   Secretary of Transportation
2      Secretary of Health, Education & Welfare
3      Secretary of Housing & Urban Development
4                     Secretary of the Treasury
                         ...                   
307              Secretary of Homeland Security
308        Secretary of Health & Human Services
309                          Secretary of State
310                                CIA Director
311               Secretary of Veterans Affairs
Name: position, Length: 312, dtype: object

In [18]:
turnovers['position'] + ', ' + turnovers['days']

0                                  OMB Director, 247
1                   Secretary of Transportation, 912
2      Secretary of Health, Education & Welfare, 926
3      Secretary of Housing & Urban Development, 926
4                     Secretary of the Treasury, 927
                           ...                      
307          Secretary of Homeland Security, #VALUE!
308    Secretary of Health & Human Services, #VALUE!
309                      Secretary of State, #VALUE!
310                            CIA Director, #VALUE!
311           Secretary of Veterans Affairs, #VALUE!
Length: 312, dtype: object

In [21]:
turnovers['pos and time'] = turnovers['position'] + '-' + turnovers['days']
turnovers.head()

Unnamed: 0,president,position,appointee,start,end,length,days,pos and time
0,Carter,OMB Director,Bert Lance,1/21/77,9/23/77,245,247,OMB Director-247
1,Carter,Secretary of Transportation,Brock Adams,1/23/77,7/20/79,908,912,Secretary of Transportation-912
2,Carter,"Secretary of Health, Education & Welfare",Joseph Califano Jr.,1/25/77,8/3/79,920,926,"Secretary of Health, Education & Welfare-926"
3,Carter,Secretary of Housing & Urban Development,Patricia Harris,1/23/77,8/3/79,922,926,Secretary of Housing & Urban Development-926
4,Carter,Secretary of the Treasury,W. Michael Blumenthal,1/23/77,8/4/79,923,927,Secretary of the Treasury-927


In [22]:
turnovers.describe()

Unnamed: 0,president,position,appointee,start,end,length,days,pos and time
count,312,312,312,312,312,312,312,312
unique,7,28,270,238,175,249,151,248
top,Bush 43,Chief of Staff,James Baker,1/22/93,1/20/17,#VALUE!,2923,Secretary of Energy-2923
freq,58,21,4,10,21,18,63,4


In [23]:
turnovers.shape

(312, 8)

In [24]:
turnovers.dtypes

president       object
position        object
appointee       object
start           object
end             object
length          object
days            object
pos and time    object
dtype: object

In [25]:
turnovers.columns

Index(['president', 'position', 'appointee', 'start', 'end', 'length', 'days',
       'pos and time'],
      dtype='object')

In [37]:
turnovers.rename(columns={'pos and time':'length'}) \
         .drop('president', axis=1) \
         .head()

Unnamed: 0,position,appointee,start,end,length,days,length.1
0,OMB Director,Bert Lance,1/21/77,9/23/77,245,247,OMB Director-247
1,Secretary of Transportation,Brock Adams,1/23/77,7/20/79,908,912,Secretary of Transportation-912
2,"Secretary of Health, Education & Welfare",Joseph Califano Jr.,1/25/77,8/3/79,920,926,"Secretary of Health, Education & Welfare-926"
3,Secretary of Housing & Urban Development,Patricia Harris,1/23/77,8/3/79,922,926,Secretary of Housing & Urban Development-926
4,Secretary of the Treasury,W. Michael Blumenthal,1/23/77,8/4/79,923,927,Secretary of the Treasury-927


In [31]:
turnovers.head()

Unnamed: 0,president,position,appointee,start,end,length,days,pos and time
0,Carter,OMB Director,Bert Lance,1/21/77,9/23/77,245,247,OMB Director-247
1,Carter,Secretary of Transportation,Brock Adams,1/23/77,7/20/79,908,912,Secretary of Transportation-912
2,Carter,"Secretary of Health, Education & Welfare",Joseph Califano Jr.,1/25/77,8/3/79,920,926,"Secretary of Health, Education & Welfare-926"
3,Carter,Secretary of Housing & Urban Development,Patricia Harris,1/23/77,8/3/79,922,926,Secretary of Housing & Urban Development-926
4,Carter,Secretary of the Treasury,W. Michael Blumenthal,1/23/77,8/4/79,923,927,Secretary of the Treasury-927


In [39]:
throwaway = turnovers.rename(columns={'pos and time':'length'}) \
                     .drop('president', axis=1) \
                     .head()
throwaway

Unnamed: 0,position,appointee,start,end,length,days,length.1
0,OMB Director,Bert Lance,1/21/77,9/23/77,245,247,OMB Director-247
1,Secretary of Transportation,Brock Adams,1/23/77,7/20/79,908,912,Secretary of Transportation-912
2,"Secretary of Health, Education & Welfare",Joseph Califano Jr.,1/25/77,8/3/79,920,926,"Secretary of Health, Education & Welfare-926"
3,Secretary of Housing & Urban Development,Patricia Harris,1/23/77,8/3/79,922,926,Secretary of Housing & Urban Development-926
4,Secretary of the Treasury,W. Michael Blumenthal,1/23/77,8/4/79,923,927,Secretary of the Treasury-927


*Caveat 1*: Unless `inplace=True` is passed in, which is a pandas no-no now-a-days, dataframes are not modified. If a series/dataframe is being updated, check to see if the return value is a series/dataframe and catch it on return.

In [41]:
# Attributes can be overwritten! The headers is an attrbute of a dataframe
# Looks like attributes are mutable but the dataframe tables are immutable
throwaway.columns = ['We', 'are', 'the', 'crystal', 'gems', 'save', 'day']
throwaway.head()

Unnamed: 0,We,are,the,crystal,gems,save,day
0,OMB Director,Bert Lance,1/21/77,9/23/77,245,247,OMB Director-247
1,Secretary of Transportation,Brock Adams,1/23/77,7/20/79,908,912,Secretary of Transportation-912
2,"Secretary of Health, Education & Welfare",Joseph Califano Jr.,1/25/77,8/3/79,920,926,"Secretary of Health, Education & Welfare-926"
3,Secretary of Housing & Urban Development,Patricia Harris,1/23/77,8/3/79,922,926,Secretary of Housing & Urban Development-926
4,Secretary of the Treasury,W. Michael Blumenthal,1/23/77,8/4/79,923,927,Secretary of the Treasury-927


In [43]:
# You can also do some fancy chaining magic to do some of this!
throwaway.columns = throwaway.columns.str.capitalize()
throwaway

Unnamed: 0,We,Are,The,Crystal,Gems,Save,Day
0,OMB Director,Bert Lance,1/21/77,9/23/77,245,247,OMB Director-247
1,Secretary of Transportation,Brock Adams,1/23/77,7/20/79,908,912,Secretary of Transportation-912
2,"Secretary of Health, Education & Welfare",Joseph Califano Jr.,1/25/77,8/3/79,920,926,"Secretary of Health, Education & Welfare-926"
3,Secretary of Housing & Urban Development,Patricia Harris,1/23/77,8/3/79,922,926,Secretary of Housing & Urban Development-926
4,Secretary of the Treasury,W. Michael Blumenthal,1/23/77,8/4/79,923,927,Secretary of the Treasury-927


In [44]:
# You can drop rows too
throwaway.drop([0,2,4], axis=0).head()

Unnamed: 0,We,Are,The,Crystal,Gems,Save,Day
1,Secretary of Transportation,Brock Adams,1/23/77,7/20/79,908,912,Secretary of Transportation-912
3,Secretary of Housing & Urban Development,Patricia Harris,1/23/77,8/3/79,922,926,Secretary of Housing & Urban Development-926


In [50]:
# YOu can sort a variety of ways, again, though, you are updating the 
# content so it won't update the table
turnovers.sort_values(['position', 'length']).head()

Unnamed: 0,president,position,appointee,start,end,length,days,pos and time
82,Reagan,Attorney General,Dick Thornburgh,8/12/88,Bush admin,1098 combined,#VALUE!,Attorney General-#VALUE!
90,Bush 41,Attorney General,Dick Thornburgh,Reagan admin,8/15/91,1098 combined,938,Attorney General-938
61,Reagan,Attorney General,Ed Meese,2/25/85,8/12/88,1264,2762,Attorney General-2762
185,Bush 43,Attorney General,John Ashcroft,2/2/01,2/3/05,1462,1476,Attorney General-1476
46,Reagan,Attorney General,William French Smith,1/23/81,2/25/85,1494,1498,Attorney General-1498


In [49]:
turnovers.head()

Unnamed: 0,president,position,appointee,start,end,length,days,pos and time
0,Carter,OMB Director,Bert Lance,1/21/77,9/23/77,245,247,OMB Director-247
1,Carter,Secretary of Transportation,Brock Adams,1/23/77,7/20/79,908,912,Secretary of Transportation-912
2,Carter,"Secretary of Health, Education & Welfare",Joseph Califano Jr.,1/25/77,8/3/79,920,926,"Secretary of Health, Education & Welfare-926"
3,Carter,Secretary of Housing & Urban Development,Patricia Harris,1/23/77,8/3/79,922,926,Secretary of Housing & Urban Development-926
4,Carter,Secretary of the Treasury,W. Michael Blumenthal,1/23/77,8/4/79,923,927,Secretary of the Treasury-927
