### Week 2 continued

In [1]:
import pandas as pd
import numpy as np

You can produce dataframes in various ways including: Combining series, combining dictionaries.

In [9]:
s1 = pd.Series({'name': 'Elvis',
                'item purchased': 'Rhinestones',
                'cost': 12.65})
s2 = pd.Series({'name': 'Priscilla',
                'item purchased': 'Hairspray',
                'cost': 1.65})
s3 = pd.Series({'name': 'Michael',
                'item purchased': 'Nose Job',
                'cost': 23456})


Note that when making the dataframe both row and col indices can be non-unique

In [10]:
purchase = pd.DataFrame([s1, s2, s3], index=['Store 1', 'Store 1', 'Store 2'])

In [11]:
purchase

Unnamed: 0,cost,item purchased,name
Store 1,12.65,Rhinestones,Elvis
Store 1,1.65,Hairspray,Priscilla
Store 2,23456.0,Nose Job,Michael


When querying with one parameter using iloc and loc on dataframes you get a series if there is only one index item to return.

Querying for the repeated index produces a dataframe of that repeated index

In [5]:
purchase.loc['Store 1']

Unnamed: 0,cost,item purchased,name
Store 1,12.65,Rhinestones,Elvis
Store 1,1.65,Hairspray,Priscilla


In [6]:
purchase.loc['Store 2']

cost                 23456
item purchased    Nose Job
name               Michael
Name: Store 2, dtype: object

In [7]:
purchase['item purchased']

Store 1    Rhinestones
Store 1      Hairspray
Store 2       Nose Job
Name: item purchased, dtype: object

Two arguements to loc gets the row and col

In [8]:
purchase.loc['Store 1', 'cost']

Store 1    12.65
Store 1     1.65
Name: cost, dtype: object

What if we want a whole column. Various options:

    1) transpose (but is ugly)
    
    2) straight out index as all columns have a name

In [9]:
purchase.T.loc['cost']

Store 1    12.65
Store 1     1.65
Store 2    23456
Name: cost, dtype: object

In [10]:
purchase['cost']

Store 1    12.65
Store 1     1.65
Store 2    23456
Name: cost, dtype: object

You can chain loc/iloc and indexing

In [11]:
purchase.loc['Store 1']['cost']

Store 1    12.65
Store 1     1.65
Name: cost, dtype: object

However chaining comes at a cost.  Tends to produce copies of the dataframe.

Particularly when changing values, its better to use various arguements to .loc


The below shows that you provide a : to slice all rows, and the second arguement can be a list of columns.

In [12]:
purchase.loc[:,['cost', 'name']]

Unnamed: 0,cost,name
Store 1,12.65,Elvis
Store 1,1.65,Priscilla
Store 2,23456.0,Michael


In [13]:
purchase.loc['Store 2', 'name'] = 'Lisa Marie'

In [14]:
purchase

Unnamed: 0,cost,item purchased,name
Store 1,12.65,Rhinestones,Elvis
Store 1,1.65,Hairspray,Priscilla
Store 2,23456.0,Nose Job,Lisa Marie


In the pandas world, friends dont let friends chain calls!

Remember, panda dataframes are just a 2-axis labelled array

### dropping data

note, this returns a df with the data dropped - it doesnt change the df

In [15]:
purchase.drop('Store 1')

Unnamed: 0,cost,item purchased,name
Store 2,23456,Nose Job,Lisa Marie


In [16]:
purchase

Unnamed: 0,cost,item purchased,name
Store 1,12.65,Rhinestones,Elvis
Store 1,1.65,Hairspray,Priscilla
Store 2,23456.0,Nose Job,Lisa Marie


Can delete columns using del - note this directly works on the original df!

In [18]:
copy_df = purchase.copy()
del copy_df['name']
copy_df

Unnamed: 0,cost,item purchased
Store 1,12.65,Rhinestones
Store 1,1.65,Hairspray
Store 2,23456.0,Nose Job


### adding a column

In [19]:
purchase['location'] = None

In [20]:
purchase

Unnamed: 0,cost,item purchased,name,location
Store 1,12.65,Rhinestones,Elvis,
Store 1,1.65,Hairspray,Priscilla,
Store 2,23456.0,Nose Job,Lisa Marie,


In [21]:
purchase['poulet'] = [1,2,3]

In [22]:
purchase

Unnamed: 0,cost,item purchased,name,location,poulet
Store 1,12.65,Rhinestones,Elvis,,1
Store 1,1.65,Hairspray,Priscilla,,2
Store 2,23456.0,Nose Job,Lisa Marie,,3


### Accessing Dataframes

Remember that when you index a dataframe you are accessing a view, and if you change the view you change the underlying data.  If you want to only change the data in a new dataset, consider using the copy() method

In [12]:
costs = purchase['cost']
costs

Store 1       12.65
Store 1        1.65
Store 2    23456.00
Name: cost, dtype: float64

In [13]:
costs += 2

In [14]:
costs

Store 1       14.65
Store 1        3.65
Store 2    23458.00
Name: cost, dtype: float64

In [15]:
purchase

Unnamed: 0,cost,item purchased,name
Store 1,14.65,Rhinestones,Elvis
Store 1,3.65,Hairspray,Priscilla
Store 2,23458.0,Nose Job,Michael


In [16]:
!cat olympics.csv

'cat' is not recognized as an internal or external command,
operable program or batch file.


In [17]:
df = pd.read_csv('olympics.csv')

In [18]:
df.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
0,,№ Summer,01 !,02 !,03 !,Total,№ Winter,01 !,02 !,03 !,Total,№ Games,01 !,02 !,03 !,Combined total
1,Afghanistan (AFG),13,0,0,2,2,0,0,0,0,0,13,0,0,2,2
2,Algeria (ALG),12,5,2,8,15,3,0,0,0,0,15,5,2,8,15
3,Argentina (ARG),23,18,24,28,70,18,0,0,0,0,41,18,24,28,70
4,Armenia (ARM),5,1,2,9,12,6,0,0,0,0,11,1,2,9,12


In [2]:
df = pd.read_csv('olympics.csv', index_col=0, skiprows=1)

In [3]:
df.head()

Unnamed: 0,№ Summer,01 !,02 !,03 !,Total,№ Winter,01 !.1,02 !.1,03 !.1,Total.1,№ Games,01 !.2,02 !.2,03 !.2,Combined total
Afghanistan (AFG),13,0,0,2,2,0,0,0,0,0,13,0,0,2,2
Algeria (ALG),12,5,2,8,15,3,0,0,0,0,15,5,2,8,15
Argentina (ARG),23,18,24,28,70,18,0,0,0,0,41,18,24,28,70
Armenia (ARM),5,1,2,9,12,6,0,0,0,0,11,1,2,9,12
Australasia (ANZ) [ANZ],2,3,4,5,12,0,0,0,0,0,2,3,4,5,12


In [4]:
df.columns

Index(['№ Summer', '01 !', '02 !', '03 !', 'Total', '№ Winter', '01 !.1',
       '02 !.1', '03 !.1', 'Total.1', '№ Games', '01 !.2', '02 !.2', '03 !.2',
       'Combined total'],
      dtype='object')

In [7]:
for col in df.columns:
    if col[:2] == '01':
        df.rename(columns={col:'Gold'+col[4:]}, inplace=True)
    if col[:2] == '02':
        df.rename(columns={col:'Silver'+col[4:]}, inplace=True)
    if col[:2] == '03':
        df.rename(columns={col:'Bronze'+col[4:]}, inplace=True)
    if col[:1] == '№':
        df.rename(columns={col:'#'+col[4:]}, inplace=True)

In [8]:
df.head()

Unnamed: 0,#mmer,Gold,Silver,Bronze,Total,#nter,Gold.1,Silver.1,Bronze.1,Total.1,#mes,Gold.2,Silver.2,Bronze.2,Combined total
Afghanistan (AFG),13,0,0,2,2,0,0,0,0,0,13,0,0,2,2
Algeria (ALG),12,5,2,8,15,3,0,0,0,0,15,5,2,8,15
Argentina (ARG),23,18,24,28,70,18,0,0,0,0,41,18,24,28,70
Armenia (ARM),5,1,2,9,12,6,0,0,0,0,11,1,2,9,12
Australasia (ANZ) [ANZ],2,3,4,5,12,0,0,0,0,0,2,3,4,5,12


In [9]:
df['Gold'] > 0

Afghanistan (AFG)                               False
Algeria (ALG)                                    True
Argentina (ARG)                                  True
Armenia (ARM)                                    True
Australasia (ANZ) [ANZ]                          True
Australia (AUS) [AUS] [Z]                        True
Austria (AUT)                                    True
Azerbaijan (AZE)                                 True
Bahamas (BAH)                                    True
Bahrain (BRN)                                   False
Barbados (BAR) [BAR]                            False
Belarus (BLR)                                    True
Belgium (BEL)                                    True
Bermuda (BER)                                   False
Bohemia (BOH) [BOH] [Z]                         False
Botswana (BOT)                                  False
Brazil (BRA)                                     True
British West Indies (BWI) [BWI]                 False
Bulgaria (BUL) [H]          

In [10]:
only_gold = df.where(df['Gold'] > 0)
only_gold.head()

Unnamed: 0,#mmer,Gold,Silver,Bronze,Total,#nter,Gold.1,Silver.1,Bronze.1,Total.1,#mes,Gold.2,Silver.2,Bronze.2,Combined total
Afghanistan (AFG),,,,,,,,,,,,,,,
Algeria (ALG),12.0,5.0,2.0,8.0,15.0,3.0,0.0,0.0,0.0,0.0,15.0,5.0,2.0,8.0,15.0
Argentina (ARG),23.0,18.0,24.0,28.0,70.0,18.0,0.0,0.0,0.0,0.0,41.0,18.0,24.0,28.0,70.0
Armenia (ARM),5.0,1.0,2.0,9.0,12.0,6.0,0.0,0.0,0.0,0.0,11.0,1.0,2.0,9.0,12.0
Australasia (ANZ) [ANZ],2.0,3.0,4.0,5.0,12.0,0.0,0.0,0.0,0.0,0.0,2.0,3.0,4.0,5.0,12.0


In [11]:
only_gold = only_gold.dropna()

In [12]:
only_gold.head()

Unnamed: 0,#mmer,Gold,Silver,Bronze,Total,#nter,Gold.1,Silver.1,Bronze.1,Total.1,#mes,Gold.2,Silver.2,Bronze.2,Combined total
Algeria (ALG),12.0,5.0,2.0,8.0,15.0,3.0,0.0,0.0,0.0,0.0,15.0,5.0,2.0,8.0,15.0
Argentina (ARG),23.0,18.0,24.0,28.0,70.0,18.0,0.0,0.0,0.0,0.0,41.0,18.0,24.0,28.0,70.0
Armenia (ARM),5.0,1.0,2.0,9.0,12.0,6.0,0.0,0.0,0.0,0.0,11.0,1.0,2.0,9.0,12.0
Australasia (ANZ) [ANZ],2.0,3.0,4.0,5.0,12.0,0.0,0.0,0.0,0.0,0.0,2.0,3.0,4.0,5.0,12.0
Australia (AUS) [AUS] [Z],25.0,139.0,152.0,177.0,468.0,18.0,5.0,3.0,4.0,12.0,43.0,144.0,155.0,181.0,480.0


In [13]:
only_gold = df[df['Gold'] > 0]
only_gold.head()

Unnamed: 0,#mmer,Gold,Silver,Bronze,Total,#nter,Gold.1,Silver.1,Bronze.1,Total.1,#mes,Gold.2,Silver.2,Bronze.2,Combined total
Algeria (ALG),12,5,2,8,15,3,0,0,0,0,15,5,2,8,15
Argentina (ARG),23,18,24,28,70,18,0,0,0,0,41,18,24,28,70
Armenia (ARM),5,1,2,9,12,6,0,0,0,0,11,1,2,9,12
Australasia (ANZ) [ANZ],2,3,4,5,12,0,0,0,0,0,2,3,4,5,12
Australia (AUS) [AUS] [Z],25,139,152,177,468,18,5,3,4,12,43,144,155,181,480


In [14]:
len(df[(df['Gold']>0) | df['Gold.1']>0])

101

### Indices

In [15]:
df['country'] = df.index

In [16]:
df = df.set_index('Gold')

In [17]:
df.head()

Unnamed: 0_level_0,#mmer,Silver,Bronze,Total,#nter,Gold.1,Silver.1,Bronze.1,Total.1,#mes,Gold.2,Silver.2,Bronze.2,Combined total,country
Gold,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
0,13,0,2,2,0,0,0,0,0,13,0,0,2,2,Afghanistan (AFG)
5,12,2,8,15,3,0,0,0,0,15,5,2,8,15,Algeria (ALG)
18,23,24,28,70,18,0,0,0,0,41,18,24,28,70,Argentina (ARG)
1,5,2,9,12,6,0,0,0,0,11,1,2,9,12,Armenia (ARM)
3,2,4,5,12,0,0,0,0,0,2,3,4,5,12,Australasia (ANZ) [ANZ]


In [18]:
df = df.reset_index()

In [19]:
df.head()

Unnamed: 0,Gold,#mmer,Silver,Bronze,Total,#nter,Gold.1,Silver.1,Bronze.1,Total.1,#mes,Gold.2,Silver.2,Bronze.2,Combined total,country
0,0,13,0,2,2,0,0,0,0,0,13,0,0,2,2,Afghanistan (AFG)
1,5,12,2,8,15,3,0,0,0,0,15,5,2,8,15,Algeria (ALG)
2,18,23,24,28,70,18,0,0,0,0,41,18,24,28,70,Argentina (ARG)
3,1,5,2,9,12,6,0,0,0,0,11,1,2,9,12,Armenia (ARM)
4,3,2,4,5,12,0,0,0,0,0,2,3,4,5,12,Australasia (ANZ) [ANZ]


In [20]:
df = pd.read_csv('census.csv')

In [21]:
df.head()

Unnamed: 0,SUMLEV,REGION,DIVISION,STATE,COUNTY,STNAME,CTYNAME,CENSUS2010POP,ESTIMATESBASE2010,POPESTIMATE2010,...,RDOMESTICMIG2011,RDOMESTICMIG2012,RDOMESTICMIG2013,RDOMESTICMIG2014,RDOMESTICMIG2015,RNETMIG2011,RNETMIG2012,RNETMIG2013,RNETMIG2014,RNETMIG2015
0,40,3,6,1,0,Alabama,Alabama,4779736,4780127,4785161,...,0.002295,-0.193196,0.381066,0.582002,-0.467369,1.030015,0.826644,1.383282,1.724718,0.712594
1,50,3,6,1,1,Alabama,Autauga County,54571,54571,54660,...,7.242091,-2.915927,-3.012349,2.265971,-2.530799,7.606016,-2.626146,-2.722002,2.59227,-2.187333
2,50,3,6,1,3,Alabama,Baldwin County,182265,182265,183193,...,14.83296,17.647293,21.845705,19.243287,17.197872,15.844176,18.559627,22.727626,20.317142,18.293499
3,50,3,6,1,5,Alabama,Barbour County,27457,27457,27341,...,-4.728132,-2.50069,-7.056824,-3.904217,-10.543299,-4.874741,-2.758113,-7.167664,-3.978583,-10.543299
4,50,3,6,1,7,Alabama,Bibb County,22915,22919,22861,...,-5.527043,-5.068871,-6.201001,-0.177537,0.177258,-5.088389,-4.363636,-5.403729,0.754533,1.107861


In [22]:
df['SUMLEV'].unique()

array([40, 50])

In [23]:
df.columns

Index(['SUMLEV', 'REGION', 'DIVISION', 'STATE', 'COUNTY', 'STNAME', 'CTYNAME',
       'CENSUS2010POP', 'ESTIMATESBASE2010', 'POPESTIMATE2010',
       'POPESTIMATE2011', 'POPESTIMATE2012', 'POPESTIMATE2013',
       'POPESTIMATE2014', 'POPESTIMATE2015', 'NPOPCHG_2010', 'NPOPCHG_2011',
       'NPOPCHG_2012', 'NPOPCHG_2013', 'NPOPCHG_2014', 'NPOPCHG_2015',
       'BIRTHS2010', 'BIRTHS2011', 'BIRTHS2012', 'BIRTHS2013', 'BIRTHS2014',
       'BIRTHS2015', 'DEATHS2010', 'DEATHS2011', 'DEATHS2012', 'DEATHS2013',
       'DEATHS2014', 'DEATHS2015', 'NATURALINC2010', 'NATURALINC2011',
       'NATURALINC2012', 'NATURALINC2013', 'NATURALINC2014', 'NATURALINC2015',
       'INTERNATIONALMIG2010', 'INTERNATIONALMIG2011', 'INTERNATIONALMIG2012',
       'INTERNATIONALMIG2013', 'INTERNATIONALMIG2014', 'INTERNATIONALMIG2015',
       'DOMESTICMIG2010', 'DOMESTICMIG2011', 'DOMESTICMIG2012',
       'DOMESTICMIG2013', 'DOMESTICMIG2014', 'DOMESTICMIG2015', 'NETMIG2010',
       'NETMIG2011', 'NETMIG2012', 'NETMI

In [25]:
columns_to_keep = ['STNAME',
                   'CTYNAME',
                   'BIRTHS2010', 
                   'BIRTHS2011', 
                   'BIRTHS2012', 
                   'BIRTHS2013', 
                   'BIRTHS2014',
                   'BIRTHS2015',
                   'POPESTIMATE2010',
                   'POPESTIMATE2011', 
                   'POPESTIMATE2012', 
                   'POPESTIMATE2013',
                   'POPESTIMATE2014', 
                   'POPESTIMATE2015']

In [26]:
df = df[columns_to_keep]

In [27]:
df.head()

Unnamed: 0,STNAME,CTYNAME,BIRTHS2010,BIRTHS2011,BIRTHS2012,BIRTHS2013,BIRTHS2014,BIRTHS2015,POPESTIMATE2010,POPESTIMATE2011,POPESTIMATE2012,POPESTIMATE2013,POPESTIMATE2014,POPESTIMATE2015
0,Alabama,Alabama,14226,59689,59062,57938,58334,58305,4785161,4801108,4816089,4830533,4846411,4858979
1,Alabama,Autauga County,151,636,615,574,623,600,54660,55253,55175,55038,55290,55347
2,Alabama,Baldwin County,517,2187,2092,2160,2186,2240,183193,186659,190396,195126,199713,203709
3,Alabama,Barbour County,70,335,300,283,260,269,27341,27226,27159,26973,26815,26489
4,Alabama,Bibb County,44,266,245,259,247,253,22861,22733,22642,22512,22549,22583


In [28]:
df = df.set_index(['STNAME', 'CTYNAME'])

In [29]:
df.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,BIRTHS2010,BIRTHS2011,BIRTHS2012,BIRTHS2013,BIRTHS2014,BIRTHS2015,POPESTIMATE2010,POPESTIMATE2011,POPESTIMATE2012,POPESTIMATE2013,POPESTIMATE2014,POPESTIMATE2015
STNAME,CTYNAME,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
Alabama,Alabama,14226,59689,59062,57938,58334,58305,4785161,4801108,4816089,4830533,4846411,4858979
Alabama,Autauga County,151,636,615,574,623,600,54660,55253,55175,55038,55290,55347
Alabama,Baldwin County,517,2187,2092,2160,2186,2240,183193,186659,190396,195126,199713,203709
Alabama,Barbour County,70,335,300,283,260,269,27341,27226,27159,26973,26815,26489
Alabama,Bibb County,44,266,245,259,247,253,22861,22733,22642,22512,22549,22583


In [30]:
df.loc['Michigan']

Unnamed: 0_level_0,BIRTHS2010,BIRTHS2011,BIRTHS2012,BIRTHS2013,BIRTHS2014,BIRTHS2015,POPESTIMATE2010,POPESTIMATE2011,POPESTIMATE2012,POPESTIMATE2013,POPESTIMATE2014,POPESTIMATE2015
CTYNAME,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
Michigan,29102,113984,113080,113008,114148,114521,9877369,9876589,9886879,9900506,9916306,9922576
Alcona County,20,61,59,66,64,58,10890,10775,10607,10572,10448,10349
Alger County,14,53,58,58,71,74,9564,9554,9496,9497,9452,9383
Allegan County,339,1355,1349,1302,1349,1362,111502,111530,111898,112391,113743,114625
Alpena County,65,282,242,254,274,268,29539,29342,29219,29026,28952,28803
Antrim County,41,207,205,157,222,202,23499,23379,23337,23220,23243,23154
Arenac County,19,142,129,109,113,110,15854,15620,15496,15419,15326,15261
Baraga County,22,79,88,73,78,78,8841,8820,8715,8691,8643,8575
Barry County,157,630,640,625,575,583,59080,58970,59070,59140,59239,59314
Bay County,298,1153,1068,1032,1067,1026,107695,107497,107121,106958,106256,105659


In [32]:
df.loc['Michigan', 'Washtenaw County']

  """Entry point for launching an IPython kernel.


Unnamed: 0_level_0,Unnamed: 1_level_0,BIRTHS2010,BIRTHS2011,BIRTHS2012,BIRTHS2013,BIRTHS2014,BIRTHS2015,POPESTIMATE2010,POPESTIMATE2011,POPESTIMATE2012,POPESTIMATE2013,POPESTIMATE2014,POPESTIMATE2015
STNAME,CTYNAME,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
Michigan,Washtenaw County,977,3826,3780,3662,3683,3709,345563,349048,351213,354289,357029,358880


In [33]:
df.loc[ [('Michigan', 'Washtenaw County'), ('Michigan', 'Wayne County')]]

Unnamed: 0_level_0,Unnamed: 1_level_0,BIRTHS2010,BIRTHS2011,BIRTHS2012,BIRTHS2013,BIRTHS2014,BIRTHS2015,POPESTIMATE2010,POPESTIMATE2011,POPESTIMATE2012,POPESTIMATE2013,POPESTIMATE2014,POPESTIMATE2015
STNAME,CTYNAME,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
Michigan,Washtenaw County,977,3826,3780,3662,3683,3709,345563,349048,351213,354289,357029,358880
Michigan,Wayne County,5918,23819,23270,23377,23607,23586,1815199,1801273,1792514,1775713,1766008,1759335


In [58]:
purchase_1 = pd.Series({'Name': 'Chris',
                        'Item Purchased': 'Dog Food',
                        'Cost': 22.50})
purchase_2 = pd.Series({'Name': 'Kevyn',
                        'Item Purchased': 'Kitty Litter',
                        'Cost': 2.50})
purchase_3 = pd.Series({'Name': 'Vinod',
                        'Item Purchased': 'Bird Seed',
                        'Cost': 5.00})

df = pd.DataFrame([purchase_1, purchase_2, purchase_3], index=['Store 1', 'Store 1', 'Store 2'])


# Your answer here
df['Store'] = df.index
df = df.reset_index()
df = df.set_index(['Store', 'Name'])
df = df.drop('index', axis=1)
df.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Cost,Item Purchased
Store,Name,Unnamed: 2_level_1,Unnamed: 3_level_1
Store 1,Chris,22.5,Dog Food
Store 1,Kevyn,2.5,Kitty Litter
Store 2,Vinod,5.0,Bird Seed


In [59]:
df.append(pd.Series({'Cost':3.0, 'Item Purchased': 'Kitty Litter'}, name=('Store 2', 'Kevyn')))

Unnamed: 0_level_0,Unnamed: 1_level_0,Cost,Item Purchased
Store,Name,Unnamed: 2_level_1,Unnamed: 3_level_1
Store 1,Chris,22.5,Dog Food
Store 1,Kevyn,2.5,Kitty Litter
Store 2,Vinod,5.0,Bird Seed
Store 2,Kevyn,3.0,Kitty Litter


### Missing Values

In [60]:
df = pd.read_csv('log.csv')

In [62]:
df.head(30)

Unnamed: 0,time,user,video,playback position,paused,volume
0,1469974424,cheryl,intro.html,5,False,10.0
1,1469974454,cheryl,intro.html,6,,
2,1469974544,cheryl,intro.html,9,,
3,1469974574,cheryl,intro.html,10,,
4,1469977514,bob,intro.html,1,,
5,1469977544,bob,intro.html,1,,
6,1469977574,bob,intro.html,1,,
7,1469977604,bob,intro.html,1,,
8,1469974604,cheryl,intro.html,11,,
9,1469974694,cheryl,intro.html,14,,


In [63]:
df = df.set_index('time')
df = df.sort_index()
df

Unnamed: 0_level_0,user,video,playback position,paused,volume
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1469974424,cheryl,intro.html,5,False,10.0
1469974424,sue,advanced.html,23,False,10.0
1469974454,cheryl,intro.html,6,,
1469974454,sue,advanced.html,24,,
1469974484,cheryl,intro.html,7,,
1469974514,cheryl,intro.html,8,,
1469974524,sue,advanced.html,25,,
1469974544,cheryl,intro.html,9,,
1469974554,sue,advanced.html,26,,
1469974574,cheryl,intro.html,10,,


In [64]:
df=df.reset_index()

In [65]:
df = df.set_index(['time', 'user'])
df

Unnamed: 0_level_0,Unnamed: 1_level_0,video,playback position,paused,volume
time,user,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1469974424,cheryl,intro.html,5,False,10.0
1469974424,sue,advanced.html,23,False,10.0
1469974454,cheryl,intro.html,6,,
1469974454,sue,advanced.html,24,,
1469974484,cheryl,intro.html,7,,
1469974514,cheryl,intro.html,8,,
1469974524,sue,advanced.html,25,,
1469974544,cheryl,intro.html,9,,
1469974554,sue,advanced.html,26,,
1469974574,cheryl,intro.html,10,,
