In [10]:
import pandas as pd
import numpy as np

## Quering a Series

In [2]:
sports = {'Archery': 'Bhutan',
          'Golf': 'Scotland',
          'Sumo': 'Japan',
          'Taekwondo': 'South Korea'}

In [8]:
s1 = pd.Series(sports)

In [9]:
s1

Archery           Bhutan
Golf            Scotland
Sumo               Japan
Taekwondo    South Korea
dtype: object

In [13]:
s2 = pd.Series(np.random.randint(0,1000,10000))

In [15]:
print(len(s2))
s2.head(5)

10000


0    853
1     88
2    577
3    516
4    289
dtype: int64

In [18]:
%%timeit -n 100 #limit to 100 loops
total = 0
for item in s2:
    total += item

1.05 ms ± 66.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [19]:
%%timeit -n 100 #limit to 100 loops
total = s2.sum() #vectorized operation - way faster!

71.2 µs ± 18.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [24]:
%%timeit -n 10
s2 = pd.Series(np.random.randint(0,1000,10000))
for index, value in s2.iteritems():
    s2.loc[index] = value+2

497 ms ± 5.08 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [26]:
%%timeit -n 10
s2 = pd.Series(np.random.randint(0,1000,10000))
s2 += 2

441 µs ± 75.4 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


---
## DataFrame Data Structure

`Loading & Operations` For the purchase records from the pet store, how would you update the DataFrame, applying a discount of 20% across all the values in the 'Cost' column?

In [38]:
purchase_1 = pd.Series({'Name': 'Chris',
                        'Item Purchased': 'Dog Food',
                        'Cost': 22.50})
purchase_2 = pd.Series({'Name': 'Kevyn',
                        'Item Purchased': 'Kitty Litter',
                        'Cost': 2.50})
purchase_3 = pd.Series({'Name': 'Vinod',
                        'Item Purchased': 'Bird Seed',
                        'Cost': 5.00})

df = pd.DataFrame([purchase_1, purchase_2, purchase_3], index=['Store 1', 'Store 1', 'Store 2'])

In [39]:
df

Unnamed: 0,Name,Item Purchased,Cost
Store 1,Chris,Dog Food,22.5
Store 1,Kevyn,Kitty Litter,2.5
Store 2,Vinod,Bird Seed,5.0


In [40]:
# Your answer here
df.loc[:,'Cost'] *= (1 - 0.2)  

In [41]:
df

Unnamed: 0,Name,Item Purchased,Cost
Store 1,Chris,Dog Food,18.0
Store 1,Kevyn,Kitty Litter,2.0
Store 2,Vinod,Bird Seed,4.0


`Querying` Write a query to return all of the names of people who bought products worth more than $3.00.

In [42]:
purchase_1 = pd.Series({'Name': 'Chris',
                        'Item Purchased': 'Dog Food',
                        'Cost': 22.50})
purchase_2 = pd.Series({'Name': 'Kevyn',
                        'Item Purchased': 'Kitty Litter',
                        'Cost': 2.50})
purchase_3 = pd.Series({'Name': 'Vinod',
                        'Item Purchased': 'Bird Seed',
                        'Cost': 5.00})

df = pd.DataFrame([purchase_1, purchase_2, purchase_3], index=['Store 1', 'Store 1', 'Store 2'])

In [46]:
# Your code here
df.query('Cost > 3.00')['Name']

Store 1    Chris
Store 2    Vinod
Name: Name, dtype: object

In [51]:
df['Name'][df['Cost'] > 3.00]

Store 1    Chris
Store 2    Vinod
Name: Name, dtype: object

`Indexing` Reindex the purchase records DataFrame to be indexed hierarchically, first by store, then by person. Name these indexes 'Location' and 'Name'. Then add a new entry to it with the value of:

Name: 'Kevyn', Item Purchased: 'Kitty Food', Cost: 3.00 Location: 'Store 2'.

In [97]:
purchase_1 = pd.Series({'Name': 'Chris',
                        'Item Purchased': 'Dog Food',
                        'Cost': 22.50})
purchase_2 = pd.Series({'Name': 'Kevyn',
                        'Item Purchased': 'Kitty Litter',
                        'Cost': 2.50})
purchase_3 = pd.Series({'Name': 'Vinod',
                        'Item Purchased': 'Bird Seed',
                        'Cost': 5.00})

df = pd.DataFrame([purchase_1, purchase_2, purchase_3], index=['Store 1', 'Store 1', 'Store 2'])

In [98]:
# Your answer here
df.set_index([df.index, 'Name'], inplace = True)
df.index.names = ['Location', 'Name']
df

Unnamed: 0_level_0,Unnamed: 1_level_0,Item Purchased,Cost
Location,Name,Unnamed: 2_level_1,Unnamed: 3_level_1
Store 1,Chris,Dog Food,22.5
Store 1,Kevyn,Kitty Litter,2.5
Store 2,Vinod,Bird Seed,5.0


In [99]:
df = df.append(pd.Series(data={'Item Purchased': 'Kitty Food', 'Cost': 3.00}, name=('Store2', 'Kevyn')))
df

Unnamed: 0_level_0,Unnamed: 1_level_0,Item Purchased,Cost
Location,Name,Unnamed: 2_level_1,Unnamed: 3_level_1
Store 1,Chris,Dog Food,22.5
Store 1,Kevyn,Kitty Litter,2.5
Store 2,Vinod,Bird Seed,5.0
Store2,Kevyn,Kitty Food,3.0
