### Topics covered:
- Random Samples from a dataframe.
- nth smallest/largest method
- Using the where method.
- Apply method used in a function for rows/columns.

In [2]:
import pandas as pd

In [30]:
bond = pd.read_csv('../datasets/jamesbond.csv', index_col='Film')
bond.sort_index(inplace=True)
bond.head(3)

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A View to a Kill,1985,Roger Moore,John Glen,275.2,54.5,9.1
Casino Royale,2006,Daniel Craig,Martin Campbell,581.5,145.3,3.3
Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,


#### Getting a random sample of rows from a dataframe.
- The defaults returns a single random row so you need to set parameters.
- Rows are also the default selection, to choose randomg columsn, use axis=1. This isnt demonstarted as it is intuitive.

In [5]:
# default
bond.sample()

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Dr. No,1962,Sean Connery,Terence Young,448.8,7.0,0.6


In [6]:
# adding paramters (use shift+tab to view), in this case we are selecting the number of rows to be returned.
bond.sample(n=5)

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Goldfinger,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2
Casino Royale,2006,Daniel Craig,Martin Campbell,581.5,145.3,3.3
The Living Daylights,1987,Timothy Dalton,John Glen,313.5,68.8,5.2
Thunderball,1965,Sean Connery,Terence Young,848.1,41.9,4.7
Licence to Kill,1989,Timothy Dalton,John Glen,250.9,56.7,7.9


In [7]:
# pull a random % by adding frac= , this case returns 10% of the total data set
bond.sample(frac=.10)

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
For Your Eyes Only,1981,Roger Moore,John Glen,449.4,60.2,
Dr. No,1962,Sean Connery,Terence Young,448.8,7.0,0.6
On Her Majesty's Secret Service,1969,George Lazenby,Peter R. Hunt,291.5,37.3,0.6


### .nsmallest( ) and .nlargest( ) methods.
- These methods can be used to extract the smallest or largest values in a column.

In [9]:
# extract the 3 movies with the largest box office gross - box office column
bond.nlargest(3, columns='Box Office')

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Skyfall,2012,Daniel Craig,Sam Mendes,943.5,170.2,14.5
Thunderball,1965,Sean Connery,Terence Young,848.1,41.9,4.7
Goldfinger,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2


In [10]:
# get the 2 smallest box office gross
bond.nsmallest(2, columns='Box Office')

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Licence to Kill,1989,Timothy Dalton,John Glen,250.9,56.7,7.9
A View to a Kill,1985,Roger Moore,John Glen,275.2,54.5,9.1


In [11]:
# get the films with the 3 largest budgets 
bond.nlargest(3, columns='Budget')

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Spectre,2015,Daniel Craig,Sam Mendes,726.7,206.3,
Quantum of Solace,2008,Daniel Craig,Marc Forster,514.2,181.4,8.1
Skyfall,2012,Daniel Craig,Sam Mendes,943.5,170.2,14.5


In [12]:
# get hthe nottom 5 salaries
bond.nsmallest(5, columns='Bond Actor Salary')

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Dr. No,1962,Sean Connery,Terence Young,448.8,7.0,0.6
On Her Majesty's Secret Service,1969,George Lazenby,Peter R. Hunt,291.5,37.3,0.6
From Russia with Love,1963,Sean Connery,Terence Young,543.8,12.6,1.6
Goldfinger,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2
Casino Royale,2006,Daniel Craig,Martin Campbell,581.5,145.3,3.3


In [13]:
# get hthe first bond film made
bond[['Year']].nsmallest(1, columns='Year')

Unnamed: 0_level_0,Year
Film,Unnamed: 1_level_1
Dr. No,1962


### Filter with the Where( ) method.
- Returns a slightly different result, returns df with all results but rows that dont meet the condition are Null.

In [14]:
# example of standard filter
mask = bond['Actor'] == 'Sean Connery'

bond[mask]

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Diamonds Are Forever,1971,Sean Connery,Guy Hamilton,442.5,34.7,5.8
Dr. No,1962,Sean Connery,Terence Young,448.8,7.0,0.6
From Russia with Love,1963,Sean Connery,Terence Young,543.8,12.6,1.6
Goldfinger,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2
Never Say Never Again,1983,Sean Connery,Irvin Kershner,380.0,86.0,
Thunderball,1965,Sean Connery,Terence Young,848.1,41.9,4.7
You Only Live Twice,1967,Sean Connery,Lewis Gilbert,514.2,59.9,4.4


In [16]:
# compare the difference and notice the nulls
bond.where(mask).head()

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A View to a Kill,,,,,,
Casino Royale,,,,,,
Casino Royale,,,,,,
Diamonds Are Forever,1971.0,Sean Connery,Guy Hamilton,442.5,34.7,5.8
Die Another Day,,,,,,


In [17]:
# box office great than 800
bond.where(bond['Box Office']>800)

Unnamed: 0_level_0,Year,Actor,Director,Box Office,Budget,Bond Actor Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A View to a Kill,,,,,,
Casino Royale,,,,,,
Casino Royale,,,,,,
Diamonds Are Forever,,,,,,
Die Another Day,,,,,,
Dr. No,,,,,,
For Your Eyes Only,,,,,,
From Russia with Love,,,,,,
GoldenEye,,,,,,
Goldfinger,1964.0,Sean Connery,Guy Hamilton,820.4,18.6,3.2


### Replace the spaces in the column name with underscores.
- This is a must if you use the .query( ) method.
- To use operators, type the literal words. You can also use the "in" operator with a list like SQL.

In [18]:
bond.columns

Index(['Year', 'Actor', 'Director', 'Box Office', 'Budget',
       'Bond Actor Salary'],
      dtype='object')

In [31]:
bond.columns = [column_name.replace(" ", "_") for column_name in bond.columns]

bond.head(2)

Unnamed: 0_level_0,Year,Actor,Director,Box_Office,Budget,Bond_Actor_Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A View to a Kill,1985,Roger Moore,John Glen,275.2,54.5,9.1
Casino Royale,2006,Daniel Craig,Martin Campbell,581.5,145.3,3.3


In [32]:
# use the query method with 2 conditions, operators are spelled out in literal terms
bond.query(' Actor == "Roger Moore" and Director == "John Glen" ')

Unnamed: 0_level_0,Year,Actor,Director,Box_Office,Budget,Bond_Actor_Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A View to a Kill,1985,Roger Moore,John Glen,275.2,54.5,9.1
For Your Eyes Only,1981,Roger Moore,John Glen,449.4,60.2,
Octopussy,1983,Roger Moore,John Glen,373.8,53.9,7.8


### The apply( ) method.
- Can be used on both series and dataframes.
- You can create a function to create new columns based on conditions of the row or column.

#### Covert Box_Office to millions by adding an M after the number. The column will need to be converted to a string and we can do this with a function and apply it to the dataframe column. This example is to apply to a column follwed by applying to more than 1 column at the same time.

In [33]:
# function for converting to string and adding M
def convert_to_m(num):
    return str(num) + "M"

# APPLY TO THE COLUMN
bond['Box_Office'].apply(convert_to_m)

Film
A View to a Kill                   275.2M
Casino Royale                      581.5M
Casino Royale                      315.0M
Diamonds Are Forever               442.5M
Die Another Day                    465.4M
Dr. No                             448.8M
For Your Eyes Only                 449.4M
From Russia with Love              543.8M
GoldenEye                          518.5M
Goldfinger                         820.4M
Licence to Kill                    250.9M
Live and Let Die                   460.3M
Moonraker                          535.0M
Never Say Never Again              380.0M
Octopussy                          373.8M
On Her Majesty's Secret Service    291.5M
Quantum of Solace                  514.2M
Skyfall                            943.5M
Spectre                            726.7M
The Living Daylights               313.5M
The Man with the Golden Gun        334.0M
The Spy Who Loved Me               533.0M
The World Is Not Enough            439.5M
Thunderball                  

In [34]:
# to keep the above values from a series and have it showo in the dataframe, assign back to bond['Box_Office']
bond['Box_Office'] = bond['Box_Office'].apply(convert_to_m)
bond.head(3)

Unnamed: 0_level_0,Year,Actor,Director,Box_Office,Budget,Bond_Actor_Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A View to a Kill,1985,Roger Moore,John Glen,275.2M,54.5,9.1
Casino Royale,2006,Daniel Craig,Martin Campbell,581.5M,145.3,3.3
Casino Royale,1967,David Niven,Ken Hughes,315.0M,85.0,


In [35]:
# apply the sem method to the salary column
bond['Bond_Actor_Salary'] = bond['Bond_Actor_Salary'].apply(convert_to_m)
bond.head(3)

Unnamed: 0_level_0,Year,Actor,Director,Box_Office,Budget,Bond_Actor_Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A View to a Kill,1985,Roger Moore,John Glen,275.2M,54.5,9.1M
Casino Royale,2006,Daniel Craig,Martin Campbell,581.5M,145.3,3.3M
Casino Royale,1967,David Niven,Ken Hughes,315.0M,85.0,nanM


In [37]:
# import the df to re-assign all cols at the same time
bond = pd.read_csv('../datasets/jamesbond.csv', index_col='Film')
bond.sort_index(inplace=True)
bond.columns = [column_name.replace(" ", "_") for column_name in bond.columns]

In [38]:
# The above can be applied to several columns at once using a for loop. YOU HAVE TO RE-Run the import@ otherwise you'll get 2M's!
# to acheive this:
columns = ['Box_Office', 'Budget', 'Bond_Actor_Salary']

for col in columns:
    bond[col] = bond[col].apply(convert_to_m)

In [39]:
# and done! You'll see all 3 columns now have the string conversion and the "M" applied to reperesnt millions.
bond.head()

Unnamed: 0_level_0,Year,Actor,Director,Box_Office,Budget,Bond_Actor_Salary
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A View to a Kill,1985,Roger Moore,John Glen,275.2M,54.5M,9.1M
Casino Royale,2006,Daniel Craig,Martin Campbell,581.5M,145.3M,3.3M
Casino Royale,1967,David Niven,Ken Hughes,315.0M,85.0M,nanM
Diamonds Are Forever,1971,Sean Connery,Guy Hamilton,442.5M,34.7M,5.8M
Die Another Day,2002,Pierce Brosnan,Lee Tamahori,465.4M,154.2M,17.9M


### Apply to rows
- Create a custom column that is derived from values within the row. Conditions can be applied.

In [46]:
# import the df again
bond = pd.read_csv('../datasets/jamesbond.csv', index_col='Film')
bond.sort_index(inplace=True)
bond.columns = [column_name.replace(" ", "_") for column_name in bond.columns]

In [47]:
# def a custom col based on a classification using conditional logic
def movie_class(row):
    
    actor = row[1]
    budget = row[4]
    
    if actor == "Pierce Bronsman":
        return "Regarded as the best."
    elif actor == "Roger Moore" and budget > 40.0:
        return "Regarded as enjoyable."
    elif actor == "Sean Connery":
        return "Regarded as classic."
    else:
        return "other"
    
# returns a series
bond.apply(movie_class, axis =1)

Film
A View to a Kill                   Regarded as enjoyable.
Casino Royale                                       other
Casino Royale                                       other
Diamonds Are Forever                 Regarded as classic.
Die Another Day                                     other
Dr. No                               Regarded as classic.
For Your Eyes Only                 Regarded as enjoyable.
From Russia with Love                Regarded as classic.
GoldenEye                                           other
Goldfinger                           Regarded as classic.
Licence to Kill                                     other
Live and Let Die                                    other
Moonraker                          Regarded as enjoyable.
Never Say Never Again                Regarded as classic.
Octopussy                          Regarded as enjoyable.
On Her Majesty's Secret Service                     other
Quantum of Solace                                   other
Skyfall  

In [48]:
# create the new column
bond['Class'] = bond.apply(movie_class, axis =1)

In [49]:
bond.head(10)

Unnamed: 0_level_0,Year,Actor,Director,Box_Office,Budget,Bond_Actor_Salary,Class
Film,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
A View to a Kill,1985,Roger Moore,John Glen,275.2,54.5,9.1,Regarded as enjoyable.
Casino Royale,2006,Daniel Craig,Martin Campbell,581.5,145.3,3.3,other
Casino Royale,1967,David Niven,Ken Hughes,315.0,85.0,,other
Diamonds Are Forever,1971,Sean Connery,Guy Hamilton,442.5,34.7,5.8,Regarded as classic.
Die Another Day,2002,Pierce Brosnan,Lee Tamahori,465.4,154.2,17.9,other
Dr. No,1962,Sean Connery,Terence Young,448.8,7.0,0.6,Regarded as classic.
For Your Eyes Only,1981,Roger Moore,John Glen,449.4,60.2,,Regarded as enjoyable.
From Russia with Love,1963,Sean Connery,Terence Young,543.8,12.6,1.6,Regarded as classic.
GoldenEye,1995,Pierce Brosnan,Martin Campbell,518.5,76.9,5.1,other
Goldfinger,1964,Sean Connery,Guy Hamilton,820.4,18.6,3.2,Regarded as classic.
