# Pandas
* Python package for data analysis
* Built on top of Numpy
* Powerful Series and DataFrame classes
    * A Series object is a column (vector) of fixed size and homogeneous data
    * A DataFrame object is a 2-dimensional table (matrix) of mutable size and heterogeneous data
        * A row is also known as an entry
        * Each entry has an index
        * A column is also known as an attribute
    * Can be used to clean, transform, and analyze the data

## 1. Use

In [42]:
import numpy as np
import pandas as pd

## 2. DataFrame creation
### 2.1. From a Python dictionary
* By default, row indices are 0 to n
* Indices can be specified in the constructor

In [43]:
data = {'apples': [3, 2, 0, 1],
        'oranges': [0, 3, 7, 2]}
df1 = pd.DataFrame(data)
print(df1)
df2 = pd.DataFrame(data, index = ['June', 'Robert', 'Lily', 'David'])
print(df2)

   apples  oranges
0       3        0
1       2        3
2       0        7
3       1        2
        apples  oranges
June         3        0
Robert       2        3
Lily         0        7
David        1        2


### 2.2. From a Numpy array
* You should provide an attribute name for the columns

In [44]:
A = np.array([[3, 0], [2, 3], [0, 7], [1, 2]])
df3 = pd.DataFrame(data = A, columns = ['apples', 'oranges'])
print(df3)

   apples  oranges
0       3        0
1       2        3
2       0        7
3       1        2


### 2.3. From a .csv or .json file
* Indicate which column in the file are the indices

In [45]:
df4 = pd.read_csv('datasets\purchases.csv', index_col = 0)
print(df4)

        apples  oranges
June         3        0
Robert       2        3
Lily         0        7
David        1        2


## 3. Viewing the data
### 3.1. First 5 entries

In [46]:
df = pd.read_csv('datasets\IMDB-Movie-Data.csv', index_col = 'Title')
df.head()

Unnamed: 0_level_0,Rank,Genre,Description,Director,Actors,Year,Runtime (Minutes),Rating,Votes,Revenue (Millions),Metascore
Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
Guardians of the Galaxy,1,"Action,Adventure,Sci-Fi",A group of intergalactic criminals are forced ...,James Gunn,"Chris Pratt, Vin Diesel, Bradley Cooper, Zoe S...",2014,121,8.1,757074,333.13,76.0
Prometheus,2,"Adventure,Mystery,Sci-Fi","Following clues to the origin of mankind, a te...",Ridley Scott,"Noomi Rapace, Logan Marshall-Green, Michael Fa...",2012,124,7.0,485820,126.46,65.0
Split,3,"Horror,Thriller",Three girls are kidnapped by a man with a diag...,M. Night Shyamalan,"James McAvoy, Anya Taylor-Joy, Haley Lu Richar...",2016,117,7.3,157606,138.12,62.0
Sing,4,"Animation,Comedy,Family","In a city of humanoid animals, a hustling thea...",Christophe Lourdelet,"Matthew McConaughey,Reese Witherspoon, Seth Ma...",2016,108,7.2,60545,270.32,59.0
Suicide Squad,5,"Action,Adventure,Fantasy",A secret government agency recruits some of th...,David Ayer,"Will Smith, Jared Leto, Margot Robbie, Viola D...",2016,123,6.2,393727,325.02,40.0


### 3.2. Information about the attributes

In [47]:
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 1000 entries, Guardians of the Galaxy to Nine Lives
Data columns (total 11 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   Rank                1000 non-null   int64  
 1   Genre               1000 non-null   object 
 2   Description         1000 non-null   object 
 3   Director            1000 non-null   object 
 4   Actors              1000 non-null   object 
 5   Year                1000 non-null   int64  
 6   Runtime (Minutes)   1000 non-null   int64  
 7   Rating              1000 non-null   float64
 8   Votes               1000 non-null   int64  
 9   Revenue (Millions)  872 non-null    float64
 10  Metascore           936 non-null    float64
dtypes: float64(3), int64(4), object(4)
memory usage: 93.8+ KB


## 4. Fixing the data
It can be seen that even though there are 1000 entries in the data, two of the fields (Revenue and Metascore) are incomplete. This will cause problems when trying to use the data for machine learning.
There are 3 different approaches to fixing the data
### 4.1. Get rid of the corresponding entries

In [48]:
dfClean1 = df.dropna(subset = ['Revenue (Millions)', 'Metascore'])
dfClean1.info()

<class 'pandas.core.frame.DataFrame'>
Index: 838 entries, Guardians of the Galaxy to Nine Lives
Data columns (total 11 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   Rank                838 non-null    int64  
 1   Genre               838 non-null    object 
 2   Description         838 non-null    object 
 3   Director            838 non-null    object 
 4   Actors              838 non-null    object 
 5   Year                838 non-null    int64  
 6   Runtime (Minutes)   838 non-null    int64  
 7   Rating              838 non-null    float64
 8   Votes               838 non-null    int64  
 9   Revenue (Millions)  838 non-null    float64
 10  Metascore           838 non-null    float64
dtypes: float64(3), int64(4), object(4)
memory usage: 78.6+ KB


### 4.2. Get rid of the whole attribute

In [49]:
dfClean2 = df.drop('Revenue (Millions)', axis = 1)
dfClean2 = dfClean2.drop('Metascore', axis = 1)
dfClean2.info()

<class 'pandas.core.frame.DataFrame'>
Index: 1000 entries, Guardians of the Galaxy to Nine Lives
Data columns (total 9 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   Rank               1000 non-null   int64  
 1   Genre              1000 non-null   object 
 2   Description        1000 non-null   object 
 3   Director           1000 non-null   object 
 4   Actors             1000 non-null   object 
 5   Year               1000 non-null   int64  
 6   Runtime (Minutes)  1000 non-null   int64  
 7   Rating             1000 non-null   float64
 8   Votes              1000 non-null   int64  
dtypes: float64(1), int64(4), object(4)
memory usage: 78.1+ KB


### 4.3. Set the values to some constant (zero, the mean, the median, etc.)

In [50]:
medianRevenue = df["Revenue (Millions)"].median()
medianMetascore = df["Metascore"].median()
dfClean3 = df.fillna({"Revenue (Millions)":medianRevenue, "Metascore":medianMetascore})
dfClean3.info()

<class 'pandas.core.frame.DataFrame'>
Index: 1000 entries, Guardians of the Galaxy to Nine Lives
Data columns (total 11 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   Rank                1000 non-null   int64  
 1   Genre               1000 non-null   object 
 2   Description         1000 non-null   object 
 3   Director            1000 non-null   object 
 4   Actors              1000 non-null   object 
 5   Year                1000 non-null   int64  
 6   Runtime (Minutes)   1000 non-null   int64  
 7   Rating              1000 non-null   float64
 8   Votes               1000 non-null   int64  
 9   Revenue (Millions)  1000 non-null   float64
 10  Metascore           1000 non-null   float64
dtypes: float64(3), int64(4), object(4)
memory usage: 93.8+ KB


## 5. Statistics about the numerical data
### 5.1. Summary
Null values are ignored.

In [51]:
df.describe()

Unnamed: 0,Rank,Year,Runtime (Minutes),Rating,Votes,Revenue (Millions),Metascore
count,1000.0,1000.0,1000.0,1000.0,1000.0,872.0,936.0
mean,500.5,2012.783,113.172,6.7232,169808.3,82.956376,58.985043
std,288.819436,3.205962,18.810908,0.945429,188762.6,103.25354,17.194757
min,1.0,2006.0,66.0,1.9,61.0,0.0,11.0
25%,250.75,2010.0,100.0,6.2,36309.0,13.27,47.0
50%,500.5,2014.0,111.0,6.8,110799.0,47.985,59.5
75%,750.25,2016.0,123.0,7.4,239909.8,113.715,72.0
max,1000.0,2016.0,191.0,9.0,1791916.0,936.63,100.0


### 5.2. Correlations
The correlation coefficient ranges from -1 to 1. When it is close to 1, it means that there is a strong positive correlation. For example, the revenue tends to go up when the number of votes increases. When the coefficient is close to -1, it means that there is a strong negative correlation

In [52]:
df.corr()

Unnamed: 0,Rank,Year,Runtime (Minutes),Rating,Votes,Revenue (Millions),Metascore
Rank,1.0,-0.261605,-0.221739,-0.219555,-0.283876,-0.271592,-0.191869
Year,-0.261605,1.0,-0.1649,-0.211219,-0.411904,-0.12679,-0.079305
Runtime (Minutes),-0.221739,-0.1649,1.0,0.392214,0.407062,0.267953,0.211978
Rating,-0.219555,-0.211219,0.392214,1.0,0.511537,0.217654,0.631897
Votes,-0.283876,-0.411904,0.407062,0.511537,1.0,0.639661,0.325684
Revenue (Millions),-0.271592,-0.12679,0.267953,0.217654,0.639661,1.0,0.142397
Metascore,-0.191869,-0.079305,0.211978,0.631897,0.325684,0.142397,1.0


## 6. Indexing and slicing
Indexing and slicing is similar to what can be done with Python dictionaries.
### 6.1. Indexing by column name
* Extract one column of the DataFrame into a Series

In [53]:
genre = df['Genre']
genre

Title
Guardians of the Galaxy     Action,Adventure,Sci-Fi
Prometheus                 Adventure,Mystery,Sci-Fi
Split                               Horror,Thriller
Sing                        Animation,Comedy,Family
Suicide Squad              Action,Adventure,Fantasy
                                     ...           
Secret in Their Eyes            Crime,Drama,Mystery
Hostel: Part II                              Horror
Step Up 2: The Streets          Drama,Music,Romance
Search Party                       Adventure,Comedy
Nine Lives                    Comedy,Family,Fantasy
Name: Genre, Length: 1000, dtype: object

* Extract one column of the DataFrame into another DataFrame

In [54]:
genreDF = df[['Genre']]
genreDF

Unnamed: 0_level_0,Genre
Title,Unnamed: 1_level_1
Guardians of the Galaxy,"Action,Adventure,Sci-Fi"
Prometheus,"Adventure,Mystery,Sci-Fi"
Split,"Horror,Thriller"
Sing,"Animation,Comedy,Family"
Suicide Squad,"Action,Adventure,Fantasy"
...,...
Secret in Their Eyes,"Crime,Drama,Mystery"
Hostel: Part II,Horror
Step Up 2: The Streets,"Drama,Music,Romance"
Search Party,"Adventure,Comedy"


* Extract two columns of the DataFrame into another DataFrame

In [55]:
newDF = df[['Genre', 'Rating']]
newDF

Unnamed: 0_level_0,Genre,Rating
Title,Unnamed: 1_level_1,Unnamed: 2_level_1
Guardians of the Galaxy,"Action,Adventure,Sci-Fi",8.1
Prometheus,"Adventure,Mystery,Sci-Fi",7.0
Split,"Horror,Thriller",7.3
Sing,"Animation,Comedy,Family",7.2
Suicide Squad,"Action,Adventure,Fantasy",6.2
...,...,...
Secret in Their Eyes,"Crime,Drama,Mystery",6.2
Hostel: Part II,Horror,5.5
Step Up 2: The Streets,"Drama,Music,Romance",6.2
Search Party,"Adventure,Comedy",5.6


### 6.2. Indexing by column number
* Extract one column into a Series

In [56]:
genre = df.iloc[:,1]
genre

Title
Guardians of the Galaxy     Action,Adventure,Sci-Fi
Prometheus                 Adventure,Mystery,Sci-Fi
Split                               Horror,Thriller
Sing                        Animation,Comedy,Family
Suicide Squad              Action,Adventure,Fantasy
                                     ...           
Secret in Their Eyes            Crime,Drama,Mystery
Hostel: Part II                              Horror
Step Up 2: The Streets          Drama,Music,Romance
Search Party                       Adventure,Comedy
Nine Lives                    Comedy,Family,Fantasy
Name: Genre, Length: 1000, dtype: object

* Extract multiple columns into a DataFrame using slicing

In [57]:
columns = df.iloc[:,1:4]
columns

Unnamed: 0_level_0,Genre,Description,Director
Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Guardians of the Galaxy,"Action,Adventure,Sci-Fi",A group of intergalactic criminals are forced ...,James Gunn
Prometheus,"Adventure,Mystery,Sci-Fi","Following clues to the origin of mankind, a te...",Ridley Scott
Split,"Horror,Thriller",Three girls are kidnapped by a man with a diag...,M. Night Shyamalan
Sing,"Animation,Comedy,Family","In a city of humanoid animals, a hustling thea...",Christophe Lourdelet
Suicide Squad,"Action,Adventure,Fantasy",A secret government agency recruits some of th...,David Ayer
...,...,...,...
Secret in Their Eyes,"Crime,Drama,Mystery","A tight-knit team of rising investigators, alo...",Billy Ray
Hostel: Part II,Horror,Three American college students studying abroa...,Eli Roth
Step Up 2: The Streets,"Drama,Music,Romance",Romantic sparks occur between two dance studen...,Jon M. Chu
Search Party,"Adventure,Comedy",A pair of friends embark on a mission to reuni...,Scot Armstrong


### 6.3. Indexing by row name
* Extract one row into a Series

In [58]:
movie = df.loc['Prometheus']
movie

Rank                                                                  2
Genre                                          Adventure,Mystery,Sci-Fi
Description           Following clues to the origin of mankind, a te...
Director                                                   Ridley Scott
Actors                Noomi Rapace, Logan Marshall-Green, Michael Fa...
Year                                                               2012
Runtime (Minutes)                                                   124
Rating                                                                7
Votes                                                            485820
Revenue (Millions)                                               126.46
Metascore                                                            65
Name: Prometheus, dtype: object

* Extract multiple rows into a DataFrame using slicing

In [59]:
movies = df.loc['Prometheus':'Sing']
movies

Unnamed: 0_level_0,Rank,Genre,Description,Director,Actors,Year,Runtime (Minutes),Rating,Votes,Revenue (Millions),Metascore
Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
Prometheus,2,"Adventure,Mystery,Sci-Fi","Following clues to the origin of mankind, a te...",Ridley Scott,"Noomi Rapace, Logan Marshall-Green, Michael Fa...",2012,124,7.0,485820,126.46,65.0
Split,3,"Horror,Thriller",Three girls are kidnapped by a man with a diag...,M. Night Shyamalan,"James McAvoy, Anya Taylor-Joy, Haley Lu Richar...",2016,117,7.3,157606,138.12,62.0
Sing,4,"Animation,Comedy,Family","In a city of humanoid animals, a hustling thea...",Christophe Lourdelet,"Matthew McConaughey,Reese Witherspoon, Seth Ma...",2016,108,7.2,60545,270.32,59.0


### 6.3. Indexing by row index
* Extract one row into a Series

In [60]:
movie = df.iloc[5]
movie

Rank                                                                  6
Genre                                          Action,Adventure,Fantasy
Description           European mercenaries searching for black powde...
Director                                                    Yimou Zhang
Actors                    Matt Damon, Tian Jing, Willem Dafoe, Andy Lau
Year                                                               2016
Runtime (Minutes)                                                   103
Rating                                                              6.1
Votes                                                             56036
Revenue (Millions)                                                45.13
Metascore                                                            42
Name: The Great Wall, dtype: object

* Extract multiple rows into a DataFrame using slicing

In [61]:
movies = df.iloc[20:25]
movies

Unnamed: 0_level_0,Rank,Genre,Description,Director,Actors,Year,Runtime (Minutes),Rating,Votes,Revenue (Millions),Metascore
Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
Gold,21,"Adventure,Drama,Thriller","Kenny Wells, a prospector desperate for a luck...",Stephen Gaghan,"Matthew McConaughey, Edgar Ramírez, Bryce Dall...",2016,120,6.7,19053,7.22,49.0
Manchester by the Sea,22,Drama,A depressed uncle is asked to take care of his...,Kenneth Lonergan,"Casey Affleck, Michelle Williams, Kyle Chandle...",2016,137,7.9,134213,47.7,96.0
Hounds of Love,23,"Crime,Drama,Horror",A cold-blooded predatory couple while cruising...,Ben Young,"Emma Booth, Ashleigh Cummings, Stephen Curry,S...",2016,108,6.7,1115,,72.0
Trolls,24,"Animation,Adventure,Comedy","After the Bergens invade Troll Village, Poppy,...",Walt Dohrn,"Anna Kendrick, Justin Timberlake,Zooey Deschan...",2016,92,6.5,38552,153.69,56.0
Independence Day: Resurgence,25,"Action,Adventure,Sci-Fi",Two decades after the first Independence Day i...,Roland Emmerich,"Liam Hemsworth, Jeff Goldblum, Bill Pullman,Ma...",2016,120,5.3,127553,103.14,32.0


### 6.4. Conditional indexing

In [62]:
RS = df[df['Director'] == 'Ridley Scott']
RS

Unnamed: 0_level_0,Rank,Genre,Description,Director,Actors,Year,Runtime (Minutes),Rating,Votes,Revenue (Millions),Metascore
Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
Prometheus,2,"Adventure,Mystery,Sci-Fi","Following clues to the origin of mankind, a te...",Ridley Scott,"Noomi Rapace, Logan Marshall-Green, Michael Fa...",2012,124,7.0,485820,126.46,65.0
The Martian,103,"Adventure,Drama,Sci-Fi",An astronaut becomes stranded on Mars after hi...,Ridley Scott,"Matt Damon, Jessica Chastain, Kristen Wiig, Ka...",2015,144,8.0,556097,228.43,80.0
Robin Hood,388,"Action,Adventure,Drama","In 12th century England, Robin and his band of...",Ridley Scott,"Russell Crowe, Cate Blanchett, Matthew Macfady...",2010,140,6.7,221117,105.22,53.0
American Gangster,471,"Biography,Crime,Drama","In 1970s America, a detective works to bring d...",Ridley Scott,"Denzel Washington, Russell Crowe, Chiwetel Eji...",2007,157,7.8,337835,130.13,76.0
Exodus: Gods and Kings,517,"Action,Adventure,Drama",The defiant leader Moses rises up against the ...,Ridley Scott,"Christian Bale, Joel Edgerton, Ben Kingsley, S...",2014,150,6.0,137299,65.01,52.0
The Counselor,522,"Crime,Drama,Thriller",A lawyer finds himself in over his head when h...,Ridley Scott,"Michael Fassbender, Penélope Cruz, Cameron Dia...",2013,117,5.3,84927,16.97,48.0
A Good Year,531,"Comedy,Drama,Romance",A British investment broker inherits his uncle...,Ridley Scott,"Russell Crowe, Abbie Cornish, Albert Finney, M...",2006,117,6.9,74674,7.46,47.0
Body of Lies,738,"Action,Drama,Romance",A CIA agent on the ground in Jordan hunts down...,Ridley Scott,"Leonardo DiCaprio, Russell Crowe, Mark Strong,...",2008,128,7.1,182305,39.38,57.0


In [63]:
goodMovies = df[df['Rating'] >= 8.6]
goodMovies

Unnamed: 0_level_0,Rank,Genre,Description,Director,Actors,Year,Runtime (Minutes),Rating,Votes,Revenue (Millions),Metascore
Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
Interstellar,37,"Adventure,Drama,Sci-Fi",A team of explorers travel through a wormhole ...,Christopher Nolan,"Matthew McConaughey, Anne Hathaway, Jessica Ch...",2014,169,8.6,1047747,187.99,74.0
The Dark Knight,55,"Action,Crime,Drama",When the menace known as the Joker wreaks havo...,Christopher Nolan,"Christian Bale, Heath Ledger, Aaron Eckhart,Mi...",2008,152,9.0,1791916,533.32,82.0
Inception,81,"Action,Adventure,Sci-Fi","A thief, who steals corporate secrets through ...",Christopher Nolan,"Leonardo DiCaprio, Joseph Gordon-Levitt, Ellen...",2010,148,8.8,1583625,292.57,74.0
Kimi no na wa,97,"Animation,Drama,Fantasy",Two strangers find themselves linked in a biza...,Makoto Shinkai,"Ryûnosuke Kamiki, Mone Kamishiraishi, Ryô Nari...",2016,106,8.6,34110,4.68,79.0
Dangal,118,"Action,Biography,Drama",Former wrestler Mahavir Singh Phogat and his t...,Nitesh Tiwari,"Aamir Khan, Sakshi Tanwar, Fatima Sana Shaikh,...",2016,161,8.8,48969,11.15,
The Intouchables,250,"Biography,Comedy,Drama",After he becomes a quadriplegic from a paragli...,Olivier Nakache,"François Cluzet, Omar Sy, Anne Le Ny, Audrey F...",2011,112,8.6,557965,13.18,57.0


## 7. Creating new attributes
* Combine attributes

In [64]:
df['Revenue per second ($)'] = (df['Revenue (Millions)'] * 1000000) / (df['Runtime (Minutes)'] * 60)
df

Unnamed: 0_level_0,Rank,Genre,Description,Director,Actors,Year,Runtime (Minutes),Rating,Votes,Revenue (Millions),Metascore,Revenue per second ($)
Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
Guardians of the Galaxy,1,"Action,Adventure,Sci-Fi",A group of intergalactic criminals are forced ...,James Gunn,"Chris Pratt, Vin Diesel, Bradley Cooper, Zoe S...",2014,121,8.1,757074,333.13,76.0,45885.674931
Prometheus,2,"Adventure,Mystery,Sci-Fi","Following clues to the origin of mankind, a te...",Ridley Scott,"Noomi Rapace, Logan Marshall-Green, Michael Fa...",2012,124,7.0,485820,126.46,65.0,16997.311828
Split,3,"Horror,Thriller",Three girls are kidnapped by a man with a diag...,M. Night Shyamalan,"James McAvoy, Anya Taylor-Joy, Haley Lu Richar...",2016,117,7.3,157606,138.12,62.0,19675.213675
Sing,4,"Animation,Comedy,Family","In a city of humanoid animals, a hustling thea...",Christophe Lourdelet,"Matthew McConaughey,Reese Witherspoon, Seth Ma...",2016,108,7.2,60545,270.32,59.0,41716.049383
Suicide Squad,5,"Action,Adventure,Fantasy",A secret government agency recruits some of th...,David Ayer,"Will Smith, Jared Leto, Margot Robbie, Viola D...",2016,123,6.2,393727,325.02,40.0,44040.650407
...,...,...,...,...,...,...,...,...,...,...,...,...
Secret in Their Eyes,996,"Crime,Drama,Mystery","A tight-knit team of rising investigators, alo...",Billy Ray,"Chiwetel Ejiofor, Nicole Kidman, Julia Roberts...",2015,111,6.2,27585,,45.0,
Hostel: Part II,997,Horror,Three American college students studying abroa...,Eli Roth,"Lauren German, Heather Matarazzo, Bijou Philli...",2007,94,5.5,73152,17.54,46.0,3109.929078
Step Up 2: The Streets,998,"Drama,Music,Romance",Romantic sparks occur between two dance studen...,Jon M. Chu,"Robert Hoffman, Briana Evigan, Cassie Ventura,...",2008,98,6.2,70699,58.01,50.0,9865.646259
Search Party,999,"Adventure,Comedy",A pair of friends embark on a mission to reuni...,Scot Armstrong,"Adam Pally, T.J. Miller, Thomas Middleditch,Sh...",2014,93,5.6,4881,,22.0,


* Add new attributes by apply a function to an attribute

In [65]:
def add10(value):
    return value + 10
df['Metascore'] = df['Metascore'].apply(add10)
df

Unnamed: 0_level_0,Rank,Genre,Description,Director,Actors,Year,Runtime (Minutes),Rating,Votes,Revenue (Millions),Metascore,Revenue per second ($)
Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
Guardians of the Galaxy,1,"Action,Adventure,Sci-Fi",A group of intergalactic criminals are forced ...,James Gunn,"Chris Pratt, Vin Diesel, Bradley Cooper, Zoe S...",2014,121,8.1,757074,333.13,86.0,45885.674931
Prometheus,2,"Adventure,Mystery,Sci-Fi","Following clues to the origin of mankind, a te...",Ridley Scott,"Noomi Rapace, Logan Marshall-Green, Michael Fa...",2012,124,7.0,485820,126.46,75.0,16997.311828
Split,3,"Horror,Thriller",Three girls are kidnapped by a man with a diag...,M. Night Shyamalan,"James McAvoy, Anya Taylor-Joy, Haley Lu Richar...",2016,117,7.3,157606,138.12,72.0,19675.213675
Sing,4,"Animation,Comedy,Family","In a city of humanoid animals, a hustling thea...",Christophe Lourdelet,"Matthew McConaughey,Reese Witherspoon, Seth Ma...",2016,108,7.2,60545,270.32,69.0,41716.049383
Suicide Squad,5,"Action,Adventure,Fantasy",A secret government agency recruits some of th...,David Ayer,"Will Smith, Jared Leto, Margot Robbie, Viola D...",2016,123,6.2,393727,325.02,50.0,44040.650407
...,...,...,...,...,...,...,...,...,...,...,...,...
Secret in Their Eyes,996,"Crime,Drama,Mystery","A tight-knit team of rising investigators, alo...",Billy Ray,"Chiwetel Ejiofor, Nicole Kidman, Julia Roberts...",2015,111,6.2,27585,,55.0,
Hostel: Part II,997,Horror,Three American college students studying abroa...,Eli Roth,"Lauren German, Heather Matarazzo, Bijou Philli...",2007,94,5.5,73152,17.54,56.0,3109.929078
Step Up 2: The Streets,998,"Drama,Music,Romance",Romantic sparks occur between two dance studen...,Jon M. Chu,"Robert Hoffman, Briana Evigan, Cassie Ventura,...",2008,98,6.2,70699,58.01,60.0,9865.646259
Search Party,999,"Adventure,Comedy",A pair of friends embark on a mission to reuni...,Scot Armstrong,"Adam Pally, T.J. Miller, Thomas Middleditch,Sh...",2014,93,5.6,4881,,32.0,


In [66]:
def rating_function(x):
    if x >= 8.0:
        return "good"
    elif x >= 6.0:
        return 'decent'
    else:
        return "bad"
df['RatingCategory'] = df['Rating'].apply(rating_function)
df[['Rating', 'RatingCategory']]

Unnamed: 0_level_0,Rating,RatingCategory
Title,Unnamed: 1_level_1,Unnamed: 2_level_1
Guardians of the Galaxy,8.1,good
Prometheus,7.0,decent
Split,7.3,decent
Sing,7.2,decent
Suicide Squad,6.2,decent
...,...,...
Secret in Their Eyes,6.2,decent
Hostel: Part II,5.5,bad
Step Up 2: The Streets,6.2,decent
Search Party,5.6,bad


## 8. Sorting
* Sort by row labels

In [67]:
sortedDF = df.sort_index()
sortedDF

Unnamed: 0_level_0,Rank,Genre,Description,Director,Actors,Year,Runtime (Minutes),Rating,Votes,Revenue (Millions),Metascore,Revenue per second ($),RatingCategory
Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
(500) Days of Summer,508,"Comedy,Drama,Romance",An offbeat romantic comedy about a woman who d...,Marc Webb,"Zooey Deschanel, Joseph Gordon-Levitt, Geoffre...",2009,95,7.7,398972,32.39,86.0,5682.456140,decent
10 Cloverfield Lane,119,"Drama,Horror,Mystery","After getting in a car accident, a woman is he...",Dan Trachtenberg,"John Goodman, Mary Elizabeth Winstead, John Ga...",2016,104,7.2,192968,71.90,86.0,11522.435897,decent
10 Years,697,"Comedy,Drama,Romance","The night before their high school reunion, a ...",Jamie Linden,"Channing Tatum, Rosario Dawson, Chris Pratt, J...",2011,100,6.1,19636,0.20,,33.333333,decent
12 Years a Slave,112,"Biography,Drama,History","In the antebellum United States, Solomon North...",Steve McQueen,"Chiwetel Ejiofor, Michael Kenneth Williams, Mi...",2013,134,8.1,486338,56.67,106.0,7048.507463,good
127 Hours,818,"Adventure,Biography,Drama",An adventurous mountain climber becomes trappe...,Danny Boyle,"James Franco, Amber Tamblyn, Kate Mara, Sean Bott",2010,94,7.6,294010,18.33,92.0,3250.000000,decent
...,...,...,...,...,...,...,...,...,...,...,...,...,...
Zipper,545,"Drama,Thriller",A successful family man with a blossoming poli...,Mora Stephens,"Patrick Wilson, Lena Headey, Ray Winstone,Rich...",2015,103,5.7,4912,,49.0,,bad
Zodiac,278,"Crime,Drama,History","In the late 1960s/early 1970s, a San Francisco...",David Fincher,"Jake Gyllenhaal, Robert Downey Jr., Mark Ruffa...",2007,157,7.7,329683,33.05,88.0,3508.492569,decent
Zombieland,364,"Adventure,Comedy,Horror",A shy student trying to reach his family in Oh...,Ruben Fleischer,"Jesse Eisenberg, Emma Stone, Woody Harrelson,A...",2009,88,7.7,409403,75.59,83.0,14316.287879,decent
Zoolander 2,432,Comedy,Derek and Hansel are lured into modeling again...,Ben Stiller,"Ben Stiller, Owen Wilson, Penélope Cruz, Will ...",2016,102,4.7,48297,28.84,44.0,4712.418301,bad


* Sort the columns

In [68]:
sortedDF = df.sort_index(axis=1)
sortedDF

Unnamed: 0_level_0,Actors,Description,Director,Genre,Metascore,Rank,Rating,RatingCategory,Revenue (Millions),Revenue per second ($),Runtime (Minutes),Votes,Year
Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
Guardians of the Galaxy,"Chris Pratt, Vin Diesel, Bradley Cooper, Zoe S...",A group of intergalactic criminals are forced ...,James Gunn,"Action,Adventure,Sci-Fi",86.0,1,8.1,good,333.13,45885.674931,121,757074,2014
Prometheus,"Noomi Rapace, Logan Marshall-Green, Michael Fa...","Following clues to the origin of mankind, a te...",Ridley Scott,"Adventure,Mystery,Sci-Fi",75.0,2,7.0,decent,126.46,16997.311828,124,485820,2012
Split,"James McAvoy, Anya Taylor-Joy, Haley Lu Richar...",Three girls are kidnapped by a man with a diag...,M. Night Shyamalan,"Horror,Thriller",72.0,3,7.3,decent,138.12,19675.213675,117,157606,2016
Sing,"Matthew McConaughey,Reese Witherspoon, Seth Ma...","In a city of humanoid animals, a hustling thea...",Christophe Lourdelet,"Animation,Comedy,Family",69.0,4,7.2,decent,270.32,41716.049383,108,60545,2016
Suicide Squad,"Will Smith, Jared Leto, Margot Robbie, Viola D...",A secret government agency recruits some of th...,David Ayer,"Action,Adventure,Fantasy",50.0,5,6.2,decent,325.02,44040.650407,123,393727,2016
...,...,...,...,...,...,...,...,...,...,...,...,...,...
Secret in Their Eyes,"Chiwetel Ejiofor, Nicole Kidman, Julia Roberts...","A tight-knit team of rising investigators, alo...",Billy Ray,"Crime,Drama,Mystery",55.0,996,6.2,decent,,,111,27585,2015
Hostel: Part II,"Lauren German, Heather Matarazzo, Bijou Philli...",Three American college students studying abroa...,Eli Roth,Horror,56.0,997,5.5,bad,17.54,3109.929078,94,73152,2007
Step Up 2: The Streets,"Robert Hoffman, Briana Evigan, Cassie Ventura,...",Romantic sparks occur between two dance studen...,Jon M. Chu,"Drama,Music,Romance",60.0,998,6.2,decent,58.01,9865.646259,98,70699,2008
Search Party,"Adam Pally, T.J. Miller, Thomas Middleditch,Sh...",A pair of friends embark on a mission to reuni...,Scot Armstrong,"Adventure,Comedy",32.0,999,5.6,bad,,,93,4881,2014


* Sort by values

In [69]:
sortedDF = df.sort_values(by='Rating')
sortedDF

Unnamed: 0_level_0,Rank,Genre,Description,Director,Actors,Year,Runtime (Minutes),Rating,Votes,Revenue (Millions),Metascore,Revenue per second ($),RatingCategory
Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
Disaster Movie,830,Comedy,"Over the course of one evening, an unsuspectin...",Jason Friedberg,"Carmen Electra, Vanessa Lachey,Nicole Parker, ...",2008,87,1.9,77207,14.17,25.0,2714.559387,bad
Don't Fuck in the Woods,43,Horror,A group of friends are going on a camping trip...,Shawn Burkett,"Brittany Blanton, Ayse Howard, Roman Jossart,N...",2016,73,2.7,496,,,,bad
Dragonball Evolution,872,"Action,Adventure,Fantasy",The young warrior Son Goku sets out on a quest...,James Wong,"Justin Chatwin, James Marsters, Yun-Fat Chow, ...",2009,85,2.7,59512,9.35,55.0,1833.333333,bad
Tall Men,648,"Fantasy,Horror,Thriller",A challenged man is stalked by tall phantoms i...,Jonathan Holbrook,"Dan Crisafulli, Kay Whitney, Richard Garcia, P...",2016,133,3.2,173,,67.0,,bad
Wrecker,969,"Action,Horror,Thriller",Best friends Emily and Lesley go on a road tri...,Micheal Bafaro,"Anna Hutchison, Andrea Whitburn, Jennifer Koen...",2015,83,3.5,1210,,47.0,,bad
...,...,...,...,...,...,...,...,...,...,...,...,...,...
Interstellar,37,"Adventure,Drama,Sci-Fi",A team of explorers travel through a wormhole ...,Christopher Nolan,"Matthew McConaughey, Anne Hathaway, Jessica Ch...",2014,169,8.6,1047747,187.99,84.0,18539.447732,good
The Intouchables,250,"Biography,Comedy,Drama",After he becomes a quadriplegic from a paragli...,Olivier Nakache,"François Cluzet, Omar Sy, Anne Le Ny, Audrey F...",2011,112,8.6,557965,13.18,67.0,1961.309524,good
Dangal,118,"Action,Biography,Drama",Former wrestler Mahavir Singh Phogat and his t...,Nitesh Tiwari,"Aamir Khan, Sakshi Tanwar, Fatima Sana Shaikh,...",2016,161,8.8,48969,11.15,,1154.244306,good
Inception,81,"Action,Adventure,Sci-Fi","A thief, who steals corporate secrets through ...",Christopher Nolan,"Leonardo DiCaprio, Joseph Gordon-Levitt, Ellen...",2010,148,8.8,1583625,292.57,84.0,32947.072072,good
