# Everything is an Object in Python (Review)
![][1]

[1]: images/object_diag.png

--------

### All objects have a type
Every object in Python has a specific **type**. In the above diagram, we have three objects, each with their own type. The type of each object is incredibly important and clues us in to what type of attributes and methods it has.

* Everything is an object
* Every object is of a specific type
* Every object has attributes and methods

### Dot Notation: Retrieving an attribute and calling a method
Dot notation is how we retrieve the value of an attribute or call a method.

```
>>> type(mercedes)
Car

>>> mercedes.year
2011

>>> mercedes.drive_forward(miles=10)  # mercedes drives forward 10 miles
```

# Introduction to Pandas - Selecting Subsets of Data

### Objectives

* DataFrame is primary object
* Components of the DataFrame - Index, Columns, Data
* Data types
* Types of missing values

## Pandas analyzes two dimensional, tabular data
The primary purpose of pandas is to do some kind of data analysis on two dimensional, **tabular** data. Tabular simply means data that is in a table with rows and columns.

## One Main Data Structure - The DataFrame
Nearly all of your time in pandas will be spent doing operations with the **DataFrame**. There are a few hundred attributes and methods, but we will only focus on the most common and powerful. The 80/20 rule applies here. 80% of the power is available from 20% of the commands.

## Three Components of the DataFrame - Index, Columns, Data
![][1]

* **Index** - Labels for rows
* **Columns** - Label for columns
* **Data** - Each column of data is of one specific type

### More details with a real dataset
![][2]

[1]: images/dataframe_color.png
[2]: images/dataframe_anatomy.png

## The DataFrame is Complex
Although the DataFrame may appear fairly innocent, there are quite a few surprises waiting to destroy your day. Pandas has a reputation for being difficult to learn, so it will take time before you feel comfortable with it.

## Data Types
These are the primary data types. Each column must be a single data type
* **Boolean**
* **Integer**
* **Float**
* **Object** (mainly strings)
* **DateTime** (specific moment in time)
* **TimeDelta** (amount of time)

## Missing Values
Booleans and Integers do not allow missing values
* **NaN** is the float missing value (can also be for object)
* **None** is object missing value
* **NaT** Datetime/Timedelta missing value

## Reading in data into a DataFrame
Most data used for teaching beginning data exploration is in CSV format. Always use the **`read_csv`** function to load in the data.

### Other common data reading functions
* **`read_sql`** - Must connect to a database with **SQL Alchemy**
* **`read_json`** - For key-value pair data

# Import pandas and read data
Commands:
* **`df.head(n)`** - Select top n rows\
* **`df.shape`** - Number of rows and columns of DataFrame
* **`df.dtypes`** - An attribute to display the data types
* **`df.info()`** - Meta-data on each column

In [1]:
import pandas as pd

In [2]:
movie = pd.read_csv('data/movie.csv')
movie.head()

Unnamed: 0,color,director_name,num_critic_for_reviews,duration,director_facebook_likes,actor_3_facebook_likes,actor_2_name,actor_1_facebook_likes,gross,genres,...,num_user_for_reviews,language,country,content_rating,budget,title_year,actor_2_facebook_likes,imdb_score,aspect_ratio,movie_facebook_likes
0,Color,James Cameron,723.0,178.0,0.0,855.0,Joel David Moore,1000.0,760505847.0,Action|Adventure|Fantasy|Sci-Fi,...,3054.0,English,USA,PG-13,237000000.0,2009.0,936.0,7.9,1.78,33000
1,Color,Gore Verbinski,302.0,169.0,563.0,1000.0,Orlando Bloom,40000.0,309404152.0,Action|Adventure|Fantasy,...,1238.0,English,USA,PG-13,300000000.0,2007.0,5000.0,7.1,2.35,0
2,Color,Sam Mendes,602.0,148.0,0.0,161.0,Rory Kinnear,11000.0,200074175.0,Action|Adventure|Thriller,...,994.0,English,UK,PG-13,245000000.0,2015.0,393.0,6.8,2.35,85000
3,Color,Christopher Nolan,813.0,164.0,22000.0,23000.0,Christian Bale,27000.0,448130642.0,Action|Thriller,...,2701.0,English,USA,PG-13,250000000.0,2012.0,23000.0,8.5,2.35,164000
4,,Doug Walker,,,131.0,,Rob Walker,131.0,,Documentary,...,,,,,,,12.0,7.1,,0


In [3]:
movie.shape

(4916, 28)

In [4]:
movie.dtypes

color                         object
director_name                 object
num_critic_for_reviews       float64
duration                     float64
director_facebook_likes      float64
actor_3_facebook_likes       float64
actor_2_name                  object
actor_1_facebook_likes       float64
gross                        float64
genres                        object
actor_1_name                  object
movie_title                   object
num_voted_users                int64
cast_total_facebook_likes      int64
actor_3_name                  object
facenumber_in_poster         float64
plot_keywords                 object
movie_imdb_link               object
num_user_for_reviews         float64
language                      object
country                       object
content_rating                object
budget                       float64
title_year                   float64
actor_2_facebook_likes       float64
imdb_score                   float64
aspect_ratio                 float64
m

In [5]:
movie.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4916 entries, 0 to 4915
Data columns (total 28 columns):
color                        4897 non-null object
director_name                4814 non-null object
num_critic_for_reviews       4867 non-null float64
duration                     4901 non-null float64
director_facebook_likes      4814 non-null float64
actor_3_facebook_likes       4893 non-null float64
actor_2_name                 4903 non-null object
actor_1_facebook_likes       4909 non-null float64
gross                        4054 non-null float64
genres                       4916 non-null object
actor_1_name                 4909 non-null object
movie_title                  4916 non-null object
num_voted_users              4916 non-null int64
cast_total_facebook_likes    4916 non-null int64
actor_3_name                 4893 non-null object
facenumber_in_poster         4903 non-null float64
plot_keywords                4764 non-null object
movie_imdb_link              4916 non-

# Selecting Subsets of Data
The most fundamental action you can perform on a pandas DataFrame is to select particular columns, rows, or a combination of the two.

### Label vs Integer Location: The two (confusing) ways to select subsets
You can refer to particular rows or columns by either their **label** or their **integer location**.
* **Label** - This is the actual value you see labeling the row or column
* **Integer Location** - This is an integer of the position of the row or column

## Examples of Selections of Subsets of Data

### Selection of columns
![][1]

### Selection of rows
![][2]

### Selection of rows and columns
![][3]

[1]: images/just_cols.png
[2]: images/just_rows.png
[3]: images/rows_cols.png

### The three indexers - `[]`, `.loc`, `iloc`
An **indexer** is what makes a subset selection. A description followed by some example code:
* `[]`
    * Used primarily to select one or more columns by label. 
    * Cannot be used to select rows and columns simultaneously.
    * I call it "**just the indexing operator**"
> `df[col_selection]`
* `.loc`
    * Must use labels to refer to rows or columns. 
    * Can simultaneously select rows and columns
    * Place a comma between row and column selection
> `df.loc[row_selection, col_selection]`
* `.iloc`
    * Stands for integer location
    * Must only use integers to refer to rows or columns
    * Can simultaneously select rows and columns
    * Place a comma between row and column selection
> `df.iloc[row_selection, col_selection]`   

# Reread DataFrame with the movie title as the index
A good choice for an index are labels that uniquely identify each row

In [6]:
movie = pd.read_csv('data/movie.csv', index_col='movie_title')
movie.head()

Unnamed: 0_level_0,color,director_name,num_critic_for_reviews,duration,director_facebook_likes,actor_3_facebook_likes,actor_2_name,actor_1_facebook_likes,gross,genres,...,num_user_for_reviews,language,country,content_rating,budget,title_year,actor_2_facebook_likes,imdb_score,aspect_ratio,movie_facebook_likes
movie_title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Avatar,Color,James Cameron,723.0,178.0,0.0,855.0,Joel David Moore,1000.0,760505847.0,Action|Adventure|Fantasy|Sci-Fi,...,3054.0,English,USA,PG-13,237000000.0,2009.0,936.0,7.9,1.78,33000
Pirates of the Caribbean: At World's End,Color,Gore Verbinski,302.0,169.0,563.0,1000.0,Orlando Bloom,40000.0,309404152.0,Action|Adventure|Fantasy,...,1238.0,English,USA,PG-13,300000000.0,2007.0,5000.0,7.1,2.35,0
Spectre,Color,Sam Mendes,602.0,148.0,0.0,161.0,Rory Kinnear,11000.0,200074175.0,Action|Adventure|Thriller,...,994.0,English,UK,PG-13,245000000.0,2015.0,393.0,6.8,2.35,85000
The Dark Knight Rises,Color,Christopher Nolan,813.0,164.0,22000.0,23000.0,Christian Bale,27000.0,448130642.0,Action|Thriller,...,2701.0,English,USA,PG-13,250000000.0,2012.0,23000.0,8.5,2.35,164000
Star Wars: Episode VII - The Force Awakens,,Doug Walker,,,131.0,,Rob Walker,131.0,,Documentary,...,,,,,,,12.0,7.1,,0


# Select a Single Column of Data as a Series with just the indexing operator
Place the exact label for the column you want to select within *just the indexing operator*. Think of brackets `[ ]` as grabbing onto data.

In [7]:
movie['num_critic_for_reviews']

movie_title
Avatar                                         723.0
Pirates of the Caribbean: At World's End       302.0
Spectre                                        602.0
The Dark Knight Rises                          813.0
Star Wars: Episode VII - The Force Awakens       NaN
John Carter                                    462.0
Spider-Man 3                                   392.0
Tangled                                        324.0
Avengers: Age of Ultron                        635.0
Harry Potter and the Half-Blood Prince         375.0
Batman v Superman: Dawn of Justice             673.0
Superman Returns                               434.0
Quantum of Solace                              403.0
Pirates of the Caribbean: Dead Man's Chest     313.0
The Lone Ranger                                450.0
Man of Steel                                   733.0
The Chronicles of Narnia: Prince Caspian       258.0
The Avengers                                   703.0
Pirates of the Caribbean: On Stran

Chain the **`head`** method to reduce the output

In [8]:
movie['num_critic_for_reviews'].head()

movie_title
Avatar                                        723.0
Pirates of the Caribbean: At World's End      302.0
Spectre                                       602.0
The Dark Knight Rises                         813.0
Star Wars: Episode VII - The Force Awakens      NaN
Name: num_critic_for_reviews, dtype: float64

# Examining the Series
A Series is a single column  of data with no only labels for the rows. It is has very similar attributes and methods that a DataFrame does

In [None]:
num_reviews = movie['num_critic_for_reviews']
type(num_reviews)

# Select Multiple Columns with just the indexing operator
To select multiple columns, use a list of column names

In [9]:
movie.columns

Index(['color', 'director_name', 'num_critic_for_reviews', 'duration',
       'director_facebook_likes', 'actor_3_facebook_likes', 'actor_2_name',
       'actor_1_facebook_likes', 'gross', 'genres', 'actor_1_name',
       'num_voted_users', 'cast_total_facebook_likes', 'actor_3_name',
       'facenumber_in_poster', 'plot_keywords', 'movie_imdb_link',
       'num_user_for_reviews', 'language', 'country', 'content_rating',
       'budget', 'title_year', 'actor_2_facebook_likes', 'imdb_score',
       'aspect_ratio', 'movie_facebook_likes'],
      dtype='object')

In [10]:
movie[['budget', 'title_year']].head() # NOTICE the list inside the 

Unnamed: 0_level_0,budget,title_year
movie_title,Unnamed: 1_level_1,Unnamed: 2_level_1
Avatar,237000000.0,2009.0
Pirates of the Caribbean: At World's End,300000000.0,2007.0
Spectre,245000000.0,2015.0
The Dark Knight Rises,250000000.0,2012.0
Star Wars: Episode VII - The Force Awakens,,


# Boolean Selection
Also known as **Boolean Indexing** is a way to select particular **rows** of your DataFrame based on the actual values of the data.

A True/False (boolean) value for each row will be generated and this will determine whether the row remains in the dataset or not. It is semantically equivalent to a **SQL where clause**.

### Three steps to boolean selection
1. Select a single column as a Series
1. Create a boolean Series with one of the comparison operators (`<, >, <=, >=, ==, !=`)
1. Put the boolean Series inside of *just the indexing operator*

In [11]:
movie['title_year'] == 2012

movie_title
Avatar                                         False
Pirates of the Caribbean: At World's End       False
Spectre                                        False
The Dark Knight Rises                           True
Star Wars: Episode VII - The Force Awakens     False
John Carter                                     True
Spider-Man 3                                   False
Tangled                                        False
Avengers: Age of Ultron                        False
Harry Potter and the Half-Blood Prince         False
Batman v Superman: Dawn of Justice             False
Superman Returns                               False
Quantum of Solace                              False
Pirates of the Caribbean: Dead Man's Chest     False
The Lone Ranger                                False
Man of Steel                                   False
The Chronicles of Narnia: Prince Caspian       False
The Avengers                                    True
Pirates of the Caribbean: On Stran

In [12]:
# Save boolean Series to a variable
year_2012 = movie['title_year'] == 2012

# place inside
movie[year_2012].head()

Unnamed: 0_level_0,color,director_name,num_critic_for_reviews,duration,director_facebook_likes,actor_3_facebook_likes,actor_2_name,actor_1_facebook_likes,gross,genres,...,num_user_for_reviews,language,country,content_rating,budget,title_year,actor_2_facebook_likes,imdb_score,aspect_ratio,movie_facebook_likes
movie_title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
The Dark Knight Rises,Color,Christopher Nolan,813.0,164.0,22000.0,23000.0,Christian Bale,27000.0,448130642.0,Action|Thriller,...,2701.0,English,USA,PG-13,250000000.0,2012.0,23000.0,8.5,2.35,164000
John Carter,Color,Andrew Stanton,462.0,132.0,475.0,530.0,Samantha Morton,640.0,73058679.0,Action|Adventure|Sci-Fi,...,738.0,English,USA,PG-13,263700000.0,2012.0,632.0,6.6,2.35,24000
The Avengers,Color,Joss Whedon,703.0,173.0,0.0,19000.0,Robert Downey Jr.,26000.0,623279547.0,Action|Adventure|Sci-Fi,...,1722.0,English,USA,PG-13,220000000.0,2012.0,21000.0,8.1,1.85,123000
Men in Black 3,Color,Barry Sonnenfeld,451.0,106.0,188.0,718.0,Michael Stuhlbarg,10000.0,179020854.0,Action|Adventure|Comedy|Family|Fantasy|Sci-Fi,...,341.0,English,USA,PG-13,225000000.0,2012.0,816.0,6.8,1.85,40000
The Amazing Spider-Man,Color,Marc Webb,599.0,153.0,464.0,963.0,Andrew Garfield,15000.0,262030663.0,Action|Adventure|Fantasy,...,1225.0,English,USA,PG-13,230000000.0,2012.0,10000.0,7.0,2.35,56000


In [13]:
# one step
movie[movie['title_year'] == 2012].head()

Unnamed: 0_level_0,color,director_name,num_critic_for_reviews,duration,director_facebook_likes,actor_3_facebook_likes,actor_2_name,actor_1_facebook_likes,gross,genres,...,num_user_for_reviews,language,country,content_rating,budget,title_year,actor_2_facebook_likes,imdb_score,aspect_ratio,movie_facebook_likes
movie_title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
The Dark Knight Rises,Color,Christopher Nolan,813.0,164.0,22000.0,23000.0,Christian Bale,27000.0,448130642.0,Action|Thriller,...,2701.0,English,USA,PG-13,250000000.0,2012.0,23000.0,8.5,2.35,164000
John Carter,Color,Andrew Stanton,462.0,132.0,475.0,530.0,Samantha Morton,640.0,73058679.0,Action|Adventure|Sci-Fi,...,738.0,English,USA,PG-13,263700000.0,2012.0,632.0,6.6,2.35,24000
The Avengers,Color,Joss Whedon,703.0,173.0,0.0,19000.0,Robert Downey Jr.,26000.0,623279547.0,Action|Adventure|Sci-Fi,...,1722.0,English,USA,PG-13,220000000.0,2012.0,21000.0,8.1,1.85,123000
Men in Black 3,Color,Barry Sonnenfeld,451.0,106.0,188.0,718.0,Michael Stuhlbarg,10000.0,179020854.0,Action|Adventure|Comedy|Family|Fantasy|Sci-Fi,...,341.0,English,USA,PG-13,225000000.0,2012.0,816.0,6.8,1.85,40000
The Amazing Spider-Man,Color,Marc Webb,599.0,153.0,464.0,963.0,Andrew Garfield,15000.0,262030663.0,Action|Adventure|Fantasy,...,1225.0,English,USA,PG-13,230000000.0,2012.0,10000.0,7.0,2.35,56000


# Creating Complex criteria with and, or, not
The keywords `and`, `or`, `not` do NOT work with pandas. To combine boolean Series use the following notation:
* `&` instead of `and`
* `|` instead of `or`
* `~` instead of `not`

#### Let's find all movies made in 2012 with an imdb_score over 8

In [14]:
year_2012 = movie['title_year'] == 2012
imdb_8 = movie['imdb_score'] > 8

criteria = year_2012 & imdb_8
movie[criteria]

Unnamed: 0_level_0,color,director_name,num_critic_for_reviews,duration,director_facebook_likes,actor_3_facebook_likes,actor_2_name,actor_1_facebook_likes,gross,genres,...,num_user_for_reviews,language,country,content_rating,budget,title_year,actor_2_facebook_likes,imdb_score,aspect_ratio,movie_facebook_likes
movie_title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
The Dark Knight Rises,Color,Christopher Nolan,813.0,164.0,22000.0,23000.0,Christian Bale,27000.0,448130642.0,Action|Thriller,...,2701.0,English,USA,PG-13,250000000.0,2012.0,23000.0,8.5,2.35,164000
The Avengers,Color,Joss Whedon,703.0,173.0,0.0,19000.0,Robert Downey Jr.,26000.0,623279547.0,Action|Adventure|Sci-Fi,...,1722.0,English,USA,PG-13,220000000.0,2012.0,21000.0,8.1,1.85,123000
Django Unchained,Color,Quentin Tarantino,765.0,165.0,16000.0,265.0,Christoph Waltz,29000.0,162804648.0,Drama|Western,...,1193.0,English,USA,R,100000000.0,2012.0,11000.0,8.5,2.35,199000
The Hunt,Color,Thomas Vinterberg,349.0,115.0,346.0,26.0,Alexandra Rapaport,74.0,610968.0,Drama,...,249.0,Danish,Denmark,R,3800000.0,2012.0,69.0,8.3,2.35,60000
The Act of Killing,Color,Joshua Oppenheimer,248.0,96.0,50.0,0.0,Herman Koto,3.0,484221.0,Biography|Crime|Documentary|History,...,107.0,Indonesian,UK,Not Rated,1000000.0,2012.0,3.0,8.2,1.85,20000
The Other Dream Team,Color,Marius A. Markevicius,26.0,89.0,6.0,8.0,Greg Speirs,14.0,133778.0,Documentary|Sport,...,9.0,English,USA,Not Rated,500000.0,2012.0,9.0,8.4,,0
Archaeology of a Woman,Color,Sharon Greytak,3.0,94.0,0.0,178.0,Alex Emanuel,433.0,,Drama,...,3.0,English,USA,,200000.0,2012.0,375.0,8.1,1.78,66


### Use parentheses if doing it in one line
Surround each expression with parentheses

In [15]:
criteria = (movie['title_year'] == 2012) & (movie['imdb_score'] > 8)
movie[criteria]

Unnamed: 0_level_0,color,director_name,num_critic_for_reviews,duration,director_facebook_likes,actor_3_facebook_likes,actor_2_name,actor_1_facebook_likes,gross,genres,...,num_user_for_reviews,language,country,content_rating,budget,title_year,actor_2_facebook_likes,imdb_score,aspect_ratio,movie_facebook_likes
movie_title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
The Dark Knight Rises,Color,Christopher Nolan,813.0,164.0,22000.0,23000.0,Christian Bale,27000.0,448130642.0,Action|Thriller,...,2701.0,English,USA,PG-13,250000000.0,2012.0,23000.0,8.5,2.35,164000
The Avengers,Color,Joss Whedon,703.0,173.0,0.0,19000.0,Robert Downey Jr.,26000.0,623279547.0,Action|Adventure|Sci-Fi,...,1722.0,English,USA,PG-13,220000000.0,2012.0,21000.0,8.1,1.85,123000
Django Unchained,Color,Quentin Tarantino,765.0,165.0,16000.0,265.0,Christoph Waltz,29000.0,162804648.0,Drama|Western,...,1193.0,English,USA,R,100000000.0,2012.0,11000.0,8.5,2.35,199000
The Hunt,Color,Thomas Vinterberg,349.0,115.0,346.0,26.0,Alexandra Rapaport,74.0,610968.0,Drama,...,249.0,Danish,Denmark,R,3800000.0,2012.0,69.0,8.3,2.35,60000
The Act of Killing,Color,Joshua Oppenheimer,248.0,96.0,50.0,0.0,Herman Koto,3.0,484221.0,Biography|Crime|Documentary|History,...,107.0,Indonesian,UK,Not Rated,1000000.0,2012.0,3.0,8.2,1.85,20000
The Other Dream Team,Color,Marius A. Markevicius,26.0,89.0,6.0,8.0,Greg Speirs,14.0,133778.0,Documentary|Sport,...,9.0,English,USA,Not Rated,500000.0,2012.0,9.0,8.4,,0
Archaeology of a Woman,Color,Sharon Greytak,3.0,94.0,0.0,178.0,Alex Emanuel,433.0,,Drama,...,3.0,English,USA,,200000.0,2012.0,375.0,8.1,1.78,66


In [None]:
criteria = (movie['title_year'] == 2012) & (movie['imdb_score'] > 8)
cols = ['color', 'duration']
movie.loc[criteria, cols]

# Your Turn
Use the movie dataset for the following problems

### Problem 1
<span  style="color:green; font-size:16px">Select the column **`movie_facebook_likes`** as a Series, save it to a variable with the same name, and output its first 10 values.</span>

In [17]:
# your code here
movie_facebook_likes=movie['movie_facebook_likes']

movie_facebook_likes.head(10)

movie_title
Avatar                                         33000
Pirates of the Caribbean: At World's End           0
Spectre                                        85000
The Dark Knight Rises                         164000
Star Wars: Episode VII - The Force Awakens         0
John Carter                                    24000
Spider-Man 3                                       0
Tangled                                        29000
Avengers: Age of Ultron                       118000
Harry Potter and the Half-Blood Prince         10000
Name: movie_facebook_likes, dtype: int64

### Problem 2
<span  style="color:green; font-size:16px">Select three columns and then select those same three in a different order</span>

In [25]:
# your code here

movie.columns

movie[['color','director_name','num_critic_for_reviews']]


movie[['director_name','num_critic_for_reviews','color']]

Unnamed: 0_level_0,director_name,num_critic_for_reviews,color
movie_title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Avatar,James Cameron,723.0,Color
Pirates of the Caribbean: At World's End,Gore Verbinski,302.0,Color
Spectre,Sam Mendes,602.0,Color
The Dark Knight Rises,Christopher Nolan,813.0,Color
Star Wars: Episode VII - The Force Awakens,Doug Walker,,
John Carter,Andrew Stanton,462.0,Color
Spider-Man 3,Sam Raimi,392.0,Color
Tangled,Nathan Greno,324.0,Color
Avengers: Age of Ultron,Joss Whedon,635.0,Color
Harry Potter and the Half-Blood Prince,David Yates,375.0,Color


### Problem 3
<span  style="color:green; font-size:16px">Use boolean indexing to select all movies with 0 movie facebook likes.</span>

In [26]:
# your code here

# Save boolean Series to a variable
movie_facebook_likes = movie['movie_facebook_likes'] == 0

# place inside
movie[movie_facebook_likes].head()


Unnamed: 0_level_0,color,director_name,num_critic_for_reviews,duration,director_facebook_likes,actor_3_facebook_likes,actor_2_name,actor_1_facebook_likes,gross,genres,...,num_user_for_reviews,language,country,content_rating,budget,title_year,actor_2_facebook_likes,imdb_score,aspect_ratio,movie_facebook_likes
movie_title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Pirates of the Caribbean: At World's End,Color,Gore Verbinski,302.0,169.0,563.0,1000.0,Orlando Bloom,40000.0,309404152.0,Action|Adventure|Fantasy,...,1238.0,English,USA,PG-13,300000000.0,2007.0,5000.0,7.1,2.35,0
Star Wars: Episode VII - The Force Awakens,,Doug Walker,,,131.0,,Rob Walker,131.0,,Documentary,...,,,,,,,12.0,7.1,,0
Spider-Man 3,Color,Sam Raimi,392.0,156.0,0.0,4000.0,James Franco,24000.0,336530303.0,Action|Adventure|Romance,...,1902.0,English,USA,PG-13,258000000.0,2007.0,11000.0,6.2,2.35,0
Superman Returns,Color,Bryan Singer,434.0,169.0,0.0,903.0,Marlon Brando,18000.0,200069408.0,Action|Adventure|Sci-Fi,...,2367.0,English,USA,PG-13,209000000.0,2006.0,10000.0,6.1,2.35,0
Quantum of Solace,Color,Marc Forster,403.0,106.0,395.0,393.0,Mathieu Amalric,451.0,168368427.0,Action|Adventure,...,1243.0,English,UK,PG-13,200000000.0,2008.0,412.0,6.7,2.35,0


### Problem 4
<span  style="color:green; font-size:16px">Use boolean indexing to select all movies that don't have 0 movie facebook likes.</span>

In [27]:
# your code here
# Save boolean Series to a variable
movie_facebook_likes = movie['movie_facebook_likes'] != 0

# place inside
movie[movie_facebook_likes].head()

Unnamed: 0_level_0,color,director_name,num_critic_for_reviews,duration,director_facebook_likes,actor_3_facebook_likes,actor_2_name,actor_1_facebook_likes,gross,genres,...,num_user_for_reviews,language,country,content_rating,budget,title_year,actor_2_facebook_likes,imdb_score,aspect_ratio,movie_facebook_likes
movie_title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Avatar,Color,James Cameron,723.0,178.0,0.0,855.0,Joel David Moore,1000.0,760505847.0,Action|Adventure|Fantasy|Sci-Fi,...,3054.0,English,USA,PG-13,237000000.0,2009.0,936.0,7.9,1.78,33000
Spectre,Color,Sam Mendes,602.0,148.0,0.0,161.0,Rory Kinnear,11000.0,200074175.0,Action|Adventure|Thriller,...,994.0,English,UK,PG-13,245000000.0,2015.0,393.0,6.8,2.35,85000
The Dark Knight Rises,Color,Christopher Nolan,813.0,164.0,22000.0,23000.0,Christian Bale,27000.0,448130642.0,Action|Thriller,...,2701.0,English,USA,PG-13,250000000.0,2012.0,23000.0,8.5,2.35,164000
John Carter,Color,Andrew Stanton,462.0,132.0,475.0,530.0,Samantha Morton,640.0,73058679.0,Action|Adventure|Sci-Fi,...,738.0,English,USA,PG-13,263700000.0,2012.0,632.0,6.6,2.35,24000
Tangled,Color,Nathan Greno,324.0,100.0,15.0,284.0,Donna Murphy,799.0,200807262.0,Adventure|Animation|Comedy|Family|Fantasy|Musi...,...,387.0,English,USA,PG,260000000.0,2010.0,553.0,7.8,1.85,29000


### Problem 5
<span  style="color:green; font-size:16px">Use boolean indexing to select all movies with more than 50,000 likes but less than 100,000</span>

In [None]:
# your code here

# Save boolean Series to a variable
movie_facebook_likes = movie['movie_facebook_likes'] == 0

# place inside
movie[movie_facebook_likes].head()

### Problem 6
<span  style="color:green; font-size:16px">Use boolean indexing to select movies with facebook likes less than 1000 or greater than 100,000.</span>

In [30]:
# your code here

criteria = (movie['movie_facebook_likes'] < 1000) | (movie['movie_facebook_likes'] > 100000)
movie[criteria]

Unnamed: 0_level_0,color,director_name,num_critic_for_reviews,duration,director_facebook_likes,actor_3_facebook_likes,actor_2_name,actor_1_facebook_likes,gross,genres,...,num_user_for_reviews,language,country,content_rating,budget,title_year,actor_2_facebook_likes,imdb_score,aspect_ratio,movie_facebook_likes
movie_title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Pirates of the Caribbean: At World's End,Color,Gore Verbinski,302.0,169.0,563.0,1000.0,Orlando Bloom,40000.0,309404152.0,Action|Adventure|Fantasy,...,1238.0,English,USA,PG-13,300000000.0,2007.0,5000.0,7.1,2.35,0
The Dark Knight Rises,Color,Christopher Nolan,813.0,164.0,22000.0,23000.0,Christian Bale,27000.0,448130642.0,Action|Thriller,...,2701.0,English,USA,PG-13,250000000.0,2012.0,23000.0,8.5,2.35,164000
Star Wars: Episode VII - The Force Awakens,,Doug Walker,,,131.0,,Rob Walker,131.0,,Documentary,...,,,,,,,12.0,7.1,,0
Spider-Man 3,Color,Sam Raimi,392.0,156.0,0.0,4000.0,James Franco,24000.0,336530303.0,Action|Adventure|Romance,...,1902.0,English,USA,PG-13,258000000.0,2007.0,11000.0,6.2,2.35,0
Avengers: Age of Ultron,Color,Joss Whedon,635.0,141.0,0.0,19000.0,Robert Downey Jr.,26000.0,458991599.0,Action|Adventure|Sci-Fi,...,1117.0,English,USA,PG-13,250000000.0,2015.0,21000.0,7.5,2.35,118000
Batman v Superman: Dawn of Justice,Color,Zack Snyder,673.0,183.0,0.0,2000.0,Lauren Cohan,15000.0,330249062.0,Action|Adventure|Sci-Fi,...,3018.0,English,USA,PG-13,250000000.0,2016.0,4000.0,6.9,2.35,197000
Superman Returns,Color,Bryan Singer,434.0,169.0,0.0,903.0,Marlon Brando,18000.0,200069408.0,Action|Adventure|Sci-Fi,...,2367.0,English,USA,PG-13,209000000.0,2006.0,10000.0,6.1,2.35,0
Quantum of Solace,Color,Marc Forster,403.0,106.0,395.0,393.0,Mathieu Amalric,451.0,168368427.0,Action|Adventure,...,1243.0,English,UK,PG-13,200000000.0,2008.0,412.0,6.7,2.35,0
Man of Steel,Color,Zack Snyder,733.0,143.0,0.0,748.0,Christopher Meloni,15000.0,291021565.0,Action|Adventure|Fantasy|Sci-Fi,...,2536.0,English,USA,PG-13,225000000.0,2013.0,3000.0,7.2,2.35,118000
The Chronicles of Narnia: Prince Caspian,Color,Andrew Adamson,258.0,150.0,80.0,201.0,Pierfrancesco Favino,22000.0,141614023.0,Action|Adventure|Family|Fantasy,...,438.0,English,USA,PG,225000000.0,2008.0,216.0,6.6,2.35,0


### Problem 7
<span  style="color:green; font-size:16px">Use boolean indexing to select movies with facebook likes less than 1000 but greater than 0 or greater than 100,000.</span>

In [32]:
# your code here
criteria = (movie['movie_facebook_likes'] < 1000) & ((movie['movie_facebook_likes'] > 0) | (movie['movie_facebook_likes'] > 100000))
movie[criteria]

Unnamed: 0_level_0,color,director_name,num_critic_for_reviews,duration,director_facebook_likes,actor_3_facebook_likes,actor_2_name,actor_1_facebook_likes,gross,genres,...,num_user_for_reviews,language,country,content_rating,budget,title_year,actor_2_facebook_likes,imdb_score,aspect_ratio,movie_facebook_likes
movie_title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
The Lovers,Color,Roland Joffé,10.0,109.0,596.0,283.0,Alice Englert,622.0,,Action|Adventure|Romance|Sci-Fi,...,15.0,English,Belgium,R,,2015.0,525.0,4.5,,677
Harry Potter and the Deathly Hallows: Part II,Color,Matt Birch,1.0,,0.0,159.0,Dave Legeno,10000.0,,Action|Fantasy,...,2.0,English,UK,,,2011.0,570.0,7.5,,40
Harry Potter and the Deathly Hallows: Part I,Color,Matt Birch,4.0,,0.0,1000.0,Toby Jones,10000.0,,Fantasy,...,2.0,English,UK,,,2010.0,2000.0,6.4,,25
Stuart Little 2,Color,Rob Minkoff,71.0,77.0,50.0,537.0,Brad Garrett,886.0,64736114.0,Adventure|Animation|Comedy|Family|Fantasy,...,69.0,English,USA,PG,120000000.0,2002.0,799.0,5.4,1.85,459
Asterix at the Olympic Games,Color,Frédéric Forestier,33.0,116.0,0.0,141.0,Santiago Segura,936.0,,Adventure|Comedy|Family|Fantasy,...,36.0,French,France,,78000000.0,2008.0,276.0,5.1,2.35,291
Home on the Range,Color,Will Finn,104.0,76.0,6.0,421.0,Roseanne Barr,12000.0,50026353.0,Animation|Comedy|Family|Music|Western,...,88.0,English,USA,PG,110000000.0,2004.0,513.0,5.4,1.78,304
Speed 2: Cruise Control,Color,Jan de Bont,79.0,121.0,101.0,202.0,Temuera Morrison,673.0,48068396.0,Action|Crime|Romance|Thriller,...,248.0,English,USA,PG-13,160000000.0,1997.0,368.0,3.7,2.35,894
The Cat in the Hat,Color,Bo Welch,109.0,82.0,34.0,434.0,Kelly Preston,760.0,100446895.0,Adventure|Comedy|Family|Fantasy,...,456.0,English,USA,PG,109000000.0,2003.0,743.0,3.8,1.85,946
Town & Country,Color,Peter Chelsom,62.0,104.0,23.0,591.0,Warren Beatty,752.0,6712451.0,Comedy|Romance,...,89.0,English,New Line,R,90000000.0,2001.0,631.0,4.4,1.85,53
Son of the Mask,Color,Lawrence Guterman,78.0,94.0,6.0,227.0,Traylor Howard,490.0,17010646.0,Comedy|Family|Fantasy,...,239.0,English,USA,PG,84000000.0,2005.0,294.0,2.2,1.85,881


### Problem 8
<span  style="color:green; font-size:16px">How many movies have more than 100,000 facebook likes?</span>

In [40]:
# your code here
criteria = movie['movie_facebook_likes'] > 100000
movie[criteria].shape

(43, 27)

# Solutions

In [36]:
movie = pd.read_csv('data/movie.csv', index_col='movie_title')

### Problem 1
<span  style="color:green; font-size:16px">Select the column **`movie_facebook_likes`** as a Series, save it to a variable with the same name and output its first 10 values.</span>

In [37]:
movie_facebook_likes = movie['movie_facebook_likes']
movie_facebook_likes.head(10)

movie_title
Avatar                                         33000
Pirates of the Caribbean: At World's End           0
Spectre                                        85000
The Dark Knight Rises                         164000
Star Wars: Episode VII - The Force Awakens         0
John Carter                                    24000
Spider-Man 3                                       0
Tangled                                        29000
Avengers: Age of Ultron                       118000
Harry Potter and the Half-Blood Prince         10000
Name: movie_facebook_likes, dtype: int64

### Problem 2
<span  style="color:green; font-size:16px">Select three columns and then select those same three in a different order</span>

In [41]:
movie.columns

Index(['color', 'director_name', 'num_critic_for_reviews', 'duration',
       'director_facebook_likes', 'actor_3_facebook_likes', 'actor_2_name',
       'actor_1_facebook_likes', 'gross', 'genres', 'actor_1_name',
       'num_voted_users', 'cast_total_facebook_likes', 'actor_3_name',
       'facenumber_in_poster', 'plot_keywords', 'movie_imdb_link',
       'num_user_for_reviews', 'language', 'country', 'content_rating',
       'budget', 'title_year', 'actor_2_facebook_likes', 'imdb_score',
       'aspect_ratio', 'movie_facebook_likes'],
      dtype='object')

In [43]:
cols = ['duration', 'director_name', 'num_critic_for_reviews']
movie[cols].head()

Unnamed: 0_level_0,duration,director_name,num_critic_for_reviews
movie_title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Avatar,178.0,James Cameron,723.0
Pirates of the Caribbean: At World's End,169.0,Gore Verbinski,302.0
Spectre,148.0,Sam Mendes,602.0
The Dark Knight Rises,164.0,Christopher Nolan,813.0
Star Wars: Episode VII - The Force Awakens,,Doug Walker,


In [44]:
cols = cols[::-1]
movie[cols].head()

Unnamed: 0_level_0,num_critic_for_reviews,director_name,duration
movie_title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Avatar,723.0,James Cameron,178.0
Pirates of the Caribbean: At World's End,302.0,Gore Verbinski,169.0
Spectre,602.0,Sam Mendes,148.0
The Dark Knight Rises,813.0,Christopher Nolan,164.0
Star Wars: Episode VII - The Force Awakens,,Doug Walker,


### Problem 3
<span  style="color:green; font-size:16px">Use boolean indexing to select all movies with 0 movie facebook likes.</span>

In [45]:
movie_facebook_likes[movie_facebook_likes == 0].head()

movie_title
Pirates of the Caribbean: At World's End      0
Star Wars: Episode VII - The Force Awakens    0
Spider-Man 3                                  0
Superman Returns                              0
Quantum of Solace                             0
Name: movie_facebook_likes, dtype: int64

### Problem 4
<span  style="color:green; font-size:16px">Use boolean indexing to select all movies that don't have 0 movie facebook likes.</span>

In [46]:
movie_facebook_likes[movie_facebook_likes != 0].head()

movie_title
Avatar                    33000
Spectre                   85000
The Dark Knight Rises    164000
John Carter               24000
Tangled                   29000
Name: movie_facebook_likes, dtype: int64

### Problem 5
<span  style="color:green; font-size:16px">Use boolean indexing to select all movies with more than 50,000 likes but less than 100,000</span>

In [47]:
criteria = (movie_facebook_likes > 50000) & (movie_facebook_likes < 100000)
movie_facebook_likes[criteria].head()

movie_title
Spectre                                        85000
Pirates of the Caribbean: On Stranger Tides    58000
The Hobbit: The Battle of the Five Armies      65000
The Amazing Spider-Man                         56000
The Hobbit: The Desolation of Smaug            83000
Name: movie_facebook_likes, dtype: int64

### Problem 6
<span  style="color:green; font-size:16px">Use boolean indexing to select movies with facebook likes less than 10000 or greater than 100,000.</span>

In [48]:
criteria = (movie_facebook_likes < 10000) | (movie_facebook_likes > 100000)
movie_facebook_likes[criteria].head(10)

movie_title
Pirates of the Caribbean: At World's End           0
The Dark Knight Rises                         164000
Star Wars: Episode VII - The Force Awakens         0
Spider-Man 3                                       0
Avengers: Age of Ultron                       118000
Batman v Superman: Dawn of Justice            197000
Superman Returns                                   0
Quantum of Solace                                  0
Pirates of the Caribbean: Dead Man's Chest      5000
Man of Steel                                  118000
Name: movie_facebook_likes, dtype: int64

### Problem 7
<span  style="color:green; font-size:16px">Use boolean indexing to select movies with facebook likes less than 1000 but greater than 0 or greater than 100,000.</span>

In [49]:
criteria = ((movie_facebook_likes > 0) & (movie_facebook_likes < 10000)) | (movie_facebook_likes > 100000)
movie_facebook_likes[criteria].head(10)

movie_title
The Dark Knight Rises                                 164000
Avengers: Age of Ultron                               118000
Batman v Superman: Dawn of Justice                    197000
Pirates of the Caribbean: Dead Man's Chest              5000
Man of Steel                                          118000
The Avengers                                          123000
Jurassic World                                        150000
World War Z                                           129000
The Great Gatsby                                      115000
Indiana Jones and the Kingdom of the Crystal Skull      5000
Name: movie_facebook_likes, dtype: int64

### Problem 8
<span  style="color:green; font-size:16px">How many movies have more than 100,000 facebook likes?</span>

In [50]:
len(movie_facebook_likes[movie_facebook_likes > 100000])

43

In [51]:
# or this
movie_facebook_likes[movie_facebook_likes > 100000].shape

(43,)