# Python Pandas Tutorial

The *pandas* package is the most important tool at the disposal of Data Scientists and Analysts working in Python today. The powerful machine learning and glamorous visualization tools may get all the attention, but pandas is the backbone of most data projects.

>\[*pandas*\] is derived from the term "**pan**el **da**ta", an econometrics term for data sets that include observations over multiple time periods for the same individuals. — [Wikipedia](https://en.wikipedia.org/wiki/Pandas_%28software%29)

If you're thinking about data science as a career, then it is imperative that one of the first things you do is learn pandas. We will go over the essential bits of information about pandas, including how to install it, its uses, and how it works with other common Python data analysis packages such as **matplotlib** and **sci-kit learn**.

<img src="assets/the-rise-in-popularity-of-pandas.png" width=500px />

## What's Pandas for?

This tool is essentially your data’s home. Through pandas, you get acquainted with your data by cleaning, transforming, and analyzing it.

For example, say you want to explore a dataset stored in a CSV on your computer. Pandas will extract the data from that CSV into a DataFrame — a table, basically — then let you do things like:

- Calculate statistics and answer questions about the data, like


    - What's the average, median, max, or min of each column?
    - Does column A correlate with column B?
    - What does the distribution of data in column C look like?


- Clean the data by doing things like removing missing values and filtering rows or columns by some criteria


- Visualize the data with help from Matplotlib. Plot bars, lines, histograms, bubbles, and more.


- Store the cleaned, transformed data back into a CSV, other file or database


Before you jump into the modeling or the complex visualizations you need to have a good understanding of the nature of your dataset and pandas is the best avenue through which to do that.



## How does pandas fit into the data science toolkit?

Not only is the pandas library a central component of the data science toolkit but it is used in conjunction with other libraries in that collection.

Pandas is built on top of the **NumPy** package, meaning a lot of the structure of NumPy is used or replicated in Pandas. Data in pandas is often used to feed statistical analysis in **SciPy**, plotting functions from **Matplotlib**, and machine learning algorithms in **Scikit-learn**.

Jupyter Notebooks offer a good environment for using pandas to do data exploration and modeling, but pandas can also be used in text editors just as easily.

Jupyter Notebooks give us the ability to execute code in a particular cell as opposed to running the entire file. This saves a lot of time when working with large datasets and complex transformations. Notebooks also provide an easy way to visualize pandas’ DataFrames and plots. As a matter of fact, this article was created entirely in a Jupyter Notebook.

## When should you start using pandas?

If you do not have any experience coding in Python, then you should stay away from learning pandas until you do. You don’t have to be at the level of the software engineer, but you should be adept at the basics, such as lists, tuples, dictionaries, functions, and iterations. Also, I’d also recommend familiarizing yourself with **NumPy** due to the similarities mentioned above.

If you're looking for a good place to learn Python, [Python for Everybody](https://www.learndatasci.com/out/coursera-programming-everybody-getting-started-python/) on Coursera is great (and Free).

Moreover, for those of you looking to do a [data science bootcamp](https://www.learndatasci.com/articles/thinkful-data-science-online-bootcamp-review/) or some other accelerated data science education program, it's highly recommended you start learning pandas on your own before you start the program.

Even though accelerated programs teach you pandas, better skills beforehand means you'll be able to maximize time for learning and mastering the more complicated material.

## Core components of pandas: Series and DataFrames

The primary two components of pandas are the `Series` and `DataFrame`.

A `Series` is essentially a column, and a `DataFrame` is a multi-dimensional table made up of a collection of Series.

DataFrames and Series are quite similar in that many operations that you can do with one you can do with the other, such as filling in null values and calculating the mean.

You'll see how these components work when we start working with data below.

## **Pandas First Steps**

### Install and import
Pandas is an easy package to install. Open up your terminal program (for Mac users) or command line (for PC users) and install it using either of the following commands:

`conda install pandas`

OR

`pip install pandas`

Alternatively, if you're currently viewing this in a Jupyter notebook you can run this cell:

In [2]:
#!pip install pandas
#from google.colab import files
#data_to_load = files.upload()

The `!` at the beginning runs cells as if they were in a terminal.

To import pandas we usually import it with a shorter name since it's used so much:

In [3]:
import pandas as pd

### How to read in data

It’s quite simple to load data from various file formats into a DataFrame. In the following examples we'll keep using our apples and oranges data, but this time it's coming from various files.

### Reading data from CSVs

With CSV files all you need is a single line to load in the data:

In [4]:
netflix_titles = pd.read_csv('netflix_titles.csv')
netflix_views = pd.read_csv('netflix_views.csv', encoding="latin-1")

The `.iloc[]` method in pandas is used for selecting data by index position. It allows you to access rows and columns using integer-based indexing.

**Select a single row:** df.iloc[2] (returns the third row) <br>
**Select multiple rows:** df.iloc[1:4] (returns rows 2 to 4) <br>
**Select a specific value:** df.iloc[2, 3] (row index 2, column index 3) <br>
**Select a column:** df.iloc[:, 1] (all rows, second column) <br>
**Select multiple rows and columns:** df.iloc[1:4, 0:2] (rows 2-4, columns 1-2)

In [5]:
netflix_titles.iloc[:5]

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
0,s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,"September 25, 2021",2020,PG-13,90 min,Documentaries,"As her father nears the end of his life, filmm..."
1,s2,TV Show,Blood & Water,,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban...",South Africa,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town t..."
2,s3,TV Show,Ganglands,Julien Leclercq,"Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi...",,"September 24, 2021",2021,TV-MA,1 Season,"Crime TV Shows, International TV Shows, TV Act...",To protect his family from a powerful drug lor...
3,s4,TV Show,Jailbirds New Orleans,,,,"September 24, 2021",2021,TV-MA,1 Season,"Docuseries, Reality TV","Feuds, flirtations and toilet talk go down amo..."
4,s5,TV Show,Kota Factory,,"Mayur More, Jitendra Kumar, Ranjan Raj, Alam K...",India,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, Romantic TV Shows, TV ...",In a city of coaching centers known to train I...


In [6]:
netflix_views.iloc[:5]

Unnamed: 0,Title,Available Globally?,Release Date,Hours Viewed
0,The Night Agent: Season 1,Yes,3/23/2023,812100000
1,Ginny & Georgia: Season 2,Yes,1/5/2023,665100000
2,The Glory: Season 1 // ? ???: ?? 1,Yes,12/30/2022,622800000
3,Wednesday: Season 1,Yes,11/23/2022,507700000
4,Queen Charlotte: A Bridgerton Story,Yes,5/4/2023,503000000


The `.loc[]` method in pandas is used for selecting data by label rather than index position. It allows you to access rows and columns using their names.

**Select a single row:** df.loc["row_label"] (returns the row with the given label) <br>
**Select multiple rows:** df.loc[["row1", "row2"]] (returns specified rows) <br>
**Select a specific value:** df.loc["row_label", "column_name"] (returns the value at the intersection) <br>
**Select a column:** df.loc[:, "column_name"] (all rows, specific column) <br>
**Select multiple rows and columns:** df.loc["row1":"row3", ["col1", "col2"]] (returns a subset of rows and columns) <br>

In [7]:
netflix_titles.loc[0:4]

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
0,s1,Movie,Dick Johnson Is Dead,Kirsten Johnson,,United States,"September 25, 2021",2020,PG-13,90 min,Documentaries,"As her father nears the end of his life, filmm..."
1,s2,TV Show,Blood & Water,,"Ama Qamata, Khosi Ngema, Gail Mabalane, Thaban...",South Africa,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, TV Dramas, TV Mysteries","After crossing paths at a party, a Cape Town t..."
2,s3,TV Show,Ganglands,Julien Leclercq,"Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabi...",,"September 24, 2021",2021,TV-MA,1 Season,"Crime TV Shows, International TV Shows, TV Act...",To protect his family from a powerful drug lor...
3,s4,TV Show,Jailbirds New Orleans,,,,"September 24, 2021",2021,TV-MA,1 Season,"Docuseries, Reality TV","Feuds, flirtations and toilet talk go down amo..."
4,s5,TV Show,Kota Factory,,"Mayur More, Jitendra Kumar, Ranjan Raj, Alam K...",India,"September 24, 2021",2021,TV-MA,2 Seasons,"International TV Shows, Romantic TV Shows, TV ...",In a city of coaching centers known to train I...


### **Practice Question:**

1. What actors play in the 3097th (aka index 3096) movie/TV show, and what show is it?

In [8]:
# Answer here:


## **Merging**

#### **Understanding Joins in `pd.merge()`**

When merging DataFrames in pandas, `pd.merge()` allows you to combine data based on a common column. The type of join determines which rows are included in the final result. By default, `pd.merge()` performs an inner join.

**Inner Join (Default)** <br>
An inner join keeps only the rows where there is a match in both DataFrames. Rows without a match are excluded. This is the default behavior of `pd.merge()`.

**Left Join** <br>
A left join keeps all rows from the left DataFrame and only the matching rows from the right DataFrame. If there is no match, missing values (NaN) are introduced in the right-side columns.

**Right Join** <br>
A right join keeps all rows from the right DataFrame and only the matching rows from the left DataFrame. If there is no match, missing values (NaN) appear in the left-side columns.

**Outer Join (Full Join)** <br>
An outer join keeps all rows from both DataFrames. If a row does not have a match in one of the DataFrames, missing values (NaN) are assigned in the unmatched columns.

In order to specify which type of join you're using, use the `how =` argument!

For example, if you wanted to perform a right join, you would type `how = 'right'`. Easy as that!

In [9]:
netflix = pd.merge(netflix_titles, netflix_views, left_on = "title", right_on = "Title")
netflix.iloc[:5]

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description,Title,Available Globally?,Release Date,Hours Viewed
0,s7,Movie,My Little Pony: A New Generation,"Robert Cullen, José Luis Ucha","Vanessa Hudgens, Kimiko Glenn, James Marsden, ...",,"September 24, 2021",2021,PG,91 min,Children & Family Movies,Equestria's divided. But a bright-eyed hero be...,My Little Pony: A New Generation,Yes,9/24/2021,15400000
1,s8,Movie,Sankofa,Haile Gerima,"Kofi Ghanaba, Oyafunmike Ogunlano, Alexandra D...","United States, Ghana, Burkina Faso, United Kin...","September 24, 2021",1993,TV-MA,125 min,"Dramas, Independent Movies, International Movies","On a photo shoot in Ghana, an American model s...",Sankofa,No,,100000
2,s10,Movie,The Starling,Theodore Melfi,"Melissa McCarthy, Chris O'Dowd, Kevin Kline, T...",United States,"September 24, 2021",2021,PG-13,104 min,"Comedies, Dramas",A woman adjusting to life after a loss contend...,The Starling,Yes,9/24/2021,8200000
3,s13,Movie,Je Suis Karl,Christian Schwochow,"Luna Wedler, Jannis Niewöhner, Milan Peschel, ...","Germany, Czech Republic","September 23, 2021",2021,TV-MA,127 min,"Dramas, International Movies",After most of her family is murdered in a terr...,Je Suis Karl,Yes,9/23/2021,300000
4,s19,Movie,Intrusion,Adam Salky,"Freida Pinto, Logan Marshall-Green, Robert Joh...",,"September 22, 2021",2021,TV-14,94 min,Thrillers,After a deadly home invasion at a couple’s new...,Intrusion,Yes,9/22/2021,11700000


But wait!! There's a problem here...

As mentioned before, some of our data has foreign titles included. This means that when the two data sets were merged, the titles didn't match exactly even if they were for the same show. This means some of our data was excluded! For example take the third line of our data here, as shown below...

It has a bunch of question marks because `pd.read_csv()` didn't know how to convert the foreign characters. In our other data set, this title would simply be `The Glory: Season 1` ... they don't match up, so they won't merge.

In [10]:
netflix_views.iloc[:5]

Unnamed: 0,Title,Available Globally?,Release Date,Hours Viewed
0,The Night Agent: Season 1,Yes,3/23/2023,812100000
1,Ginny & Georgia: Season 2,Yes,1/5/2023,665100000
2,The Glory: Season 1 // ? ???: ?? 1,Yes,12/30/2022,622800000
3,Wednesday: Season 1,Yes,11/23/2022,507700000
4,Queen Charlotte: A Bridgerton Story,Yes,5/4/2023,503000000


The `apply()` method in pandas allows you to apply a function to each element or row of a DataFrame or Series. It is useful for performing transformations, calculations, or custom operations efficiently. When used on a Series, it applies the function element-wise. When used on a DataFrame, it can apply the function along either rows or columns, depending on the axis parameter. For now, we're just using it on a Series, so we don't have to worry about an axis parameter.

In [11]:
def english_only(s):
    languages = s.split(" // ")
    english = languages[0]
    return english

netflix_views['Title'] = netflix_views['Title'].apply(english_only)
netflix_views.iloc[:5]

Unnamed: 0,Title,Available Globally?,Release Date,Hours Viewed
0,The Night Agent: Season 1,Yes,3/23/2023,812100000
1,Ginny & Georgia: Season 2,Yes,1/5/2023,665100000
2,The Glory: Season 1,Yes,12/30/2022,622800000
3,Wednesday: Season 1,Yes,11/23/2022,507700000
4,Queen Charlotte: A Bridgerton Story,Yes,5/4/2023,503000000


Alternatively, if your transformation or calculation is very simple, as it is in this case, you can use a lambda function! This is my preferred method of using apply; it's a much more efficient way to code! If you're performing a complex change that takes many steps, you will have to define your own function and use that within apply. But if not, I highly recommend lambda! The following code chunk performs the exact same thing as the code chunk above:

In [12]:
netflix_views['Title'] = netflix_views['Title'].apply(lambda x: x.split(" // ")[0])
netflix_views.iloc[:5]

Unnamed: 0,Title,Available Globally?,Release Date,Hours Viewed
0,The Night Agent: Season 1,Yes,3/23/2023,812100000
1,Ginny & Georgia: Season 2,Yes,1/5/2023,665100000
2,The Glory: Season 1,Yes,12/30/2022,622800000
3,Wednesday: Season 1,Yes,11/23/2022,507700000
4,Queen Charlotte: A Bridgerton Story,Yes,5/4/2023,503000000


In [13]:
# print the number of movies in the merged data BEFORE updating the title column
print("Movies and shows in merged data before updating:", len(netflix))

# perform the same inner join from before with the new netlix_views data
netflix = pd.merge(netflix_titles, netflix_views, left_on = "title", right_on = "Title")

# print the number of movies in the merged data AFTER updating the title column
print("Movies and shows in merged data after updating:", len(netflix))

Movies and shows in merged data before updating: 1965
Movies and shows in merged data after updating: 2895


That's a big difference! There were nearly 1,000 movies and shows in one of our original data sets that had a foreign title attached to them. This meant that their title didn't match up with our other data set, and therefore they were excluded from our merged data. We don't want that! We want to keep all the records that we possibly can.

### **Practice Questions:**

2. You're interested in viewership data for Netflix shows and movies. What you really need to look at is the `Hours Viewed` column, but it would also be nice to have data about the cast, director, year, and other information on that same line. However, you're most interested in `Hours Viewed`, so you want to keep that data even if it doesn't have matching information from the other data set. What type of join should you use?

3. Create a new column that denotes whether or not a show/movie is `listed_in` the "Thrillers" category.

In [14]:
# Answer here:


## **Filtering**

A mask in pandas is a boolean condition applied to a DataFrame to filter specific rows. It creates a Series of `True` and `False` values, where only True rows are kept.

To filter rows based on a condition, create a mask by applying a logical expression to a column. For example, checking if values in a column are greater than a threshold results in a mask where True represents the rows that meet the condition. Applying this mask to the DataFrame returns only the matching rows.

Multiple conditions can be combined using logical operators. The & operator represents an "AND" condition, meaning all conditions must be met for a row to be included. The | operator represents an "OR" condition, meaning at least one condition must be met. To negate a condition, use the ~ symbol before the mask.

Using masks provides an efficient way to filter and extract relevant data from a DataFrame.

In [15]:
mask = (netflix['type'] == "Movie")
netflix[mask].iloc[:5]

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description,Title,Available Globally?,Release Date,Hours Viewed
0,s7,Movie,My Little Pony: A New Generation,"Robert Cullen, José Luis Ucha","Vanessa Hudgens, Kimiko Glenn, James Marsden, ...",,"September 24, 2021",2021,PG,91 min,Children & Family Movies,Equestria's divided. But a bright-eyed hero be...,My Little Pony: A New Generation,Yes,9/24/2021,15400000
1,s8,Movie,Sankofa,Haile Gerima,"Kofi Ghanaba, Oyafunmike Ogunlano, Alexandra D...","United States, Ghana, Burkina Faso, United Kin...","September 24, 2021",1993,TV-MA,125 min,"Dramas, Independent Movies, International Movies","On a photo shoot in Ghana, an American model s...",Sankofa,No,,100000
2,s10,Movie,The Starling,Theodore Melfi,"Melissa McCarthy, Chris O'Dowd, Kevin Kline, T...",United States,"September 24, 2021",2021,PG-13,104 min,"Comedies, Dramas",A woman adjusting to life after a loss contend...,The Starling,Yes,9/24/2021,8200000
3,s13,Movie,Je Suis Karl,Christian Schwochow,"Luna Wedler, Jannis Niewöhner, Milan Peschel, ...","Germany, Czech Republic","September 23, 2021",2021,TV-MA,127 min,"Dramas, International Movies",After most of her family is murdered in a terr...,Je Suis Karl,Yes,9/23/2021,300000
4,s14,Movie,Confessions of an Invisible Girl,Bruno Garotti,"Klara Castanho, Lucca Picon, Júlia Gomes, Marc...",,"September 22, 2021",2021,TV-PG,91 min,"Children & Family Movies, Comedies",When the clever but socially-awkward Tetê join...,Confessions of an Invisible Girl,Yes,9/22/2021,5700000


Alternatively, you should perform this all on one line:

In [16]:
netflix[netflix['type'] == "Movie"].iloc[:5]

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description,Title,Available Globally?,Release Date,Hours Viewed
0,s7,Movie,My Little Pony: A New Generation,"Robert Cullen, José Luis Ucha","Vanessa Hudgens, Kimiko Glenn, James Marsden, ...",,"September 24, 2021",2021,PG,91 min,Children & Family Movies,Equestria's divided. But a bright-eyed hero be...,My Little Pony: A New Generation,Yes,9/24/2021,15400000
1,s8,Movie,Sankofa,Haile Gerima,"Kofi Ghanaba, Oyafunmike Ogunlano, Alexandra D...","United States, Ghana, Burkina Faso, United Kin...","September 24, 2021",1993,TV-MA,125 min,"Dramas, Independent Movies, International Movies","On a photo shoot in Ghana, an American model s...",Sankofa,No,,100000
2,s10,Movie,The Starling,Theodore Melfi,"Melissa McCarthy, Chris O'Dowd, Kevin Kline, T...",United States,"September 24, 2021",2021,PG-13,104 min,"Comedies, Dramas",A woman adjusting to life after a loss contend...,The Starling,Yes,9/24/2021,8200000
3,s13,Movie,Je Suis Karl,Christian Schwochow,"Luna Wedler, Jannis Niewöhner, Milan Peschel, ...","Germany, Czech Republic","September 23, 2021",2021,TV-MA,127 min,"Dramas, International Movies",After most of her family is murdered in a terr...,Je Suis Karl,Yes,9/23/2021,300000
4,s14,Movie,Confessions of an Invisible Girl,Bruno Garotti,"Klara Castanho, Lucca Picon, Júlia Gomes, Marc...",,"September 22, 2021",2021,TV-PG,91 min,"Children & Family Movies, Comedies",When the clever but socially-awkward Tetê join...,Confessions of an Invisible Girl,Yes,9/22/2021,5700000


You can also perform multiple masks at once!

In [17]:
year_mask = (netflix['release_year'] >= 2000)
rating_mask = (netflix['rating'] == 'PG-13')
netflix[year_mask & rating_mask].iloc[:5]

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description,Title,Available Globally?,Release Date,Hours Viewed
2,s10,Movie,The Starling,Theodore Melfi,"Melissa McCarthy, Chris O'Dowd, Kevin Kline, T...",United States,"September 24, 2021",2021,PG-13,104 min,"Comedies, Dramas",A woman adjusting to life after a loss contend...,The Starling,Yes,9/24/2021,8200000
13,s39,Movie,Birth of the Dragon,George Nolfi,"Billy Magnussen, Ron Yuan, Qu Jingjing, Terry ...","China, Canada, United States","September 16, 2021",2017,PG-13,96 min,"Action & Adventure, Dramas",A young Bruce Lee angers kung fu traditionalis...,Birth of the Dragon,No,,1400000
42,s89,Movie,Blood Brothers: Malcolm X & Muhammad Ali,Marcus Clarke,"Malcolm X, Muhammad Ali",,"September 9, 2021",2021,PG-13,96 min,"Documentaries, Sports Movies","From a chance meeting to a tragic fallout, Mal...",Blood Brothers: Malcolm X & Muhammad Ali,Yes,9/9/2021,300000
52,s113,Movie,Worth,Sara Colangelo,"Michael Keaton, Stanley Tucci, Amy Ryan, Shuno...",,"September 3, 2021",2021,PG-13,119 min,Dramas,"In the wake of the Sept. 11 attacks, a lawyer ...",Worth,No,9/3/2021,5200000
55,s118,Movie,Final Account,Luke Holland,,"United Kingdom, United States","September 2, 2021",2021,PG-13,94 min,Documentaries,This documentary stitches together never-befor...,Final Account,No,,500000


Again though, it's best to do this all on one line.

In [18]:
netflix[(netflix['release_year'] >= 2000) & (netflix['rating'] == 'PG-13')].iloc[:5]

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description,Title,Available Globally?,Release Date,Hours Viewed
2,s10,Movie,The Starling,Theodore Melfi,"Melissa McCarthy, Chris O'Dowd, Kevin Kline, T...",United States,"September 24, 2021",2021,PG-13,104 min,"Comedies, Dramas",A woman adjusting to life after a loss contend...,The Starling,Yes,9/24/2021,8200000
13,s39,Movie,Birth of the Dragon,George Nolfi,"Billy Magnussen, Ron Yuan, Qu Jingjing, Terry ...","China, Canada, United States","September 16, 2021",2017,PG-13,96 min,"Action & Adventure, Dramas",A young Bruce Lee angers kung fu traditionalis...,Birth of the Dragon,No,,1400000
42,s89,Movie,Blood Brothers: Malcolm X & Muhammad Ali,Marcus Clarke,"Malcolm X, Muhammad Ali",,"September 9, 2021",2021,PG-13,96 min,"Documentaries, Sports Movies","From a chance meeting to a tragic fallout, Mal...",Blood Brothers: Malcolm X & Muhammad Ali,Yes,9/9/2021,300000
52,s113,Movie,Worth,Sara Colangelo,"Michael Keaton, Stanley Tucci, Amy Ryan, Shuno...",,"September 3, 2021",2021,PG-13,119 min,Dramas,"In the wake of the Sept. 11 attacks, a lawyer ...",Worth,No,9/3/2021,5200000
55,s118,Movie,Final Account,Luke Holland,,"United Kingdom, United States","September 2, 2021",2021,PG-13,94 min,Documentaries,This documentary stitches together never-befor...,Final Account,No,,500000


### **Practice Questions:**

4. What show had the most hours viewed in our merged data? (The following code puts hours viewed into a numeric form, make sure you run it first)

In [19]:
netflix['Hours Viewed'] = netflix['Hours Viewed'].str.replace(",", "")
netflix['Hours Viewed'] = pd.to_numeric(netflix['Hours Viewed'])

In [20]:
# Answer here:


5. Filter the data to only include movies and shows that were either directed by Quentin Tarantino or Spike Lee.

In [21]:
# Answer here:


## Wrapping up

Exploring, cleaning, transforming, and visualization data with pandas in Python is an essential skill in data science. Just cleaning wrangling data is 80% of your job as a Data Scientist. After a few projects and some practice, you should be very comfortable with most of the basics.

To keep improving, view the [extensive tutorials](https://pandas.pydata.org/pandas-docs/stable/tutorials.html) offered by the official pandas docs, follow along with a few [Kaggle kernels](https://www.kaggle.com/kernels), and keep working on your own projects!