# Filtering Data With Pandas

This notebook contains exercises for Pandas Filtering.

**At the end of each exercise there are cells containing assert statements that you can use to check your answers.**

In [None]:
import numpy as np
import pandas as pd
from utils.dataset_loader import load_movies_dataset
%autosave 30

## Exercise 3, Part 1: Filtering With `loc` and `iloc`

---

In this exercise you will write compound queries using `loc` and `iloc` instead of `[]`.

In [None]:
movies_df = load_movies_dataset()

### Question 3.1

Select the titles of the movies that are shorter than `45` minutes with a rating of more than `7.5`.

Use only one call to `loc` for this.

In [None]:
titles_short_high_rated = ...

### Question 3.3

Find out the runtime of the longest Bulgarian movie. Bulgarian movies are those with `original_language` equal to `bg`.

*HINT 1: You can use the function `max` on a column to select its maximum value.*

In [None]:
runtime_longest_bg_movie = ...

#### Run these cells after finishing the exercise questions to check your answers. 

In [None]:
assert titles_short_high_rated.shape == (437,), \
    f"Wrong shape of titles_short_high_rated - {titles_short_high_rated.shape}!"

In [None]:
assert runtime_longest_bg_movie == 115, \
    f"Wrong runtime_longest_bg_movie - {runtime_longest_bg_movie}!"

## Exercise 3, Part 2: Multiple Conditions, Wrong Dimension

---

The runtime of all english movies (those with `original_language` equal to `en`) is in hours instead of minutes.

### Question 3.4

Fix the runtime of English movies by multiplying their runtime by `60`.

Use only a single call to `loc` without chaining filtering operations.

In [None]:
# YOUR CODE GOES HERE

#### Run this cell after finishing the exercise questions to check your answers. 

In [None]:
assert (movies_df.loc[movies_df['runtime'] > 0, 'runtime'] > 0.5).all(), "Some movies are still in hours!"

## Exercise 3, Part 3: Putting it all Together

---

You will now use all you've learned to answer some more complex questions about movies with the power of Pandas!

### Question 3.5

* What's the rating of the lowest-rated movie has a budget of more than `1,000,000` (one million)?
* __[Optional]__ What's the title of the movie? _HINT: You can use `idxmin` on the `vote_average` column._

In [None]:
lowest_rated_high_budget = ...

# [Optional]
lowest_rated_high_budget_idx = ...

lowest_rated_name = ...

### Question 3.6

What's the mean rating of all movies with `runtime` > `350` minutes?

*HINT: You can use the `mean` function.*

In [None]:
avg_rating_long_movies = ...

#### Run these cells after finishing the exercise questions to check your answers. 

In [None]:
assert lowest_rated_high_budget == 1.5, f"Wrong lowest_rated_high_budget - {lowest_rated_high_budget}!"

In [None]:
assert np.isclose(avg_rating_long_movies, 7.514285714285715, rtol=1e-5), \
    f"Wrong avg_rating_long_movies - {avg_rating_long_movies:.5f}!"