## Introduction

Now you are ready to get a deeper understanding of your data.

Run the following cell to load your data and some utility functions

In [None]:
import pandas as pd
pd.set_option("display.max_rows", 5)
reviews = pd.read_csv("/winemag-data-130k-v2.csv", index_col=0)

print("Setup complete.")

Look at an overview of your data by running the following line.

In [None]:
reviews.head()

## Exercises

## 1.

What is the median of the `points` column in the `reviews` DataFrame?

median_points = ___

In [None]:
Hint: Use the median function (a built-in pandas function, like the mean function or the unique function).

Solution:

median_points = reviews.points.median()

## 2. 
What countries are represented in the dataset? (Your answer should not include any duplicates.)

In [None]:
countries = ___

In [None]:
Hint: Use the unique function to get a list of unique entries in a column.

Solution:

countries = reviews.country.unique()

## 3.
How often does each country appear in the dataset? Create a Series `reviews_per_country` mapping countries to the count of reviews of wines from that country.

In [None]:
reviews_per_country = ___

In [None]:
Hint: To see a list of unique values and how often they occur in a Series, use the value_counts method.

Solution:

reviews_per_country = reviews.country.value_counts()

## 4.
Create variable `centered_price` containing a version of the `price` column with the mean price subtracted.

(Note: this 'centering' transformation is a common preprocessing step before applying various machine learning algorithms.) 

In [None]:
centered_price = ___

In [None]:
Hint: To get the mean of a column in a Pandas DataFrame, use the mean function.

Solution:

centered_price = reviews.price - reviews.price.mean()

## 5.
I'm an economical wine buyer. Which wine is the "best bargain"? Create a variable `bargain_wine` with the title of the wine with the highest points-to-price ratio in the dataset.

In [None]:
bargain_wine = ___

In [None]:
Hint: The idxmax method may be useful here.

Solution:

bargain_idx = (reviews.points / reviews.price).idxmax()
bargain_wine = reviews.loc[bargain_idx, 'title']

## 6.
There are only so many words you can use when describing a bottle of wine. Is a wine more likely to be "tropical" or "fruity"? Create a Series `descriptor_counts` counting how many times each of these two words appears in the `description` column in the dataset. (For simplicity, let's ignore the capitalized versions of these words.)

In [None]:
descriptor_counts = ___

In [None]:
Hint: Use a map to check each description for the string tropical, then count up the number of times this is True. Repeat this for fruity. Finally, create a Series combining the two values.

Solution:

n_trop = reviews.description.map(lambda desc: "tropical" in desc).sum()
n_fruity = reviews.description.map(lambda desc: "fruity" in desc).sum()
descriptor_counts = pd.Series([n_trop, n_fruity], index=['tropical', 'fruity'])

## 7.
We'd like to host these wine reviews on our website, but a rating system ranging from 80 to 100 points is too hard to understand - we'd like to translate them into simple star ratings. A score of 95 or higher counts as 3 stars, a score of at least 85 but less than 95 is 2 stars. Any other score is 1 star.

Also, the Canadian Vintners Association bought a lot of ads on the site, so any wines from Canada should automatically get 3 stars, regardless of points.

Create a series `star_ratings` with the number of stars corresponding to each review in the dataset.

In [None]:
star_ratings = ___

In [None]:
Hint: Begin by writing a custom function that accepts a row from the DataFrame as input and returns the star rating corresponding to the row. Then, use DataFrame.apply to apply the custom function to every row in the dataset.

Solution:

def stars(row):
    if row.country == 'Canada':
        return 3
    elif row.points >= 95:
        return 3
    elif row.points >= 85:
        return 2
    else:
        return 1

star_ratings = reviews.apply(stars, axis='columns')