**This notebook is an exercise in the [Pandas](https://www.kaggle.com/learn/pandas) course.  You can reference the tutorial at [this link](https://www.kaggle.com/residentmario/summary-functions-and-maps).**

---


# Introduction

Now you are ready to get a deeper understanding of your data.

Run the following cell to load your data and some utility functions (including code to check your answers).

In [1]:
import pandas as pd
pd.set_option("display.max_rows", 5)
reviews = pd.read_csv("../input/wine-reviews/winemag-data-130k-v2.csv", index_col=0)

from learntools.core import binder; binder.bind(globals())
from learntools.pandas.summary_functions_and_maps import *
print("Setup complete.")

reviews.head()

Setup complete.


Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
0,Italy,"Aromas include tropical fruit, broom, brimston...",Vulkà Bianco,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco (Etna),White Blend,Nicosia
1,Portugal,"This is ripe and fruity, a wine that is smooth...",Avidagos,87,15.0,Douro,,,Roger Voss,@vossroger,Quinta dos Avidagos 2011 Avidagos Red (Douro),Portuguese Red,Quinta dos Avidagos
2,US,"Tart and snappy, the flavors of lime flesh and...",,87,14.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Rainstorm 2013 Pinot Gris (Willamette Valley),Pinot Gris,Rainstorm
3,US,"Pineapple rind, lemon pith and orange blossom ...",Reserve Late Harvest,87,13.0,Michigan,Lake Michigan Shore,,Alexander Peartree,,St. Julian 2013 Reserve Late Harvest Riesling ...,Riesling,St. Julian
4,US,"Much like the regular bottling from 2012, this...",Vintner's Reserve Wild Child Block,87,65.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Sweet Cheeks 2012 Vintner's Reserve Wild Child...,Pinot Noir,Sweet Cheeks


# Exercises

## 1.

What is the median of the `points` column in the `reviews` DataFrame?

In [2]:
median_points = reviews['points'].median()

# Check your answer
q1.check()

<IPython.core.display.Javascript object>

<span style="color:#33cc33">Correct</span>

In [3]:
#q1.hint()
#q1.solution()

## 2. 
What countries are represented in the dataset? (Your answer should not include any duplicates.)

In [4]:
countries = reviews['country'].unique()

# Check your answer
q2.check()

<IPython.core.display.Javascript object>

<span style="color:#33cc33">Correct</span>

In [5]:
#q2.hint()
#q2.solution()

## 3.
How often does each country appear in the dataset? Create a Series `reviews_per_country` mapping countries to the count of reviews of wines from that country.

In [12]:
reviews_per_country = reviews['country'].value_counts()

# Check your answer
q3.check()

<IPython.core.display.Javascript object>

<span style="color:#33cc33">Correct</span>

In [13]:
#q3.hint()
#q3.solution()

## 4.
Create variable `centered_price` containing a version of the `price` column with the mean price subtracted.

(Note: this 'centering' transformation is a common preprocessing step before applying various machine learning algorithms.) 

In [15]:
reviews.columns

Index(['country', 'description', 'designation', 'points', 'price', 'province',
       'region_1', 'region_2', 'taster_name', 'taster_twitter_handle', 'title',
       'variety', 'winery'],
      dtype='object')

In [16]:
centered_price = reviews['price'] - reviews['price'].mean()

# Check your answer
q4.check()

<IPython.core.display.Javascript object>

<span style="color:#33cc33">Correct</span>

In [17]:
#q4.hint()
#q4.solution()

## 5.
I'm an economical wine buyer. Which wine is the "best bargain"? Create a variable `bargain_wine` with the title of the wine with the highest points-to-price ratio in the dataset.

In [18]:
reviews['points']

0         87
1         87
          ..
129969    90
129970    90
Name: points, Length: 129971, dtype: int64

In [19]:
reviews['price']

0          NaN
1         15.0
          ... 
129969    32.0
129970    21.0
Name: price, Length: 129971, dtype: float64

In [20]:
reviews['points']/reviews['price']

0              NaN
1         5.800000
            ...   
129969    2.812500
129970    4.285714
Length: 129971, dtype: float64

In [30]:
help((reviews['points']/reviews['price']).idxmax)

Help on method idxmax in module pandas.core.series:

idxmax(axis: 'Axis' = 0, skipna: 'bool' = True, *args, **kwargs) -> 'Hashable' method of pandas.core.series.Series instance
    Return the row label of the maximum value.
    
    If multiple values equal the maximum, the first row label with that
    value is returned.
    
    Parameters
    ----------
    axis : {0 or 'index'}
        Unused. Parameter needed for compatibility with DataFrame.
    skipna : bool, default True
        Exclude NA/null values. If the entire Series is NA, the result
        will be NA.
    *args, **kwargs
        Additional arguments and keywords have no effect but might be
        accepted for compatibility with NumPy.
    
    Returns
    -------
    Index
        Label of the maximum value.
    
    Raises
    ------
    ValueError
        If the Series is empty.
    
    See Also
    --------
    numpy.argmax : Return indices of the maximum values
        along the given axis.
    DataFrame.idxmax :

In [31]:
bargain_idx = (reviews['points']/reviews['price']).idxmax()
# 가격 대비 점수의 비율이 가장 높은 와인의 인덱스(idxmax())를 구한다
# idxmax() : 객체에서 최고값을 가진 값의 인덱스를 반환한다

# 앞서 구한 비율 점수가 가장 높은 데이터의 인덱스 행과 'title' 열의 값을 추출한다
bargain_wine = reviews.loc[bargain_idx, 'title']

# Check your answer
q5.check()

<IPython.core.display.Javascript object>

<span style="color:#33cc33">Correct</span>

In [32]:
#q5.hint()
#q5.solution()

## 6.
There are only so many words you can use when describing a bottle of wine. Is a wine more likely to be "tropical" or "fruity"? Create a Series `descriptor_counts` counting how many times each of these two words appears in the `description` column in the dataset. (For simplicity, let's ignore the capitalized versions of these words.)

In [38]:
reviews['description']

0         Aromas include tropical fruit, broom, brimston...
1         This is ripe and fruity, a wine that is smooth...
                                ...                        
129969    A dry style of Pinot Gris, this is crisp with ...
129970    Big, rich and off-dry, this is powered by inte...
Name: description, Length: 129971, dtype: object

In [None]:
for sentence in reviews['description']:
    for i in range(len(sentence)):
        sen

In [47]:
n_trop = reviews['description'].map(lambda x : 'tropical' in x).sum()
n_trop

3607

In [52]:
n_trop = reviews['description'].map(lambda x : 'tropical' in x).sum()
n_fru = reviews['description'].map(lambda x : 'fruity' in x).sum()
descriptor_counts = pd.Series([n_trop, n_fru], index=['tropical', 'fruity'])

# Check your answer
q6.check()

<IPython.core.display.Javascript object>

<span style="color:#33cc33">Correct</span>

In [53]:
#q6.hint()
#q6.solution()

## 7.
We'd like to host these wine reviews on our website, but a rating system ranging from 80 to 100 points is too hard to understand - we'd like to translate them into simple star ratings. A score of 95 or higher counts as 3 stars, a score of at least 85 but less than 95 is 2 stars. Any other score is 1 star.

Also, the Canadian Vintners Association bought a lot of ads on the site, so any wines from Canada should automatically get 3 stars, regardless of points.

Create a series `star_ratings` with the number of stars corresponding to each review in the dataset.

In [65]:
star = []

for i in range(len(reviews.index)):
    if reviews['country'][i] == 'Canada':
        star.append(3)
        pass
    elif reviews['points'][i] >=95:
        star.append(3)
    elif reviews['points'][i] >=85:
        star.append(2)
    elif reviews['points'][i] < 85:
        star.append(1)

star_ratings = pd.Series(star)

In [66]:
star = []

for i in range(len(reviews.index)):
    if reviews['country'][i] == 'Canada':
        star.append(3)
        pass
    elif reviews['points'][i] >=95:
        star.append(3)
    elif reviews['points'][i] >=85:
        star.append(2)
    elif reviews['points'][i] < 85:
        star.append(1)

star_ratings = pd.Series(star)

# Check your answer
q7.check()

<IPython.core.display.Javascript object>

<span style="color:#33cc33">Correct</span>

In [67]:
# 모범 답안
def stars(row):
    if row.country == 'Canada':
        return 3
    elif row.points >= 95:
        return 3
    elif row.points >= 85:
        return 2
    else:
        return 1

star_ratings = reviews.apply(stars, axis='columns')

In [69]:
#q7.hint()
#q7.solution()

# Keep going
Continue to **[grouping and sorting](https://www.kaggle.com/residentmario/grouping-and-sorting)**.

---




*Have questions or comments? Visit the [course discussion forum](https://www.kaggle.com/learn/pandas/discussion) to chat with other learners.*