<a href="https://colab.research.google.com/github/jeremychege/MachineLearning/blob/main/Pandas.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 1. Creating Data
- There are two core objects in pandas:
 1. the DataFrame
 2. the Series

 ## 1. DataFrame
 - A table containing an array of individual entries, each of which has a certain value and corresponds to a row/record and a column. The syntax for declaring a new one is a dictionary whose keys are the column names and whose values are a list of entries.

 - The list of row labels used in a DataFrame is known as an Index

 ## 2. Series
 - In essence, a single column of a DataFrame. So you can assign row labels to the Series the same way as before, using an `index` parameter. No column names as it has one overall `name`.


 - Series and DataFrame are intimately related. Its helpful to think of a DataFrame as actually being just a bunch of Series "glued together".




In [None]:
import pandas as pd
pd.DataFrame ({'Yes': [50, 21], 'No': [131, 2]})

Unnamed: 0,Yes,No
0,50,131
1,21,2


In [None]:
pd.DataFrame({'Bob': ['I liked it.', 'It was awful.'], 'Sue': ['Pretty good.', 'Bland.']}, index = ['Product A', 'Product B'])


Unnamed: 0,Bob,Sue
Product A,I liked it.,Pretty good.
Product B,It was awful.,Bland.


In [None]:
pd.Series([1, 2, 3, 4, 5])

Unnamed: 0,0
0,1
1,2
2,3
3,4
4,5


In [None]:
pd.Series([30, 35, 40], index = ['2015 Sales', '2016 Sales', '2017 Sales'], name='Product A')

Unnamed: 0,Product A
2015 Sales,30
2016 Sales,35
2017 Sales,40


## Reading data files

 - Creating a DataFrame or Series manually is handy but mostly we will work with data that already exists.

 - Data can be stored in any number of different forms and formats. By far the most basic is the humble csv file.

 - We can use `shape` attribute to check how large the resulting DataFrame is. e.g, our DataFrame has 129971 records split across 13 different columns.

 - the `head()` command, grabs the first five rows.

- `pd.read_csv()` function is well-endowed with over 30 optional parameters you can specify. e.g the CSV read below had a built-in index, which pandas did not pick up on automatically. To make Pandas use that instead of creating a new one we can specify an `index_col`

In [None]:
import kagglehub

# Download latest version
wine_path = kagglehub.dataset_download("zynicide/wine-reviews")

print("Path to dataset files:", wine_path)

wine_reviews = pd.read_csv(wine_path + "/winemag-data-130k-v2.csv", index_col =0)

wine_reviews.shape

wine_reviews.head()


Path to dataset files: /root/.cache/kagglehub/datasets/zynicide/wine-reviews/versions/4


Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
0,Italy,"Aromas include tropical fruit, broom, brimston...",Vulkà Bianco,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco (Etna),White Blend,Nicosia
1,Portugal,"This is ripe and fruity, a wine that is smooth...",Avidagos,87,15.0,Douro,,,Roger Voss,@vossroger,Quinta dos Avidagos 2011 Avidagos Red (Douro),Portuguese Red,Quinta dos Avidagos
2,US,"Tart and snappy, the flavors of lime flesh and...",,87,14.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Rainstorm 2013 Pinot Gris (Willamette Valley),Pinot Gris,Rainstorm
3,US,"Pineapple rind, lemon pith and orange blossom ...",Reserve Late Harvest,87,13.0,Michigan,Lake Michigan Shore,,Alexander Peartree,,St. Julian 2013 Reserve Late Harvest Riesling ...,Riesling,St. Julian
4,US,"Much like the regular bottling from 2012, this...",Vintner's Reserve Wild Child Block,87,65.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Sweet Cheeks 2012 Vintner's Reserve Wild Child...,Pinot Noir,Sweet Cheeks


# Indexing, Selecting & Assigning

- Selecting specific values of a pandas DataFrame or Series to work on is an implicit step in almost any data operation you'll run, so one of the first things you need to learn in working with data in Python is how to go about selecting the data points relevant to you quickly and effectively.


## Native accessors
- Native Python objects provide good ways of indexing data. Pandas carries all of these over, which helps make it easy to start with.

- In Python, we can access the property of an object by accessing it as an attribute. A `book` object, for example, might have a title property, which we can access by calling `book.title`. Columns in a pandas DataFrame work in much the same way.

In [None]:
wine_reviews

Unnamed: 0,country
0,Italy
1,Portugal
...,...
129969,France
129970,France


In [None]:
wine_reviews.country

Unnamed: 0,country
0,Italy
1,Portugal
...,...
129969,France
129970,France


In [None]:
wine_reviews['country']

Unnamed: 0,country
0,Italy
1,Portugal
...,...
129969,France
129970,France


- Above are the two ways of selecting Series out of a DataFrame. Neither of them more/less syntactically valid than the other.

- The indexing operator `[]` hass the adv. thati it can handle column names  with reserved characters in them. To drill down to a specific value we need to use the indexing operator more than once.

In [None]:
wine_reviews['country'][12001]

'US'

## Indexing in Pandas
- The indexing operator and attribute selection are nice because they work just like they do in the rest of the Python ecosystem. As a novice, this makes them easy to pick up and use. However, pandas has its own accessor operators, `loc` and `iloc`. For more advanced operations, these are the ones you're supposed to be using.



### Index-based Selection
- Pandas indexing works in one of two paradigms. The first is `index-based selection`: selecting data based on its numerical position in the data. `iloc` follows this paradigm.

- The below line selects the first row of data in a DataFrame. Both `loc` and `iloc` are row-first, column second as opposed to native Python which is column-first, row-second, meaning its marginally easier to retrieve rows and marginally harder to retrieve columns.


In [None]:
wine_reviews.iloc[0]

Unnamed: 0,0
country,Italy
description,"Aromas include tropical fruit, broom, brimston..."
...,...
variety,White Blend
winery,Nicosia


- to get column with `iloc` we can do the following

In [None]:
wine_reviews.iloc[:, 0]

Unnamed: 0,country
0,Italy
1,Portugal
...,...
129969,France
129970,France


- On its own, the : operator, which also comes from native Python, means "everything". When combined with other selectors, however, it can be used to indicate a range of values. For example, to select the country column from just the first, second, and third row, we would do:

In [None]:
wine_reviews.iloc[:3, 0]

Unnamed: 0,country
0,Italy
1,Portugal
2,US


- Or, to select just the second and third entries, we would do:

In [None]:
wine_reviews.iloc[1:3, 0]

Unnamed: 0,country
1,Portugal
2,US


- It is also possible to pass a list:

In [None]:
wine_reviews.iloc[[0, 1, 34, 1209, 10101], 0]

Unnamed: 0,country
0,Italy
1,Portugal
34,US
1209,US
10101,France


- It's worth knowing that negative numbers can be used in selection. This will start counting forwards from the end of the values. So for example here are the last five elements of the dataset.

In [None]:
wine_reviews.iloc[-5:]

Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
129966,Germany,Notes of honeysuckle and cantaloupe sweeten th...,Brauneberger Juffer-Sonnenuhr Spätlese,90,28.0,Mosel,,,Anna Lee C. Iijima,,Dr. H. Thanisch (Erben Müller-Burggraef) 2013 ...,Riesling,Dr. H. Thanisch (Erben Müller-Burggraef)
129967,US,Citation is given as much as a decade of bottl...,,90,75.0,Oregon,Oregon,Oregon Other,Paul Gregutt,@paulgwine,Citation 2004 Pinot Noir (Oregon),Pinot Noir,Citation
129968,France,Well-drained gravel soil gives this wine its c...,Kritt,90,30.0,Alsace,Alsace,,Roger Voss,@vossroger,Domaine Gresser 2013 Kritt Gewurztraminer (Als...,Gewürztraminer,Domaine Gresser
129969,France,"A dry style of Pinot Gris, this is crisp with ...",,90,32.0,Alsace,Alsace,,Roger Voss,@vossroger,Domaine Marcel Deiss 2012 Pinot Gris (Alsace),Pinot Gris,Domaine Marcel Deiss
129970,France,"Big, rich and off-dry, this is powered by inte...",Lieu-dit Harth Cuvée Caroline,90,21.0,Alsace,Alsace,,Roger Voss,@vossroger,Domaine Schoffit 2012 Lieu-dit Harth Cuvée Car...,Gewürztraminer,Domaine Schoffit


### Label-based selection
-  The second paradigm for attribute selection is the one followed by the `loc` operator: **label-based selection**. In this paradigm the data index value matters, not its position.

- e.g, to get the first entry in reviews we would do the following:


In [None]:
wine_reviews.loc[0, 'country']

'Italy'

- `iloc` is conceptually simpler than `loc` because it ignores the datasets indices. By contrast, `loc` uses information in indices to do its work.

- Since your dataset usually has meaningful indices, it's usually easier to do things using `loc` instead.

In [None]:
wine_reviews.loc[:, ['taster_name', 'taster_twitter_handle', 'points']]

Unnamed: 0,taster_name,taster_twitter_handle,points
0,Kerin O’Keefe,@kerinokeefe,87
1,Roger Voss,@vossroger,87
...,...,...,...
129969,Roger Voss,@vossroger,90
129970,Roger Voss,@vossroger,90


### Choosing between `iloc` and `loc`

- `iloc` uses the Python stdlib indexing scheme, where the first element of the range is included and the last one excluded. So `0:10` will select entries `0,...,9`. `loc`, meanwhile, indexes inclusively. So `0:10` will select entries `0,...,10`.

- This is particularly confusing when the DataFrame index is a simple numerical list, e.g. `0,...,1000`. In this case `df.iloc[0:1000]` will return 1000 entries, while `df.loc[0:1000]` return 1001 of them! To get 1000 elements using loc, you will need to go one lower and ask for `df.loc[0:999]`

### Manipulating the index
- Indexes used are not immutable.
- We can manipulate the index any way we see fit.
- The `set_index()` method can be used to do the job. E.g, below is what happens when we `set_index` to the `title` field.
- Its useful to come up with an index for the dataset which is better than the current one.

In [None]:
wine_reviews.set_index("title")

Unnamed: 0_level_0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,variety,winery
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
Nicosia 2013 Vulkà Bianco (Etna),Italy,"Aromas include tropical fruit, broom, brimston...",Vulkà Bianco,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,White Blend,Nicosia
Quinta dos Avidagos 2011 Avidagos Red (Douro),Portugal,"This is ripe and fruity, a wine that is smooth...",Avidagos,87,15.0,Douro,,,Roger Voss,@vossroger,Portuguese Red,Quinta dos Avidagos
...,...,...,...,...,...,...,...,...,...,...,...,...
Domaine Marcel Deiss 2012 Pinot Gris (Alsace),France,"A dry style of Pinot Gris, this is crisp with ...",,90,32.0,Alsace,Alsace,,Roger Voss,@vossroger,Pinot Gris,Domaine Marcel Deiss
Domaine Schoffit 2012 Lieu-dit Harth Cuvée Caroline Gewurztraminer (Alsace),France,"Big, rich and off-dry, this is powered by inte...",Lieu-dit Harth Cuvée Caroline,90,21.0,Alsace,Alsace,,Roger Voss,@vossroger,Gewürztraminer,Domaine Schoffit


## Conditional selection

- Weve been indexing various strides of data, using structural properties of the DataFrame itself. To do interesting things with the data, however, we often need to ask questions based on conditions.

- E.g, suppose we are interested in better than average wines produced in italy. We start by checking if the wine is from italy or not.

In [None]:
wine_reviews.country == 'Italy'

Unnamed: 0,country
0,True
1,False
...,...
129969,False
129970,False


- The above operation produced a Series of `True/False` booleans based on the `country` of each record. Tis result can then be used inside of `loc` to select relevant data

In [None]:
wine_reviews.loc[wine_reviews.country == 'Italy']

Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
0,Italy,"Aromas include tropical fruit, broom, brimston...",Vulkà Bianco,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco (Etna),White Blend,Nicosia
6,Italy,"Here's a bright, informal red that opens with ...",Belsito,87,16.0,Sicily & Sardinia,Vittoria,,Kerin O’Keefe,@kerinokeefe,Terre di Giurfo 2013 Belsito Frappato (Vittoria),Frappato,Terre di Giurfo
...,...,...,...,...,...,...,...,...,...,...,...,...,...
129961,Italy,"Intense aromas of wild cherry, baking spice, t...",,90,30.0,Sicily & Sardinia,Sicilia,,Kerin O’Keefe,@kerinokeefe,COS 2013 Frappato (Sicilia),Frappato,COS
129962,Italy,"Blackberry, cassis, grilled herb and toasted a...",Sàgana Tenuta San Giacomo,90,40.0,Sicily & Sardinia,Sicilia,,Kerin O’Keefe,@kerinokeefe,Cusumano 2012 Sàgana Tenuta San Giacomo Nero d...,Nero d'Avola,Cusumano


- We also wanted to know which ones are better than average. Wines are reviewed on a 80-to-100 point scale, so this could mean wines that accrued at least 90 points.

We can use the ampersand (&) to bring the two questions together:

In [None]:
wine_reviews.loc[(wine_reviews.country == 'Italy') & (wine_reviews.points >= 90)]

Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
120,Italy,"Slightly backward, particularly given the vint...",Bricco Rocche Prapó,92,70.0,Piedmont,Barolo,,,,Ceretto 2003 Bricco Rocche Prapó (Barolo),Nebbiolo,Ceretto
130,Italy,"At the first it was quite muted and subdued, b...",Bricco Rocche Brunate,91,70.0,Piedmont,Barolo,,,,Ceretto 2003 Bricco Rocche Brunate (Barolo),Nebbiolo,Ceretto
...,...,...,...,...,...,...,...,...,...,...,...,...,...
129961,Italy,"Intense aromas of wild cherry, baking spice, t...",,90,30.0,Sicily & Sardinia,Sicilia,,Kerin O’Keefe,@kerinokeefe,COS 2013 Frappato (Sicilia),Frappato,COS
129962,Italy,"Blackberry, cassis, grilled herb and toasted a...",Sàgana Tenuta San Giacomo,90,40.0,Sicily & Sardinia,Sicilia,,Kerin O’Keefe,@kerinokeefe,Cusumano 2012 Sàgana Tenuta San Giacomo Nero d...,Nero d'Avola,Cusumano


Suppose we'll bauy any wine made in italy or which is rated above average. Fore this we use a pipe (|)

In [None]:
wine_reviews.loc[(wine_reviews.country == 'Italy') | (wine_reviews.points >= 90)]

Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
0,Italy,"Aromas include tropical fruit, broom, brimston...",Vulkà Bianco,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco (Etna),White Blend,Nicosia
6,Italy,"Here's a bright, informal red that opens with ...",Belsito,87,16.0,Sicily & Sardinia,Vittoria,,Kerin O’Keefe,@kerinokeefe,Terre di Giurfo 2013 Belsito Frappato (Vittoria),Frappato,Terre di Giurfo
...,...,...,...,...,...,...,...,...,...,...,...,...,...
129969,France,"A dry style of Pinot Gris, this is crisp with ...",,90,32.0,Alsace,Alsace,,Roger Voss,@vossroger,Domaine Marcel Deiss 2012 Pinot Gris (Alsace),Pinot Gris,Domaine Marcel Deiss
129970,France,"Big, rich and off-dry, this is powered by inte...",Lieu-dit Harth Cuvée Caroline,90,21.0,Alsace,Alsace,,Roger Voss,@vossroger,Domaine Schoffit 2012 Lieu-dit Harth Cuvée Car...,Gewürztraminer,Domaine Schoffit


- Pandas comes with a few built-in conditional selectors, two of which we will highlight here.

- The first is `isin`. `isin` is lets you select data whose value "is in" a list of values. E.g, here's how we can use it to select wines only from Italy or France.

- The second is `isnull` and its companion, `notnull`. These methods lets you highlight values which are/are not empty.

In [None]:
wine_reviews.loc[wine_reviews.country.isin(['Italy', 'France'])]

Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
0,Italy,"Aromas include tropical fruit, broom, brimston...",Vulkà Bianco,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco (Etna),White Blend,Nicosia
6,Italy,"Here's a bright, informal red that opens with ...",Belsito,87,16.0,Sicily & Sardinia,Vittoria,,Kerin O’Keefe,@kerinokeefe,Terre di Giurfo 2013 Belsito Frappato (Vittoria),Frappato,Terre di Giurfo
...,...,...,...,...,...,...,...,...,...,...,...,...,...
129969,France,"A dry style of Pinot Gris, this is crisp with ...",,90,32.0,Alsace,Alsace,,Roger Voss,@vossroger,Domaine Marcel Deiss 2012 Pinot Gris (Alsace),Pinot Gris,Domaine Marcel Deiss
129970,France,"Big, rich and off-dry, this is powered by inte...",Lieu-dit Harth Cuvée Caroline,90,21.0,Alsace,Alsace,,Roger Voss,@vossroger,Domaine Schoffit 2012 Lieu-dit Harth Cuvée Car...,Gewürztraminer,Domaine Schoffit


In [None]:
wine_reviews.loc[(wine_reviews.price.notnull()) & (wine_reviews.country.isin(['Italy', 'France']))]

Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
6,Italy,"Here's a bright, informal red that opens with ...",Belsito,87,16.0,Sicily & Sardinia,Vittoria,,Kerin O’Keefe,@kerinokeefe,Terre di Giurfo 2013 Belsito Frappato (Vittoria),Frappato,Terre di Giurfo
7,France,This dry and restrained wine offers spice in p...,,87,24.0,Alsace,Alsace,,Roger Voss,@vossroger,Trimbach 2012 Gewurztraminer (Alsace),Gewürztraminer,Trimbach
...,...,...,...,...,...,...,...,...,...,...,...,...,...
129969,France,"A dry style of Pinot Gris, this is crisp with ...",,90,32.0,Alsace,Alsace,,Roger Voss,@vossroger,Domaine Marcel Deiss 2012 Pinot Gris (Alsace),Pinot Gris,Domaine Marcel Deiss
129970,France,"Big, rich and off-dry, this is powered by inte...",Lieu-dit Harth Cuvée Caroline,90,21.0,Alsace,Alsace,,Roger Voss,@vossroger,Domaine Schoffit 2012 Lieu-dit Harth Cuvée Car...,Gewürztraminer,Domaine Schoffit


## Assigning data

- Assigning data to a DataFrame is easy. You can assign either a constant value:


In [None]:
wine_reviews['critic'] = 'everyone'
wine_reviews['critic']

Unnamed: 0,critic
0,everyone
1,everyone
...,...
129969,everyone
129970,everyone


- Or with an iterable of values

In [None]:
wine_reviews['index_backwards'] = range(len(wine_reviews), 0, -1)
wine_reviews['index_backwards']

Unnamed: 0,index_backwards
0,129971
1,129970
...,...
129969,2
129970,1


# Summary Functions and Maps

- Data does not always come out of memory in the format we want it in right out of the bat. Some times we have to reformat it for the task at as hand.


## Summary functions
- Pandas provides many simple "Summary functions" which restructure the data in some useful way. For example, consider `describe()` .
- This ethod generates a high-level summary of the attributes of the given column.


In [None]:
wine_reviews.points.describe()

Unnamed: 0,points
count,129971.000000
mean,88.447138
...,...
75%,91.000000
max,100.000000


 - It is type-aware, meaning that its output changes based on the data type of input.

In [None]:
wine_reviews.taster_name.describe()

Unnamed: 0,taster_name
count,103727
unique,19
top,Roger Voss
freq,25514


- If you want to get some particular simple summary statistic about a column in a DataFrame or a Series, there is usually a helpful pandas function that makes it happen.

- e.g,
1.  to see mean points allotted, we can use: `mean()`
2. to see a list of unique values we can use: `unique()`
3. to see a list of unique values and how often they occur in the dataset, we can use the `value_counts()` method

In [None]:
wine_reviews.points.mean()

88.44713820775404

In [None]:
wine_reviews.taster_name.unique()


NameError: name 'wine_reviews' is not defined

In [None]:
wine_reviews.taster_name.value_counts()

Unnamed: 0_level_0,count
taster_name,Unnamed: 1_level_1
Roger Voss,25514
Michael Schachner,15134
...,...
Fiona Adams,27
Christina Pickard,6


## Maps
- A **map** is a function that takes one set of values and "maps" them to another set of values.

- In data science, we often have a need for creating new representations from existing data OR for transforming data from the format it is in now to the format that we want it to be in later.

- There are two mapping methods that you will use often.

1. `map()` is the first, and slightly simpler one. For example, suppose that we wanted to remean the scores the wines received to 0. We can do this as follows:




In [None]:
wine_review_points_mean = wine_reviews.points.mean()
wine_reviews.points.map(lambda p: p - wine_review_points_mean)

Unnamed: 0,points
0,-1.447138
1,-1.447138
...,...
129969,1.552862
129970,1.552862


- `apply()` is the equivalent method if we want to transform a whole DataFrame by calling a custom method on each row.

In [None]:
def remean_points(row):
  row.points = row.points - wine_review_points_mean
  return row

wine_reviews.apply(remean_points, axis = 'columns')


Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery,critic,index_backwards
0,Italy,"Aromas include tropical fruit, broom, brimston...",Vulkà Bianco,-1.447138,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco (Etna),White Blend,Nicosia,everyone,129971
1,Portugal,"This is ripe and fruity, a wine that is smooth...",Avidagos,-1.447138,15.0,Douro,,,Roger Voss,@vossroger,Quinta dos Avidagos 2011 Avidagos Red (Douro),Portuguese Red,Quinta dos Avidagos,everyone,129970
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
129969,France,"A dry style of Pinot Gris, this is crisp with ...",,1.552862,32.0,Alsace,Alsace,,Roger Voss,@vossroger,Domaine Marcel Deiss 2012 Pinot Gris (Alsace),Pinot Gris,Domaine Marcel Deiss,everyone,2
129970,France,"Big, rich and off-dry, this is powered by inte...",Lieu-dit Harth Cuvée Caroline,1.552862,21.0,Alsace,Alsace,,Roger Voss,@vossroger,Domaine Schoffit 2012 Lieu-dit Harth Cuvée Car...,Gewürztraminer,Domaine Schoffit,everyone,1


- If we had called `reviews.apply()` with `axis='index'`, then instead of passing a function to transform each row, we would need to give a function to transform each column.

- Note that `map()` and `apply()` return new, transformed Series and DataFrames, respectively. They don't modify the original data they're called on. If we look at the first row of reviews, we can see that it still has its original points value.

In [None]:
wine_reviews.head(1)

Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery,critic,index_backwards
0,Italy,"Aromas include tropical fruit, broom, brimston...",Vulkà Bianco,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco (Etna),White Blend,Nicosia,everyone,129971
