In [19]:
import pandas as pd

# **Creating, Reading and Writing**

## Creating data

There are two core objects in pandas: the DataFrame and the Series.

### 1. DataFrame

A DataFrame is a table. It contains an array of individual entries, each of which has a certain value. Each entry corresponds to a row (or record) and a column.

For example, consider the following simple DataFrame:


In [20]:
pd.DataFrame({'Yes': [50, 21], 'No': [131, 2]})

Unnamed: 0,Yes,No
0,50,131
1,21,2


In [21]:
pd.DataFrame({'Bob': ['I liked it.', 'It was awful.'], 'Sue': ['Pretty good.', 'Bland.']})

Unnamed: 0,Bob,Sue
0,I liked it.,Pretty good.
1,It was awful.,Bland.


The list of row labels used in a DataFrame is known as an Index. We can assign values to it by using an index parameter in our constructor:

In [22]:
pd.DataFrame({'Bob': ['I liked it.', 'It was awful.'], 
              'Sue': ['Pretty good.', 'Bland.']},
             index=['Product A', 'Product B'])

Unnamed: 0,Bob,Sue
Product A,I liked it.,Pretty good.
Product B,It was awful.,Bland.


### 2. Series

A Series, by contrast, is a sequence of data values. If a DataFrame is a table, a Series is a list. And in fact you can create one with nothing more than a list:

In [23]:
pd.Series([1, 2, 3, 4, 5])

0    1
1    2
2    3
3    4
4    5
dtype: int64

## Reading data files

Being able to create a DataFrame or Series by hand is handy. But, most of the time, we won't actually be creating our own data by hand. Instead, we'll be working with data that already exists.

In [24]:
reviews = pd.read_csv("../input/wine-reviews/winemag-data-130k-v2.csv")

In [25]:
reviews.shape

(129971, 14)

In [26]:
reviews.head()

Unnamed: 0.1,Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
0,0,Italy,"Aromas include tropical fruit, broom, brimston...",Vulkà Bianco,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco (Etna),White Blend,Nicosia
1,1,Portugal,"This is ripe and fruity, a wine that is smooth...",Avidagos,87,15.0,Douro,,,Roger Voss,@vossroger,Quinta dos Avidagos 2011 Avidagos Red (Douro),Portuguese Red,Quinta dos Avidagos
2,2,US,"Tart and snappy, the flavors of lime flesh and...",,87,14.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Rainstorm 2013 Pinot Gris (Willamette Valley),Pinot Gris,Rainstorm
3,3,US,"Pineapple rind, lemon pith and orange blossom ...",Reserve Late Harvest,87,13.0,Michigan,Lake Michigan Shore,,Alexander Peartree,,St. Julian 2013 Reserve Late Harvest Riesling ...,Riesling,St. Julian
4,4,US,"Much like the regular bottling from 2012, this...",Vintner's Reserve Wild Child Block,87,65.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Sweet Cheeks 2012 Vintner's Reserve Wild Child...,Pinot Noir,Sweet Cheeks


### Index Column

If a CSV file already has a meaningful index column, pandas can use it instead of creating a new one by specifying index_col.

In [27]:
reviews = pd.read_csv("../input/wine-reviews/winemag-data-130k-v2.csv", index_col=0)
reviews.head()

Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
0,Italy,"Aromas include tropical fruit, broom, brimston...",Vulkà Bianco,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco (Etna),White Blend,Nicosia
1,Portugal,"This is ripe and fruity, a wine that is smooth...",Avidagos,87,15.0,Douro,,,Roger Voss,@vossroger,Quinta dos Avidagos 2011 Avidagos Red (Douro),Portuguese Red,Quinta dos Avidagos
2,US,"Tart and snappy, the flavors of lime flesh and...",,87,14.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Rainstorm 2013 Pinot Gris (Willamette Valley),Pinot Gris,Rainstorm
3,US,"Pineapple rind, lemon pith and orange blossom ...",Reserve Late Harvest,87,13.0,Michigan,Lake Michigan Shore,,Alexander Peartree,,St. Julian 2013 Reserve Late Harvest Riesling ...,Riesling,St. Julian
4,US,"Much like the regular bottling from 2012, this...",Vintner's Reserve Wild Child Block,87,65.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Sweet Cheeks 2012 Vintner's Reserve Wild Child...,Pinot Noir,Sweet Cheeks


## Selecting Data in Pandas

### Introduction
Selecting specific rows and columns is a key step in almost every pandas task.  
Pandas gives you fast ways to select the exact data you need from a **DataFrame** or **Series**.


### Setup
- Import pandas using the common alias `pd`
- Load the dataset using `read_csv()`
- Set display options if needed (optional)

### Native Accessors (Basic Column Selection)
Pandas supports Python-style column access:

- **Attribute access**
  - Example: `df.country`
  - Works only when the column name is a valid Python identifier (no spaces/special chars)

- **Bracket access**
  - Example: `df['country']`
  - Works for all column names (recommended)

Both methods return a **Series**.



### Accessing a Single Value
To get one value:
- Select a column (Series)
- Then select a row

Example idea:
- `df['country'][0]` → value at row 0 in the `country` column

## Pandas Indexing Tools

### Two Main Accessors
- **`iloc`** : index-based selection (by position)
- **`loc`**  : label-based selection (by index label)

**Rule:** Both are **row-first, column-second**.

### 1. `iloc` (Index-Based / Position-Based)
Use `iloc` when selecting by numerical position.

- First row: `df.iloc[0]`
- First column: `df.iloc[:, 0]`
- First 3 rows of first column: `df.iloc[:3, 0]`
- Specific rows (list): `df.iloc[[0, 1, 2], 0]`
- Last 5 rows: `df.iloc[-5:]`

**Slicing rule (Python style):**
- `0:3` means `0,1,2` (end is excluded)

### 2. `loc` (Label-Based)
Use `loc` when selecting by index labels and column names.

- Single value: `df.loc[0, 'country']`
- Multiple columns: `df.loc[:, ['taster_name', 'points']]`

**Slicing rule (inclusive):**
- `0:3` means `0,1,2,3` (end is included)

### `loc` vs `iloc` (Important Difference)
- `df.iloc[0:3]` → rows `0,1,2`
- `df.loc[0:3]`  → rows `0,1,2,3`

⚠️ This is a common exam trap.

## Index Manipulation

### Changing the Index
The DataFrame index is not fixed. You can change it to a better column using:

- `set_index('title')`

This can make label-based selection clearer and more meaningful.

## Conditional Selection (Filtering)

This operation produced a Series of True/False booleans based on the country of each record.

### Creating a Condition
A condition produces a Boolean Series (`True/False`), e.g.:
- `df.country == 'Italy'`

### Filtering with `loc`
Use the condition inside `loc`:

- `df.loc[df.country == 'Italy']`

### Combining Conditions
Use:
- `&` for AND
- `|` for OR

⚠️ Always use parentheses around each condition.

Examples:
- Italy AND points >= 90  
  `(df.country == 'Italy') & (df.points >= 90)`

- Italy OR points >= 90  
  `(df.country == 'Italy') | (df.points >= 90)`


### Useful Conditional Methods
- **`isin([...])`**  
  Select values inside a list  
  Example idea: `df.country.isin(['Italy', 'France'])`

- **`isnull()` / `notnull()`**  
  Find missing values (NaN) or non-missing values  
  Example idea: `df.price.notnull()`


## Assigning Data (Creating New Columns)

### Assign a Constant
- Add the same value to every row in a new column

Example idea:
- `df['critic'] = 'everyone'`

### Assign a Sequence
- Add changing values using an iterable like `range()`

Example idea:
- `df['index_backwards'] = range(len(df), 0, -1)`

## Quick Tip
After selecting or changing data, preview results using:
- `df.head()`


In [28]:
# Set display options for better readability
pd.set_option('display.max_rows', 5)

# Display the dataset
reviews

# ===== NATIVE ACCESSORS =====
# Access country column using attribute
reviews.country

# Access country column using bracket notation (recommended)
reviews['country']

# Get a single value from the Series
reviews['country'][0]

# ===== INDEX-BASED SELECTION (iloc) =====
# Select first row
reviews.iloc[0]

# Select first column (all rows)
reviews.iloc[:, 0]

# Select first 3 rows of first column
reviews.iloc[:3, 0]

# Select rows 1-2 (note: end is excluded)
reviews.iloc[1:3, 0]

# Select specific rows using a list
reviews.iloc[[0, 1, 2], 0]

# Select last 5 rows
reviews.iloc[-5:]

# ===== LABEL-BASED SELECTION (loc) =====
# Get single value by label
reviews.loc[0, 'country']

# Select multiple columns
reviews.loc[:, ['taster_name', 'taster_twitter_handle', 'points']]

# ===== CONDITIONAL SELECTION =====
# Check which wines are from Italy
reviews.country == 'Italy'

# Filter wines from Italy
reviews.loc[reviews.country == 'Italy']

# Filter wines from Italy AND rated >= 90 points (AND operator: &)
reviews.loc[(reviews.country == 'Italy') & (reviews.points >= 90)]

# Filter wines from Italy OR rated >= 90 points (OR operator: |)
reviews.loc[(reviews.country == 'Italy') | (reviews.points >= 90)]

# Filter wines from Italy or France using isin()
reviews.loc[reviews.country.isin(['Italy', 'France'])]

# Filter wines with prices (remove NaN values)
reviews.loc[reviews.price.notnull()]

# Filter wines without prices (find NaN values)
reviews.loc[reviews.price.isnull()]

# ===== INDEX MANIPULATION =====
# Set the title as the index
reviews_by_title = reviews.set_index('title')
reviews_by_title.head()

# ===== ASSIGNING DATA =====
# Assign constant value to new column
reviews['critic'] = 'everyone'

# Assign sequence values (reverse index)
reviews['index_backwards'] = range(len(reviews), 0, -1)

# Preview the new columns
reviews[['critic', 'index_backwards']].head()

# ===== PRACTICAL EXAMPLES =====
# 1. Count wines by country
reviews['country'].value_counts()

# 2. Find average points by country
reviews.groupby('country')['points'].mean().sort_values(ascending=False).head()

# 3. Find most expensive wines
reviews.nlargest(5, 'price')[['title', 'price', 'country']]

# 4. Find wines with highest points
reviews.nlargest(5, 'points')[['title', 'points', 'country']]

# 5. Find missing prices per country
reviews[reviews['price'].isnull()].groupby('country').size().sort_values(ascending=False).head()

country
France      4317
Italy       2626
Portugal     816
Austria      546
US           239
dtype: int64

---


# **Summary Functions**

## Intro

After selecting data, we often need to reformat or transform it before analysis.
This lesson covers common operations to make your data “fit” the task.

## Summary Functions

Pandas provides quick functions to summarize a Series/column.

Common ones:

- `describe()` → overall summary (changes based on data type)
- `mean()` → average (numeric)
- `unique()` → distinct values
- `value_counts()` → distinct values + frequency

In [29]:
# 1. Describe() - Get overall statistics
# reviews['points'].describe()

reviews['taster_name'].describe()

# 2. Mean - Average points
# reviews['points'].mean()

# # 3. Unique - Distinct values
# reviews['taster_name'].unique()

# # 4. Value counts - Frequency of each value
# reviews['taster_name'].value_counts()

count         103727
unique            19
top       Roger Voss
freq           25514
Name: taster_name, dtype: object

## MAP

A map transforms values from one form to another.
Used to create new representations or modify existing data. They do not modify the original data unless you assign the result back.

Two key methods:

- `map()` → transforms each value in a Series
- `apply()` → transforms rows or columns in a DataFrame

### map()

Applies a function to each element in a Series and returns a new Series.
Best for simple, element-wise changes.

In [30]:
# Example 1: Re-center points around mean
review_points_mean = reviews['points'].mean()
reviews['points'].map(lambda p: p - review_points_mean).head()

# Example 2: Create price category using map
price_category = reviews['price'].map(lambda p: 'Expensive' if p > 50 else 'Affordable' if p > 20 else 'Budget')
price_category.head(100)

0         Budget
1         Budget
         ...    
98    Affordable
99     Expensive
Name: price, Length: 100, dtype: str

### apply()

Applies a function to each row or column in a DataFrame.

- axis='columns' → apply to each row
- axis='index' → apply to each column

Returns a new DataFrame.

In [31]:
# Example: Apply function to each row
def get_wine_info(row):
    return f"{row['title']} from {row['country']} - {row['points']} points"

reviews.apply(get_wine_info, axis='columns').head()

0    Nicosia 2013 Vulkà Bianco  (Etna) from Italy -...
1    Quinta dos Avidagos 2011 Avidagos Red (Douro) ...
2    Rainstorm 2013 Pinot Gris (Willamette Valley) ...
3    St. Julian 2013 Reserve Late Harvest Riesling ...
4    Sweet Cheeks 2012 Vintner's Reserve Wild Child...
dtype: str

## Vectorized Operations

Many transformations can be done faster using direct operations like:

- Series - number
- Series + Series

Pandas automatically applies the operation across all rows (broadcasting).

In [32]:
# Example 1: Subtract mean from all points
reviews['points'] - review_points_mean

# Example 2: Concatenate country and region
reviews['country'] + " - " + reviews['region_1']

# Example 3: Create quality rating based on points
quality = reviews['points'].apply(lambda x: 
    'Outstanding' if x >= 95 else
    'Excellent' if x >= 90 else
    'Very Good' if x >= 85 else
    'Good' if x >= 80 else
    'Average'
)
quality.head(100)

0     Very Good
1     Very Good
        ...    
98    Very Good
99    Very Good
Name: points, Length: 100, dtype: str

# **Grouping and Sorting**

When simple column-wise transformations aren’t enough, we group data and apply operations per group.
This lesson covers groupby, multi-index, and sorting.

## Groupwise Analysis - groupby

groupby() splits data into groups based on column values, then applies summary functions to each group.

Common uses:

- Count per group
- Min/Max per group
- Aggregations per category

### groupby() + Summary Functions

After grouping, you can apply functions like:

- count()
- min(), max()
- mean()
- Custom logic with apply()

value_counts() is a shortcut for a simple groupby().count() pattern.

### Grouping by Multiple Columns

You can group by more than one column to get more detailed summaries (e.g., by country and province).

### Aggregation with agg()

agg() lets you compute multiple statistics at once for each group (e.g., count, min, max).


In [33]:
reviews.groupby(['country']).price.agg([len, min, max])

Unnamed: 0_level_0,len,min,max
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Argentina,3800,4.0,230.0
Armenia,2,14.0,15.0
...,...,...,...
Ukraine,14,6.0,13.0
Uruguay,109,10.0,130.0


In [34]:
reviews.groupby('points').points.count()

points
80     397
81     692
      ... 
99      33
100     19
Name: points, Length: 21, dtype: int64

In [35]:
reviews.groupby('winery').apply(lambda df: df.title.iloc[0])

winery
1+1=3                          1+1=3 NV Rosé Sparkling (Cava)
10 Knots                 10 Knots 2010 Viognier (Paso Robles)
                                  ...                        
àMaurice    àMaurice 2013 Fred Estate Syrah (Walla Walla V...
Štoka                         Štoka 2009 Izbrani Teran (Kras)
Length: 16757, dtype: str

In [36]:
reviews.groupby(['country', 'province']).apply(lambda df: df.loc[df.points.idxmax()])

Unnamed: 0_level_0,Unnamed: 1_level_0,description,designation,points,price,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery,critic,index_backwards
country,province,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
Argentina,Mendoza Province,"If the color doesn't tell the full story, the ...",Nicasia Vineyard,97,120.0,Mendoza,,Michael Schachner,@wineschach,Bodega Catena Zapata 2006 Nicasia Vineyard Mal...,Malbec,Bodega Catena Zapata,everyone,47217
Argentina,Other,"Take note, this could be the best wine Colomé ...",Reserva,95,90.0,Salta,,Michael Schachner,@wineschach,Colomé 2010 Reserva Malbec (Salta),Malbec,Colomé,everyone,51668
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Uruguay,San Jose,"Baked, sweet, heavy aromas turn earthy with ti...",El Preciado Gran Reserva,87,50.0,,,Michael Schachner,@wineschach,Castillo Viejo 2005 El Preciado Gran Reserva R...,Red Blend,Castillo Viejo,everyone,90073
Uruguay,Uruguay,"Cherry and berry aromas are ripe, healthy and ...",Blend 002 Limited Edition,91,22.0,,,Michael Schachner,@wineschach,Narbona NV Blend 002 Limited Edition Tannat-Ca...,Tannat-Cabernet Franc,Narbona,everyone,90610


In [37]:
reviews.groupby(['country']).price.agg([len, min, max])

Unnamed: 0_level_0,len,min,max
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Argentina,3800,4.0,230.0
Armenia,2,14.0,15.0
...,...,...,...
Ukraine,14,6.0,13.0
Uruguay,109,10.0,130.0


## Multi-Index

Some groupby() results create a multi-level index (multiple index labels).
This is normal but can be confusing for beginners.

### Resetting the Index

Use reset_index() to convert a multi-index back to normal columns for easier use and display.

In [38]:
countries_reviewed = reviews.groupby(['country', 'province']).description.agg([len])
countries_reviewed

Unnamed: 0_level_0,Unnamed: 1_level_0,len
country,province,Unnamed: 2_level_1
Argentina,Mendoza Province,3264
Argentina,Other,536
...,...,...
Uruguay,San Jose,3
Uruguay,Uruguay,24


In [39]:
mi = countries_reviewed.index
type(mi)

pandas.MultiIndex

In [40]:
countries_reviewed.reset_index()

Unnamed: 0,country,province,len
0,Argentina,Mendoza Province,3264
1,Argentina,Other,536
...,...,...,...
423,Uruguay,San Jose,3
424,Uruguay,Uruguay,24


## Sorting

Grouped results are ordered by index by default, not by values.
Use:

- sort_values() → sort by column values
- sort_index() → sort by index labels

### Sorting Options

- Ascending (default) or descending
- Sort by multiple columns for fine control

In [41]:
countries_reviewed = countries_reviewed.reset_index()
countries_reviewed.sort_values(by='len')

Unnamed: 0,country,province,len
386,Turkey,Elazığ-Diyarbakir,1
389,Turkey,Urla-Thrace,1
...,...,...,...
415,US,Washington,8639
392,US,California,36247


In [42]:
countries_reviewed.sort_values(by='len', ascending=False)

Unnamed: 0,country,province,len
392,US,California,36247
415,US,Washington,8639
...,...,...,...
395,US,Hawaii,1
399,US,Kentucky,1


In [43]:
countries_reviewed.sort_index()

Unnamed: 0,country,province,len
0,Argentina,Mendoza Province,3264
1,Argentina,Other,536
...,...,...,...
423,Uruguay,San Jose,3
424,Uruguay,Uruguay,24


In [44]:
countries_reviewed.sort_values(by=['country', 'len'])

Unnamed: 0,country,province,len
1,Argentina,Other,536
0,Argentina,Mendoza Province,3264
...,...,...,...
424,Uruguay,Uruguay,24
419,Uruguay,Canelones,43


# **Data Types and Missing Values**

This lesson covers how to inspect data types (dtypes) in pandas and how to handle missing values in DataFrames and Series.

## Dtypes (Data Types)

Each column in pandas has a dtype that shows how data is stored.

Common dtypes:

- int64 → integers
- float64 → decimal numbers
- object → text (strings)

Use:

- Series.dtype → dtype of one column
- DataFrame.dtypes → dtypes of all columns

In [45]:
reviews.price.dtype

dtype('float64')

In [46]:
reviews.dtypes

country              str
description          str
                   ...  
critic               str
index_backwards    int64
Length: 15, dtype: object

### Type Conversion

Use astype() to convert a column to another dtype when it makes sense (e.g., int → float).

In [51]:
reviews.points.astype('float64')

0         87.0
1         87.0
          ... 
129969    90.0
129970    90.0
Name: points, Length: 129971, dtype: float64

In [52]:
reviews.points.dtype

dtype('int64')

## Missing data

Missing values are represented as **NaN**.
NaN values are treated as floating-point internally.

### Detecting Missing Values

Use:

- pd.isnull() → find missing values
- pd.notnull() → find non-missing values

These are commonly used for filtering rows with missing data.

In [55]:
reviews[pd.isnull(reviews.country)]

Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery,critic,index_backwards
913,,"Amber in color, this wine has aromas of peach ...",Asureti Valley,87,30.0,,,,Mike DeSimone,@worldwineguys,Gotsa Family Wines 2014 Asureti Valley Chinuri,Chinuri,Gotsa Family Wines,everyone,129058
3131,,"Soft, fruity and juicy, this is a pleasant, si...",Partager,83,,,,,Roger Voss,@vossroger,Barton & Guestier NV Partager Red,Red Blend,Barton & Guestier,everyone,126840
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
129590,,"A blend of 60% Syrah, 30% Cabernet Sauvignon a...",Shah,90,30.0,,,,Mike DeSimone,@worldwineguys,Büyülübağ 2012 Shah Red,Red Blend,Büyülübağ,everyone,381
129900,,This wine offers a delightful bouquet of black...,,91,32.0,,,,Mike DeSimone,@worldwineguys,Psagot 2014 Merlot,Merlot,Psagot,everyone,71


### Filling Missing Values

Use fillna() to replace NaN values with:

- A constant (e.g., "Unknown")
- A strategy (e.g., backfill/forward fill)

In [54]:
reviews.region_2.fillna("Unknown")

0         Unknown
1         Unknown
           ...   
129969    Unknown
129970    Unknown
Name: region_2, Length: 129971, dtype: str

### Replacing Values

Use replace() to change specific values in a column (e.g., updating labels or correcting entries).
Also useful for replacing placeholder values like "Unknown" or "Invalid".

In [56]:
reviews.taster_twitter_handle.replace("@kerinokeefe", "@kerino")

0            @kerino
1         @vossroger
             ...    
129969    @vossroger
129970    @vossroger
Name: taster_twitter_handle, Length: 129971, dtype: str

# **Renaming and Combining**

This lesson shows how to rename columns/indexes and how to combine multiple DataFrames/Series into one dataset.

## Renaming Columns & Index

Use rename() to change column names or index labels.

- Rename columns → rename(columns={...})
- Rename index values → rename(index={...})
- Change index/column names → rename_axis()

Renaming columns is common; renaming index values is rare.

In [58]:
reviews.rename(columns={'points': 'score'})

Unnamed: 0,country,description,designation,score,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery,critic,index_backwards
0,Italy,"Aromas include tropical fruit, broom, brimston...",Vulkà Bianco,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco (Etna),White Blend,Nicosia,everyone,129971
1,Portugal,"This is ripe and fruity, a wine that is smooth...",Avidagos,87,15.0,Douro,,,Roger Voss,@vossroger,Quinta dos Avidagos 2011 Avidagos Red (Douro),Portuguese Red,Quinta dos Avidagos,everyone,129970
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
129969,France,"A dry style of Pinot Gris, this is crisp with ...",,90,32.0,Alsace,Alsace,,Roger Voss,@vossroger,Domaine Marcel Deiss 2012 Pinot Gris (Alsace),Pinot Gris,Domaine Marcel Deiss,everyone,2
129970,France,"Big, rich and off-dry, this is powered by inte...",Lieu-dit Harth Cuvée Caroline,90,21.0,Alsace,Alsace,,Roger Voss,@vossroger,Domaine Schoffit 2012 Lieu-dit Harth Cuvée Car...,Gewürztraminer,Domaine Schoffit,everyone,1


In [57]:
reviews.rename(index={0: 'firstEntry', 1: 'secondEntry'})

Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery,critic,index_backwards
firstEntry,Italy,"Aromas include tropical fruit, broom, brimston...",Vulkà Bianco,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco (Etna),White Blend,Nicosia,everyone,129971
secondEntry,Portugal,"This is ripe and fruity, a wine that is smooth...",Avidagos,87,15.0,Douro,,,Roger Voss,@vossroger,Quinta dos Avidagos 2011 Avidagos Red (Douro),Portuguese Red,Quinta dos Avidagos,everyone,129970
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
129969,France,"A dry style of Pinot Gris, this is crisp with ...",,90,32.0,Alsace,Alsace,,Roger Voss,@vossroger,Domaine Marcel Deiss 2012 Pinot Gris (Alsace),Pinot Gris,Domaine Marcel Deiss,everyone,2
129970,France,"Big, rich and off-dry, this is powered by inte...",Lieu-dit Harth Cuvée Caroline,90,21.0,Alsace,Alsace,,Roger Voss,@vossroger,Domaine Schoffit 2012 Lieu-dit Harth Cuvée Car...,Gewürztraminer,Domaine Schoffit,everyone,1


## Setting Index

Use set_index() to make a column the row index (more useful than renaming index values).

In [59]:
reviews.rename_axis("wines", axis='rows').rename_axis("fields", axis='columns')

fields,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery,critic,index_backwards
wines,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1
0,Italy,"Aromas include tropical fruit, broom, brimston...",Vulkà Bianco,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco (Etna),White Blend,Nicosia,everyone,129971
1,Portugal,"This is ripe and fruity, a wine that is smooth...",Avidagos,87,15.0,Douro,,,Roger Voss,@vossroger,Quinta dos Avidagos 2011 Avidagos Red (Douro),Portuguese Red,Quinta dos Avidagos,everyone,129970
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
129969,France,"A dry style of Pinot Gris, this is crisp with ...",,90,32.0,Alsace,Alsace,,Roger Voss,@vossroger,Domaine Marcel Deiss 2012 Pinot Gris (Alsace),Pinot Gris,Domaine Marcel Deiss,everyone,2
129970,France,"Big, rich and off-dry, this is powered by inte...",Lieu-dit Harth Cuvée Caroline,90,21.0,Alsace,Alsace,,Roger Voss,@vossroger,Domaine Schoffit 2012 Lieu-dit Harth Cuvée Car...,Gewürztraminer,Domaine Schoffit,everyone,1


## Combining Data (Overview)

Pandas provides methods to combine datasets:

- concat() → stack datasets with same columns
- join() → combine on a common index
- (merge() exists, but join() often suffices)

### concat()

Used to append rows from multiple DataFrames with the same columns into one DataFrame.

### join()

Used to combine DataFrames by a shared index.
Suffixes are needed when both tables have columns with the same names.