# Working with pivot tables

In this lesson, the focus is on **performing calculations and subsetting pivot tables** efficiently.

### Creating Pivot Tables

To make a pivot table, use the `.pivot_table()` method:

* **Values argument** → the column to aggregate.
* **Index argument** → columns to group by and display as rows.
* **Columns argument** → columns to group by and display as columns.
* By default, the aggregation function is `mean`.

### Using `.loc[]` with Slicing

Pivot tables are essentially DataFrames with sorted indexes. This means you can use **all the techniques learned earlier**, including **`.loc[]` with slicing**, to subset rows and columns efficiently.

### The `axis` Argument

Many DataFrame summary methods (like `.mean()`) have an **axis** parameter:

* `axis="index"` (default) → compute statistics **across rows**, for each column.
* Example: calculating the mean for each color across different dog breeds.

### Calculating Statistics Across Columns

To calculate a statistic **across columns**, set `axis="columns"`:

* Example: computing the mean height for each breed **across all colors**.
* This is mostly applicable to pivot tables, since all columns usually contain the same data type. For general DataFrames with mixed data types, using `axis="columns"` may not make sense.

## Preparing Data

In [3]:
import pandas as pd
temperatures = pd.read_csv("datasets/temperatures.csv")

## Exercise: Pivot temperature by city and year

Tracking temperature changes by **year** can be easier than looking at every month. By aggregating data by **city and year**, we can create a more digestible summary.

You can extract components of a date using the `.dt` accessor:

* `dataframe["column"].dt.year` → year
* `dataframe["column"].dt.month` → month
* `dataframe["column"].dt.day` → day

Once we have the year column, a **pivot table** can summarize temperatures by city and year.

### Instructions

1. Create a new `year` column from the `date` column.
2. Pivot the `avg_temp_c` column, using `country` and `city` as row indexes and `year` as columns.
3. Store the pivot table in `temp_by_country_city_vs_year` and inspect it.


In [5]:
# Convert the 'date' column to datetime
temperatures["date"] = pd.to_datetime(temperatures["date"])

# Step 1: Extract the year from the date column
temperatures["year"] = temperatures["date"].dt.year

# Step 2: Pivot avg_temp_c by country and city vs year
temp_by_country_city_vs_year = temperatures.pivot_table(
    values="avg_temp_c",
    index=["country", "city"],
    columns="year"
)

# Step 3: View the pivot table
print(temp_by_country_city_vs_year)

year                                 2000       2001       2002       2003  \
country       city                                                           
Afghanistan   Kabul             15.822667  15.847917  15.714583  15.132583   
Angola        Luanda            24.410333  24.427083  24.790917  24.867167   
Australia     Melbourne         14.320083  14.180000  14.075833  13.985583   
              Sydney            17.567417  17.854500  17.733833  17.592333   
Bangladesh    Dhaka             25.905250  25.931250  26.095000  25.927417   
...                                   ...        ...        ...        ...   
United States Chicago           11.089667  11.703083  11.532083  10.481583   
              Los Angeles       16.643333  16.466250  16.430250  16.944667   
              New York           9.969083  10.931000  11.252167   9.836000   
Vietnam       Ho Chi Minh City  27.588917  27.831750  28.064750  27.827667   
Zimbabwe      Harare            20.283667  20.861000  21.079333 

## Exercise: Subsetting Pivot Tables

Pivot tables are essentially DataFrames with sorted indexes, so all the familiar subsetting techniques apply. In particular, combining **`.loc[]` with slicing** is very powerful for selecting ranges of rows and columns.

The pivot table `temp_by_country_city_vs_year` is available.

### Instructions

1. Subset the pivot table for countries **from Egypt to India**.
2. Subset for the range **from Egypt, Cairo to India, Delhi**.
3. Subset for the same range of countries and cities **for the years 2005 to 2010**.

In [6]:
# Step 1: Select countries from Egypt to India
print(temp_by_country_city_vs_year.loc["Egypt":"India"])

year                       2000       2001       2002       2003       2004  \
country  city                                                                 
Egypt    Alexandria   20.744500  21.454583  21.456167  21.221417  21.064167   
         Cairo        21.486167  22.330833  22.414083  22.170500  22.081917   
         Gizeh        21.486167  22.330833  22.414083  22.170500  22.081917   
Ethiopia Addis Abeba  18.241250  18.296417  18.469750  18.320917  18.292750   
France   Paris        11.739667  11.371250  11.871333  11.909500  11.338833   
Germany  Berlin       10.963667   9.690250  10.264417  10.065750   9.822583   
India    Ahmadabad    27.436000  27.198083  27.719083  27.403833  27.628333   
         Bangalore    25.337917  25.528167  25.755333  25.924750  25.252083   
         Bombay       27.203667  27.243667  27.628667  27.578417  27.318750   
         Calcutta     26.491333  26.515167  26.703917  26.561333  26.634333   
         Delhi        26.048333  25.862917  26.63433

In [7]:
# Step 2: Select from Egypt, Cairo to India, Delhi
print(temp_by_country_city_vs_year.loc[("Egypt", "Cairo"):("India", "Delhi")])

year                       2000       2001       2002       2003       2004  \
country  city                                                                 
Egypt    Cairo        21.486167  22.330833  22.414083  22.170500  22.081917   
         Gizeh        21.486167  22.330833  22.414083  22.170500  22.081917   
Ethiopia Addis Abeba  18.241250  18.296417  18.469750  18.320917  18.292750   
France   Paris        11.739667  11.371250  11.871333  11.909500  11.338833   
Germany  Berlin       10.963667   9.690250  10.264417  10.065750   9.822583   
India    Ahmadabad    27.436000  27.198083  27.719083  27.403833  27.628333   
         Bangalore    25.337917  25.528167  25.755333  25.924750  25.252083   
         Bombay       27.203667  27.243667  27.628667  27.578417  27.318750   
         Calcutta     26.491333  26.515167  26.703917  26.561333  26.634333   
         Delhi        26.048333  25.862917  26.634333  25.721083  26.239917   

year                       2005       2006       20

In [8]:
# Step 3: Select from Egypt, Cairo to India, Delhi, and years 2005 to 2010
print(temp_by_country_city_vs_year.loc[("Egypt", "Cairo"):("India", "Delhi"), 2005:2010])

year                       2005       2006       2007       2008       2009  \
country  city                                                                 
Egypt    Cairo        22.006500  22.050000  22.361000  22.644500  22.625000   
         Gizeh        22.006500  22.050000  22.361000  22.644500  22.625000   
Ethiopia Addis Abeba  18.312833  18.427083  18.142583  18.165000  18.765333   
France   Paris        11.552917  11.788500  11.750833  11.278250  11.464083   
Germany  Berlin        9.919083  10.545333  10.883167  10.657750  10.062500   
India    Ahmadabad    26.828083  27.282833  27.511167  27.048500  28.095833   
         Bangalore    25.476500  25.418250  25.464333  25.352583  25.725750   
         Bombay       27.035750  27.381500  27.634667  27.177750  27.844500   
         Calcutta     26.729167  26.986250  26.584583  26.522333  27.153250   
         Delhi        25.716083  26.365917  26.145667  25.675000  26.554250   

year                       2010  
country  city    

## Exercise: Calculating on a pivot table

Pivot tables often show summarized values, but that’s usually just the starting point. To get real insights, we might need to calculate more from those summaries. A common task is to figure out which row or column contains the maximum or minimum values.

Remember from Chapter 1 that you can filter a Series or DataFrame by using conditions inside square brackets. For example: `series[series > value]`.

You have a DataFrame called **`temp_by_country_city_vs_year`** that shows average temperatures by country, city, and year. A preview looks like this (only some years shown):

| country     | city      | 2000   | 2001   | 2002   | … | 2013   |
| ----------- | --------- | ------ | ------ | ------ | - | ------ |
| Afghanistan | Kabul     | 15.823 | 15.848 | 15.715 | … | 16.206 |
| Angola      | Luanda    | 24.410 | 24.427 | 24.791 | … | 24.554 |
| Australia   | Melbourne | 14.320 | 14.180 | 14.076 | … | 14.742 |
|             | Sydney    | 17.567 | 17.854 | 17.734 | … | 18.090 |
| Bangladesh  | Dhaka     | 25.905 | 25.931 | 26.095 | … | 26.587 |

### Instructions

1. Find the average temperature for each year and store it in `avg_temp_by_year`.
2. From that, identify which year was the warmest.
3. Find the average temperature for each city (across all years) and save it in `avg_temp_by_city`.
4. From that, identify which city was the coldest.

In [11]:
# Step 1: Average temp for each year (columns = years)
avg_temp_by_year = temp_by_country_city_vs_year.mean(axis="index")
print(avg_temp_by_year)

year
2000    19.506243
2001    19.679352
2002    19.855685
2003    19.630197
2004    19.672204
2005    19.607239
2006    19.793993
2007    19.854270
2008    19.608778
2009    19.833752
2010    19.911734
2011    19.549197
2012    19.668239
2013    20.312285
dtype: float64


In [12]:
# Step 2: Year with maximum average temp
warmest_year = avg_temp_by_year.loc[avg_temp_by_year == avg_temp_by_year.max()]
print(warmest_year)

year
2013    20.312285
dtype: float64


In [14]:
# Step 3: Average temp for each city (rows = cities)
avg_temp_by_city = temp_by_country_city_vs_year.mean(axis="columns")
print(avg_temp_by_city)

country        city            
Afghanistan    Kabul               15.541955
Angola         Luanda              24.391616
Australia      Melbourne           14.275411
               Sydney              17.799250
Bangladesh     Dhaka               26.174440
                                     ...    
United States  Chicago             11.330825
               Los Angeles         16.675399
               New York            10.911034
Vietnam        Ho Chi Minh City    27.922857
Zimbabwe       Harare              20.699000
Length: 100, dtype: float64


In [15]:
# Step 4: City with minimum average temp
coldest_city = avg_temp_by_city.loc[avg_temp_by_city == avg_temp_by_city.min()]
print(coldest_city)

country  city  
China    Harbin    4.876551
dtype: float64
