<a href="https://colab.research.google.com/github/krauseannelize/nb-py-ms-exercises/blob/sprint04/notebooks/s04_pandas_data_wrangling/44_exercises_aggregating_information.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 44 | Exercises - Aggregating Information & Applying

## Grouping Data with `.groupby()`

We use the `.groupby()` method to split a **DataFrame** into groups based on one or more columns. This allows us to apply aggregation functions like `sum()`, `mean()`, `count()`, etc., to each group independently.

```python
# basic syntax
df.groupby('column_name')
df.groupby(['column_1', 'column_2'])
```

In [9]:
import pandas as pd

# Define the data
data = {
    'Region': ['East', 'East', 'East', 'East',
               'North', 'North', 'North', 'North',
               'South', 'South', 'South',
               'West', 'West', 'West', 'West'],
    'Year': [2022, 2023, 2024, 2025,
             2022, 2023, 2024, 2025,
             2022, 2023, 2024,
             2022, 2023, 2024, 2025],
    'Sales': [350, 450, 350, 350,
              300, 300, 450, 650,
              310, 350, 400,
              270, 80, 100, 200]
}

# Create the DataFrame
df1 = pd.DataFrame(data)

# Display the DataFrame
print(df1)

   Region  Year  Sales
0    East  2022    350
1    East  2023    450
2    East  2024    350
3    East  2025    350
4   North  2022    300
5   North  2023    300
6   North  2024    450
7   North  2025    650
8   South  2022    310
9   South  2023    350
10  South  2024    400
11   West  2022    270
12   West  2023     80
13   West  2024    100
14   West  2025    200


In [10]:
# Group by one column: total sales by year
print(df1.groupby('Year')['Sales'].sum().reset_index())

   Year  Sales
0  2022   1230
1  2023   1180
2  2024   1300
3  2025   1200


In [11]:
import pandas as pd

# Define the data
data = {
    'Region': ['North', 'South', 'East', 'West',
               'North', 'East', 'South', 'West',
               'North', 'South'],
    'Product_Type': ['Electronics', 'Clothing', 'Electronics', 'Clothing',
                     'Groceries', 'Groceries', 'Electronics', 'Groceries',
                     'Clothing', 'Electronics'],
    'Sales': [200, 150, 300, 100,
              250, 400, 130, 200,
              150, 160]
}

# Create the DataFrame
df2 = pd.DataFrame(data)

# Display the DataFrame
print(df2)

  Region Product_Type  Sales
0  North  Electronics    200
1  South     Clothing    150
2   East  Electronics    300
3   West     Clothing    100
4  North    Groceries    250
5   East    Groceries    400
6  South  Electronics    130
7   West    Groceries    200
8  North     Clothing    150
9  South  Electronics    160


In [12]:
# Group by multiple columns: total sales by region and product type
df2.groupby(['Region', 'Product_Type'])['Sales'].sum().reset_index()

Unnamed: 0,Region,Product_Type,Sales
0,East,Electronics,300
1,East,Groceries,400
2,North,Clothing,150
3,North,Electronics,200
4,North,Groceries,250
5,South,Clothing,150
6,South,Electronics,290
7,West,Clothing,100
8,West,Groceries,200


## Grouping Data in Pivot Tables

Pivot tables let us summarize and reshape data by grouping and aggregating values across multiple dimensions.

### Key Features

- Group data by one or more fields
- Apply aggregation functions like `sum`, `mean`, `count`, etc.
- Rearrange data into hierarchical or multi-level formats
- Slice data into focused views for specific segments
- Handle missing values with `fill_value`

### Syntax

```python
pd.pivot_table(df,
               values='data_column',
               index='row_group',
               columns='column_group',
               aggfunc='sum',
               fill_value=0)
```

You can also pass lists to index, columns, or use multiple aggregation functions with `aggfunc=['sum', 'mean']`.

⚠️ **Key difference**:
- `.groupby()` is used for summarising data _without altering its layout_,
- `.pivot_table()` _rearranges the data_ into a tabular format, making it easier to compare different groups or categories.


In [21]:
import pandas as pd

# Define the data
data = {
    'Region': ['North', 'South', 'East', 'West',
               'North', 'East', 'South', 'West'],
    'Product': ['Electronics', 'Clothing', 'Electronics', 'Clothing',
                'Groceries', 'Groceries', 'Electronics', 'Groceries'],
    'Sales': [200, 150, 300, 100,
              250, 400, 130, 200]
}

# Create the DataFrame
df3 = pd.DataFrame(data)

# Display the DataFrame
print(df3)

  Region      Product  Sales
0  North  Electronics    200
1  South     Clothing    150
2   East  Electronics    300
3   West     Clothing    100
4  North    Groceries    250
5   East    Groceries    400
6  South  Electronics    130
7   West    Groceries    200


In [22]:
pd.pivot_table(df3, # DataFrame to summarize
               values='Sales', # Column to aggregate
               index='Region',  # Rows grouped by Region
               columns='Product',  # Columns grouped by Product
               aggfunc='sum',  # Aggregation function to apply
               fill_value=0) # Replace missing values with 0

Product,Clothing,Electronics,Groceries
Region,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
East,0,300,400
North,0,200,250
South,150,130,0
West,100,0,200


## Exercise 1

Imagine your boss give you the following DataFrame and ask you to calculate the total sales by region and the mean sales by region. Replace the “___”.

```python
import pandas as pd
import numpy as np

data = {
    'product': ['Apple', 'Banana', 'Orange', 'Grapes', 'Apple',
'Banana', 'Orange', 'Grapes', 'Apple', 'Banana', 'Orange', 'Grapes', 'Apple', 'Banana', 'Orange', 'Grapes'],
    'region': ['North America', 'North America', 'North America', 'North America', 'South America', 'South America', 'South America', 'South America', 'Europe', 'Europe', 'Europe', 'Europe', 'Asia', 'Asia', 'Asia', 'Asia'],
    'sales': [100, 150, 200, 120, 80, 90, 130, 110, 140, 160, 120, 130, 180, 190, 210, 220]
}

sales_df = pd.DataFrame(data)

# Group by region and calculate the total sales
# Replace the "___" below
print(sales_df.groupby('___')['sales'].sum().reset_index())  

# Calculate the mean sales by region
# Replace the "___" below
print(sales_df.groupby('region')['___'].___().reset_index())
```

In [3]:
import pandas as pd
import numpy as np

data = {
    'product': ['Apple', 'Banana', 'Orange', 'Grapes', 'Apple',
'Banana', 'Orange', 'Grapes', 'Apple', 'Banana', 'Orange', 'Grapes', 'Apple', 'Banana', 'Orange', 'Grapes'],
    'region': ['North America', 'North America', 'North America', 'North America', 'South America', 'South America', 'South America', 'South America', 'Europe', 'Europe', 'Europe', 'Europe', 'Asia', 'Asia', 'Asia', 'Asia'],
    'sales': [100, 150, 200, 120, 80, 90, 130, 110, 140, 160, 120, 130, 180, 190, 210, 220]
}

sales_df = pd.DataFrame(data)

# Group by region and calculate the total sales
print("Total sales by Region:")
print(sales_df.groupby('region')['sales'].sum().reset_index())

# Calculate the mean sales by region
print("\nAverage sales by Region:")
print(sales_df.groupby('region')['sales'].mean().reset_index())

Total sales by Region:
          region  sales
0           Asia    800
1         Europe    550
2  North America    570
3  South America    410

Average sales by Region:
          region  sales
0           Asia  200.0
1         Europe  137.5
2  North America  142.5
3  South America  102.5


## Exercise 2

You have a dataset with columns for `Year`, `Region`, and `Sales`. Group the data by `Year` and calculates the total sales for each year.

```python
import pandas as pd

# Sample dataset
data = {
    'Region': ['East', 'East', 'East', 'East', 'North', 'North', 'North', 'North', 'South', 'South', 'South', 'West', 'West', 'West', 'West'],
    'Year': [2022, 2023, 2024, 2025, 2022, 2023, 2024, 2025, 2022, 2023, 2024, 2022, 2023, 2024, 2025],
    'Sales': [350, 450, 350, 350, 300, 300, 450, 650, 310, 350, 400, 270, 80, 100, 200]
}

df = pd.DataFrame(data)

# Group by 'Year' and calculate total sales
result = df.___('___')['___'].___().reset_index()  # Replace "___" with the correct method
print(result)
```

In [13]:
import pandas as pd

# Sample dataset
data = {
    'Region': ['East', 'East', 'East', 'East', 'North', 'North', 'North', 'North', 'South', 'South', 'South', 'West', 'West', 'West', 'West'],
    'Year': [2022, 2023, 2024, 2025, 2022, 2023, 2024, 2025, 2022, 2023, 2024, 2022, 2023, 2024, 2025],
    'Sales': [350, 450, 350, 350, 300, 300, 450, 650, 310, 350, 400, 270, 80, 100, 200]
}

df = pd.DataFrame(data)

# Group by 'Year' and calculate total sales
result = df.groupby('Year')['Sales'].sum().reset_index()
print(result)

   Year  Sales
0  2022   1230
1  2023   1180
2  2024   1300
3  2025   1200


## Exercise 3

You have a dataset with sales information. Write a function that groups the data by both `Region` and `Year` to calculate the total sales for each combination of region and year.

```python
import pandas as pd

# Sample dataset
data = {
    'Region': ['North', 'South', 'East', 'West', 'North', 'East', 'South', 'West', 'North', 'South', 'East', 'West'],
    'Year': [2022, 2022, 2022, 2022, 2023, 2023, 2023, 2023, 2024, 2024, 2024, 2024],
    'Sales': [200, 150, 300, 100, 250, 400, 130, 200, 150, 160, 120, 130]
}

df = pd.DataFrame(data)

# Group by 'Region' and 'Year' and calculate total sales
result = ___.___(['___', '____'])['____'].____().reset_index()  # Replace "___" with the correct method
print(result)
```

In [14]:
import pandas as pd

# Sample dataset
data = {
    'Region': ['North', 'South', 'East', 'West', 'North', 'East', 'South', 'West', 'North', 'South', 'East', 'West'],
    'Year': [2022, 2022, 2022, 2022, 2023, 2023, 2023, 2023, 2024, 2024, 2024, 2024],
    'Sales': [200, 150, 300, 100, 250, 400, 130, 200, 150, 160, 120, 130]
}

df = pd.DataFrame(data)

# Group by 'Region' and 'Year' and calculate total sales
result = df.groupby(['Region', 'Year'])['Sales'].sum().reset_index()
print(result)

   Region  Year  Sales
0    East  2022    300
1    East  2023    400
2    East  2024    120
3   North  2022    200
4   North  2023    250
5   North  2024    150
6   South  2022    150
7   South  2023    130
8   South  2024    160
9    West  2022    100
10   West  2023    200
11   West  2024    130


## Exercise 4

You have a dataset with sales information by region. Use `groupby()` to calculate the average sales for each region.

```python
import pandas as pd

# Sample dataset
data = {
    'Region': ['North', 'South', 'East', 'West', 'North', 'East', 'South', 'West', 'North', 'South', 'East', 'West'],
    'Sales': [200, 150, 300, 100, 250, 400, 130, 200, 150, 160, 120, 130]
}

df = pd.DataFrame(data)

# Group by 'Region' and calculate mean sales
result = df.____  # Replace "___" with the correct method
print(result)
```

In [16]:
import pandas as pd

# Sample dataset
data = {
    'Region': ['North', 'South', 'East', 'West', 'North', 'East', 'South', 'West', 'North', 'South', 'East', 'West'],
    'Sales': [200, 150, 300, 100, 250, 400, 130, 200, 150, 160, 120, 130]
}

df = pd.DataFrame(data)

# Group by 'Region' and calculate mean sales
result = df.groupby('Region')['Sales'].mean().reset_index()
print(result)

  Region       Sales
0   East  273.333333
1  North  200.000000
2  South  146.666667
3   West  143.333333


## Exercise 5

You have a dataset with regions and sales. Use `groupby()` to count the number of entries for each region.

```python
import pandas as pd

# Sample dataset
data = {
    'Region': ['North', 'South', 'East', 'West', 'North', 'East', 'South', 'West', 'North', 'South', 'East', 'West'],
    'Sales': [200, 150, 300, 100, 250, 400, 130, 200, 150, 160, 120, 130]
}

df = pd.DataFrame(data)

# Group by 'Region' and count the number of entries
result = df.groupby('')[''].().reset_index()  # Replace "___" with the correct method
print(result)
```

In [18]:
import pandas as pd

# Sample dataset
data = {
    'Region': ['North', 'South', 'East', 'West', 'North', 'East', 'South', 'West', 'North', 'South', 'East', 'West'],
    'Sales': [200, 150, 300, 100, 250, 400, 130, 200, 150, 160, 120, 130]
}

df = pd.DataFrame(data)

# Group by 'Region' and count the number of entries
result = df.groupby('Region')['Sales'].count().reset_index()
print(result)

  Region  Sales
0   East      3
1  North      3
2  South      3
3   West      3


## Exercise 6

You want to calculate both the total and the average sales by region. Use the `agg()` method to apply multiple aggregation functions at once.

```python
import pandas as pd

# Sample dataset
data = {
    'Region': ['North', 'South', 'East', 'West', 'North', 'East', 'South', 'West', 'North', 'South', 'East', 'West'],
    'Sales': [200, 150, 300, 100, 250, 400, 130, 200, 150, 160, 120, 130]
}

df = pd.DataFrame(data)

# Group by 'Region' and apply multiple aggregation functions
___= df.groupby('')[''].agg(['', 'mean']).reset_index()  # Replace "___"
print(result)
```

In [None]:
import pandas as pd

# Sample dataset
data = {
    'Region': ['North', 'South', 'East', 'West', 'North', 'East', 'South', 'West', 'North', 'South', 'East', 'West'],
    'Sales': [200, 150, 300, 100, 250, 400, 130, 200, 150, 160, 120, 130]
}

df = pd.DataFrame(data)

# Group by 'Region' and apply multiple aggregation functions
result = df.groupby('Region')['Sales'].agg(['sum', 'mean']).reset_index()
print(result)

## Exercise 7

You have a dataset with sales data by region and year. Use `groupby()` to find the maximum sales for each region.

```python
import pandas as pd

# Sample dataset
data = {
    'Region': ['North', 'South', 'East', 'West', 'North', 'East', 'South', 'West', 'North', 'South', 'East', 'West'],
    'Year': [2022, 2022, 2022, 2022, 2023, 2023, 2023, 2023, 2024, 2024, 2024, 2024],
    'Sales': [200, 150, 300, 100, 250, 400, 130, 200, 150, 160, 120, 130]
}

df = pd.DataFrame(data)

# Group by 'Region' and find the maximum sales
___= ___  
print(result)
```

In [None]:
import pandas as pd

# Sample dataset
data = {
    'Region': ['North', 'South', 'East', 'West', 'North', 'East', 'South', 'West', 'North', 'South', 'East', 'West'],
    'Year': [2022, 2022, 2022, 2022, 2023, 2023, 2023, 2023, 2024, 2024, 2024, 2024],
    'Sales': [200, 150, 300, 100, 250, 400, 130, 200, 150, 160, 120, 130]
}

df = pd.DataFrame(data)

# Group by 'Region' and find the maximum sales
result = df.groupby('Region')['Sales'].max().reset_index()
print(result)

## Exercise 8

Create a pivot table for total sales by region and another one with the mean sales by region.

```python
import pandas as pd

data = {'product': ['Apple', 'Banana', 'Orange', 'Grapes', 'Apple', 'Banana', 'Orange', 'Grapes', 'Apple',
                               'Banana', 'Orange', 'Grapes', 'Apple', 'Banana', 'Orange'],
        'region': ['North America', 'North America', 'North America', 'North America', 'South America',
                       'South America', 'South America', 'South America', 'Europe', 'Europe', 'Europe', 'Europe',
                   'Asia', 'Asia', 'Asia'],
        'sales': [100, 150, 200, 120, 80, 90, 130, 110, 140, 160, 120, 130, 180, 190, 210]}
sales_df = pd.DataFrame(data)

# Create a pivot table for total sales by region
total_sales_pivot = sales_df.pivot_table(values='___', index='___',
        aggfunc='___').reset_index()  # Replace the "___"
print('Total Sales Pivot', total_sales_pivot)

# Create a pivot table for mean sales by region
mean_sales_pivot = sales_df.pivot_table(values='___', index='___',
        aggfunc='___').reset_index()  # Replace the "___"
print('\n Mean Sales Pivot', mean_sales_pivot)
```

In [None]:
import pandas as pd

data = {'product': ['Apple', 'Banana', 'Orange', 'Grapes', 'Apple', 'Banana', 'Orange', 'Grapes', 'Apple',
                               'Banana', 'Orange', 'Grapes', 'Apple', 'Banana', 'Orange'],
        'region': ['North America', 'North America', 'North America', 'North America', 'South America',
                       'South America', 'South America', 'South America', 'Europe', 'Europe', 'Europe', 'Europe',
                   'Asia', 'Asia', 'Asia'],
        'sales': [100, 150, 200, 120, 80, 90, 130, 110, 140, 160, 120, 130, 180, 190, 210]}
sales_df = pd.DataFrame(data)

# Create a pivot table for total sales by region
total_sales_pivot = sales_df.pivot_table(values='sales', index='region',
        aggfunc='sum').reset_index()
print('Total Sales Pivot', total_sales_pivot)

# Create a pivot table for mean sales by region
mean_sales_pivot = sales_df.pivot_table(values='sales', index='region',
        aggfunc='mean').reset_index()
print('\n Mean Sales Pivot', mean_sales_pivot)

## Exercise 9

You have a dataset with sales information by region and product. Create a pivot table that shows the total sales by region and product.

```python
import pandas as pd

# Sample dataset
data = {
    'Region': ['North', 'South', 'East', 'West', 'North', 'East', 'South', 'West', 'North', 'South'],
    'Product': ['Electronics', 'Clothing', 'Electronics', 'Clothing', 'Groceries', 'Groceries', 'Electronics', 'Groceries', 'Clothing', 'Electronics'],
    'Sales': [200, 150, 300, 100, 250, 400, 130, 200, 150, 160]
}

df = pd.DataFrame(data)

# Create a pivot table for total sales by region and product
pivot_table = pd.___(df,
                     values='___',  # Replace "___" with the correct column for values
                     index='___',  # Replace "___" with the correct column for rows (index)
                     columns='___',  # Replace "___" with the correct column for columns
                     aggfunc='___',  # Replace "___" with the correct aggregation function
                     fill_value=___)  # Replace "___" with the value to fill missing data
print(pivot_table)
```

In [23]:
import pandas as pd

# Sample dataset
data = {
    'Region': ['North', 'South', 'East', 'West', 'North', 'East', 'South', 'West', 'North', 'South'],
    'Product': ['Electronics', 'Clothing', 'Electronics', 'Clothing', 'Groceries', 'Groceries', 'Electronics', 'Groceries', 'Clothing', 'Electronics'],
    'Sales': [200, 150, 300, 100, 250, 400, 130, 200, 150, 160]
}

df = pd.DataFrame(data)

# Create a pivot table for total sales by region and product
pivot_table = pd.pivot_table(df,
                     values='Sales',
                     index='Region',
                     columns='Product',
                     aggfunc='sum',
                     fill_value=0)
print(pivot_table)

Product  Clothing  Electronics  Groceries
Region                                   
East            0          300        400
North         150          200        250
South         150          290          0
West          100            0        200


## Exercise 10

You have the same dataset and now need to create a pivot table showing the average sales by region.

```python
import pandas as pd

# Sample dataset
data = {
    'Region': ['North', 'South', 'East', 'West', 'North', 'East', 'South', 'West', 'North', 'South'],
    'Product': ['Electronics', 'Clothing', 'Electronics', 'Clothing', 'Groceries', 'Groceries', 'Electronics', 'Groceries', 'Clothing', 'Electronics'],
    'Sales': [200, 150, 300, 100, 250, 400, 130, 200, 150, 160]
}

df = pd.DataFrame(data)

# Create a pivot table for mean sales by region
pivot_table = pd.pivot_table(df,
                             ___='___',  # Replace "___"
                             ___='___',  # Replace "___"
print(pivot_table)
```

In [24]:
import pandas as pd

# Sample dataset
data = {
    'Region': ['North', 'South', 'East', 'West', 'North', 'East', 'South', 'West', 'North', 'South'],
    'Product': ['Electronics', 'Clothing', 'Electronics', 'Clothing', 'Groceries', 'Groceries', 'Electronics', 'Groceries', 'Clothing', 'Electronics'],
    'Sales': [200, 150, 300, 100, 250, 400, 130, 200, 150, 160]
}

df = pd.DataFrame(data)

# Create a pivot table for mean sales by region
pivot_table = pd.pivot_table(df,
                             values='Sales',
                             index='Region',
                             aggfunc='mean',
                             fill_value=0)
print(pivot_table)

             Sales
Region            
East    350.000000
North   200.000000
South   146.666667
West    150.000000


## Exercise 11

Create a pivot table that shows both the total and average sales by region and product type.

```python
import pandas as pd

# Sample dataset
data = {
    'Region': ['North', 'South', 'East', 'West', 'North', 'East', 'South', 'West', 'North', 'South'],
    'Product': ['Electronics', 'Clothing', 'Electronics', 'Clothing', 'Groceries', 'Groceries', 'Electronics', 'Groceries', 'Clothing', 'Electronics'],
    'Sales': [200, 150, 300, 100, 250, 400, 130, 200, 150, 160]
}

df = pd.DataFrame(data)

# Create a pivot table with both total and average sales by region and product
pivot_table = pd.pivot_table(___)  # Replace "___"
print(pivot_table)
```

In [25]:
import pandas as pd

# Sample dataset
data = {
    'Region': ['North', 'South', 'East', 'West', 'North', 'East', 'South', 'West', 'North', 'South'],
    'Product': ['Electronics', 'Clothing', 'Electronics', 'Clothing', 'Groceries', 'Groceries', 'Electronics', 'Groceries', 'Clothing', 'Electronics'],
    'Sales': [200, 150, 300, 100, 250, 400, 130, 200, 150, 160]
}

df = pd.DataFrame(data)

# Create a pivot table with both total and average sales by region and product
pivot_table = pd.pivot_table(df,
                             values='Sales',
                             index='Region',
                             columns='Product',
                             aggfunc=['sum', 'mean'],
                             fill_value=0)
print(pivot_table)

             sum                           mean                      
Product Clothing Electronics Groceries Clothing Electronics Groceries
Region                                                               
East           0         300       400      0.0       300.0     400.0
North        150         200       250    150.0       200.0     250.0
South        150         290         0    150.0       145.0       0.0
West         100           0       200    100.0         0.0     200.0


## Exercise 12

You want to create a pivot table showing total sales by region, but only for "Electronics" products. Filter the data before creating the pivot table.

```python
import pandas as pd

# Sample dataset
data = {
    'Region': ['North', 'South', 'East', 'West', 'North', 'East', 'South', 'West', 'North', 'South'],
    'Product': ['Electronics', 'Clothing', 'Electronics', 'Clothing', 'Groceries', 'Groceries', 'Electronics', 'Groceries', 'Clothing', 'Electronics'],
    'Sales': [200, 150, 300, 100, 250, 400, 130, 200, 150, 160]
}

df = pd.DataFrame(data)
print(df.head())

# Filter for 'Electronics' products and then create a pivot table for total sales by region
filtered_df = df[df['Product'] == 'Electronics']  # Replace "___" with the correct filter
___= ___ # Replace "___" with the value to fill missing data
print(pivot_table)
```

In [28]:
import pandas as pd

# Sample dataset
data = {
    'Region': ['North', 'South', 'East', 'West', 'North', 'East', 'South', 'West', 'North', 'South'],
    'Product': ['Electronics', 'Clothing', 'Electronics', 'Clothing', 'Groceries', 'Groceries', 'Electronics', 'Groceries', 'Clothing', 'Electronics'],
    'Sales': [200, 150, 300, 100, 250, 400, 130, 200, 150, 160]
}

df = pd.DataFrame(data)
print(df.head(), "\n")

# Filter for 'Electronics' products and then create a pivot table for total sales by region
filtered_df = df[df['Product'] == 'Electronics']
pivot_table = pd.pivot_table(filtered_df,
                             values='Sales',
                             index='Region',
                             aggfunc='sum',
                             fill_value=0)
print(f"Total Electronic Sales per Region:\n{pivot_table}")

  Region      Product  Sales
0  North  Electronics    200
1  South     Clothing    150
2   East  Electronics    300
3   West     Clothing    100
4  North    Groceries    250 

Total Electronic Sales per Region:
        Sales
Region       
East      300
North     200
South     290


## Exercise 13

Create a pivot table to show the count of sales entries by region.

```python
import pandas as pd

# Sample dataset
data = {
    'Region': ['North', 'South', 'East', 'West', 'North', 'East', 'South', 'West', 'North', 'South'],
    'Product': ['Electronics', 'Clothing', 'Electronics', 'Clothing', 'Groceries', 'Groceries', 'Electronics', 'Groceries', 'Clothing', 'Electronics'],
    'Sales': [200, 150, 300, 100, 250, 400, 130, 200, 150, 160]
}

df = pd.DataFrame(data)
print(df.head())


# Create a pivot table showing the count of sales entries by region
___ = ___  # Replace "___"
print(___)
```

In [31]:
import pandas as pd

# Sample dataset
data = {
    'Region': ['North', 'South', 'East', 'West', 'North', 'East', 'South', 'West', 'North', 'South'],
    'Product': ['Electronics', 'Clothing', 'Electronics', 'Clothing', 'Groceries', 'Groceries', 'Electronics', 'Groceries', 'Clothing', 'Electronics'],
    'Sales': [200, 150, 300, 100, 250, 400, 130, 200, 150, 160]
}

df = pd.DataFrame(data)
print(df.head(), '\n')

# Create a pivot table showing the count of sales entries by region
pivot_table = pd.pivot_table(df,
                             values='Sales',
                             index='Region',
                             aggfunc='count',
                             fill_value=0)
print(f"Number of Sales Per Region:\n{pivot_table}")

  Region      Product  Sales
0  North  Electronics    200
1  South     Clothing    150
2   East  Electronics    300
3   West     Clothing    100
4  North    Groceries    250 

Number of Sales Per Region:
        Sales
Region       
East        2
North       3
South       3
West        2


## Exercise 14

Create a pivot table that shows both the total and average sales by region and product type.

```python
import pandas as pd

# Sample dataset
data = {
    'Region': ['North', 'South', 'East', 'West', 'North', 'East', 'South', 'West', 'North', 'South'],
    'Product': ['Electronics', 'Clothing', 'Electronics', 'Clothing', 'Groceries', 'Groceries', 'Electronics', 'Groceries', 'Clothing', 'Electronics'],
    'Sales': [200, 150, 300, 100, 250, 400, 130, 200, 150, 160]
}

df = pd.DataFrame(data)

# Create a pivot table with both total and average sales by region and product
pivot_table = pd.pivot_table(___)  # Replace "___"
print(pivot_table)
```

In [32]:
import pandas as pd

# Sample dataset
data = {
    'Region': ['North', 'South', 'East', 'West', 'North', 'East', 'South', 'West', 'North', 'South'],
    'Product': ['Electronics', 'Clothing', 'Electronics', 'Clothing', 'Groceries', 'Groceries', 'Electronics', 'Groceries', 'Clothing', 'Electronics'],
    'Sales': [200, 150, 300, 100, 250, 400, 130, 200, 150, 160]
}

df = pd.DataFrame(data)

# Create a pivot table with both total and average sales by region and product
pivot_table = pd.pivot_table(df,
                             values='Sales',
                             index='Region',
                             columns='Product',
                             aggfunc=['sum', 'mean'],
                             fill_value=0)
print(pivot_table)

             sum                           mean                      
Product Clothing Electronics Groceries Clothing Electronics Groceries
Region                                                               
East           0         300       400      0.0       300.0     400.0
North        150         200       250    150.0       200.0     250.0
South        150         290         0    150.0       145.0       0.0
West         100           0       200    100.0         0.0     200.0
