# Grouping with a CASE Statement

In this section, we’re learning how to categorize and summarize data using the `CASE` statement in SQL.

By the end of this section, we will:
- Know how to categorize data using CASE statements.
- Combine CASE statements with aggregate functions for advanced summarization.
- Use the `GROUP BY` clause effectively with CASE statements.

We’ll be using our **Access_to_Basic_Services** table from the **united_nations** MySQL database to analyze how access to managed drinking water services varies across Africa’s Regional Economic Communities (RECs).


## Connecting to our MySQL Database

Let’s connect to our MySQL server using `pymysql` and the `%sql` magic command to run SQL queries directly in this notebook.


In [1]:
%load_ext sql

## 1. Identify Regions in Africa

We’ll start by selecting only the regions that contain the word “Africa” in their name.  
This helps us isolate the African countries for our regional classification task.


In [3]:
%%sql

SELECT DISTINCT Region
FROM united_nations.Access_to_Basic_Services
WHERE Region LIKE '%Africa%';


 * mysql+pymysql://root:***@localhost:3306/united_nations
2 rows affected.


Region
Northern Africa and Western Asia
Sub-Saharan Africa


## 2. Classify SADC Countries

Next, we’ll classify whether a country belongs to the **SADC** (Southern African Development Community).  
If the country is part of SADC, we’ll label it as `SADC`; otherwise, it will be `Not Classified`.


In [4]:
%%sql

SELECT
  CASE
    WHEN Country_name IN (
      'Angola', 'Botswana', 'Comoros', 'Democratic Republic of Congo', 'Eswatini',
      'Lesotho', 'Madagascar', 'Malawi', 'Mauritius', 'Mozambique',
      'Namibia', 'Seychelles', 'South Africa', 'United Republic Tanzania', 'Zambia', 'Zimbabwe'
    ) THEN 'SADC'
    ELSE 'Not Classified'
  END AS Regional_economic_community,
  Country_name,
  Pct_managed_drinking_water_services
FROM united_nations.Access_to_Basic_Services;


 * mysql+pymysql://root:***@localhost:3306/united_nations
1048 rows affected.


Regional_economic_community,Country_name,Pct_managed_drinking_water_services
Not Classified,Kazakhstan,94.67
Not Classified,Kazakhstan,94.67
Not Classified,Kazakhstan,95.0
Not Classified,Kazakhstan,95.0
Not Classified,Kazakhstan,95.0
Not Classified,Kazakhstan,95.0
Not Classified,Kyrgyzstan,89.67
Not Classified,Kyrgyzstan,90.33
Not Classified,Kyrgyzstan,91.0
Not Classified,Kyrgyzstan,91.33


## 3. Classify UMA and ECOWAS Countries

We’ll expand the CASE statement to include **UMA** (Union du Maghreb Arabe) and **ECOWAS** (Economic Community of West African States) members as well.


In [5]:
%%sql

SELECT
  CASE
    WHEN Country_name IN (
      'Angola', 'Botswana', 'Comoros', 'Democratic Republic of Congo', 'Eswatini',
      'Lesotho', 'Madagascar', 'Malawi', 'Mauritius', 'Mozambique',
      'Namibia', 'Seychelles', 'South Africa', 'United Republic Tanzania', 'Zambia', 'Zimbabwe'
    ) THEN 'SADC'
    WHEN Country_name IN (
      'Algeria', 'Libya', 'Mauritania', 'Morocco', 'Tunisia'
    ) THEN 'UMA'
    WHEN Country_name IN (
      'Benin', 'Burkina Faso', 'Cabo Verde', 'Cote d’Ivoire', 'Gambia', 'Ghana', 'Guinea',
      'Guinea-Bissau', 'Liberia', 'Mali', 'Niger', 'Nigeria', 'Senegal', 'Sierra Leone', 'Togo'
    ) THEN 'ECOWAS'
    ELSE 'Not Classified'
  END AS Regional_economic_community,
  Country_name,
  Pct_managed_drinking_water_services
FROM united_nations.Access_to_Basic_Services;


 * mysql+pymysql://root:***@localhost:3306/united_nations
1048 rows affected.


Regional_economic_community,Country_name,Pct_managed_drinking_water_services
Not Classified,Kazakhstan,94.67
Not Classified,Kazakhstan,94.67
Not Classified,Kazakhstan,95.0
Not Classified,Kazakhstan,95.0
Not Classified,Kazakhstan,95.0
Not Classified,Kazakhstan,95.0
Not Classified,Kyrgyzstan,89.67
Not Classified,Kyrgyzstan,90.33
Not Classified,Kyrgyzstan,91.0
Not Classified,Kyrgyzstan,91.33


## 4. Calculate Minimum, Average, and Maximum Drinking Water Access per REC

Finally, let’s summarize the data by REC, calculating the **minimum**, **average**, and **maximum** percentage of managed drinking water services for each group.

This will help us understand disparities in access to basic services across different African regions.


In [6]:
%%sql

SELECT
  CASE
    WHEN Country_name IN (
      'Angola', 'Botswana', 'Comoros', 'Democratic Republic of Congo', 'Eswatini',
      'Lesotho', 'Madagascar', 'Malawi', 'Mauritius', 'Mozambique',
      'Namibia', 'Seychelles', 'South Africa', 'United Republic Tanzania', 'Zambia', 'Zimbabwe'
    ) THEN 'SADC'
    WHEN Country_name IN (
      'Algeria', 'Libya', 'Mauritania', 'Morocco', 'Tunisia'
    ) THEN 'UMA'
    WHEN Country_name IN (
      'Benin', 'Burkina Faso', 'Cabo Verde', 'Cote d’Ivoire', 'Gambia', 'Ghana', 'Guinea',
      'Guinea-Bissau', 'Liberia', 'Mali', 'Niger', 'Nigeria', 'Senegal', 'Sierra Leone', 'Togo'
    ) THEN 'ECOWAS'
    ELSE 'Not Classified'
  END AS Regional_economic_community,
  MIN(Pct_managed_drinking_water_services) AS Min_Water_Access,
  AVG(Pct_managed_drinking_water_services) AS Avg_Water_Access,
  MAX(Pct_managed_drinking_water_services) AS Max_Water_Access
FROM united_nations.Access_to_Basic_Services
GROUP BY Regional_economic_community;


 * mysql+pymysql://root:***@localhost:3306/united_nations
4 rows affected.


Regional_economic_community,Min_Water_Access,Avg_Water_Access,Max_Water_Access
Not Classified,38.33,89.864108,100.0
UMA,66.67,88.233,100.0
SADC,50.33,75.813049,100.0
ECOWAS,53.33,70.789286,87.33


##  Summary

We now have a **summarised report** by the **Regional Economic Community (REC)**, showcasing the **minimum**, **average**, and **maximum** values of *managed drinking water services* for each region.

This analysis highlights how access to clean and safe drinking water varies across Africa’s major economic communities — **SADC**, **UMA**, and **ECOWAS** — as well as countries not yet classified under these groups.

By combining the `CASE` statement with `GROUP BY` and aggregate functions, we’ve learned how to:
- Categorize data dynamically based on conditions.
- Compute summary statistics for each category.
- Generate regional insights that can inform broader development discussions.

This skill is particularly valuable in **data analysis and policymaking**, where grouping and aggregation reveal trends that raw data alone cannot show.
