# Analyzing the Effects of COPD: A Comparative Study of the United States and Uganda

## Introduction

Chronic Obstructive Pulmonary Disease (COPD) is a progressive respiratory condition with a significant global health burden. Understanding its impact across countries is crucial for informing public health policies. This analysis compares the burden of COPD in the United States and Uganda by calculating crude and age-standardized death rates.

The crude death rate is a basic measure of mortality, indicating the total number of COPD deaths per 100,000 population. However, it does not account for differences in population age structures, which can distort comparisons due to age being a primary COPD risk factor.

Age-standardization addresses this limitation by calculating age-specific mortality rates and taking a weighted average using the World Health Organization's (WHO) standard population. This approach removes the influence of age distribution, enabling a fair comparison of COPD mortality risk between countries.

By utilizing age-standardization, this analysis aims to assess whether the United States or Uganda exhibits a higher COPD mortality risk when age is not a confounding factor. This study leverages data from the UN World Population Prospects, WHO Standard Population, and age-specific COPD death rates for 2019.

The objective is to calculate and compare the crude and age-standardized COPD death rates for the United States and Uganda, providing insights into the relative burden of COPD in each country while highlighting the importance of age-standardization in epidemiological analyses.

In [1]:
import pandas as pd

import os


In [2]:
# load data sources urls

UN_POPULATION_PROSPECTS = 'input\WPP2022_POPULATION_5-YEAR_AGE_GROUPS_BOTH_SEXES.xlsx'
WHO_STANDARD_POPULATION_TABLE = 'input\Table 1.xlsx'
AGE_SPECIFIC_COPD_DEATH_RATES = 'input\COPD_death_rates.xlsx'


In [3]:
# Load UN World Population Prospects data
un_population = pd.read_excel(UN_POPULATION_PROSPECTS)
print("TABLE: UN World Population Prospects:")
print(un_population.head())  # Display the first few rows


# Load WHO Standard Population data
who_standard = pd.read_excel(WHO_STANDARD_POPULATION_TABLE)
print("\nTABLE: WHO Standard Population:")
print(who_standard.head())

# Load age-specific COPD death rates data
copd_rates = pd.read_excel(AGE_SPECIFIC_COPD_DEATH_RATES)
print("\nTABLE Age-specific COPD Death Rates:")
print(copd_rates.head())


TABLE: UN World Population Prospects:
   Index    Variant Region, subregion, country or area * Notes  Location code  \
0      1  Estimates                                WORLD   NaN            900   
1      2  Estimates                                WORLD   NaN            900   
2      3  Estimates                                WORLD   NaN            900   
3      4  Estimates                                WORLD   NaN            900   
4      5  Estimates                                WORLD   NaN            900   

  ISO3 Alpha-code ISO2 Alpha-code  SDMX code**   Type  Parent code  ...  \
0             NaN             NaN          1.0  World            0  ...   
1             NaN             NaN          1.0  World            0  ...   
2             NaN             NaN          1.0  World            0  ...   
3             NaN             NaN          1.0  World            0  ...   
4             NaN             NaN          1.0  World            0  ...   

        55-59       60-6

 The UN World Population Prospects data contains population estimates for multiple countries and regions.

On exploring the Excel file, we find that the estimates data is grouped by continent and sub-regions within continents.

Under each continent's subregions, there are entries for each country, and each country has a unique identifier - the location code.

From exploring the spreadsheet, we identify the location code for Uganda to be 800 and for the United States to be 840.

Additionally, we find that all Uganda entries are from index 2885 to index 2956, and all United States entries are from index 18437 to index 18508.

To narrow down our data, we filter for Uganda and the United States specifically.

In [4]:
# Filter for Uganda and United States using location codes and make new dfs. Print them as well

uganda_population = un_population[un_population['Location code'] == 800]
us_population = un_population[un_population['Location code'] == 840]

print("Uganda Population:")
print(uganda_population)

print("\nUS Population:")
print(us_population)

Uganda Population:
      Index    Variant Region, subregion, country or area * Notes  \
2884   2885  Estimates                               Uganda   NaN   
2885   2886  Estimates                               Uganda   NaN   
2886   2887  Estimates                               Uganda   NaN   
2887   2888  Estimates                               Uganda   NaN   
2888   2889  Estimates                               Uganda   NaN   
...     ...        ...                                  ...   ...   
2951   2952  Estimates                               Uganda   NaN   
2952   2953  Estimates                               Uganda   NaN   
2953   2954  Estimates                               Uganda   NaN   
2954   2955  Estimates                               Uganda   NaN   
2955   2956  Estimates                               Uganda   NaN   

      Location code ISO3 Alpha-code ISO2 Alpha-code  SDMX code**  \
2884            800             UGA              UG        800.0   
2885            

### Filtering Population Data

After we filter the UN World Population Prospects data for Uganda and the United States using their respective location codes (800 for Uganda and 840 for the United States). We then further filter this data to include only the entries for the year 2019, aligning with our analysis period.

In [5]:


# Further filter for the year 2019 if your dataset contains multiple years
uganda_population_2019 = uganda_population[uganda_population['Year'] == 2019]
us_population_2019 = us_population[us_population['Year'] == 2019]

In [6]:

print("Uganda Population:")
print(uganda_population_2019)

print("\nUS Population:")
print(us_population_2019)

Uganda Population:
      Index    Variant Region, subregion, country or area * Notes  \
2953   2954  Estimates                               Uganda   NaN   

      Location code ISO3 Alpha-code ISO2 Alpha-code  SDMX code**  \
2953            800             UGA              UG        800.0   

              Type  Parent code  ...    55-59     60-64     65-69     70-74  \
2953  Country/Area          910  ...  687.315  500.2975  353.2215  197.1705   

       75-79   80-84    85-89   90-94   95-99   100+  
2953  92.682  43.893  15.4255  3.5645  0.6165  0.096  

[1 rows x 32 columns]

US Population:
       Index    Variant Region, subregion, country or area * Notes  \
18505  18506  Estimates             United States of America    30   

       Location code ISO3 Alpha-code ISO2 Alpha-code  SDMX code**  \
18505            840             USA              US        840.0   

               Type  Parent code  ...       55-59       60-64       65-69  \
18505  Country/Area          905  ...  2

ok so now based off all we have - all data loaded in, filtered to uganda and us for 2019, (btw these are in separate data frames rn, do we merge them, yes or no?) table of of age-specific death rates of COPD for 2019 for uganda and us, table of Age standardization of rates: a new WHO standard 

what next? and with what tables??  .

As a reminder, these are the data frames that we have 



1. print(copd_rates.head())
2. print(who_standard.head())
3. print(uganda_population_2019)
4. print(us_population_2019)

In [7]:
print(copd_rates.head())


  Age group (years)  Death rate, United States, 2019  Death rate, Uganda, 2019
0               0-4                             0.04                      0.40
1               5-9                             0.02                      0.17
2             10-14                             0.02                      0.07
3             15-19                             0.02                      0.23
4             20-24                             0.06                      0.38


In [8]:
print(who_standard.head())

  Age Group  Segi ("world") Standard  Scandinavian ("European") Standard  \
0       0-4                     12.0                                   8   
1       5-9                     10.0                                   7   
2     10-14                      9.0                                   7   
3     15-19                      9.0                                   7   
4     20-24                      8.0                                   7   

   WHO World Standard*  
0                 8.86  
1                 8.69  
2                 8.60  
3                 8.47  
4                 8.22  


In [9]:
print(uganda_population_2019)

      Index    Variant Region, subregion, country or area * Notes  \
2953   2954  Estimates                               Uganda   NaN   

      Location code ISO3 Alpha-code ISO2 Alpha-code  SDMX code**  \
2953            800             UGA              UG        800.0   

              Type  Parent code  ...    55-59     60-64     65-69     70-74  \
2953  Country/Area          910  ...  687.315  500.2975  353.2215  197.1705   

       75-79   80-84    85-89   90-94   95-99   100+  
2953  92.682  43.893  15.4255  3.5645  0.6165  0.096  

[1 rows x 32 columns]


In [10]:
print(us_population_2019)

       Index    Variant Region, subregion, country or area * Notes  \
18505  18506  Estimates             United States of America    30   

       Location code ISO3 Alpha-code ISO2 Alpha-code  SDMX code**  \
18505            840             USA              US        840.0   

               Type  Parent code  ...       55-59       60-64       65-69  \
18505  Country/Area          905  ...  22347.5005  20941.0635  17500.8715   

            70-74     75-79      80-84      85-89     90-94     95-99    100+  
18505  13688.5955  9272.809  6118.8665  3977.1775  1656.067  501.7545  78.955  

[1 rows x 32 columns]


now let's proceed with specific, tailored steps to calculate the crude and age-standardized COPD death rates for Uganda and the United States for the year 2019. 

We'll use the filtered population data for 2019 (uganda_population_2019 and us_population_2019), the age-specific COPD death rates (which I'll assume are contained in a similar structured DataFrame as your population data, let's call it copd_death_rates_2019), and the WHO standard population 1.


---

so tl dr

1. uganda_population_2019
2. us_population_2019
3. copd_rates
4. who_standard

now, 
First,we ensure and cross check that we  have the COPD death rates for 2019 for both countries, structured similarly to your population data, with age groups as columns

we do

Given that the age groups in copd_rates and who_standard match exactly and both stop at the "85+" entry, while the Ugandan and US population data for 2019 includes age values up to "95-99", we'll need to adjust the population data to align with the age groups used in the COPD death rates and WHO standard population data. This adjustment ensures that our calculations for both crude and age-standardized death rates are based on comparable age groupings.

so we align the population data with the COPD death rates and WHO standard, we aggregate the population counts from the "85+" age group onwards in the Ugandan and US population data into a single "85+" category. This aggregation will allow us to directly compare and calculate the death rates using the WHO standard population, as well as the copd_rates which also ends at "85+".

In [13]:
 
def adjust_population_data(population_data):
    """
    Adjusts the population data by summing the populations of the age groups
    beyond "85+" into a single "85+" category and dropping the unnecessary
    columns.

    Args:
        population_data (pd.DataFrame): The population data to be adjusted.

    Returns:
        pd.DataFrame: The adjusted population data with a single "85+" category.
    """
    # Sum the populations of the "85-89", "90-94", "95-99", and "100+" age groups into a new "85+" column
    population_data.loc[:, '85+'] = population_data[['85-89', '90-94', '95-99', '100+']].sum(axis=1)

    # Now, you can drop the now-unnecessary columns to clean up the DataFrame
    population_data_adjusted = population_data.drop(columns=['85-89', '90-94', '95-99', '100+'])

    return population_data_adjusted

run it on uganda and us

In [14]:
uganda_population_2019_adjusted = adjust_population_data(uganda_population_2019)
us_population_2019_adjusted = adjust_population_data(us_population_2019)


In [None]:

print(uganda_population_2019_adjusted())
print(us_population_2019_adjusted())


#### For Uganda:

In [None]:
# Assuming copd_rates contains COPD deaths for each age group for Uganda and the US in 2019
total_deaths_uganda = copd_rates['Death rate, Uganda, 2019'].sum()

In [None]:

total_population_uganda = uganda_population_2019.iloc[:, -16:].sum().sum()  # Summing the last 16 columns which represent age groups

