# Project 2: GDP and Unemployment
## **1. Introduction** 
It is well understood that a country’s GDP and unemployment rate are closely linked. In this project, I examine how GDP growth and unemployment relate to each other in China and the United States from 1991 to 2021. These two countries were chosen because they are the world’s largest economies and, in many ways, serve as representatives of the Eastern and Western economic systems. Comparing them offers a broad and meaningful perspective.
Before generating the visualizations, my baseline expectation was that both countries would show an inverse relationship between GDP growth and unemployment.

# **2.Data Processing**
# a.Data Acquisition of GDP growth
Read GDP growth dataset from a csv file.

In [10]:
import pandas as pd

gdp = pd.read_csv("world_gdp_data.csv", encoding="latin1")

# b.Data Cleaning of GDP Growth
In this dataset, there is a column called indicator_name, which is not relevant to this project and is therefore removed.
The dataset contains data for more than 100 countries, but since this project focuses only on China and the United States, the other countries are filtered out.
Because the merge function requires both datasets to be in long format, the data are reshaped before merging.
Finally, only the years of interest—1991 to 2021—are selected for analysis.

In [11]:

gdp = gdp.drop(columns=["indicator_name"])

countries_to_keep = [
    "United States",
    "China, People's Republic of"
]

gdp_filtered = gdp[gdp["country_name"].isin(countries_to_keep)]

gdp_long = gdp_filtered.melt(
    id_vars=["country_name"],
    var_name="Year",
    value_name="gdp_growth"
)

gdp_long["Year"] = gdp_long["Year"].astype(int)
gdp_long = gdp_long[
    (gdp_long["Year"] >= 1991) & (gdp_long["Year"] <= 2021)
]
print(gdp_long.head())

                   country_name  Year  gdp_growth
22  China, People's Republic of  1991         9.0
23                United States  1991        -0.1
24  China, People's Republic of  1992        14.3
25                United States  1992         3.5
26  China, People's Republic of  1993        13.9


# c.Data Acquisition of Unemployment Rate
Read unemployment rate dataset from a csv file.
Dataset resource:https://www.kaggle.com/datasets/pantanjali/unemployment-dataset

In [12]:
unemp_rate = pd.read_csv("unemployment analysis.csv")

# d.Data Cleaning of Unemployment Rate
The data cleaning process for the unemployment rate dataset is similar to that of the GDP growth dataset.
The only difference is that the irrelevant column in this dataset is Country Code.

In [13]:
unemp_rate = unemp_rate.drop(columns=["Country Code"])

countries_to_keep = [
    "United States",
    "China"
]

unemp_rate_filtered = unemp_rate[unemp_rate["Country Name"].isin(countries_to_keep)]

unemp_rate_long = unemp_rate_filtered.melt(
    id_vars=["Country Name"],
    var_name="Year",
    value_name="unemp_rate"
)

unemp_rate_long["Year"] = unemp_rate_long["Year"].astype(int)
unemp_rate_long = unemp_rate_long[
    (unemp_rate_long["Year"] >= 1991) & (unemp_rate_long["Year"] <= 2021)
]
print(unemp_rate_long.head())

    Country Name  Year  unemp_rate
0          China  1991        2.37
1  United States  1991        6.80
2          China  1992        2.37
3  United States  1992        7.50
4          China  1993        2.69


# e.Data Merge
Before merging the datasets, it is necessary to standardize any inconsistent entries.
The country name field is unified under the variable country_name.
China appears in two different forms—“China” and “China, People’s Republic of”—and these are consolidated into a single standardized name, “China.” The common fields required for merging are country_name and Year.

In [14]:
unemp_rate_long = unemp_rate_long.rename(columns={
    "Country Name": "country_name"
})
gdp_long["country_name"] = gdp_long["country_name"].replace(
    "China, People's Republic of",
    "China"
)

final = pd.merge(
    gdp_long,
    unemp_rate_long,
    on=["country_name", "Year"],
    how="inner"
)

print(final.head())
print(final.tail())

    country_name  Year  gdp_growth  unemp_rate
0          China  1991         9.0        2.37
1  United States  1991        -0.1        6.80
2          China  1992        14.3        2.37
3  United States  1992         3.5        7.50
4          China  1993        13.9        2.69
     country_name  Year  gdp_growth  unemp_rate
57  United States  2019         2.3        3.67
58          China  2020         2.2        5.00
59  United States  2020        -2.8        8.05
60          China  2021         8.4        4.82
61  United States  2021         5.9        5.46


# **3.Visualization**
Before creating the charts, the wide-format tables were reshaped into long format. The visualizations then track how GDP growth and unemployment change over time. Each country is shown in its own line chart, with years on the x-axis and percentage values on the y-axis. For every year, the chart plots two data points—one for GDP growth and one for the unemployment rate.

In [15]:
import plotly.express as px

final_long = final.melt(
    id_vars=["country_name", "Year"],
    value_vars=["gdp_growth", "unemp_rate"],
    var_name="indicator",
    value_name="percent"
)

def plot_country_line(country_name_cn, country_name_en):
    df_c = final_long[final_long["country_name"] == country_name_en].copy()

    fig = px.line(
        df_c,
        x="Year",
        y="percent",
        color="indicator",
        markers=True,
        hover_data={"Year": True, "percent": True},
        title=f"{country_name_cn}: GDP Growth and Unemployment Rate (1991–2021)",
        labels={
            "percent": "Percent (%)",
            "indicator": "Indicator"
        }
    )
    fig.update_layout(legend_title_text='')
    fig.show()

plot_country_line("United States", "United States")

plot_country_line("China", "China")

# **4.Conclusion**
The visualizations illustrate a clear inverse relationship between GDP growth and unemployment. In most years, movements in both indicators are relatively modest, so the relationship is less visible. But during major economic shocks—such as the 2008 financial crisis and the 2020 COVID-19 pandemic—the pattern becomes unmistakable: GDP growth drops sharply while unemployment rises.
Comparing the two countries reveals meaningful structural differences. The United States experiences larger year-to-year fluctuations in both GDP growth and unemployment, reflecting the dynamics of a mature, market-driven economy. China, by contrast, shows far more stable trends, with consistently higher GDP growth and notably lower unemployment rates throughout the period. These contrasting patterns highlight the countries’ different stages of economic development—slower but steady growth in the U.S. versus China’s sustained high-growth trajectory during its rapid development phase.