In [40]:
import plotly.io as pio

pio.renderers.default = "vscode+jupyterlab+notebook_connected"

# Project 2: Analyzing the Relationship between GDP and Population in Japan (1960-2023)

## **Project Overview**
This project investigates the relationship between Japan's GDP (Gross Domestic Product) and its population trends from 1960 to 2023. By analyzing these data, I aim to understand how population changes, especially during the recent period of population decline, correlate with economic performance. The analysis includes visualizations like line graphs for GDP and population, and scatter plots to explore potential correlations.

## **Data Sources**
The datasets used in this analysis were sourced from the **World Bank's World Development Indicators** repository:

1. **GDP Data**:
   - Contains GDP values in current US dollars for all countries from 1960 to 2023.
   - Downloaded as a CSV file from the World Bank's [Open Data Portal](https://data.worldbank.org/indicator/NY.GDP.MKTP.CD).

2. **Population Data**:
   - Includes population estimates for all countries from 1960 to 2023.
   - Downloaded as a CSV file from the World Bank's [Open Data Portal](https://data.worldbank.org/indicator/SP.POP.TOTL).

## **Data Preprocessing Steps**
### **Step 1. Loading Data**
The CSV files were read into a Python environment using the `pandas` library.

In [41]:
import pandas as pd

gdp_data = pd.read_csv('gdp_by_countries.csv', header=2)
population_data = pd.read_csv('population_by_countries.csv', header=2)

### **Step 2: Exploring the Dataset**
Before extracting Japan's data, I first explore the dataset to understand its structure, column names, and sample data.

In [42]:
gdp_data.info()
gdp_data.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 266 entries, 0 to 265
Data columns (total 69 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Country Name    266 non-null    object 
 1   Country Code    266 non-null    object 
 2   Indicator Name  266 non-null    object 
 3   Indicator Code  266 non-null    object 
 4   1960            138 non-null    float64
 5   1961            142 non-null    float64
 6   1962            144 non-null    float64
 7   1963            144 non-null    float64
 8   1964            144 non-null    float64
 9   1965            154 non-null    float64
 10  1966            155 non-null    float64
 11  1967            158 non-null    float64
 12  1968            159 non-null    float64
 13  1969            159 non-null    float64
 14  1970            181 non-null    float64
 15  1971            182 non-null    float64
 16  1972            182 non-null    float64
 17  1973            182 non-null    flo

Unnamed: 0,Country Name,Country Code,Indicator Name,Indicator Code,1960,1961,1962,1963,1964,1965,...,2015,2016,2017,2018,2019,2020,2021,2022,2023,Unnamed: 68
0,Aruba,ABW,GDP (current US$),NY.GDP.MKTP.CD,,,,,,,...,2962907000.0,2983635000.0,3092429000.0,3276184000.0,3395799000.0,2558906000.0,3103184000.0,3544708000.0,,
1,Africa Eastern and Southern,AFE,GDP (current US$),NY.GDP.MKTP.CD,21216960000.0,22307470000.0,23702470000.0,25779380000.0,28049540000.0,30374910000.0,...,899295700000.0,829830000000.0,940105500000.0,1012719000000.0,1006527000000.0,929074100000.0,1086772000000.0,1183962000000.0,1236163000000.0,
2,Afghanistan,AFG,GDP (current US$),NY.GDP.MKTP.CD,,,,,,,...,19134220000.0,18116570000.0,18753460000.0,18053220000.0,18799440000.0,19955930000.0,14266500000.0,14502160000.0,,
3,Africa Western and Central,AFW,GDP (current US$),NY.GDP.MKTP.CD,11884130000.0,12685660000.0,13606830000.0,14439980000.0,15769110000.0,16934480000.0,...,769367300000.0,692181100000.0,685750200000.0,768189600000.0,823933600000.0,787146700000.0,845993000000.0,877140800000.0,796586200000.0,
4,Angola,AGO,GDP (current US$),NY.GDP.MKTP.CD,,,,,,,...,90496420000.0,52761620000.0,73690150000.0,79450690000.0,70897960000.0,48501560000.0,66505130000.0,104399700000.0,84722960000.0,


In [43]:
population_data.info()
population_data.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 266 entries, 0 to 265
Data columns (total 69 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Country Name    266 non-null    object 
 1   Country Code    266 non-null    object 
 2   Indicator Name  266 non-null    object 
 3   Indicator Code  266 non-null    object 
 4   1960            264 non-null    float64
 5   1961            264 non-null    float64
 6   1962            264 non-null    float64
 7   1963            264 non-null    float64
 8   1964            264 non-null    float64
 9   1965            264 non-null    float64
 10  1966            264 non-null    float64
 11  1967            264 non-null    float64
 12  1968            264 non-null    float64
 13  1969            264 non-null    float64
 14  1970            264 non-null    float64
 15  1971            264 non-null    float64
 16  1972            264 non-null    float64
 17  1973            264 non-null    flo

Unnamed: 0,Country Name,Country Code,Indicator Name,Indicator Code,1960,1961,1962,1963,1964,1965,...,2015,2016,2017,2018,2019,2020,2021,2022,2023,Unnamed: 68
0,Aruba,ABW,"Population, total",SP.POP.TOTL,54608.0,55811.0,56682.0,57475.0,58178.0,58782.0,...,104257.0,104874.0,105439.0,105962.0,106442.0,106585.0,106537.0,106445.0,106277.0,
1,Africa Eastern and Southern,AFE,"Population, total",SP.POP.TOTL,130692579.0,134169237.0,137835590.0,141630546.0,145605995.0,149742351.0,...,600008424.0,616377605.0,632746570.0,649757148.0,667242986.0,685112979.0,702977106.0,720859132.0,739108306.0,
2,Afghanistan,AFG,"Population, total",SP.POP.TOTL,8622466.0,8790140.0,8969047.0,9157465.0,9355514.0,9565147.0,...,33753499.0,34636207.0,35643418.0,36686784.0,37769499.0,38972230.0,40099462.0,41128771.0,42239854.0,
3,Africa Western and Central,AFW,"Population, total",SP.POP.TOTL,97256290.0,99314028.0,101445032.0,103667517.0,105959979.0,108336203.0,...,408690375.0,419778384.0,431138704.0,442646825.0,454306063.0,466189102.0,478185907.0,490330870.0,502789511.0,
4,Angola,AGO,"Population, total",SP.POP.TOTL,5357195.0,5441333.0,5521400.0,5599827.0,5673199.0,5736582.0,...,28127721.0,29154746.0,30208628.0,31273533.0,32353588.0,33428486.0,34503774.0,35588987.0,36684202.0,


### **Step 3: Extracting Japan's Data**
Both datasets included a `"Country Name"` column, which identifies the country for each row. This was used to filter the data for Japan only. After filtering, only the `"Year"` and the respective values (`GDP` or `Population`) columns were retained.

#### **Step 3.1: Remove unnecessary columns**
Unnecessary columns such as `"Country Code"`, `"Indicator Name"`, `"Indicator Code"`, and others were removed, leaving only `"Country Name"` and the year columns (1960–2023).

In [44]:
japan_gdp_data = gdp_data[['Country Name'] + [str(year) for year in range(1960, 2024)]]
japan_population_data = population_data[['Country Name'] + [str(year) for year in range(1960, 2024)]]

#### **Step 3.2: Filter only Japan's data**
Using the `"Country Name"` column, the rows corresponding to Japan were extracted from both the GDP and Population datasets.

In [45]:
japan_gdp_data = japan_gdp_data[japan_gdp_data['Country Name'] == 'Japan']
japan_population_data = japan_population_data[japan_population_data['Country Name'] == 'Japan']

#### **Step 3.3: Reshape the data using `melt`**
The wide-format data was transformed into a long-format structure using the `melt` function. This allowed the year columns to be combined into a single `"Year"` column with corresponding values for `GDP` or `Population`.

In [46]:
japan_gdp_cleaned = japan_gdp_data.melt(id_vars=['Country Name'], var_name='Year', value_name='GDP')
japan_population_cleaned = japan_population_data.melt(id_vars=['Country Name'], var_name='Year', value_name='Population')

#### **Step 3.4: Convert Columns to Proper Data Types**

To ensure consistency in data types, I convert the `"Year"` column to integers and confirm that the `"Population"` column is also in integer format. 

In [47]:
japan_gdp_cleaned['Year'] = japan_gdp_cleaned['Year'].astype(int)
japan_population_cleaned['Year'] = japan_population_cleaned['Year'].astype(int)

japan_population_cleaned['Population'] = japan_population_cleaned['Population'].astype(int)

#### **Step 3.5: Cleaned Data Preview**
The cleaned data for GDP and Population is displayed below:

In [48]:
japan_gdp_cleaned.head()

Unnamed: 0,Country Name,Year,GDP
0,Japan,1960,47419240000.0
1,Japan,1961,57266760000.0
2,Japan,1962,64987860000.0
3,Japan,1963,74379280000.0
4,Japan,1964,87490590000.0


In [49]:
japan_population_cleaned.head()

Unnamed: 0,Country Name,Year,Population
0,Japan,1960,93216000
1,Japan,1961,94055000
2,Japan,1962,94933000
3,Japan,1963,95900000
4,Japan,1964,96903000


### **Step 4: Merge GDP and Population Data in Japan**
To combine the GDP and population data for analysis, I merge the cleaned datasets (`japan_gdp_cleaned` and `japan_population_cleaned`) on the `"Year"` column. This creates a single dataframe, `merged_data`, containing GDP and population values for each year.

In [50]:
merged_data = pd.merge(japan_gdp_cleaned, japan_population_cleaned, on='Year', how='inner')

merged_data.head()

Unnamed: 0,Country Name_x,Year,GDP,Country Name_y,Population
0,Japan,1960,47419240000.0,Japan,93216000
1,Japan,1961,57266760000.0,Japan,94055000
2,Japan,1962,64987860000.0,Japan,94933000
3,Japan,1963,74379280000.0,Japan,95900000
4,Japan,1964,87490590000.0,Japan,96903000


After merging the GDP and Population datasets, the resulting dataframe included redundant columns (`Country Name_x` and `Country Name_y`). These columns were created because both original datasets contained a column named `"Country Name"`. 

To ensure that the dataframe remains concise and focused on the relevant data (`Year`, `GDP`, and `Population`), the unnecessary columns were removed. Cleaning the merged dataset makes it easier to read, process, and visualize the data in subsequent steps.

In [51]:
merged_data = merged_data.drop(columns=['Country Name_x', 'Country Name_y'])

merged_data.head()

Unnamed: 0,Year,GDP,Population
0,1960,47419240000.0,93216000
1,1961,57266760000.0,94055000
2,1962,64987860000.0,94933000
3,1963,74379280000.0,95900000
4,1964,87490590000.0,96903000


## **Analysis and Visualizations**
### **Step 5. GDP Trend Line Graph**

I use `plotly` to create an interactive line graph showing the trend of Japan's GDP from 1960 to 2023. This visualization highlights key economic periods, such as post-war growth, the bubble economy, and recent stagnation.

In [52]:
import plotly.express as px

In [53]:
fig = px.line(
    merged_data,
    x='Year',
    y='GDP',
    title='GDP Trend in Japan (1960-2023)',
    labels={'GDP': 'GDP (current US$)', 'Year': 'Year'},
    markers=True
)

fig.update_layout(
    xaxis_title="Year",
    yaxis_title="GDP (current US$)",
    template="plotly_white",
    title_font_size=18,
    title_x=0.5,
    font=dict(size=12)
)

fig.show()

### **Step 6: Population Trend Line Graph**
I visualize the population trends in Japan during the same period. This graph shows steady growth until the early 2000s, followed by a noticeable decline.

In [54]:
fig_population = px.line(
    merged_data,
    x='Year',
    y='Population',
    title='Population Trend in Japan (1960-2023)',
    labels={'Population': 'Population', 'Year': 'Year'},
    markers=True
)

fig_population.update_layout(
    xaxis_title="Year",
    yaxis_title="Population",
    template="plotly_white",
    title_font_size=18,
    title_x=0.5,
    font=dict(size=12)
)

fig_population.show()

### **Step 7: Analyze the Correlation Between GDP and Population**

To explore the relationship between GDP and population, I use scatter plots. These visualizations help me identify trends and correlations over different periods. Specifically, I examine:

1. **The entire period (1960–2023)**:
   - This scatter plot reveals the long-term correlation between GDP and population, reflecting historical economic trends.

2. **The recent period of population decline (2000–2023)**:
   - By focusing on this period, I can assess how the relationship between GDP and population has evolved during the population decline phase.
  
#### **Step 7.1: The entire period (1960–2023)**
In this step, a scatter plot is created to visualize the relationship between GDP and population for the entire period from 1960 to 2023.

In [55]:
fig_scatter = px.scatter(
    merged_data,
    x='Population',
    y='GDP',
    title='Correlation between GDP and Population in Japan (1960-2023)',
    labels={'Population': 'Population', 'GDP': 'GDP (current US$)'},
    trendline="ols"
)

fig_scatter.update_layout(
    xaxis_title="Population",
    yaxis_title="GDP (current US$)",
    template="plotly_white",
    title_font_size=18,
    title_x=0.5,
    font=dict(size=12)
)

fig_scatter.show()

#### **Step 7.2: The Recent Period of Population Decline (2000–2023)**
This step focuses on the population decline phase by filtering the data for the years 2000 to 2023. A scatter plot is then created to examine how the relationship between GDP and population has evolved during this period.

In [56]:
post_2000_data = merged_data[merged_data['Year'] >= 2000]

fig_scatter_2000 = px.scatter(
    post_2000_data,
    x='Population',
    y='GDP',
    title='Correlation between GDP and Population in Japan (2000-2023)',
    labels={'Population': 'Population', 'GDP': 'GDP (current US$)'},
    trendline="ols"
)

fig_scatter_2000.update_layout(
    xaxis_title="Population",
    yaxis_title="GDP (current US$)",
    template="plotly_white",
    title_font_size=18,
    title_x=0.5,
    font=dict(size=12)
)

fig_scatter_2000.show()

## **Conclusion**

The analysis highlights notable differences in the relationship between GDP and population in Japan when comparing the full historical period (1960–2023) and the more recent period of population decline (2000–2023). The findings can be summarized as follows:

#### **1960–2023: A Strong Positive Correlation**
In the scatter plot covering the entire period, there is a clear positive correlation between GDP and population. This reflects Japan’s post-war economic boom, where rapid population growth coincided with industrial expansion and increased GDP. During this time, a larger workforce contributed to economic productivity, and urbanization supported both population growth and GDP increases.

#### **2000–2023: Diverging Trends**
In contrast, the scatter plot focusing on 2000–2023 shows a weaker or more complex correlation. Despite a declining population, GDP remained relatively stable or showed slight growth due to factors such as:

- **Increased Productivity**: Advances in technology and efficiency improvements helped maintain economic output.
- **Globalization**: Japanese companies expanded internationally, contributing to GDP without relying solely on domestic population growth.
- **Policy Interventions**: Economic policies and stimulus measures may have mitigated the impact of population decline on GDP.

#### **Implications**
1. **Economic Resilience**:
   - The weakening of the GDP-population correlation in recent years highlights Japan's ability to maintain economic stability despite demographic challenges.
   - However, this trend may not be sustainable in the long term, as the aging population could impose greater fiscal burdens.

2. **Future Challenges**:
   - Continued population decline might lead to reduced consumer demand, labor shortages, and increased social security costs.
   - Innovative strategies, such as automation, immigration policy reform, and economic diversification, will be crucial to offset these challenges.

3. **Policy Recommendations**:
   - Policymakers should focus on fostering productivity and innovation while addressing demographic issues through comprehensive reforms in education, healthcare, and labor markets.

By contrasting these two periods, the analysis underscores the evolving dynamics of Japan’s economy and the critical need to adapt to demographic shifts.