# Final Project: Trends in Immigration amid Demographic Decline in Japan (Report and Code)

**Author:** Kevin Jin

**Course:** CB&B 634 Computational Methods for Informatics

**Instructor:** Robert McDougal

**Term:** Fall 2023

# Introduction
In July of 2023, the Japanese government released data indicating that in 2022, the number of people in all 47 prefectures of the country fell for the first time since the government began tracking the data in 1968.[^1] Last year marked the 14th consecutive year that Japan's population has been falling and for the first time included Okinawa prefecture, which has historically had a high birthrate.[^2]

It is almost without exception that population decline spells economic disaster for a nation; as such, Japan faces grim economic prospects. The Japanese government has invested a titanic amount of resources into addressing the issue, yet heated debate continues over the causes of Japan's population decline and the potential solution of immigration and its implications.

This report will present Japan's ongoing demographic crash in graphic detail and attempt to identify causes. It will also spotlight immigration and provide spatial insights into the state of immigration in Japan.

[^1]: https://english.kyodonews.net/news/2023/07/c6b8e75dc7a9-japanese-population-falls-in-all-47-prefectures-for-1st-time.html
[^2]: https://www.bloomberg.com/news/articles/2023-07-26/japanese-population-falls-in-all-47-prefectures-for-first-time

# Part 1: Demographic crash 

## 1a: How does Japan's fertility rate compare to peer nations?

### Background: the fertility rate
To sustain a population, each woman within it needs to have somewhere around two children during her lifetime; interpreted in a vacuum, this value is known as the **total fertility rate** for that population and assumes that the individual woman's fertility rate remains constant throughout her lifetime and that she lives from birth until the end of her reproductive life. Total fertility rates can vary significantly from country to country depending on mortality rates.[^1] Averaged globally, the total fertility rate needed to sustain the world population (known as **replacement-level fertility**) is around 2.1.[^2] If the total fertility rate falls below replacement-level, each new generation will be less populous than the older, previous one, a phenomenon known as **sub-replacement fertility**.[^3] Japan's total fertility rate was around 1.26 in 2022, a record low that is well below replacement-level and especially striking considering that it has declined for seven consecutive years, firmly placing it among the countries with the lowest fertility rates. [^4]

[^1]: https://en.wikipedia.org/w/index.php?title=Total_fertility_rate
[^2]: https://doi.org/10.1023/B:POPU.0000020882.29684.8e
[^3]: https://en.wikipedia.org/w/index.php?title=Sub-replacement_fertility
[^4]: https://www.reuters.com/world/asia-pacific/japan-demographic-woes-deepen-birth-rate-hits-record-low-2023-06-02/

###  Dataset 1: Total fertility rates by country from 1960 - 2022
Data is sourced from [The World Bank](https://data.worldbank.org/indicator/SP.DYN.TFRT.IN) and includes data up through 2022 for countries of interest. The World Bank provides its data under the [Creative Commons Attribution 4.0 International license](https://creativecommons.org/licenses/by/4.0).

#### Data cleaning
Manually removed CSV headers (rows 1-4) and imputed total fertility rate data from 2022 to countries of interest. 2022 data was sourced from below:
* Japan: [Reuters](https://www.reuters.com/world/asia-pacific/japan-demographic-woes-deepen-birth-rate-hits-record-low-2023-06-02/)
* South Korea: [CNN](https://www.cnn.com/2023/12/15/asia/south-korea-to-see-population-plummet-intl-hnk/index.html)
* China: [The Guardian](https://www.theguardian.com/world/2023/aug/16/china-fertility-rate-dropped-to-record-low-in-2022-estimates-show)
* United States: [Centers for Disease Control and Prevention](https://www.cdc.gov/nchs/data/vsrr/vsrr028.pdf)
* Europe: [OECD](https://data.oecd.org/pop/fertility-rates.htm)

In [99]:
# Load data for total fertility rates by year and country
import pandas as pd

fertility_rates = pd.read_csv("datasets/1_demographics/total_fertility_rates_by_year.csv")
fertility_rates.drop(columns=["Country Code", "Indicator Name", "Indicator Code"], inplace=True) # Drop unnecessary columns
fertility_rates = fertility_rates.transpose() # Transpose the dataframe for easier manipulation
fertility_rates.rename(columns=fertility_rates.iloc[0], inplace=True) # Rename columns to country names
fertility_rates.drop(fertility_rates.index[0], inplace = True) # Drop the first row
fertility_rates.insert(0, 'Year', fertility_rates.index) # Add a column for the year
fertility_rates = fertility_rates.reset_index() # Reset the index
fertility_rates.head()
fertility_rates.drop(columns=["index"], inplace=True) # Drop the old index

# Plot fertility rates for Japan and peer nations over 1960 - 2022
import plotly.express as px

fig = px.line(fertility_rates, x="Year", y=["Japan", "China", "Korea, Rep.", "United States"])
fig.update_xaxes(tickmode='linear', tick0=1960, dtick=5, tickangle=45) # Only show x-axis labels every 5 years and rotate them 45 degrees
fig.update_traces(line=dict(color='crimson'), selector=dict(name='Japan')) # Change the color of the Japan line
fig.update_traces(line=dict(color='orange'), selector=dict(name='China')) # Change the color of the China line
fig.update_traces(line=dict(color='blue'), selector=dict(name='Korea, Rep.')) # Change the color of the Korea line
fig.update_traces(line=dict(color='green'), selector=dict(name='United States')) # Change the color of the United States line
fig.update_layout(title_text='Total Fertility Rates of Japan and Selected Peer Nations (1960 - 2022)', xaxis_title='Year', yaxis_title='Total Fertility Rate') # Plot and axis titles
fig.update_layout(legend_title_text='Country', legend=dict(yanchor="top", y=0.99, xanchor="right", x=0.99)) # Move the legend to the top right corner
fig.add_hline(y=2.1, line_dash="dash", line_color="black", annotation_text="Global Replacement Fertility Rate", annotation_position="top right") # Indicate global average fertility rate of 2.1 with horizontal line
fig.show()

In [100]:
# Predict fertility rates for Japan and peer nations over 2023 - 2050 using ARIMA
from statsmodels.tsa.arima.model import ARIMA
import numpy as np

# Create separate dataframe for predictions


### Dataset 2: Trends in Japan's population over time


## 1b: What is causing the population decline?

### Dataset 3: Japanese salaries over time

Data is sourced from [OECD](https://data.oecd.org/earnwage/average-wages.htm) and provided under a Creative Commons license.

In [115]:
import pandas as pd

wages = pd.read_csv("datasets/1_demographics/average_wages.csv")
wages = wages[wages['LOCATION'] == "JPN"] # Take only Japan rows
wages = wages[['TIME', 'Value']] # Grab year and value columns
wages.rename(columns={"TIME": "Year", "Value": "Average Wage (USD)"}, inplace=True) # Rename columns
wages.reset_index(drop=True, inplace=True) # Reset the index

# Grab fertility rates for Japan from 1991 - 2022
fertility_rates_japan = fertility_rates[["Year", "Japan"]].apply(pd.to_numeric) # Convert columns to numeric
fertility_rates_japan.rename(columns={"Japan": "Total Fertility Rate"}, inplace=True) # Rename column
fertility_rates_japan = fertility_rates_japan[fertility_rates_japan['Year'] >= 1991] # Take only rows from 1991 onwards

# Combine the two dataframes
wages = wages.join(fertility_rates_japan.set_index('Year'), on='Year') # Join the two dataframes on the year column
wages

Unnamed: 0,Year,Average Wage (USD),Total Fertility Rate
0,1991,40378.559106,1.53
1,1992,40433.759448,1.5
2,1993,40123.212265,1.46
3,1994,40523.159403,1.5
4,1995,41013.0,1.42
5,1996,41183.899248,1.43
6,1997,41509.997917,1.39
7,1998,41335.46599,1.38
8,1999,41196.883172,1.34
9,2000,41428.236874,1.36


In [122]:
# Plot average wages against fertility rates for Japan over 1991 - 2022
import plotly.express as px

fig = px.scatter(wages, x="Average Wage (USD)", y="Total Fertility Rate", trendline="ols", trendline_color_override="red") # Plot scatter plot with linear trendline
fig.update_layout(title_text='Average Wage vs. Total Fertility Rate in Japan (1991 - 2022)', xaxis_title='Average Wage (USD)', yaxis_title='Total Fertility Rate') # Plot and axis titles
fig.show()

# Part 2: Immigration to Japan

## 2a: What kinds of people are coming to Japan?

### Dataset 3: Visa application by nationality in Japan
[Visa Application by Nationality in Japan](https://www.kaggle.com/datasets/yutodennou/visa-issuance-by-nationality-and-region-in-japan/data) is a multivariate dataset compiled by Waticson on Kaggle. It primarily contains information on numbers of visas issued as well as purpose of visit per country from 2006 to 2017. This dataset is licensed using the [Database Contents License (DbCL) v1.0, from Open Data Commons.](https://opendatacommons.org/licenses/dbcl/1-0/), which explicitly permits reuse.

In [None]:
# Load data
import pandas as pd
visas = pd.read_csv("datasets/2_immigration/visa_number_in_japan.csv")
visas.head()

In [None]:
import plotly.express as px

top_countries = visas[visas['Country'] != 'total'].groupby('Country')['Number of issued'].sum().nlargest(10).index
top_countries_data = visas[(visas['Country'].isin(top_countries)) & (visas['Country'] != 'total')]

fig = px.line(top_countries_data, x='Year', y='Number of issued', color='Country')
fig.update_layout(title='Number of Visas Issued Over Time for Top Ten Countries of Issuance',
                  xaxis_title='Year',
                  yaxis_title='Number of Visas Issued')
fig.show()

In [None]:
# Exploratory plots
import seaborn as sns
import matplotlib.pyplot as plt

## Total visas issued from 2006-2017
# Pull country totals
total = visas[visas["Country"] == "total"]
total.head(10)

# Delete non-total rows
visas = visas[visas["Country"] != "total"].reset_index(drop=True)
visas.head(10)

fig = plt.figure(figsize = (10,7))
ax = sns.barplot(x=total["Year"], y=total["Number of issued"])
fig.autofmt_xdate()
plt.title("Total visas issued by Japan from 2006-2017")
plt.show()

**Commentary:** The number of visas issued by Japan steadily increased from 2006-2017, and China has remained the country sending the most visa applicants throughout.

## 2b: Where are they going?

### Dataset 3: Spatial distribution of migration within Japan

# Conclusion: Recommendations

