# COVID-19: Spread situation by prefecture in Japan

I will zoom into Japan in this kernel, to visualize how Coronavirus spread in each prefecture in Japan.

The 2 external dataset is used:
 - [Japan COVID-19 by prefecture](https://www.kaggle.com/corochann/japan-covid19-by-prefecture): Individual covid confirmed case's precise information in Japan. Data is from [kaz-ogiwara/covid19](https://github.com/kaz-ogiwara/covid19/) 
 - [Japan Prefecture Latitude Longitude](https://www.kaggle.com/corochann/japan-prefecture-latitude-longitude): It stores longitude, latitude information of each prefecture in Japan.
 - [Japan population by age and sex in 2020](https://www.kaggle.com/corochann/japan-population-by-age-and-sex-in-2020): Precise population information in Japan.

**Please upvote both kernel & dataset** if you find it useful :)

In [None]:
import gc
import os
from pathlib import Path
import random
import sys

from tqdm.notebook import tqdm
import numpy as np
import pandas as pd
import scipy as sp


import matplotlib.pyplot as plt
import seaborn as sns

from IPython.core.display import display, HTML

# --- plotly ---
from plotly import tools, subplots
import plotly.offline as py
py.init_notebook_mode(connected=True)
import plotly.graph_objs as go
import plotly.express as px
import plotly.figure_factory as ff
import plotly.io as pio
pio.templates.default = "plotly_dark"

# --- models ---
from sklearn import preprocessing
from sklearn.model_selection import KFold
import lightgbm as lgb
import xgboost as xgb
import catboost as cb

# --- setup ---
pd.set_option('max_columns', 50)

# Load Data

Load data and convert Japanese columns into English so that others can understand :).

In [None]:
# Input data files are available in the "../input/" directory.
import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    filenames.sort()
    for filename in filenames:
        print(os.path.join(dirname, filename))


COVID-19 data in Japan.

In [None]:
individual_df = pd.read_csv('/kaggle/input/japan-covid19-by-prefecture/individuals.csv')
individual_df = individual_df.rename({
    '新No.': 'new_number',
    '旧No.': 'old_number', 
    '確定年': 'year', 
    '確定月': 'month', 
    '確定日': 'day', 
    '年代': 'age', 
    '性別': 'sex',
    '居住地1': 'place1',
    '居住地2': 'place2',
    '備考': 'remark'}, axis=1)
# Create "date" column, and convert to datetime type
individual_df['date'] = individual_df['year'].apply(str) + '/' + individual_df['month'].apply(str) + '/' + individual_df['day'].apply(str)
individual_df['date'] = pd.to_datetime(individual_df['date'])

In [None]:
# Convert sex to English
individual_df['sex'] = individual_df['sex'].map({'男': 'male', '女': 'female', '不明': 'unkwown'})
# Convert age to English
individual_df['age'] = individual_df['age'].map({
    '10歳未満': '0-9',
    '10代': '10-19',
    '20代': '20-29',
    '30代': '30-39',
    '40代': '40-49',
    '50代': '50-59',
    '60代': '60-69',
    '70代': '70-79',
    '80代': '80-89',
    '90代': '90-99',
    '不明': 'unknown',
})

Latitude Longitude information in Japan

In [None]:
japan_latlng = pd.read_csv('/kaggle/input/japan-prefecture-latitude-longitude/japan_prefecture_latlng.csv')

Population information in Japan

In [None]:
japan_pop_df = pd.read_csv('/kaggle/input/japan-population-by-age-and-sex-in-2020/japan_population.csv')

In [None]:
# Re-grouping
tmp_list = []
for i in range(10):
    if i == 9:
        tmp = japan_pop_df.iloc[2*i] + japan_pop_df.iloc[2*i+1] + japan_pop_df.iloc[2*i+2]
    else:
        tmp = japan_pop_df.iloc[2*i] + japan_pop_df.iloc[2*i+1]
    if i == 0:
        tmp['age'] = f'0-9'
    elif i == 9:
        tmp['age'] = f'90-'
    else:
        tmp['age'] = f'{i}0-{i}9'
    tmp_list.append(tmp)
population_df = pd.DataFrame(tmp_list)

In [None]:
population_df

# Visualize spread of Coronavirus in Japan

In [None]:
place1_list = individual_df['place1'].unique()
data_list = []

for place1 in tqdm(place1_list):
    date = individual_df['date'].min()
    confirmed_total = 0
    while date <= individual_df['date'].max():
        confirmed = len(individual_df.query('(place1 == @place1) & (date == @date)'))
        confirmed_total += confirmed
        data_list.append({
            'date': date,
            'place1': place1,
            'confirmed': confirmed,
            'confirmed_total': confirmed_total})

        #print(date)
        date += pd.Timedelta(1, unit='d')
    
tmpdf = pd.DataFrame(data_list)

In [None]:
tmpdf2 = pd.merge(tmpdf.rename({'place1': 'prefecture'}, axis=1), japan_latlng, on='prefecture')
tmpdf2['datestr'] = tmpdf2['date'].apply(str)

The size shows "accumulated" confirmed cases and color shows "daily" confirmed cases for each prefecture in Japan.

 - Hokkaido (North side): Suddenly spread at the end of Feb. However Hokkaido announced "emergency declaration" as quick counter measure and it seems the spread slow down now.
 - Osaka, Hyogo, Aichi (Middle side): These prefectures are 2nd, 3rd largest cities in Japan. Coronavirus spread in beginning of March and it seels the spread slowing down.
 - Around Tokyo (East side): This is most largetst, and high population density city in Japan. The number gradually grows in Tokyo, and sadly the number increases rapidly recently...
 - Other areas: although we can find some cases happen in each prefecturein rural area, its case is not so many so far.

The nice news is spread is not so fast except Tokyo. We need to take care how the number increases in Tokyo for the time being (written in end of March).

In [None]:
fig = px.scatter_geo(
    tmpdf2, lat='lat', lon='long', color='confirmed', size='confirmed_total', scope='asia',
    animation_frame='datestr', range_color=[0, 17], hover_name='prefecture_en',
    center={'lat': 37, 'lon': 136.5})
fig.update_layout(margin={"r": 0,"t": 0, "l": 0,"b": 0})
fig.layout.geo.projection = go.layout.geo.Projection(scale=3.)
fig.show()

# Confirmed cases by Age

Let's check age-confirmed cases situation.

In [None]:
agedf = individual_df['age'].value_counts().reset_index().rename(
    {'index': 'age', 'age': 'count'}, axis=1)

fig = px.pie(agedf, values='count', names='age', title='Ratio of age')
fig.update_traces(textposition='inside', textinfo='percent+label')
fig.show()

People around 50 years old have more cases.

In [None]:
fig = px.bar(agedf.sort_values('age', ascending=False), 
             x='count', y='age',
             title='Confirmed case by age', text='count', orientation='h')
fig.show()

Above figure shows total number confirmed cases.

Now let's check the confirmed case "ratio" by population. The bar graph is almost same with above. People around 50 years old have highest rate, but even so it is 1 person per 100,000 people.

In [None]:
age_df = pd.merge(agedf, population_df)
age_df['confirmed_rate'] = age_df['count'] / age_df['total'] * 10000

fig = px.bar(age_df.sort_values('age', ascending=False), 
             x='confirmed_rate', y='age',
             title='Confirmed case per 10000 people by age', text='confirmed_rate', orientation='h')
fig.show()

Just a side note, this is the population in Japan.

In [None]:
population_df

In [None]:
fig = px.bar(population_df.sort_values('age', ascending=False), 
             x='total', y='age', 
             title='Population in Japan', text='total', orientation='h')
fig.show()

# Confirmed cases by sex

Male seems to got slightly more chances, but the difference not so big.

In [None]:
fig = px.pie(individual_df['sex'].value_counts().reset_index(),
             values='sex', names='index', title='Ratio of sex')
fig.update_traces(textposition='inside', textinfo='percent+label')
fig.show()

Note: when we see total population, female is slightly more than male in Japan.

In [None]:
fig = px.bar(population_df.sum()[['male', 'female']].reset_index().rename({0: 'count', 'index': 'sex'}, axis=1), 
             x='count', y='sex', 
             title='Population in Japan by sex', text='count', orientation='h')
fig.show()


That's all for visualization. Thank you for reading.

Tokyo is one of the most highest population density city in the world, but current Coronavirus spread is not so fast compared to other Europe, US countries so far.<br/>
Why? Of course precise reason is not revealed yet, but there's some article with hypothesis to consider about this fact. Please refer further reading for details.<br/>


Further reading:

 - [Total confirmed cases of COVID-19](https://ourworldindata.org/grapher/covid-confirmed-cases-since-100th-case?time=63)
 - [If I were North American/West European/Australian, I will take BCG vaccination now against the novel coronavirus pandemic.](https://www.jsatonotes.com/2020/03/if-i-were-north-americaneuropeanaustral.html)
 - [Why is Japan still a coronavirus outlier?](https://www.japantimes.co.jp/opinion/2020/03/21/commentary/japan-commentary/japan-still-coronavirus-outlier/#comment-4843977551)