# Assignment: Data Extraction
Extract Thailand's Covid-19 cases data via DDC open-data API at https://covid19.th-stat.com/api/open/cases
and answer the following questions:
- how many people are infected on new year day of 2021?
- which country **is not** in the list of the nationality of the infected people on the New Year day of 2021?
- what are the average ages by gender of infected people on the New Year day of 2021?

## how many people are infected on new year day of 2021?

In [None]:
import requests
import json
import pprint
import pandas as pd

In [None]:
api_url = 'https://covid19.th-stat.com/api/open/cases'
data_info = requests.get(api_url)

covid_info = json.loads(data_info.text)

In [None]:
data_list = []
for d in covid_info['Data']:
  data_list.append((d['ConfirmDate'], d['No'], d['Age'], d['Gender'], d['GenderEn'], 
                    d['Nation'], d['NationEn'], d['Province'], d['ProvinceId'], 
                    d['District'], d['ProvinceEn'], d['Detail'], d['StatQuarantine']))
data_list

In [None]:
pd.DataFrame.from_dict(covid_info)

In [None]:
df = pd.DataFrame(data_list, columns=['ConfirmDate','No','Age','Gender','GenderEn','Nation','NationEn','Province','ProvinceId','District','ProvinceEn','Detail','StatQuarantine'])

In [None]:
from datetime import datetime, timezone

new_year = datetime(2021, 1, 1, tzinfo=timezone.utc)
df['ConfirmDate'] = pd.to_datetime(df.ConfirmDate, utc=True)
df[df.ConfirmDate == new_year].shape[0]

279

## Which country is not in the list of the nationality of the infected people on the New Year day of 2021?

In [None]:
df_new_year = df[df.ConfirmDate == new_year]

In [None]:
nation_ls = df_new_year.Nation.value_counts().index.tolist()
mcv_choices = ['Thailand', 'Cambodia', 'Burma', 'USA', 'China', 'Germany', 'Japan', 'Slovenia', 'Vietnam', 'Italy']
ans = []
for e in mcv_choices:
  if e not in nation_ls:
    ans.append(e)

ans

['USA', 'Japan']

## What are the average ages by gender of infected people on the New Year day of 2021?

In [None]:
df_group = df_new_year.groupby('GenderEn')
df_group.Age.mean()

GenderEn
Female     38.675676
Male       39.417910
Unknown          NaN
Name: Age, dtype: float64