## Dashboarding Covid Development with Python Scraping and Pyecharts
----


### 0. Preface
* Tencent weiste changed it's JSON coding since the beginning of Feb 2020.
* Dashboard with Baidu's pyecharts library - way easier than ever.
* Data souce: [Tencent Live Tracking of 2019-nCoV](https://news.qq.com/zt2020/page/feiyan.htm)
* Updated the contents since tencent has changed quite a lot of data structure.


### Part 1. Web page analysis

Data source: https://news.qq.com/zt2020/page/feiyan.htm?from=timeline&isappinstalled=0

The Inspector tool of Chome or Firefox shall be able to help you:

* China Domenstic Data Interface:
  
    https://api.inews.qq.com/newsqa/v1/query/inner/publish/modules/list?modules=

* China Provincial Hostoric Data Interface：
  
    https://api.inews.qq.com/newsqa/v1/query/pubished/daily/list?adCode=

* International Data Interface:
  
    https://api.inews.qq.com/newsqa/v1/automation/modules/list?modules=


### Part 2 Catch Data

2.1 Import modules

In [15]:
import time 
import json
import requests
from datetime import datetime
import pandas as pd 
import numpy as np
import matplotlib.pyplot as plt 

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None) 
plt.rcParams['font.sans-serif']=['SimHei'] # for display Chinese texts
plt.rcParams['axes.unicode_minus']=False 
plt.style.use('ggplot')

2.2 Catch the Data Stream

Steps: 
* define the function
* catch the data stream using the function, then process data to dataframe using Pandas

In [16]:
def catch_data(api_name):
    url = 'https://api.inews.qq.com/newsqa/v1/query/inner/publish/modules/list?modules=' + api_name
    reponse = requests.get(url=url).json()
    return reponse

* China Domenstic Daily Added (New Cases) Data Interface - chinaDayList

In [17]:
# China recent 60-day data
chinadaylist = catch_data('chinaDayList')
chinadaylist = pd.DataFrame(chinadaylist['data']['chinaDayList'])
chinadaylist['date'] = pd.to_datetime(chinadaylist['y'].astype('str') + '.' + chinadaylist['date'])
chinadaylist = chinadaylist[['date','confirm','heal','dead','importedCase','nowConfirm','nowSevere','localConfirm']]
chinadaylist.columns = ['日期','累计确诊','累计治愈','累计死亡','累计境外输入','现有确诊','现有重症','本土现有确诊']
chinadaylist.tail()

Unnamed: 0,日期,累计确诊,累计治愈,累计死亡,累计境外输入,现有确诊,现有重症,本土现有确诊
55,2022-11-24,8981987,379092,30010,27227,8572885,0,27429
56,2022-11-25,9000592,381286,30082,27296,8589224,0,28985
57,2022-11-26,9018455,383678,30121,27357,8604656,0,30646
58,2022-11-27,8998177,386020,30166,27431,8581991,0,32348
59,2022-11-28,9051741,389050,30233,27494,8632458,0,33190


* China Domenstic Daily Added (New Cases) Data Interface - chinaDayDayListNew

In [18]:
# China Daily Added New Cases
chinanewadd = catch_data('chinaDayAddListNew')
chinanewadd = pd.DataFrame(chinanewadd['data']['chinaDayAddListNew'])
chinanewadd['date'] = pd.to_datetime(chinanewadd['y'].astype('str') + '.' + chinanewadd['date'])
chinanewadd = chinanewadd[['date','confirm','dead','heal','infect','importedCase','localConfirmadd','localinfectionadd']]
chinanewadd.columns = ['日期','新增确诊','新增死亡','新增治愈','新增无症状','新增境外','本土新增确诊','本土新增无症状']
chinanewadd.tail()


Unnamed: 0,日期,新增确诊,新增死亡,新增治愈,新增无症状,新增境外,本土新增确诊,本土新增无症状
360,2022-11-24,20237,78,2028,29840,62,3041,29654
361,2022-11-25,18605,72,2150,31709,69,3405,31504
362,2022-11-26,17863,39,2335,36082,61,3648,35858
363,2022-11-27,18084,45,2285,36525,74,3748,36304
364,2022-11-28,15202,67,2963,35021,63,3561,34860


2.3 China Domestic City Data Interface - diseaseh5Shelf

Procincial and City data processing merthods:

* Look into webp page data frame, figured out Provincial data can be streamed at interface of diseaseh5Shelf
* diseaseh5Shelf will return a dictionary with data locating at areaTree, which is a list. The first element/children is for saving the list of the provincial data.
* Each children has 34 elements, each of which sace a data piece of a province in dict format, including name, adcode, total, today, children, first 4 are for total provincial data， and the children are for City details for that specific province.
* City data has a same data structure as Province data, while city numbers in each children has to be decided by province_catch_data[i][‘children’].

In [19]:
# Province data processing
province_data = pd.DataFrame()
# Catch the data - first to process province data
province_catch_data = catch_data('diseaseh5Shelf')['data']['diseaseh5Shelf']['areaTree'][0]['children']
for i in range(len(province_catch_data)):
    province_total = province_catch_data[i]['total'] # Provincial Total
    province_total['name'] = province_catch_data[i]['name'] # province name
    province_total['adcode'] = province_catch_data[i]['adcode'] # province code
    province_total['date'] = province_catch_data[i]['date'] # update date
    province_today = province_catch_data[i]['today'] # provincial today data
    province_today['name'] = province_catch_data[i]['name'] # province name
    province_total = pd.DataFrame(province_total,index=[i])
    province_today = pd.DataFrame(province_today,index=[i])
    province_today.rename({'confirm':'confirm_add'},inplace=True,axis=1) # "confirm" of today is actually the daily added
    merge_data = province_total.merge(province_today,how='left',on='name') # merge province total and daily new cases
    province_data = pd.concat([province_data,merge_data]) # merging Provincial data
province_data = province_data[['name','adcode','date','confirm','provinceLocalConfirm','heal','dead','nowConfirm','confirm_add','local_confirm_add',
                               'wzz_add','abroad_confirm_add','dead_add','mediumRiskAreaNum','highRiskAreaNum','isUpdated']]
province_data.columns = ['省份','代码','日期','累计确诊','本土累计','累计治愈','累计死亡','现有确诊','当日新增','新增本土','新增无症状',
                         '新增境外','新增死亡','中风险数量','高风险数量','是否更新']
province_data = province_data.sort_values(by='累计确诊',ascending=False,ignore_index=True)
province_data.head()



Unnamed: 0,省份,代码,日期,累计确诊,本土累计,累计治愈,累计死亡,现有确诊,当日新增,新增本土,新增无症状,新增境外,新增死亡,中风险数量,高风险数量,是否更新
0,台湾,,2022/11/29,8278371,0,13742,14374,8250255,10651,10651,0,0,41,0,0,True
1,香港,,2022/11/29,457323,0,98471,10744,348108,925,925,0,0,26,0,0,True
2,湖北,420000.0,2022/11/29,68588,68445,63945,4512,131,27,26,623,1,0,0,1196,True
3,上海,310000.0,2022/11/29,64529,58861,63760,595,174,26,20,158,6,0,0,97,True
4,吉林,220000.0,2022/11/29,40410,40321,40309,5,96,23,23,809,0,0,0,1456,True


In [20]:
df_city_data_total = pd.DataFrame()
for x in range(len(province_catch_data)):
    province_dict = province_catch_data[x]['children']
    province_name = province_catch_data[x]['name']
    df_city_data = pd.DataFrame()
    for i in range(len(province_dict)):
        city_total = province_dict[i]['total']
        city_total['province_name'] = province_name #省名
        city_total['name'] = province_dict[i]['name'] #市区名
        city_total['adcode'] = province_dict[i]['adcode'] #市区代码
        city_total['date'] = province_dict[i]['date'] #更新日期
        city_today = province_dict[i]['today'] #当日数据
        city_today['province_name'] = province_name #省名
        city_today['name'] = province_dict[i]['name'] #市区名
        city_total = pd.DataFrame(city_total,index=[i])
        city_today = pd.DataFrame(city_today,index=[i])
        city_today.rename({'confirm':'confirm_add'},inplace=True,axis=1) #today里面的confirm实际是每日新增
        merge_city = city_total.merge(city_today,how='left',on=['province_name','name'])
        df_city_data = pd.concat([df_city_data,merge_city])
    df_city_data_total = pd.concat([df_city_data_total,df_city_data])
df_city_data_total = df_city_data_total[['province_name','name','adcode','date','confirm','provinceLocalConfirm','heal','dead','nowConfirm','confirm_add','local_confirm_add',
                               'wzz_add','mediumRiskAreaNum','highRiskAreaNum']]
df_city_data_total.columns = ['省份','城市','代码','日期','累计确诊','本土累计','累计治愈','累计死亡','现有确诊','当日新增','新增本土','新增无症状','中风险数量','高风险数量']
df_city_data_total =df_city_data_total.sort_values(by='累计确诊',ascending=False,ignore_index=True)
df_city_data_total.head()



Unnamed: 0,省份,城市,代码,日期,累计确诊,本土累计,累计治愈,累计死亡,现有确诊,当日新增,新增本土,新增无症状,中风险数量,高风险数量
0,台湾,地区待确认,,2022/11/29,8274116,0,0,0,8274116,10651,10651,,0,0
1,香港,地区待确认,,2022/11/29,457465,0,0,0,457465,925,925,,0,0
2,湖北,武汉,420100.0,2022/11/29,50571,0,0,0,50571,26,26,339.0,0,1063
3,吉林,长春,220100.0,2022/11/29,25197,0,0,0,25197,0,0,388.0,0,498
4,广东,广州,440100.0,2022/11/29,20351,0,0,0,20351,959,959,6993.0,0,480


2.4 Provincial Historic Data Details

In [21]:
# Provincial Historic Data details with Hongkong, Macau absent; while City historic data can be retrieved by changing the city code
province_history_data = pd.DataFrame()
for code in province_data['代码']:
    if code != '':
        history_data = requests.get('https://api.inews.qq.com/newsqa/v1/query/pubished/daily/list?adCode=' + str(code)).json()['data']
        history_df = pd.DataFrame(history_data)
        history_df['date'] = pd.to_datetime(history_df['year'].astype('str') + '.' + history_df['date'])
        history_df_use = history_df[['date','province','confirm','dead','heal','wzz','newConfirm','newHeal','newDead','wzz_add']]
        history_df_use.columns = ['日期','省份','累计确诊','累计死亡','累计治愈','无症状','新增确诊','新增治愈','新增死亡','新增无症状']
        province_history_data = pd.concat([province_history_data,history_df_use])
        
province_history_data.shape



(32156, 10)

2.5 International Dataset Processing

In [22]:
# International Data Update
aboard_data = requests.get('https://api.inews.qq.com/newsqa/v1/automation/modules/list?modules=WomAboard').json()['data']['WomAboard']
aboard_data = pd.DataFrame(aboard_data)
aboard_data_use = aboard_data[['pub_date','continent','name','confirm','dead','heal','nowConfirm','confirmAdd']]
aboard_data_use.columns = ['日期','大洲','国家','累计确诊','累计死亡','累计治愈','现有确诊','新增确诊']
aboard_data_use.head()



Unnamed: 0,日期,大洲,国家,累计确诊,累计死亡,累计治愈,现有确诊,新增确诊
0,20221129,北美洲,美国,100507928,1104879,98060568,1342481,13836
1,20221129,亚洲,印度,44673170,530615,44136471,6084,343
2,20221129,欧洲,法国,37686603,158771,36853642,674190,9304
3,20221129,欧洲,德国,36419717,157657,35698200,563860,46553
4,20221129,南美洲,巴西,35229745,689601,34201418,338726,17710


### Part 3. Visualization

3.1 Import pyecharts library

In [23]:
# Import all charts of Baidu's Pyecharts library
from pyecharts.charts import * 
from pyecharts import options as opts

# Import Themes of Pyecharts if needed (can ignore this line if not used)
from pyecharts.globals import ThemeType
from pyecharts.commons.utils import JsCode
from pyecharts.globals import CurrentConfig, NotebookType
CurrentConfig.NOTEBOOK_TYPE = NotebookType.JUPYTER_NOTEBOOK

3.2 Data Table

In [24]:
from pyecharts.components import Table
from pyecharts.options import ComponentTitleOpts
table = Table()
headers = list(chinadaylist.columns)
rows = chinadaylist.sort_values(by='日期',ascending=False).head(1).values
table.add(headers=headers,rows=rows)
table.set_global_opts(title_opts=ComponentTitleOpts(title="国内最新数据", 
                                                    subtitle="更新日期：" + chinadaylist['日期'].astype('str').max()))
table.render_notebook()


日期,累计确诊,累计治愈,累计死亡,累计境外输入,现有确诊,现有重症,本土现有确诊
2022-11-28 00:00:00,9051741,389050,30233,27494,8632458,0,33190


3.3 Bar/Line Charts

In [26]:
# Bar Chart and Line Chart
bar = Bar()
bar.add_xaxis(list(chinadaylist["日期"].astype('str')))
bar.add_yaxis(series_name ='累计确诊',y_axis=list(chinadaylist["累计确诊"]))
bar.add_yaxis(series_name ="现有确诊",y_axis=list(chinadaylist['现有确诊']))
bar.extend_axis(yaxis=opts.AxisOpts(name='治愈率',axislabel_opts=opts.LabelOpts(formatter="{value}%")))
bar.set_series_opts(label_opts=opts.LabelOpts(is_show=False))  #不显示数据标签
bar.set_global_opts(title_opts=opts.TitleOpts(title="国内累计确诊趋势",
                                              subtitle="数据来自腾讯疫情数据（含港澳台）", #添加副标题
                                              pos_left="center", #标题位置
                                              pos_top="top"),
                    legend_opts=opts.LegendOpts(pos_left="left"), #图例位置-左侧
                    xaxis_opts=opts.AxisOpts(type_="category",
                                             axislabel_opts=opts.AxisTickOpts()),
                    yaxis_opts=opts.AxisOpts(name="人数")
                    
                   )
line = Line()
line.add_xaxis(list(chinadaylist["日期"].astype('str')))
line.add_yaxis(series_name="治愈率（%）",
               y_axis=(chinadaylist['累计治愈']/chinadaylist['累计确诊']).round(decimals=3)*100,
               yaxis_index=1,
               symbol_size=3,
               is_smooth=True,
               label_opts=opts.LabelOpts(is_show=False),
               tooltip_opts=opts.TooltipOpts(formatter=JsCode("function (params) {return params.value+ '%'}"),
            is_show_content = True)
              )

bar.overlap(line) ##图形叠加
bar.render_notebook()



3.4 Line Chart - Beautifier

In [27]:
# Another Pie Chart for Newly Added
background_color_js = ("new echarts.graphic.LinearGradient(0, 0, 0,1, "
                       "[{offset: 0, color: '#99cccc'}, {offset: 1, color: '#00bfff'}], false)")

line1 = Line(init_opts=opts.InitOpts(theme=ThemeType.ROMA,bg_color=JsCode(background_color_js))) #设置主题&背景颜色

line1.add_xaxis(list(chinanewadd["日期"].astype('str')))  #添加x轴

line1.add_yaxis(series_name = "新增确诊",
                y_axis = list(chinanewadd["新增确诊"]), #增加Y轴数据
                is_smooth=True,#添加Y轴，平滑曲线
                areastyle_opts=opts.AreaStyleOpts(opacity=0.3), #区域阴影透明度
                is_symbol_show = True,
                label_opts=opts.LabelOpts(is_show=False),
                yaxis_index = 0 #指定y轴顺序
               ) #不显示标签

line1.add_yaxis(series_name = "新增本土",
                y_axis = list(chinanewadd["本土新增确诊"]),
                is_smooth=True,
                areastyle_opts=opts.AreaStyleOpts(opacity=0.3),
                is_symbol_show = True,#是否显示标记
#                 symbol = 'circle' #标记类型 'circle', 'rect', 'roundRect', 'triangle', 'diamond', 'pin', 'arrow', 'none'
                label_opts=opts.LabelOpts(is_show=False),
                yaxis_index = 1
               )
#增加副轴
line1.extend_axis(yaxis=opts.AxisOpts(
                        name="新增本土(人)",
                        name_location="end", #轴标题位置
                        type_="value",#轴类型
                        is_inverse=False, #逆序刻度值
                        axistick_opts=opts.AxisTickOpts(is_show=True),
                        splitline_opts=opts.SplitLineOpts(is_show=True)
                        )
                )
#设置图表格式
line1.set_global_opts(title_opts=opts.TitleOpts(title="国内每日新增趋势", #添加主标题
                                                subtitle="数据来自腾讯疫情数据（含港澳台）", #添加副标题
                                                subtitle_textstyle_opts = opts.TextStyleOpts(color='#000000'),
                                                pos_left="center", #标题位置
                                                pos_top="top"),
                    legend_opts=opts.LegendOpts(pos_left="40%",
                                                pos_top='10%'), #图例位置-左侧
                    xaxis_opts=opts.AxisOpts(type_="category",
                                             axislabel_opts=opts.AxisTickOpts()),
                    yaxis_opts=opts.AxisOpts(name="新增确诊（人）", 
                                             type_="value", 
#                                              max_=100000
                                            ),
                    datazoom_opts=opts.DataZoomOpts(type_= 'slider',
                                                    range_start=80 ,#横轴开始百分百
                                                    range_end=100) , #横轴结束百分比
                    toolbox_opts=opts.ToolboxOpts(is_show=True,  #显示工具窗口
                                                  orient='vertical', #垂直排列工具窗口
                                                  pos_left='95%',
                                                  pos_top='middle')
                     )

line1.render_notebook()



3.5 China Heat Map - Multi-tab Revolver

In [28]:
map1= Map(init_opts=opts.InitOpts(width="900px",height="500px",bg_color=None))
map1.add(series_name = "累计确诊",
         data_pair = [list(z) for z in zip(province_data['省份'],province_data['累计确诊'])],
         maptype = "china",
         is_map_symbol_show=False)

map1.set_global_opts(title_opts=opts.TitleOpts(title="全国疫情地图-累计确诊",
                                               subtitle="更新日期：" + province_data['日期'].astype('str').max(),
                                               subtitle_textstyle_opts = opts.TextStyleOpts(color='#ffffff'),
                                               pos_left="center"),
                legend_opts=opts.LegendOpts(is_show=True, pos_top="40px", pos_left="30px"),
                visualmap_opts=opts.VisualMapOpts(is_piecewise=True,
                                                  range_text=['高', '低'],
                                                  pieces=[
                                                    {"min": 50000, "color": "#751d0d"},
                                                    {"min": 10000, "max": 49999, "color": "#ae2a23"},
                                                    {"min": 5000, "max": 9999, "color": "#d6564c"},
                                                    {"min": 1000, "max": 4999, "color": "#f19178"},
                                                    {"min": 500, "max": 999, "color": "#f7d3a6"},
                                                    {"min": 100, "max": 499, "color": "#fdf2d3"},
                                                    {"min": 0, "max": 99, "color": "#FFFFFF"}]),
                     toolbox_opts=opts.ToolboxOpts(is_show=True,  #显示工具窗口
                                                  orient='vertical', #垂直排列工具窗口
                                                  pos_left='95%',
                                                  pos_top='middle'),
            )


map2= Map(init_opts=opts.InitOpts(width="900px",height="500px",bg_color=None))
map2.add(series_name = "现有确诊",
         data_pair = [list(z) for z in zip(province_data['省份'],province_data['现有确诊'])],
         maptype = "china",
         is_map_symbol_show=False)

map2.set_global_opts(title_opts=opts.TitleOpts(title="全国疫情地图-现有确诊",
                                               subtitle="更新日期：" + province_data['日期'].astype('str').max(),
                                               subtitle_textstyle_opts = opts.TextStyleOpts(color='#ffffff'),
                                               pos_left="center"),
                legend_opts=opts.LegendOpts(is_show=True, pos_top="40px", pos_left="30px"),
                visualmap_opts=opts.VisualMapOpts(is_piecewise=True,
                                                  range_text=['高', '低'],
                                                  pieces=[
                                                    {"min": 10000, "color": "#751d0d"},
                                                    {"min": 1000, "max": 9999, "color": "#ae2a23"},
                                                    {"min": 500, "max": 999, "color": "#d6564c"},
                                                    {"min": 100, "max": 499, "color": "#f19178"},
                                                    {"min": 10, "max": 99, "color": "#f7d3a6"},
                                                    {"min": 1, "max": 9, "color": "#fdf2d3"},
                                                    {"min": 0, "max": 0, "color": "#FFFFFF"}]),
                     toolbox_opts=opts.ToolboxOpts(is_show=True,  #显示工具窗口
                                                  orient='vertical', #垂直排列工具窗口
                                                  pos_left='95%',
                                                  pos_top='middle'),
            )
##i添加选项卡tab
tab = Tab()
tab.add(map1, "累计确诊地图")
tab.add(map2, "现有确诊地图")

        
tab.render_notebook()


3.6 China Heat Map - Time-axis Recolver

In [30]:

## Sequencialize date
province_history_data['date_rank'] = province_history_data['日期'].rank(method='dense',ascending=True)
df_list = []
# 取前15日数据，可任意变更
for i in range(1,15):
    df_list.append(province_history_data.loc[province_history_data['date_rank']==i])

tl = Timeline(init_opts=opts.InitOpts(theme=ThemeType.CHALK,width="900px", height="600px")) #时间轴

for idx in range(len(df_list)):#循环给时间轴增加图形
    provinces = []
    confirm_value = []
    date = df_list[idx]['日期'].astype('str').unique()[0]
    for item_pv in df_list[idx]['省份']:
        provinces.append(item_pv)
    for item_pc in df_list[idx]['累计确诊']:
        confirm_value.append(item_pc)
    zipped = zip(provinces, confirm_value)
    f_map = Map(init_opts=opts.InitOpts(width="800",height="500px"))
    f_map.add(series_name="确诊数量",
                  data_pair=[list(z) for z in zipped],
                  maptype="china",
                  is_map_symbol_show=False)
    f_map.set_global_opts(title_opts=opts.TitleOpts(title="全国疫情地图-累计确诊",
                                               subtitle="更新日期：" + date,
                                               subtitle_textstyle_opts = opts.TextStyleOpts(color='#ffffff'),
                                               pos_left="center"),
                legend_opts=opts.LegendOpts(is_show=False, pos_top="40px", pos_left="30px"),
                visualmap_opts=opts.VisualMapOpts(is_piecewise=True,
                                                  range_text=['高', '低'],
                                                  pieces=[
                                                    {"min": 1000, "color": "#CC0033"},
                                                    {"min": 200, "max": 999, "color": "#FF4500"},
                                                    {"min": 50, "max": 199, "color": "#FF8C00"},
                                                    {"min": 1, "max": 49, "color": "#FFDAB9"},
                                                    {"min": 0, "max": 0, "color": "#F5F5F5"}],
                                                  textstyle_opts = opts.TextStyleOpts(color='#ffffff'),
                                                  pos_bottom='15%',
                                                  pos_left='5%'
                                                 )
            )
        
    tl.add(f_map, "{}".format(date)) #添加图形
tl.add_schema(is_timeline_show=True,  # 是否显示
              play_interval=1200,  # 播放间
              symbol=None,  # 图标
              is_loop_play=True , # 循环播放
              is_auto_play = True
             )
tl.render_notebook()



----
## Courtesy of Hakuna_Matata_001

————————————————

* Copyrights Reserved by the CSDN Blogger - Hakuna_Matata_001.
* Abided by Copyright protocols of CC 4.0 BY-SA. 
* Please cite the article origin when quoting.

  Code of Origin: https://blog.csdn.net/weixin_43130164/article/details/104113559