# Analysis of COVID-19 in India using Real Time Data

### Data Scraping - API, Requests, JSON
### Data Manipulation - Pandas
### Data visualization - Plotly

#### Author: Jyothish Kumar C G

This project is an attempt to work on analysis using real time data. The real time datas are fetched from the api published by website https://www.covid19india.org/ . This website is not officially owned or published by the Government of India. But they update  the stats based on state press bulletins, official (CM, Health M) handles, PBI, Press Trust of India, ANI reports.

In [1]:
import requests
import json

**API Details are published under https://api.covid19india.org/**

**Patient Level : Raw Data Partition 6 (From Jun 5th onwards)	https://api.covid19india.org/raw_data6.json**

Using Requests to fetch the data

In [2]:
response_patient_level = requests.get("https://api.covid19india.org/raw_data6.json")

status_code = response_patient_level.status_code
print("\nStatus Code: ", status_code)

response_type = type(response_patient_level)
print("\nResponse Type : ",response_type)

headers = response_patient_level.headers
print("\nResponse Headers: ",headers)

content_type = headers["Content-Type"]
print("\nContent Type: ",content_type)


Status Code:  200

Response Type :  <class 'requests.models.Response'>

Response Headers:  {'Connection': 'keep-alive', 'Content-Length': '475101', 'Server': 'GitHub.com', 'Content-Type': 'application/json; charset=utf-8', 'Strict-Transport-Security': 'max-age=31556952', 'Last-Modified': 'Sat, 31 Oct 2020 07:58:20 GMT', 'ETag': 'W/"5f9d191c-d406d0"', 'Access-Control-Allow-Origin': '*', 'Expires': 'Sat, 31 Oct 2020 09:14:59 GMT', 'Cache-Control': 'max-age=600', 'Content-Encoding': 'gzip', 'X-Proxy-Cache': 'MISS', 'X-GitHub-Request-Id': '1D86:631C:6228:7862:5F9D28B9', 'Accept-Ranges': 'bytes', 'Date': 'Sat, 31 Oct 2020 09:04:59 GMT', 'Via': '1.1 varnish', 'Age': '0', 'X-Served-By': 'cache-wdc5576-WDC', 'X-Cache': 'MISS', 'X-Cache-Hits': '0', 'X-Timer': 'S1604135100.511164,VS0,VE97', 'Vary': 'Accept-Encoding', 'X-Fastly-Request-ID': 'b4a05b600832fcb8e510cb01a7590eeecfa18e52'}

Content Type:  application/json; charset=utf-8


**Converting the Data to dictionary type using JSON**

In [3]:
json_data = response_patient_level.json()

print("\nType of Data Retrieved ", type(json_data))

print("\nData Retrieved: ",json_data['raw_data'][0:5])


Type of Data Retrieved  <class 'dict'>

Data Retrieved:  [{'agebracket': '', 'contractedfromwhichpatientsuspected': '', 'currentstatus': 'Hospitalized', 'dateannounced': '05/06/2020', 'detectedcity': '', 'detecteddistrict': 'Lunglei', 'detectedstate': 'Mizoram', 'entryid': '48613', 'gender': '', 'nationality': '', 'notes': 'Institutional quarantine at Lunglei', 'numcases': '1', 'patientnumber': '78000', 'source1': 'https://twitter.com/dipr_mizoram/status/1268736125104910336', 'source2': '', 'source3': '', 'statecode': 'MZ', 'statepatientnumber': '', 'statuschangedate': '', 'typeoftransmission': ''}, {'agebracket': '', 'contractedfromwhichpatientsuspected': '', 'currentstatus': 'Hospitalized', 'dateannounced': '05/06/2020', 'detectedcity': 'Falkawn', 'detecteddistrict': 'Aizawl', 'detectedstate': 'Mizoram', 'entryid': '48614', 'gender': '', 'nationality': '', 'notes': 'ZMC', 'numcases': '4', 'patientnumber': '78001', 'source1': 'https://twitter.com/dipr_mizoram/status/12687361251049103

**As the above Data is more on Granule level details about patients, I have skipped that and scraped the Nation wide Count statistics data** 

**National Level :Time series, State-wise stats and Test counts	https://api.covid19india.org/data.json**

In [4]:
response_count_level = requests.get("https://api.covid19india.org/data.json")

status_code = response_count_level
print("\nStatus Code: ", status_code)

json_data_count = response_count_level.json()

print("\nType of Data Retrieved ", type(json_data_count))

print("\nData Retrieved: ",json_data_count['cases_time_series'][0:5])

print("\nKeys of the JSON : ",json_data_count.keys())



Status Code:  <Response [200]>

Type of Data Retrieved  <class 'dict'>

Data Retrieved:  [{'dailyconfirmed': '1', 'dailydeceased': '0', 'dailyrecovered': '0', 'date': '30 January ', 'dateymd': '2020-01-30', 'totalconfirmed': '1', 'totaldeceased': '0', 'totalrecovered': '0'}, {'dailyconfirmed': '0', 'dailydeceased': '0', 'dailyrecovered': '0', 'date': '31 January ', 'dateymd': '2020-01-31', 'totalconfirmed': '1', 'totaldeceased': '0', 'totalrecovered': '0'}, {'dailyconfirmed': '0', 'dailydeceased': '0', 'dailyrecovered': '0', 'date': '01 February ', 'dateymd': '2020-02-01', 'totalconfirmed': '1', 'totaldeceased': '0', 'totalrecovered': '0'}, {'dailyconfirmed': '1', 'dailydeceased': '0', 'dailyrecovered': '0', 'date': '02 February ', 'dateymd': '2020-02-02', 'totalconfirmed': '2', 'totaldeceased': '0', 'totalrecovered': '0'}, {'dailyconfirmed': '1', 'dailydeceased': '0', 'dailyrecovered': '0', 'date': '03 February ', 'dateymd': '2020-02-03', 'totalconfirmed': '3', 'totaldeceased': '0', 

**Extracting the different sets of data using their keys**

*Time Series - Nation wide count on a daily basis*

*State Wise -  State wise count on a daily basis*

*Tested - Samples tested on a daily basis*

In [5]:
time_series = json_data_count["cases_time_series"]
time_series[0:2]

[{'dailyconfirmed': '1',
  'dailydeceased': '0',
  'dailyrecovered': '0',
  'date': '30 January ',
  'dateymd': '2020-01-30',
  'totalconfirmed': '1',
  'totaldeceased': '0',
  'totalrecovered': '0'},
 {'dailyconfirmed': '0',
  'dailydeceased': '0',
  'dailyrecovered': '0',
  'date': '31 January ',
  'dateymd': '2020-01-31',
  'totalconfirmed': '1',
  'totaldeceased': '0',
  'totalrecovered': '0'}]

In [6]:
state_wise = json_data_count["statewise"]
state_wise[0:2]

[{'active': '583571',
  'confirmed': '8139081',
  'deaths': '121699',
  'deltaconfirmed': '2915',
  'deltadeaths': '18',
  'deltarecovered': '1486',
  'lastupdatedtime': '31/10/2020 13:18:08',
  'migratedother': '1414',
  'recovered': '7432397',
  'state': 'Total',
  'statecode': 'TT',
  'statenotes': ''},
 {'active': '125418',
  'confirmed': '1672858',
  'deaths': '43837',
  'deltaconfirmed': '0',
  'deltadeaths': '0',
  'deltarecovered': '0',
  'lastupdatedtime': '31/10/2020 00:06:10',
  'migratedother': '553',
  'recovered': '1503050',
  'state': 'Maharashtra',
  'statecode': 'MH',
  'statenotes': "[Sep 9] :239 cases have been removed from the hospitalized figures owing to the removal of duplicates and change of addresses as per the original residence,\n[Aug 15] : MH bulletin has reduced 819 confirmed cases in Mumbai and 72 confirmed cases from 'Other States' from the tally\n[Jun 16] : 1328 deceased cases have been retroactively added to MH bulletin.\n[Jun 20] : 69 deceased cases ha

In [7]:
tested = json_data_count["tested"]
tested[0:2]

[{'dailyrtpcrsamplescollectedicmrapplication': '',
  'individualstestedperconfirmedcase': '75.64',
  'positivecasesfromsamplesreported': '',
  'samplereportedtoday': '',
  'source': 'Press_Release_ICMR_13March2020.pdf',
  'source1': '',
  'source3': '',
  'testedasof': '13/03/2020',
  'testpositivitylast7days': '',
  'testpositivityrate': '1.20%',
  'testsconductedbyprivatelabs': '',
  'testsperconfirmedcase': '83.33',
  'testspermillion': '5',
  'totalindividualstested': '5900',
  'totalpositivecases': '78',
  'totalrtpcrsamplescollectedicmrapplication': '',
  'totalsamplestested': '6500',
  'updatetimestamp': '13/03/2020 00:00:00'},
 {'dailyrtpcrsamplescollectedicmrapplication': '',
  'individualstestedperconfirmedcase': '81.57',
  'positivecasesfromsamplesreported': '',
  'samplereportedtoday': '',
  'source': 'ICMR_website_update_18March_6PM_IST.pdf',
  'source1': '',
  'source3': '',
  'testedasof': '18/03/2020',
  'testpositivitylast7days': '',
  'testpositivityrate': '1.14%',
  

**Different States in the list state_wise. Total refers to Nation wide**

In [8]:
states = []
for records in state_wise:
    state = records['state']
    states.append(state)
    
states    

['Total',
 'Maharashtra',
 'Andhra Pradesh',
 'Karnataka',
 'Tamil Nadu',
 'Uttar Pradesh',
 'Delhi',
 'Kerala',
 'West Bengal',
 'Odisha',
 'Telangana',
 'Bihar',
 'Assam',
 'Rajasthan',
 'Gujarat',
 'Madhya Pradesh',
 'Chhattisgarh',
 'Haryana',
 'Punjab',
 'Jharkhand',
 'Jammu and Kashmir',
 'Uttarakhand',
 'Goa',
 'Puducherry',
 'Tripura',
 'Himachal Pradesh',
 'Manipur',
 'Chandigarh',
 'Arunachal Pradesh',
 'Meghalaya',
 'Nagaland',
 'Ladakh',
 'Andaman and Nicobar Islands',
 'Sikkim',
 'Dadra and Nagar Haveli and Daman and Diu',
 'Mizoram',
 'State Unassigned',
 'Lakshadweep']

**Latest Status of the Country**

In [9]:

total_cases = {}
for records in state_wise:
    if records['state'] == 'Total':
        total_cases['Country'] = "India"
        total_cases['Total Confirmed Cases'] = int(records['confirmed'])
        total_cases['Total Active Cases'] = int(records['active']) 
        total_cases['Total Recovered Cases'] = total_cases['Total Confirmed Cases'] - total_cases['Total Active Cases']
        total_cases['Total Deaths Reported'] = int(records['deaths'])
        total_cases['Date'] = records['lastupdatedtime']
        
total_cases    

{'Country': 'India',
 'Date': '31/10/2020 13:18:08',
 'Total Active Cases': 583571,
 'Total Confirmed Cases': 8139081,
 'Total Deaths Reported': 121699,
 'Total Recovered Cases': 7555510}

**Today's Status [Based on the data published in the api ]**

In [10]:
today_status = {}

today_status["Date"] = time_series[-1]['date']   
today_status["Confirmed Cases"] =  time_series[-1]['dailyconfirmed']
today_status["Recovered Cases"] =  time_series[-1]['dailyrecovered']
today_status["Deceased Cases"] =  time_series[-1]['dailydeceased']

today_status

{'Confirmed Cases': '48117',
 'Date': '30 October ',
 'Deceased Cases': '550',
 'Recovered Cases': '59005'}

**Number of Samples Tested till day**

In [11]:
total_samples_tested = tested[-1]['totalsamplestested']
total_samples_tested

'108796064'

**Moving Data According to Daywise count to a seperate dictionary**

In [12]:
timeseries_plot_data = {}

for records in time_series:
    date = records['date']
    dailyconfirmed = records['dailyconfirmed']
    dailydeceased = records['dailydeceased']
    dailyrecovered = records['dailyrecovered']

    daily_status = {'Confirmed': dailyconfirmed, 'Deceased': dailydeceased, 'Recovered': dailyrecovered }
    timeseries_plot_data[date] = daily_status 
   
timeseries_plot_data

{'01 April ': {'Confirmed': '424', 'Deceased': '6', 'Recovered': '19'},
 '01 August ': {'Confirmed': '55117', 'Deceased': '854', 'Recovered': '51368'},
 '01 February ': {'Confirmed': '0', 'Deceased': '0', 'Recovered': '0'},
 '01 July ': {'Confirmed': '19429', 'Deceased': '438', 'Recovered': '12064'},
 '01 June ': {'Confirmed': '7723', 'Deceased': '201', 'Recovered': '3882'},
 '01 March ': {'Confirmed': '0', 'Deceased': '0', 'Recovered': '0'},
 '01 May ': {'Confirmed': '2396', 'Deceased': '77', 'Recovered': '962'},
 '01 October ': {'Confirmed': '81784',
  'Deceased': '1099',
  'Recovered': '78731'},
 '01 September ': {'Confirmed': '78168',
  'Deceased': '892',
  'Recovered': '62145'},
 '02 April ': {'Confirmed': '486', 'Deceased': '16', 'Recovered': '22'},
 '02 August ': {'Confirmed': '52672', 'Deceased': '760', 'Recovered': '40355'},
 '02 February ': {'Confirmed': '1', 'Deceased': '0', 'Recovered': '0'},
 '02 July ': {'Confirmed': '21947', 'Deceased': '378', 'Recovered': '19999'},
 '02

**Using Pandas Creating seperate Dataframes for Confirmed, Deceased and Recovered cases on a daily basis**

In [13]:
import pandas as pd


df = pd.concat({k: pd.Series(v) for k, v in timeseries_plot_data.items()}).reset_index()
df.columns = ['Date','Cases','Count']

df.head(5)

Unnamed: 0,Date,Cases,Count
0,30 January,Confirmed,1
1,30 January,Deceased,0
2,30 January,Recovered,0
3,31 January,Confirmed,0
4,31 January,Deceased,0


In [14]:
df_confirmed_cases = df[df['Cases'] == 'Confirmed']
df_confirmed_cases.head(5)


Unnamed: 0,Date,Cases,Count
0,30 January,Confirmed,1
3,31 January,Confirmed,0
6,01 February,Confirmed,0
9,02 February,Confirmed,1
12,03 February,Confirmed,1


In [15]:
df_deceased_cases = df[df['Cases'] == 'Deceased']
df_deceased_cases.head(5)


Unnamed: 0,Date,Cases,Count
1,30 January,Deceased,0
4,31 January,Deceased,0
7,01 February,Deceased,0
10,02 February,Deceased,0
13,03 February,Deceased,0


In [16]:
df_recovered_cases = df[df['Cases'] == 'Recovered']
df_recovered_cases.head(5)


Unnamed: 0,Date,Cases,Count
2,30 January,Recovered,0
5,31 January,Recovered,0
8,01 February,Recovered,0
11,02 February,Recovered,0
14,03 February,Recovered,0


**Using Plotly to plot the daily confirmed cases according to dates based on real time data**

In [17]:
import plotly.express as px
import plotly.graph_objs as go
from plotly.offline import download_plotlyjs,init_notebook_mode,plot

fig = px.line(df_confirmed_cases, x='Date', y='Count')
fig.update_layout(title_text="Daily Confirmed Cases of COVID-19 in India")
fig.show()

**Using Plotly to plot the daily confirmed, deceased, recovered cases according to dates in a single graph based on real time data**

In [18]:
fig = go.Figure()

fig.add_trace( go.Scatter(x = df_confirmed_cases.Date, y = df_confirmed_cases.Count, mode='lines',
                    name='Confirmed Cases', line=dict(color='royalblue', width=4) ) )

fig.add_trace( go.Scatter(x = df_deceased_cases.Date, y = df_deceased_cases.Count,mode='lines',
                    name='Deceased Cases',  line=dict(color='darkred', width=4) ) )

fig.add_trace( go.Scatter(x = df_recovered_cases.Date, y = df_recovered_cases.Count,mode='lines',
                    name='Recovered Cases',  line=dict(color='green', width=4) ) )

fig.update_layout(title_text="COVID-19 Daily Status India", xaxis_title='Date',
                   yaxis_title='Count')

fig.show()
plot(fig)

'temp-plot.html'