# Python for Data Science - Group Project

## By:
> *  **S.Hariharan 18113069 CSE-4-A**
* **Sai Pratyush 18113026 CSE-4-A**
* **Srobonti Sarkar 18113009 CSE-4-A**
* **Riya Satija 18113029 CSE-4-A**

# Datasets : **[`COVID-19 in India`](https://www.kaggle.com/sudalairajkumar/covid19-in-india)**  and **[`COVID-19 in Italy`](https://www.kaggle.com/sudalairajkumar/covid19-in-italy)** from kaggle.com

# Contents of this Notebook
1. [Formulate questions for Analysis](#1.-Formulate-questions-for-Analysis)
* [Import Necessary Libraries](#2.-Import-Necessary-Libraries)
* [Read Input and Explore/Analyse the Data](#3.-Read-Input-and-Explore/Analyse-the-Data)
* [Descriptive Analytics and Visualization](#4.)
* [Final Summary and Results]()
* [Model, Predict and Solve (If possible)]()
* [Model Evaluation (If possible)]()



* **All the details about the files used are given in this [`link`](#File-Details)**

# 1. Formulate questions for Analysis

The main analysis will focus on answering the below questions. It's important to note that the findings in this analysis are based on a sample and are not definitive.

1. ***[No. of Cases for each State .](#No.-of-Cases-for-each-State) - [Bar Chart(State vs. Count) .](#State-vs.-Count)***
* ***[State with the Most and Least no.of Cases .](#State-with-the-Most-and-Least-no.of-Cases)***
* ***[Count the No.of Patients for unique current_status .](#Count-the-No.of-Patients-for-unique-current_status)***
* ***[Count the No.of Patients for unique current_status Statewise](#Count-the-No.of-Patients-for-unique-current_status-Statewise) - [Bar Chart(Statewise Count vs. Status) .](#Statewise-Count-vs.-Status)***
* ***[Line Chart - (Day vs. Count) .](#Day-vs.-Count)***
* ***[Monthwise Status Count (Bar Chart) .](#Monthwise-Status-Count)***
* ***[Statewise Multiple-Line Chart - (Day vs. Count) .](#Statewise-Multiple-Line-Chart-{Day-vs.-Count})***
* ***[Statewise Status Count for Each Month .](#Statewise-Status-Count-for-Each-Month)***

## 2. Import Necessary Libraries
First off, we need to import several **`Python libraries`** such as **`numpy`**, **`pandas`**, **`matplotlib`** and **`seaborn`** for Visualising the Data

In [None]:
#data analysis libraries 
import numpy as np
import pandas as pd

#visualization libraries
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
#ignore warnings
import warnings
warnings.filterwarnings('ignore')

from IPython.display import Markdown as md

## 3. Read Input and Explore/Analyse the Data

It's time to read in our data's using [`pd.read_csv`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html), and take a first look at the dimensions and data using the **`shape()`** and **`head()`** functions.

In [None]:
# Import and read files

# First import India dataset
agedetails = pd.read_csv("../input/covid19-in-india/AgeGroupDetails.csv")
individualdetails = pd.read_csv("../input/covid19-in-india/IndividualDetails.csv")
covid_india = pd.read_csv("../input/covid19-in-india/covid_19_india.csv")
population = pd.read_csv("../input/covid19-in-india/population_india_census2011.csv")

# Then import Italy dataset
province = pd.read_csv("../input/covid19-in-italy/covid19_italy_province.csv")
region = pd.read_csv("../input/covid19-in-italy/covid19_italy_region.csv")
individualdetails

In [None]:
# Getting a brife about the files
individualdetails.info()

# 4.

* As we can see that ***`government_id`,`age`,`gender`,`detected_city`,`nationality`*** columns have many NaN values , these columns can be **[Droped]()**

In [None]:
individualdetails.drop(columns=['government_id','age','gender','detected_city','nationality'],inplace = True)

**First let Group the data by State in which the patient was Diagnosed and count the no of cases.**
### *No. of Cases for each State*

In [None]:
a = individualdetails.groupby('detected_state',as_index=False).id.count()
df = pd.DataFrame(a)
df

### *State vs. Count*

In [None]:
plt.figure(figsize=(20,8))
chart = sns.barplot(x = 'detected_state',y = 'id' , data = a)
chart.set_xticklabels(chart.get_xticklabels(), rotation=90)
for index, row in a.iterrows():
    chart.text(row.name,row.id, round(row.id,0), ha="center")
plt.xlabel('States')
plt.ylabel('Count')
plt.title('State vs. Count')
plt.show()

### *State with the Most and Least no.of Cases*

In [None]:
md("***`%s`*** Has the most no.of Cases with `%i` and ***`%s`*** has the Least no.of Cases with `%i`"%([i for i in a[a.id == max(a.id)].detected_state][0],[i for i in a[a.id == max(a.id)].id][0],[i for i in a[a.id == min(a.id)].detected_state][0],[i for i in a[a.id == min(a.id)].id][0]))

### *Count the No.of Patients for unique current_status*

In [None]:
a = individualdetails.groupby('current_status',as_index=False).id.count()
df = pd.DataFrame(a)
df

### *Count the No.of Patients for unique current_status Statewise*

In [None]:
a = individualdetails.groupby(['detected_state','current_status']).id.count()
df = pd.DataFrame(a)
df.head(60)

In [None]:
df.reset_index(inplace=True)

### *Statewise Count vs. Status*

In [None]:
plt.figure(figsize=(15,25))
chart = sns.barplot(x = 'detected_state',y = 'id' , hue = 'current_status' , data = df)
chart.set_xticklabels(chart.get_xticklabels(), rotation=90)
plt.xlabel('States')
plt.ylabel('Count')
plt.title('Statewise Count vs. Status')
plt.show()

### *Day vs. Count*

In [None]:
individualdetails['Month'] = 0
individualdetails['Date'] = 0
for i in range(len(individualdetails.diagnosed_date)):
    individualdetails['Month'][i] = int(individualdetails.diagnosed_date[i].split('/')[1])
    individualdetails['Date'][i] = int(individualdetails.diagnosed_date[i].split('/')[0])

In [None]:
a = pd.DataFrame(individualdetails.groupby(['Month','Date'],as_index=False).id.count())
a['DDate'] = 0
for i in range(len(a.Date)):
    a['DDate'][i] = str(a['Date'][i]) + '/' + str(a['Month'][i])  
plt.figure(figsize=(30,15))
chart = sns.lineplot(x = 'DDate' ,y = 'id' ,sort = False ,data = a)
plt.xlabel('Date')
plt.ylabel('Count')
plt.title('Day vs. Count')
plt.show()

### *Monthwise Status Count*

In [None]:
a = individualdetails.groupby(['Month','current_status'],as_index=False).id.count()
plt.figure(figsize=(20,5))
sns.barplot(x='Month',y='id',hue='current_status',data=a)
plt.xlabel('Month')
plt.ylabel('Count')
plt.title('Monthwise Status vs. Count')
plt.show()

### *Statewise Multiple Line Chart {Day vs. Count}*

In [None]:
a = pd.DataFrame(individualdetails.groupby(['detected_state','Month','Date'],as_index=False).id.count())
a['DDate'] = 0
for i in range(len(a.Date)):
    a['DDate'][i] = str(a['Date'][i]) + '/' + str(a['Month'][i])  
a.sort_values(['Month','Date'] , inplace = True)
a.reset_index()
plt.figure(figsize=(25,10))
chart = sns.lineplot(x = 'DDate' ,y = 'id' ,hue = 'detected_state' ,sort = False ,data = a)
plt.show()

### *Statewise Status Count for Each Month*

In [None]:
a = individualdetails.groupby(['Month','detected_state','current_status'],as_index=False).id.count()
for i in a.Month.unique():
    plt.figure(figsize=(15,4))
    chart = sns.barplot(x='detected_state',y='id',hue='current_status',data=a[a.Month == i])
    chart.set_xticklabels(chart.get_xticklabels(), rotation=90)
    plt.xlabel('Month')
    plt.ylabel('Count')
    plt.title('Status vs. Count for {}.Month'.format(i))
    plt.show()

# File Details

# For [India Dataset](https://www.kaggle.com/sudalairajkumar/covid19-in-india) :

## Context
> * Coronaviruses are a large family of viruses which may cause illness in animals or humans. In humans, several coronaviruses are known to cause respiratory infections ranging from the common cold to more severe diseases such as Middle East Respiratory Syndrome (MERS) and Severe Acute Respiratory Syndrome (SARS). The most recently discovered coronavirus causes coronavirus disease COVID-19 - World Health Organization
* The number of new cases are increasing day by day around the world. This dataset has information from the states and union territories of India at daily level.
* State level data comes from [Ministry of Health & Family Welfare](https://www.mohfw.gov.in/)
* Individual level data comes from [covid19india](https://www.covid19india.org/)

## Content

> **COVID-19 cases at daily level** is present in `covid_19_india.csv` file
 **Individual level details** are present in `IndividualDetails.csv` file and is obtained from [this link](http://portal.covid19india.org/)
 **Population at state level** is present in `population_india_census2011.csv` file
 **Number of COVID-19 tests** at daily level in `ICMRTestingDetails.csv` file
 **Number of hospital beds in each state** in present in `HospitalBedsIndia.csv` file and is extracted from [this link](https://pib.gov.in/PressReleasePage.aspx?PRID=1539877)
 **Travel history dataset** by [@dheerajmpai](https://www.kaggle.com/dheerajmpai/covidindiatravelhistory)

### The Files used with their descriptions are below.
---------------------------------------------------------------------------------------------------------------


* **AgeGroupDetails.csv**
 > Age group details of affected cases. To Download the file click [Here](https://www.kaggle.com/sudalairajkumar/covid19-in-india/download/LBD0rBLGHtjmKAPpMkyk%2Fversions%2Fvbrr05D7usvGb7nRb4py%2Ffiles%2FAgeGroupDetails.csv?)

* **covid_19_india.csv**
 > Number of covid-19 cases in India at daily level. To Download the file click [Here](https://www.kaggle.com/sudalairajkumar/covid19-in-india/download/LBD0rBLGHtjmKAPpMkyk%2Fversions%2Fvbrr05D7usvGb7nRb4py%2Ffiles%2Fcovid_19_india.csv?)

* **HospitalBedsIndia.csv**
 > Number of hospital beds in each state in India. To Download the file click [Here](https://www.kaggle.com/sudalairajkumar/covid19-in-india/download/LBD0rBLGHtjmKAPpMkyk%2Fversions%2Fvbrr05D7usvGb7nRb4py%2Ffiles%2FHospitalBedsIndia.csv?)

* **ICMRTestingDetails.csv**
 > Number of COVID testings at daily level from ICMR. To Download the file click [Here](https://www.kaggle.com/sudalairajkumar/covid19-in-india/download/LBD0rBLGHtjmKAPpMkyk%2Fversions%2Fvbrr05D7usvGb7nRb4py%2Ffiles%2FICMRTestingDetails.csv?)
   
* **ICMRTestingLabs.csv**
 > List of ICMR testing labs that test samples for covid-19. To Download the file click [Here](https://www.kaggle.com/sudalairajkumar/covid19-in-india/download/LBD0rBLGHtjmKAPpMkyk%2Fversions%2Fvbrr05D7usvGb7nRb4py%2Ffiles%2FICMRTestingLabs.csv?)

* **IndividualDetails.csv**
 > Individual case level details. To Download the file click [Here](https://www.kaggle.com/sudalairajkumar/covid19-in-india/download/LBD0rBLGHtjmKAPpMkyk%2Fversions%2Fvbrr05D7usvGb7nRb4py%2Ffiles%2FIndividualDetails.csv?)

* **population_india_census2011.csv**
 > Population of different states in India. To Download the file click [Here](https://www.kaggle.com/sudalairajkumar/covid19-in-india/download/LBD0rBLGHtjmKAPpMkyk%2Fversions%2Fvbrr05D7usvGb7nRb4py%2Ffiles%2Fpopulation_india_census2011.csv?)
 
* **StatewiseTestingDetails.csv**
 > Testing details at state level. To Download the file click [Here](https://www.kaggle.com/sudalairajkumar/covid19-in-india/download/LBD0rBLGHtjmKAPpMkyk%2Fversions%2Fvbrr05D7usvGb7nRb4py%2Ffiles%2FStatewiseTestingDetails.csv?)

# For [Italy Dataset](https://www.kaggle.com/sudalairajkumar/covid19-in-italy) :

## Context
> * Coronaviruses are a large family of viruses which may cause illness in animals or humans. In humans, several coronaviruses are known to cause respiratory infections ranging from the common cold to more severe diseases such as Middle East Respiratory Syndrome (MERS) and Severe Acute Respiratory Syndrome (SARS). The most recently discovered coronavirus causes coronavirus disease COVID-19 - [WHO](https://www.who.int/news-room/q-a-detail/q-a-coronaviruses)
* People can catch COVID-19 from others who have the virus. This has been spreading rapidly around the world and Italy is one of the most affected country.
* On March 8, 2020 - Italy’s prime minister announced a sweeping coronavirus quarantine early Sunday, restricting the movements of about a quarter of the country’s population in a bid to limit contagions at the epicenter of Europe’s outbreak. - [TIME](https://time.com/5799107/italy-coronavirus-quarantine/)

## Content
This dataset is from [`COVID-19`](https://github.com/pcm-dpc/COVID-19) collected by Sito del Dipartimento della Protezione Civile - Emergenza Coronavirus: la risposta nazionale

This dataset has two files

>    **covid19_italy_province.csv**   
  **covid_italy_region.csv**

### The Files used with their descriptions are below.
---------------------------------------------------------------------------------------------------------------


* **covid19_italy_province.csv**
 > Province level data on COVID-19 cases in Italy. To Download the file click [Here](https://www.kaggle.com/sudalairajkumar/covid19-in-italy/download/PHlCX07R5B3JmvFOj2c5%2Fversions%2FdKPsOQUBI2uB0X2H4USR%2Ffiles%2Fcovid19_italy_province.csv?)

* **covid19_italy_region.csv**
 > Region level data on COVID-19 cases in Italy. To Download the file click [Here](https://www.kaggle.com/sudalairajkumar/covid19-in-italy/download/PHlCX07R5B3JmvFOj2c5%2Fversions%2FdKPsOQUBI2uB0X2H4USR%2Ffiles%2Fcovid19_italy_region.csv?)