# COVID19 IN TURKEY EXPLANATORY DATA ANALYSIS

## What will you learn this project?
* By month, like patient/case, pneumonia/patient, Seriously patients/patient, patient/recovering... rates We'll see

* Seaborn library visualization techniques: bar, box, kde, swarm, heatmap, clustermap



## Introduction
* During the pandemic process, countries share rates such as daily patients/cases with their people.

* This dataset contains data officially shared by the Ministry of Health of the Republic of Turkey between May 2020 and July 2021.


## Analysis Content
1. [Python Libraries](#1)
1. [Data content](#2)
1. [Read and Analyse Data](#3)
1. [Data info](#4)
1. [Cleaning Data](#5)
1. [Data Distributions](#6)
1. [Relationship Between Features](#7)
1. [Conclusion](#8)

<a id='1'></a>     
## Python Libraries
* In this section, we import used libraries during this kernel.

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt
import seaborn as sns

import plotly.express as px
import plotly.graph_objs as go
from plotly.offline import init_notebook_mode, iplot
plt.style.use("seaborn-notebook")

sns.set_style("whitegrid")


# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

import warnings
warnings.filterwarnings("ignore")

<a id='2'></a>     
## Data content

* **Total Number of Tests:** Total number of tests performed
* **Total Number of Patiens:** Total number of known patiens 
* **Total Number of Deaths:** Total number of known deaths
* **Pneumonia Rate in Patients (%):** Rate of pneumonia found in patients
* **The Number of Seriously Ill Patients:** Total number of seriously ill patients
* **Total Number of Recovered Patients:** Total number of known patients recovering
* **Number of Cases Today:** Number of known cases today
* **Number of Patients for Today:** Number of known patients today
* **Number of Tests for Today:** Number of known tests today
* **Number of Deaths for Today:** Number of known deaths today
* **Number of Recovered Patients for Today:** Number of known Recovered Patients today

<a id='3'></a>  
# Read and Analyse Data


In [None]:
#read data
df = pd.read_csv("/kaggle/input/latest-covid19-turkey-status/data.csv")

In [None]:
# show last five row of data
df.tail()

In [None]:
# information about data
df.info()

<a id='4'></a>  
## Data info 
* As can be seen, there are many missing data in this data set provided by the Ministry of Health.
* Especially before 25 November 2020, daily case numbers are lacking.
* Before 28 June 2020, we do not have the rates of severe patients and pneumonia.
* As of June 4, 2021, there is no total rate.
* Therefore, in Exploratory Data Analysis, we will examine portions of the data found between November 25, 2020, and June 3, 2021.

* In the next topics, I will try to complete the missing data using machine learning structures.

<a id='5'></a>  
# CLEANING DATA

In [None]:
# We must data parsing for between December 01, 2020, and May 31, 2021.
# Firstly, we keep ourselves safe
data1 = df

# Let's start by indexing the time column.
time_list = data1.loc[:,'Date']
#print(type(time_list))

datetime_object = pd.to_datetime(time_list)       # We used datetime method of pandas
#print(type(datetime_object))
data1["date"] = datetime_object

# lets make date as index
data1= data1.set_index("date")

# Let's delete the unnecessary "Date" column
del data1['Date']


# Finally, let's save the range that we will use
data1 = data1.loc['Dec-01-2020':'May-31-2021']   

print(data1.notnull().all())                       # Checks if it is empty



### As you can see, the name of the columns is not useful
* **Because there is a lot of space between words**
* **The first character of the words in the columns is uppercase and we don't want them**
* **We also need to change the data types**

* **Now let's make the data types usable**

In [None]:
# First, we can make correct types to data values

for item in data1:
    for i in range(0,182):
        if item == 'Total Number of Tests' or item == 'Total Number of Patiens' or item == 'Total Number of Recovered Patients':           
            data1[item][i] = int(data1[item][i].replace('.',''))       # for object 
            
        elif item == 'Pneumonia Rate in Patients (%)':                 # for float 
            temp = data1[item][i].replace(',','.')
            data1[item][i] = float(temp)
            
        else:                                                          # for float 
            temp1 = str(data1[item][i])
            temp = temp1.replace('.','')
            data1[item][i] = int(temp)
            
    

# Next, let's fix the columns    

data1['Total Number of Tests'] = data1['Total Number of Tests'].astype('int')
data1['Total Number of Patiens'] = data1['Total Number of Patiens'].astype('int')
data1['Total Number of Deaths'] = data1['Total Number of Deaths'].astype('int')
data1['Pneumonia Rate in Patients (%)'] = data1['Pneumonia Rate in Patients (%)'].astype('float')
data1['The Number of Seriously Ill Patients'] = data1['The Number of Seriously Ill Patients'].astype('int')
data1['Total Number of Recovered Patients'] = data1['Total Number of Recovered Patients'].astype('int')
data1['Number of Cases Today'] = data1['Number of Cases Today'].astype('int')
data1['Number of Patients for Today'] = data1['Number of Patients for Today'].astype('int')
data1['Number of Tests for Today'] = data1['Number of Tests for Today'].astype('int')
data1['Number of Deaths for Today'] = data1['Number of Deaths for Today'].astype('int')
data1['Number of Recovered Patients for Today'] = data1['Number of Recovered Patients for Today'].astype('int')


# And finally, we fix the name of the columns   

for i in data1.columns:
    new_name = data1[i].name
    new_name = new_name.lower().replace(' ','_')
    data1=data1.rename(columns = {i:new_name})

In [None]:
data1.info()

In [None]:
data1.tail()


## **We cleared our data as you can see**

**In this section we have done:**

* Diagnose data for cleaning
* Datetime method of pandas
* Data parsing
* Fix data types
* Fix tne name of the columns whit several String methods
* Missing data assert

<a id='6'></a> 
# Data Distributions

In [None]:
# distribution of total number features
list_features = ["total_number_of_tests", "total_number_of_patiens",'total_number_of_recovered_patients']
sns.boxplot(data = data1.loc[:, list_features], orient = "h", palette = "Set1")
plt.show()

In [None]:
# distribution of today number features
list_features = ["number_of_patients_for_today","number_of_cases_today","number_of_recovered_patients_for_today"]
sns.boxplot(data = data1.loc[:, list_features], orient = "h", palette = "Set2")
plt.show()

In [None]:
# distribution of today number features
list_features = ["number_of_deaths_for_today","the_number_of_seriously_ill_patients",'number_of_patients_for_today']
sns.boxplot(data = data1.loc[:, list_features], orient = "h", palette = "Set3")
plt.show()



# Data Distributions by month

In [None]:
# Monthly cases rate

data1_num_cases_Mmean = data1.number_of_cases_today.resample('M').mean()

sns.swarmplot(x = data1_num_cases_Mmean.index ,y = data1_num_cases_Mmean , size= 15)
plt.xticks(rotation = 60)
plt.title("Monthly cases rate ")
plt.show()


In [None]:
# Monthly Total number of patients 
data1_tot_num_patients_sum = data1.number_of_patients_for_today.resample('M').sum()

plt.figure(figsize=(8,8))
ax= sns.barplot(x=data1_tot_num_patients_sum.index, y=data1_tot_num_patients_sum ,palette = sns.cubehelix_palette(len(data1_tot_num_patients_sum.index)))
plt.xticks(rotation = 60)

plt.xlabel('Months')
plt.ylabel('Total number')
plt.title('Monthly Total number of patients ')
plt.show()

In [None]:
# Monthly death rate

month_list =['11-2020','12-2020','01-2021','02-2021','03-2021','04-2021','05-2021','06-2021']

data1_num_deaths_Mmean = data1.number_of_deaths_for_today.resample('M').mean()
sns.barplot (x = data1_num_deaths_Mmean,y=data1_num_deaths_Mmean.index,data= data1, palette = "bone_r")
plt.title("Monthly death ratio")
plt.show()

# By looking at this graph, we can say that the death rates increase as the summer months come, that is, people spend more time outside.

In [None]:
# Total number of patients who recovered by months

data1_num_recovered_patients_mean = data1.number_of_recovered_patients_for_today.resample('M').mean()
data1_num_recovered_patients_sum = data1.number_of_recovered_patients_for_today.resample('M').sum()


fig = px.scatter( 
                 y = data1_num_recovered_patients_mean,
                 x = data1_num_recovered_patients_mean.index,
                hover_name = data1_num_recovered_patients_sum,
                animation_group = data1_num_recovered_patients_mean.index,
                template = "plotly_dark",
                size_max= (5,5)
                 )
fig.update_layout(title = "Total number of patients who recovered by months")
fig.show()

In [None]:
# Monthly the number of seriously ill patients

data1_tot_num_seriously_ill_patients_mean = data1.the_number_of_seriously_ill_patients.resample('M').mean()

f,ax1 = plt.subplots(figsize =(6,6))
sns.pointplot(y=data1_tot_num_seriously_ill_patients_mean, x=data1_tot_num_seriously_ill_patients_mean.index,color='lime',alpha=0.8)
plt.xticks(rotation = 60)
plt.title('The number of seriously ill patients',fontsize = 20,color='blue')
plt.show()

In [None]:
# Total number of tests for today

data1_num_tests_for_today_mean = data1.number_of_tests_for_today.resample('M').mean()
data1_num_tests_for_today_sum = data1.number_of_tests_for_today.resample('M').sum()


fig = px.scatter( 
                 y = data1_num_tests_for_today_mean,
                 x = data1_num_tests_for_today_mean.index,
                hover_name = data1_num_tests_for_today_sum,
                animation_group = data1_num_tests_for_today_mean.index,
                template = "plotly_white",
                size_max= (5,5)
                 )
fig.update_layout(title = "Total number of tests for today")
fig.show()

In [None]:
#  Monthly Pneumonia rate in patients
data1_tot_num_pneumonia_rate_Mmean = data1['pneumonia_rate_in_patients_(%)'].resample('M').mean()
data1_tot_num_patients_mean = data1.number_of_patients_for_today.resample('M').mean()

f,ax = plt.subplots(figsize = (6,6))
sns.barplot(x=data1_tot_num_pneumonia_rate_Mmean ,y=data1_tot_num_pneumonia_rate_Mmean.index ,color ='red',alpha = 0.6,label='pneumonia')
ax.set(xlabel='Pneumonia rate(%)', ylabel='DATE',title = "Pneumonia rate in patients")
plt.show()

<a id='7'></a> 
# Relationship Between Features

In [None]:
data1.corr()

In [None]:
sns.heatmap(data1.corr(), annot = True,fmt = ".2f", linewidth = .7)
plt.title("Relationship Between Features ")
plt.show()

In [None]:
sns.clustermap(data1.corr(), center = 0, cmap = "vlag", dendrogram_ratio = (0.1, 0.2), annot = True, linewidths = .7, figsize=(10,10))
plt.show()

<a id='8'></a> 
# What did we learn from the above section
* **When we look at the 6-month charts in Turkey;**
* There was serious declines in January and February with New Year's measures.
* Unfortunately, with the relaxation of the rules in the following months, the increase in the number of cases and patients continued.
* However, there was a serious decrease in the rates of severe patients and pneumonia, probably due to the effect of vaccines.
* We will see this more clearly as the data comes in.

+ **I will fill in the missing data and make updates using machine learning in the future.**
+ **This is my first post, please give your feedback. Take care of yourself**


### Finally, I would like to thank the @DataI team for giving us their excellent course and knowledge.
* **References:** 
* https://www.kaggle.com/kanncaa1/seaborn-tutorial-for-beginners?scriptVersionId=27768785
* https://www.udemy.com/course/python-ile-veri-bilimi-makine-ogrenmesi-projeleri-a-ztm/