# COVID19: Análisis exploratorio de datos en Argentina 

**<font color="red">Nota: Algunos datasets dejaron de actualizar los datos, es por ello que algunos de los gráficos pueden anularse o incluso ser negativos a partir de un punto.</font>**

**Tabla de contenidos**

1. [Preparación básica de los datos](#s1)
    
    1.1. [Importación](#s1p1)
    
    1.2. [Selección de datos](#s1p2)

2. [Filtración de datos útiles](#s2)

    2.1. [CDR/TIEMPO](#s2p1)
        
      2.1.1. [Argentina](#s2p1p1)
         
      2.1.2. [Singapur](#s2p1p2)
        
      2.1.3. [Alemania](#s2p1p3)
        
      2.1.4. [Japon](#s2p1p4)
        
      2.1.5. [Corea del Sur](#s2p1p5)
         
    2.2. [Velocidades de punta CDR/semana](#s2p2)
            
    2.3. [TASAS, COMPARACIONES](#s2p3)
   
     2.3.1. [Infectados totales por país](#s2p3p1)
      
     2.3.2. [Infectados totales por pais, ajustado por población](#s2p3p2)
      
     2.3.3. [Tasa de mortalidad por infectados](#s2p3p3)
      
     2.3.4. [Tasa de recuperación por infectados](#s2p3p4)

3. [Referencias](#s3)

# 1. Análisis exploratorio de datos. <a id="s1"></a>
___

## 1.1. Importación <a id="s1p1"></a>

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
pd.plotting.register_matplotlib_converters()
%matplotlib inline
import seaborn as sns
from scipy.integrate import odeint




## 1.2. Selección de datos <a id="s1p2"></a>
---

In [None]:
cdr_logs = pd.read_csv("../input/novel-corona-virus-2019-dataset/covid_19_data.csv", index_col="ObservationDate") ##dataset w/ EVERY country (CONFIRMED, DEATHS, RECOVERED)
cdr_logs["Last Update"] = pd.to_datetime(cdr_logs["Last Update"])#change date format to "datetime64"
cdr_logs["Province/State"] = cdr_logs["Province/State"].fillna("No data")#Change NaN states for "No data" string
cdr_logs_argentina = cdr_logs[cdr_logs["Country/Region"]=="Argentina"] ##dataset ARGENTINA (CONFIRMED, DEATHS, RECOVERED)
cdr_logs_singapore = cdr_logs[cdr_logs["Country/Region"]=="Singapore"] ##dataset SINGAPORE (CONFIRMED, DEATHS, RECOVERED)
cdr_logs_germany = cdr_logs[cdr_logs["Country/Region"]=="Germany"] ##dataset GERMANY (CONFIRMED, DEATHS, RECOVERED)
cdr_logs_japan = cdr_logs[cdr_logs["Country/Region"]=="Japan"] ##dataset JAPAN (CONFIRMED, DEATHS, RECOVERED)
cdr_logs_skorea = cdr_logs[cdr_logs["Country/Region"]=="South Korea"] ##dataset SOUTH KOREA (CONFIRMED, DEATHS, RECOVERED)


# 2. Filtración de datos <a id="s2"></a> 
___

### 2.1. Confirmados, Muertos, Recuperados <a id="s2p1"></a> 
***

#### 2.1.1. Argentina <a id="s2p1p1"></a> 
---

In [None]:
plt.figure(figsize=(40,20))
sns.lineplot(data = cdr_logs_argentina.loc[:,["Confirmed","Deaths","Recovered"]])

#### 2.1.2. Singapur<a id="s2p1p2"></a> 
___

In [None]:
plt.figure(figsize=(40,20))
sns.lineplot(data = cdr_logs_singapore.loc[:,["Confirmed","Deaths","Recovered"]])

#### 2.1.3. Alemania <a id="s2p1p3"></a>
* * * 

In [None]:
plt.figure(figsize=(40,20))
sns.lineplot(data = cdr_logs_germany.loc[:,["Confirmed","Deaths","Recovered"]])


#### 2.1.4. Japón <a id="s2p1p4"></a>

In [None]:
plt.figure(figsize=(40,20))
sns.lineplot(data = cdr_logs_japan.loc[:,["Confirmed","Deaths","Recovered"]])
cdr_logs_japan.tail(1)

#### 2.1.5. Corea del Sur <a id="s2p1p5"></a>

In [None]:
plt.figure(figsize=(40,20))
sns.lineplot(data = cdr_logs_skorea.loc[:,["Confirmed","Deaths","Recovered"]])

## 2.2. Velocidades de punta de CDR <a id="s2p2"></a>
Para ver el estado actual de contagio en cada país voy a computar las velocidades de infeccion, muerte y recuperación de las ultimas semanas.

In [None]:
speed_per_week = pd.DataFrame([(cdr_logs_argentina.tail(7).iloc[6,4:]-cdr_logs_argentina.tail(7).iloc[0,4:])/7,
                     (cdr_logs_singapore.tail(7).iloc[6,4:]-cdr_logs_singapore.tail(7).iloc[0,4:])/7,
                     (cdr_logs_germany.tail(7).iloc[6,4:]-cdr_logs_germany.tail(7).iloc[0,4:])/7,
                     (cdr_logs_japan.tail(7).iloc[6,4:]-cdr_logs_japan.tail(7).iloc[0,4:])/7,
                     (cdr_logs_skorea.tail(7).iloc[6,4:]-cdr_logs_skorea.tail(7).iloc[0,4:])/7], 
                  index=["Argentina","Singapore","Germany","Japan","South Korea"])
sns.barplot(y=speed_per_week.iloc[:,0],x=speed_per_week.index)




In [None]:
sns.barplot(y=speed_per_week.iloc[:,1],x=speed_per_week.index)

In [None]:
sns.barplot(y=speed_per_week.iloc[:,2],x=speed_per_week.index)

## 2.3. Tasas <a id="s2p3"></a>

### 2.3.1. Infectados totales por país <a id="s2p3p1">

In [None]:
cdr_logs_argentina.tail(1)
cdr_logs_germany.tail(1)
cdr_logs_japan.tail(1)
cdr_logs_singapore.tail(1)
cdr_logs_skorea.tail(1)
def lastData(df,column):
    return(df.tail(1)[column][0])
sns.barplot(y = pd.Series([lastData(cdr_logs_argentina,"Confirmed"),
lastData(cdr_logs_singapore,"Confirmed"),
lastData(cdr_logs_germany,"Confirmed"),
lastData(cdr_logs_japan,"Confirmed"),
lastData(cdr_logs_skorea,"Confirmed")]), x=["Argentina","Singapore","Germany","Japan","South Korea"])

plt.ylabel("Infectados")

### 2.3.2. Infectados totales por país, ajustado por población. <a id="s2p3p2">

In [None]:
population_ordered = [45195777,5850343,83783945,126476458,51269183]

sns.barplot(y = pd.Series([lastData(cdr_logs_argentina,"Confirmed"),
lastData(cdr_logs_singapore,"Confirmed"),
lastData(cdr_logs_germany,"Confirmed"),
lastData(cdr_logs_japan,"Confirmed"),
lastData(cdr_logs_skorea,"Confirmed")])*100/population_ordered, x=["Argentina","Singapore","Germany","Japan","South Korea"])
plt.ylabel("% de población infectada")

### 2.3.3. Tasa de mortalidad por infectados<a id="s2p3p3"></a>

In [None]:
sns.barplot(y=pd.Series([lastData(cdr_logs_argentina,"Deaths"),
lastData(cdr_logs_singapore,"Deaths"),
lastData(cdr_logs_germany,"Deaths"),
lastData(cdr_logs_japan,"Deaths"),
lastData(cdr_logs_skorea,"Deaths")])/pd.Series([lastData(cdr_logs_argentina,"Confirmed"),
lastData(cdr_logs_singapore,"Confirmed"),
lastData(cdr_logs_germany,"Confirmed"),
lastData(cdr_logs_japan,"Confirmed"),
lastData(cdr_logs_skorea,"Confirmed")]), x=["Argentina","Singapore","Germany","Japan","South Korea"])
plt.ylabel("Tasa de mortalidad por infectados")

### 2.3.4. Tasa de recuperación <a id="s2p3p4"></a>

In [None]:
sns.barplot(y=pd.Series([lastData(cdr_logs_argentina,"Recovered"),
lastData(cdr_logs_singapore,"Recovered"),
lastData(cdr_logs_germany,"Recovered"),
lastData(cdr_logs_japan,"Recovered"),
lastData(cdr_logs_skorea,"Recovered")])/pd.Series([lastData(cdr_logs_argentina,"Confirmed"),
lastData(cdr_logs_singapore,"Confirmed"),
lastData(cdr_logs_germany,"Confirmed"),
lastData(cdr_logs_japan,"Confirmed"),
lastData(cdr_logs_skorea,"Confirmed")]), x=["Argentina","Singapore","Germany","Japan","South Korea"])
plt.ylabel("Tasa de recuperación por infectados")

# 3. Referencias <a id="s3"></a>

<ul>
    <li><a href="https://scipython.com/about/the-book/">Learning Scientific Programming with Python</a></li>
    <li><a href="https://www.kaggle.com/sudalairajkumar/novel-corona-virus-2019-dataset">Novel Corona Virus 2019 Dataset</a></li>
    
</ul>