# Limpeza e Manutenção dos Dados - DATASETS: ureia.csv e creatinina.csv

## Índice

- [Link para retornar ao notebook principal](#retornar-para-notebook-principal---mainipynb)
- [Importando bibliotecas e pacotes](#importando-bibliotecas-e-pacotes)
- [Importando datasets](#importando-datasets)
- [Breve apresentação dos dados](#breve-apresentacao-dos-dados)
- [Realizando a análise exploratória](#analise-exploratoria-dos-dados)

## Retornar para notebook principal - main.ipynb

[Link para notebook principal](./main.ipynb)

## Importando bibliotecas e pacotes

In [1]:
from utils.alterar_dataset import preencher_dias_faltantes
from matplotlib import pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np

## Importando datasets

In [2]:
df_ureia = pd.read_csv("../databases/raw/ureia.csv", sep = ",", index_col = "subject_id")
df_creatinina = pd.read_csv("../databases/raw/creatinina.csv", sep = ",", index_col = "subject_id")

## Breve apresentacao dos dados

### DATASET ureia

In [3]:
df_ureia.head(10)

Unnamed: 0_level_0,day,UreiaMaxDia
subject_id,Unnamed: 1_level_1,Unnamed: 2_level_1
15158531,1,107.0
15158531,2,100.0
15158531,3,78.0
15158531,4,59.0
15158531,15,59.0
15158531,16,59.0
15158531,17,59.0
15158531,18,64.0
14800685,2,3.0
13697731,3,4.0


In [4]:
df_ureia.info()

<class 'pandas.core.frame.DataFrame'>
Index: 42392 entries, 15158531 to 15274195
Data columns (total 2 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   day          42392 non-null  int64  
 1   UreiaMaxDia  42392 non-null  float64
dtypes: float64(1), int64(1)
memory usage: 993.6 KB


### DATASET creatinina

In [5]:
df_creatinina.head(10)

Unnamed: 0_level_0,day,CreatininaMaxDia
subject_id,Unnamed: 1_level_1,Unnamed: 2_level_1
15158531,15,3.8
15158531,16,3.9
15158531,17,3.9
15158531,18,3.7
13697731,66,0.1
13697731,68,0.1
15796335,5,4.2
15796335,6,5.4
15796335,7,4.2
15796335,12,4.5


In [6]:
df_creatinina.info()

<class 'pandas.core.frame.DataFrame'>
Index: 42383 entries, 15158531 to 12953561
Data columns (total 2 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   day               42383 non-null  int64  
 1   CreatininaMaxDia  42383 non-null  float64
dtypes: float64(1), int64(1)
memory usage: 993.4 KB


## Limpeza

### Retirando todos os dados cujo dia (coluna *day*) é maior que `7`

#### DATASET ureia

In [7]:
df_ureia_dia_menor_7 = df_ureia[df_ureia["day"] < 8]
df_ureia_dia_menor_7["day"].unique()

array([1, 2, 3, 4, 5, 6, 7])

In [8]:
df_ureia_dia_menor_7["day"].value_counts()

day
1    3297
3    3281
2    3277
4    3267
5    3262
6    3242
7    3082
Name: count, dtype: int64

#### DATASET creatinina

In [9]:
df_creatinina_dia_menor_7 = df_creatinina[df_creatinina["day"] < 8]
df_creatinina_dia_menor_7["day"].unique()

array([5, 6, 7, 4, 3, 2, 1])

In [10]:
df_creatinina_dia_menor_7["day"].value_counts()

day
1    3298
3    3282
2    3275
4    3267
5    3262
6    3241
7    3081
Name: count, dtype: int64

### Inserindo novos dias ate que todos os pacientes possuam 7 dias de observacao

#### DATASET ureia

In [11]:
df_ureia_dia_menor_7 = df_ureia_dia_menor_7.reset_index()
df_ureia_processed = preencher_dias_faltantes(df_ureia_dia_menor_7, ["subject_id", "day"], "UreiaMaxDia", 8)
df_ureia_processed.head(14)

Unnamed: 0_level_0,Unnamed: 1_level_0,UreiaMaxDia
subject_id,day,Unnamed: 2_level_1
10001884,1,30.0
10001884,2,
10001884,3,38.0
10001884,4,31.0
10001884,5,26.0
10001884,6,22.0
10001884,7,14.0
10004422,1,21.0
10004422,2,13.0
10004422,3,17.0


#### DATASET creatinina

In [12]:
df_creatinina_dia_menor_7 = df_creatinina_dia_menor_7.reset_index()
df_creatinina_processed = preencher_dias_faltantes(df_creatinina_dia_menor_7, ["subject_id", "day"], "CreatininaMaxDia", 8)
df_creatinina_processed.head(14)

Unnamed: 0_level_0,Unnamed: 1_level_0,CreatininaMaxDia
subject_id,day,Unnamed: 2_level_1
10001884,1,1.1
10001884,2,
10001884,3,1.3
10001884,4,0.9
10001884,5,0.8
10001884,6,0.6
10001884,7,0.5
10004422,1,0.7
10004422,2,0.7
10004422,3,0.7
