### Dataset
This dataset contains anonymized data from patients seen at the Hospital Israelita Albert Einstein, at São Paulo, Brazil, and who had samples collected to perform the SARS-CoV-2 RT-PCR and additional laboratory tests during a visit to the hospital.

All data were anonymized following the best international practices and recommendations. All clinical data were standardized to have a mean of zero and a unit standard deviation.

Task Details
### TASK 1
• Predict confirmed COVID-19 cases among suspected cases.
Based on the results of laboratory tests commonly collected for a suspected COVID-19 case during a visit to the emergency room, would it be possible to predict the test result for SARS-Cov-2 (positive/negative)?

### TASK 2
• Predict admission to general ward, semi-intensive unit or intensive care unit among confirmed COVID-19 cases.
Based on the results of laboratory tests commonly collected among confirmed COVID-19 cases during a visit to the emergency room, would it be possible to predict which patients will need to be admitted to a general ward, semi-intensive unit or intensive care unit?

In [12]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

DATA_PATH = './covid_sp_dataset.xlsx'

data = pd.read_excel(DATA_PATH)


data.drop_duplicates(inplace=True)
data.dropna(inplace=True, subset=['Hemoglobin'])
data.dropna(inplace=True, axis=1, subset=[1])
data.replace(to_replace='not_detected', value=0, inplace=True)
data.replace(to_replace='detected', value=1, inplace=True)
data.replace(to_replace='negative', value=0, inplace=True)
data.replace(to_replace='positive', value=1, inplace=True)
data.drop(['Influenza A, rapid test', 'Influenza B, rapid test', 'Serum Glucose'], axis=1, inplace=True)

#data.fillna(data.mean(), inplace=True)




print(f"The data has {data.shape[0]} rows and {data.shape[1]} columns, the columns are:")

for i, col in enumerate(data.columns):
    print(f"{i}: {col}")

The data has 603 rows and 43 columns, the columns are:
0: Patient ID
1: Patient age quantile
2: SARS-Cov-2 exam result
3: Patient addmited to regular ward (1=yes, 0=no)
4: Patient addmited to semi-intensive unit (1=yes, 0=no)
5: Patient addmited to intensive care unit (1=yes, 0=no)
6: Hematocrit
7: Hemoglobin
8: Platelets
9: Mean platelet volume 
10: Red blood Cells
11: Lymphocytes
12: Mean corpuscular hemoglobin concentration (MCHC)
13: Leukocytes
14: Basophils
15: Mean corpuscular hemoglobin (MCH)
16: Eosinophils
17: Mean corpuscular volume (MCV)
18: Monocytes
19: Red blood cell distribution width (RDW)
20: Respiratory Syncytial Virus
21: Influenza A
22: Influenza B
23: Parainfluenza 1
24: CoronavirusNL63
25: Rhinovirus/Enterovirus
26: Coronavirus HKU1
27: Parainfluenza 3
28: Chlamydophila pneumoniae
29: Adenovirus
30: Parainfluenza 4
31: Coronavirus229E
32: CoronavirusOC43
33: Inf A H1N1 2009
34: Bordetella pertussis
35: Metapneumovirus
36: Parainfluenza 2
37: Neutrophils
38: Urea
3

In [13]:
print(f"The data have the following format:")
data.to_csv('./covid_sp_dataset_treated.csv', index=False)
data.head()

The data have the following format:


Unnamed: 0,Patient ID,Patient age quantile,SARS-Cov-2 exam result,"Patient addmited to regular ward (1=yes, 0=no)","Patient addmited to semi-intensive unit (1=yes, 0=no)","Patient addmited to intensive care unit (1=yes, 0=no)",Hematocrit,Hemoglobin,Platelets,Mean platelet volume,...,Inf A H1N1 2009,Bordetella pertussis,Metapneumovirus,Parainfluenza 2,Neutrophils,Urea,Proteina C reativa mg/dL,Creatinine,Potassium,Sodium
1,126e9dd13932f68,17,0,0,0,0,0.236515,-0.02234,-0.517413,0.010677,...,0.0,0.0,0.0,0.0,-0.619086,1.198059,-0.147895,2.089928,-0.305787,0.862512
8,8bb9d64f0215244,1,0,0,1,0,-1.571682,-0.774212,1.429667,-1.672222,...,0.0,0.0,0.0,0.0,-0.127395,-0.067309,-0.286986,-1.838623,0.93002,0.503132
15,6c9d3323975b082,9,0,0,0,0,-0.747693,-0.586244,-0.42948,-0.213711,...,1.0,0.0,0.0,0.0,0.88057,-0.811643,,-0.908177,0.435697,-0.215628
18,d3ea751f3db9de9,11,0,0,0,0,0.991838,0.792188,0.072992,-0.55029,...,0.0,0.0,0.0,0.0,0.265957,,-0.487674,,,
22,2c2eae16c12a18a,9,0,0,0,0,0.190738,-0.147652,-0.668155,1.020415,...,,,,,-0.42241,-1.332677,,-0.908177,-0.552949,-0.575008


Organize the data

In [None]:
##For KNN

