## Descrição

Nesse notebook é feito uma análise exploratória dos dados de 1 paciente do dataset I-CARE - CinC2023 Challenge - FASE 1

DATASET: https://physionet.org/content/i-care/1.0/

In [1]:
# Importar bibliotecas
import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

Pyarrow will become a required dependency of pandas in the next major release of pandas (pandas 3.0),
(to allow more performant data types, such as the Arrow string type, and better interoperability with other libraries)
but was not found to be installed on your system.
If this would cause problems for you,
please provide us feedback at https://github.com/pandas-dev/pandas/issues/54466
        
  import pandas as pd


In [3]:
# Caminho do arquivo
patient_file = r'C:/Users/estel/Documents/Python_Codes/datasets/i-care-international-cardiac-arrest-research-consortium-database-1.0/training/ICARE_0284/ICARE_0284.txt'


patient_metadata = []
record_quality = []

In [4]:
# Lendo arquivo .txt e transformando em dicionário
patient_metadata.extend(
    pd.read_csv(patient_file, delimiter=": ", header=None, index_col=0, engine='python').T.to_dict(orient='records')
)

print(patient_metadata)

[{'Patient': 'ICARE_0284', 'Age': '53', 'Sex': 'Male', 'ROSC': nan, 'OHCA': 'True', 'VFib': 'True', 'TTM': '33', 'Outcome': 'Good', 'CPC': '1'}]


In [5]:
# Lendo arquivo .tsv para qualidade das janelas
record_quality.append(
            np.array(pd.read_csv(patient_file[:-3] + "tsv", delimiter='\t', engine='python').Quality)
        )
print(record_quality)
print('--> LEN: ',len(record_quality))

[array([  nan,   nan,   nan,   nan, 1.   , 1.   , 1.   , 1.   , 0.983,
       1.   , 1.   , 0.817, 0.933, 1.   , 1.   , 1.   , 1.   , 1.   ,
       0.983, 0.95 , 1.   , 1.   , 1.   , 1.   , 1.   ,   nan,   nan,
         nan, 1.   , 1.   , 1.   , 1.   , 1.   , 1.   , 1.   , 1.   ,
       1.   , 1.   , 1.   , 1.   , 1.   , 1.   , 1.   , 1.   , 1.   ,
       1.   , 1.   , 1.   , 1.   , 1.   , 1.   , 1.   , 1.   , 1.   ,
       1.   , 1.   , 1.   , 1.   , 1.   , 1.   , 1.   , 1.   , 1.   ,
       1.   , 1.   , 1.   , 1.   , 1.   , 1.   , 1.   , 1.   , 1.   ])]
--> LEN:  1


In [6]:
# Qualidade do sinal para cada hora (1-72 horas de registro)
df_quality = pd.DataFrame(np.vstack(record_quality), columns=[f"h{i:02}" for i in range(72)])
df_quality

Unnamed: 0,h00,h01,h02,h03,h04,h05,h06,h07,h08,h09,...,h62,h63,h64,h65,h66,h67,h68,h69,h70,h71
0,,,,,1.0,1.0,1.0,1.0,0.983,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0


In [7]:
valid_hours = np.sum(~np.isnan(np.vstack(record_quality)), axis=1) # NaN = não tem registro/sinal
mean_quality = np.nanmean(np.vstack(record_quality), axis=1)

print(valid_hours)
print(mean_quality)

[65]
[0.99486154]


In [8]:
df_patient = pd.DataFrame(patient_metadata)
df_patient

Unnamed: 0,Patient,Age,Sex,ROSC,OHCA,VFib,TTM,Outcome,CPC
0,ICARE_0284,53,Male,,True,True,33,Good,1


In [9]:
df_patient['valid_hours'] = valid_hours
df_patient['mean_quality'] = mean_quality

df_patient

Unnamed: 0,Patient,Age,Sex,ROSC,OHCA,VFib,TTM,Outcome,CPC,valid_hours,mean_quality
0,ICARE_0284,53,Male,,True,True,33,Good,1,65,0.994862


In [10]:
df_patient["Age"] = df_patient["Age"].astype(float)
df_patient["ROSC"] = df_patient["ROSC"].astype(float)
df_patient["OHCA"] = df_patient["OHCA"].astype(bool)
df_patient["TTM"] = df_patient["TTM"].astype(float)
df_patient['Poor_out'] = df_patient["Outcome"] == 'Poor'
df_patient['male'] = df_patient["Sex"] == 'Male'
df_patient['female'] = df_patient["Sex"] == 'Female'
df_patient["VFib"] = df_patient["VFib"].astype(bool)
df_patient['pindex'] = df_patient["Patient"].str.replace("ICARE_", "")
df_patient['CPC'] = df_patient['CPC'].astype(int)
df_patient["Sex"].fillna("Other", inplace=True)
df_patient["Sex"] = df_patient["Sex"].astype(str)

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df_patient["Sex"].fillna("Other", inplace=True)


In [11]:
df_patient

Unnamed: 0,Patient,Age,Sex,ROSC,OHCA,VFib,TTM,Outcome,CPC,valid_hours,mean_quality,Poor_out,male,female,pindex
0,ICARE_0284,53.0,Male,,True,True,33.0,Good,1,65,0.994862,False,True,False,284


In [12]:
df = pd.concat([df_patient, df_quality], axis=1)
df

Unnamed: 0,Patient,Age,Sex,ROSC,OHCA,VFib,TTM,Outcome,CPC,valid_hours,...,h62,h63,h64,h65,h66,h67,h68,h69,h70,h71
0,ICARE_0284,53.0,Male,,True,True,33.0,Good,1,65,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0


In [13]:
 #plt.imsave("valid_data.png", np.isnan(df_quality), cmap='gray')

In [14]:
#df_patient.describe(include='all', percentiles=[])