## Classificação de Modelos cardíacos tridimensionais relacionada à presença ou ausência de cardimiopatia

Diego Dedize <br />
Luiz Gustavo Silva <br />
Johnny Demetrius <br />
Vagner Mendonça Gonçalves <br />

version 0.1 - 03/04/2020

**Objetivo:** aplicar modelos de classificação  e avaliar os resultados obtidos em um dataset composto por vetores de características extraídas de modelos 3D reconstruídos a partir de exames de Ressonancia Magnética Cardíaca (RMC).

**Cardiomiopatia:** quadro clínico dado por uma anormalidade no miocárdio (estrutura muscular do coração) (KUMAR et al., 2010 apud BERGAMASCO, 2018).

**Fonte dos dados:** Dataset disponibilizado pelo Laboratório de Aplicações de Informática em Saúde (LApIS) - EACH/USP, coordenado pela Profa. Dra. Fátima L. S. Nunes.

**Referências:** <br />
BERGAMASCO, Leila Cristina Carneiro. Recuperação de imagens cardíacas tridimensionais por conteúdo. 2013. 134 f. Dissertação (Mestrado em Ciências) - Programa de Pós-graduação em Sistemas de Informação, Escola de Artes, Ciências e Humanidades, Universidade de São Paulo, São Paulo, 2013. 

BERGAMASCO, Leila Cristina Carneiro. Recuperação de objetos médicos 3D utilizando harmônicos esféricos e redes de fluxo. 2018. 181 f. Tese (Doutorado em Ciências) - Escola Politécnica, Departamento de Engenharia da Computação e Sistemas Digitais, Universidade de São Paulo, São Paulo, 2018.

KUMAR, V.; ABBAS, A. K.; FAUSTO, N.; ASTER, J. C.. Robbins & Cotran – Patologia: Bases Patológicas das Doenças. 8 ed. Rio de Janeiro: Elsevier, 2010.


### Importanto bibliotecas importantes

In [1]:
import pandas as pd
import numpy as np

### Importando arquivo e verificando primeiras linhas

In [2]:
df = pd.read_csv("Total_SPHARM_20200326.csv",header=None)
df.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,707,708,709,710,711,712,713,714,715,716
0,id001,29,M,-7881.480247,-5759.969698,-24465.608592,-15275.106756,-22974.37807,-22686.138288,-6131.488463,...,,,,,,,,,,0
1,id002,31,M,-567.772697,-33.292309,-465.179132,-525.98101,-469.5469,-305.182676,-89.243364,...,,,,,,,,,,0
2,id003,27,M,-135372.767326,-115124.114646,-772665.053883,-292331.423079,-58059.25501,-118464.329594,-585147.837935,...,,,,,,,,,,0
3,id004,52,M,-582.939571,-366.425893,-281.022452,-437.739821,-206.814933,-230.346112,-126.535277,...,,,,,,,,,,0
4,id005,56,M,-913.082501,-334.221895,-449.102108,-113.637478,-50.065343,-127.180338,-640.351614,...,,,,,,,,,,0


### Verifica colunas e adiciona cabeçalho

In [3]:
df.shape

(400, 717)

In [4]:
df.columns

Int64Index([  0,   1,   2,   3,   4,   5,   6,   7,   8,   9,
            ...
            707, 708, 709, 710, 711, 712, 713, 714, 715, 716],
           dtype='int64', length=717)

In [5]:
colunas_1 =['id','age','sex']
colunas_1

['id', 'age', 'sex']

In [6]:
colunas_2 = []
for i in range(1,714):
     colunas_2.append(str(i))

In [7]:
coluna_3 = ['class_id']

In [8]:
colunas_total = colunas_1 + colunas_2 + coluna_3

In [9]:
df.columns = colunas_total

In [10]:
df.head()

Unnamed: 0,id,age,sex,1,2,3,4,5,6,7,...,705,706,707,708,709,710,711,712,713,class_id
0,id001,29,M,-7881.480247,-5759.969698,-24465.608592,-15275.106756,-22974.37807,-22686.138288,-6131.488463,...,,,,,,,,,,0
1,id002,31,M,-567.772697,-33.292309,-465.179132,-525.98101,-469.5469,-305.182676,-89.243364,...,,,,,,,,,,0
2,id003,27,M,-135372.767326,-115124.114646,-772665.053883,-292331.423079,-58059.25501,-118464.329594,-585147.837935,...,,,,,,,,,,0
3,id004,52,M,-582.939571,-366.425893,-281.022452,-437.739821,-206.814933,-230.346112,-126.535277,...,,,,,,,,,,0
4,id005,56,M,-913.082501,-334.221895,-449.102108,-113.637478,-50.065343,-127.180338,-640.351614,...,,,,,,,,,,0


### Verificando as features

In [11]:
df.columns

Index(['id', 'age', 'sex', '1', '2', '3', '4', '5', '6', '7',
       ...
       '705', '706', '707', '708', '709', '710', '711', '712', '713',
       'class_id'],
      dtype='object', length=717)

In [12]:
for ind, row in df.iterrows():
    if (df.loc[ind,'class_id']==0): df.loc[ind,'class_desc'] = 'normal'
    if (df.loc[ind,'class_id']==1): df.loc[ind,'class_desc'] = 'CMD'
    if (df.loc[ind,'class_id']==2): df.loc[ind,'class_desc'] = 'CMH'
        

In [13]:
df.head()

Unnamed: 0,id,age,sex,1,2,3,4,5,6,7,...,706,707,708,709,710,711,712,713,class_id,class_desc
0,id001,29,M,-7881.480247,-5759.969698,-24465.608592,-15275.106756,-22974.37807,-22686.138288,-6131.488463,...,,,,,,,,,0,normal
1,id002,31,M,-567.772697,-33.292309,-465.179132,-525.98101,-469.5469,-305.182676,-89.243364,...,,,,,,,,,0,normal
2,id003,27,M,-135372.767326,-115124.114646,-772665.053883,-292331.423079,-58059.25501,-118464.329594,-585147.837935,...,,,,,,,,,0,normal
3,id004,52,M,-582.939571,-366.425893,-281.022452,-437.739821,-206.814933,-230.346112,-126.535277,...,,,,,,,,,0,normal
4,id005,56,M,-913.082501,-334.221895,-449.102108,-113.637478,-50.065343,-127.180338,-640.351614,...,,,,,,,,,0,normal


In [14]:
df['class_desc'].value_counts()

CMD       183
CMH       116
normal    101
Name: class_desc, dtype: int64

### Definição das classes binarizadas (0 = Sem anomalia | 1 = Com anomalia)

In [15]:
for ind, row in df.iterrows():
    if (df.loc[ind,'class_id']==0):
        df.loc[ind,'target'] = '0'
    else:
        df.loc[ind,'target'] = '1'
 

In [16]:
df.head()

Unnamed: 0,id,age,sex,1,2,3,4,5,6,7,...,707,708,709,710,711,712,713,class_id,class_desc,target
0,id001,29,M,-7881.480247,-5759.969698,-24465.608592,-15275.106756,-22974.37807,-22686.138288,-6131.488463,...,,,,,,,,0,normal,0
1,id002,31,M,-567.772697,-33.292309,-465.179132,-525.98101,-469.5469,-305.182676,-89.243364,...,,,,,,,,0,normal,0
2,id003,27,M,-135372.767326,-115124.114646,-772665.053883,-292331.423079,-58059.25501,-118464.329594,-585147.837935,...,,,,,,,,0,normal,0
3,id004,52,M,-582.939571,-366.425893,-281.022452,-437.739821,-206.814933,-230.346112,-126.535277,...,,,,,,,,0,normal,0
4,id005,56,M,-913.082501,-334.221895,-449.102108,-113.637478,-50.065343,-127.180338,-640.351614,...,,,,,,,,0,normal,0


In [17]:
df['target'].value_counts()

1    299
0    101
Name: target, dtype: int64

### Excluindo features irrelevantes

In [18]:
df_bkp = df

In [19]:
df.head()

Unnamed: 0,id,age,sex,1,2,3,4,5,6,7,...,707,708,709,710,711,712,713,class_id,class_desc,target
0,id001,29,M,-7881.480247,-5759.969698,-24465.608592,-15275.106756,-22974.37807,-22686.138288,-6131.488463,...,,,,,,,,0,normal,0
1,id002,31,M,-567.772697,-33.292309,-465.179132,-525.98101,-469.5469,-305.182676,-89.243364,...,,,,,,,,0,normal,0
2,id003,27,M,-135372.767326,-115124.114646,-772665.053883,-292331.423079,-58059.25501,-118464.329594,-585147.837935,...,,,,,,,,0,normal,0
3,id004,52,M,-582.939571,-366.425893,-281.022452,-437.739821,-206.814933,-230.346112,-126.535277,...,,,,,,,,0,normal,0
4,id005,56,M,-913.082501,-334.221895,-449.102108,-113.637478,-50.065343,-127.180338,-640.351614,...,,,,,,,,0,normal,0


In [20]:
df.drop(columns=['class_id'],inplace=True)
df.drop(columns=['class_desc'],inplace=True)
df.head()

Unnamed: 0,id,age,sex,1,2,3,4,5,6,7,...,705,706,707,708,709,710,711,712,713,target
0,id001,29,M,-7881.480247,-5759.969698,-24465.608592,-15275.106756,-22974.37807,-22686.138288,-6131.488463,...,,,,,,,,,,0
1,id002,31,M,-567.772697,-33.292309,-465.179132,-525.98101,-469.5469,-305.182676,-89.243364,...,,,,,,,,,,0
2,id003,27,M,-135372.767326,-115124.114646,-772665.053883,-292331.423079,-58059.25501,-118464.329594,-585147.837935,...,,,,,,,,,,0
3,id004,52,M,-582.939571,-366.425893,-281.022452,-437.739821,-206.814933,-230.346112,-126.535277,...,,,,,,,,,,0
4,id005,56,M,-913.082501,-334.221895,-449.102108,-113.637478,-50.065343,-127.180338,-640.351614,...,,,,,,,,,,0


### Verificando features 

In [21]:
df.dtypes

id         object
age        object
sex        object
1         float64
2         float64
           ...   
710       float64
711       float64
712       float64
713       float64
target     object
Length: 717, dtype: object

In [22]:
df.describe()

Unnamed: 0,1,2,3,4,5,6,7,8,9,10,...,704,705,706,707,708,709,710,711,712,713
count,400.0,400.0,400.0,400.0,400.0,400.0,400.0,400.0,400.0,400.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
mean,-418477.7,-238191.7,-294377.2,-279254.1,-395259.1,-386545.1,-320496.0,-321291.9,-317617.4,-300557.8,...,-58.335621,-1.960383,-149.222355,-233.562542,-126.6011,-154.984096,-155.468962,-124.529409,-148.211614,-86.011145
std,3669986.0,1532231.0,2742143.0,1962290.0,3851215.0,3769669.0,2666231.0,3078381.0,2617692.0,2140763.0,...,,,,,,,,,,
min,-63514060.0,-19249410.0,-49471150.0,-27563150.0,-71267950.0,-69142510.0,-43449590.0,-56406980.0,-39712550.0,-26185660.0,...,-58.335621,-1.960383,-149.222355,-233.562542,-126.6011,-154.984096,-155.468962,-124.529409,-148.211614,-86.011145
25%,-9671.564,-9623.35,-7915.87,-8164.928,-9045.309,-8973.423,-8167.788,-9698.177,-8553.08,-8869.544,...,-58.335621,-1.960383,-149.222355,-233.562542,-126.6011,-154.984096,-155.468962,-124.529409,-148.211614,-86.011145
50%,-1078.799,-1191.435,-1096.427,-1036.713,-1191.743,-1296.3,-1281.311,-1168.526,-1230.535,-1011.175,...,-58.335621,-1.960383,-149.222355,-233.562542,-126.6011,-154.984096,-155.468962,-124.529409,-148.211614,-86.011145
75%,-289.6726,-275.2981,-230.2181,-265.259,-248.4256,-252.0838,-260.421,-301.5197,-256.7557,-213.3616,...,-58.335621,-1.960383,-149.222355,-233.562542,-126.6011,-154.984096,-155.468962,-124.529409,-148.211614,-86.011145
max,-1.679736,-0.8179066,-0.8525168,-0.7131343,-0.6057804,-0.7625684,-1.031201,-0.03125699,-0.3058401,-0.03762626,...,-58.335621,-1.960383,-149.222355,-233.562542,-126.6011,-154.984096,-155.468962,-124.529409,-148.211614,-86.011145


### Verificar missings Values

### Aplicar PCA