# Projeto de TCC
**Autor**: Igor Sousa dos Santos Santana

**Data**: 2024-07-26

## Índice

- [Importando pacotes e Bibliotecas](#importando-pacotes-e-bibliotecas)
- [Importando os Datasets](#importando-os-datasets)
- [Breve Apresentação dos Datasets](#breve-apresentacao-dos-datasets)
    - [Iris](#dataset-iris)
    - [Titanic](#dataset-titanic)
    - [Doença do Coração](#dataset-heart-disease)
- [Breve Limpeza nos Datasets](#limpeza-dos-datasets)
- [Análise Exploratória de Dados](#analise-exploratoria-dos-dados)
- [Algoritmo KMeans](#utilizando-o-algoritmo-kmeans)
- [Algoritmo MeanShift](#utilizando-o-algoritmo-meanshift)
- [Algoritmo FuzzyCMeans](#utilizando-o-algoritmo-fuzzycmeans)
- [Bibliografia](#bibliografia)

## Importando pacotes e Bibliotecas

In [75]:
import pandas as pd
import numpy as np

## Importando os Datasets

In [76]:
df_iris_raw = pd.read_csv("./databases/raw/Iris.csv", sep = ",", index_col = "Id")
df_titanic_processed = pd.read_pickle("./databases/processed/titanic_processado.pkl")
df_coracao_processed = pd.read_pickle("./databases/processed/coracao_processado.pkl")

## Breve apresentacao dos datasets

### Dataset Iris

In [77]:
df_iris_raw.sample(10)

Unnamed: 0_level_0,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
Id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
126,7.2,3.2,6.0,1.8,Iris-virginica
3,4.7,3.2,1.3,0.2,Iris-setosa
49,5.3,3.7,1.5,0.2,Iris-setosa
91,5.5,2.6,4.4,1.2,Iris-versicolor
78,6.7,3.0,5.0,1.7,Iris-versicolor
72,6.1,2.8,4.0,1.3,Iris-versicolor
67,5.6,3.0,4.5,1.5,Iris-versicolor
124,6.3,2.7,4.9,1.8,Iris-virginica
11,5.4,3.7,1.5,0.2,Iris-setosa
57,6.3,3.3,4.7,1.6,Iris-versicolor


In [80]:
df_iris_raw.info()

<class 'pandas.core.frame.DataFrame'>
Index: 150 entries, 1 to 150
Data columns (total 5 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   SepalLengthCm  150 non-null    float64
 1   SepalWidthCm   150 non-null    float64
 2   PetalLengthCm  150 non-null    float64
 3   PetalWidthCm   150 non-null    float64
 4   Species        150 non-null    object 
dtypes: float64(4), object(1)
memory usage: 7.0+ KB


### Dataset Titanic

In [78]:
df_titanic_processed.sample(10)

Unnamed: 0_level_0,Survived,Pclass,Name,Sex,Age,SibSp,Fare,Embarked
PassengerId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
670,1,1,"Taylor, Mrs. Elmer Zebley (Juliet Cummins Wright)",0,28,1,52.0,2
671,1,2,"Brown, Mrs. Thomas William Solomon (Elizabeth ...",0,40,1,39.0,2
789,1,3,"Dean, Master. Bertram Vere",1,1,1,20.575001,2
593,0,3,"Elsbury, Mr. William James",1,47,0,7.25,2
499,0,1,"Allison, Mrs. Hudson J C (Bessie Waldo Daniels)",0,25,1,151.550003,2
444,1,2,"Reynaldo, Ms. Encarnacion",0,28,0,13.0,2
298,0,1,"Allison, Miss. Helen Loraine",0,2,1,151.550003,2
293,0,2,"Levy, Mr. Rene Jacques",1,36,0,12.875,0
149,0,2,"Navratil, Mr. Michel (""Louis M Hoffman"")",1,36,0,26.0,2
34,0,2,"Wheadon, Mr. Edward H",1,66,0,10.5,2


In [81]:
df_titanic_processed.info()

<class 'pandas.core.frame.DataFrame'>
Index: 1473 entries, 1 to 724
Data columns (total 8 columns):
 #   Column    Non-Null Count  Dtype   
---  ------    --------------  -----   
 0   Survived  1473 non-null   category
 1   Pclass    1473 non-null   category
 2   Name      1473 non-null   string  
 3   Sex       1473 non-null   category
 4   Age       1473 non-null   uint8   
 5   SibSp     1473 non-null   uint8   
 6   Fare      1473 non-null   float32 
 7   Embarked  1473 non-null   category
dtypes: category(4), float32(1), string(1), uint8(2)
memory usage: 37.6 KB


### Dataset Heart Disease

In [79]:
df_coracao_processed.sample(10)

Unnamed: 0,age,sex,cp,trestbps,chol,fbs,restecg,thalach,exang,oldpeak,slope,ca,thal,target
3,56,1,1,120,236,0,1,178,0,0.8,2,0,2,1
14,58,0,3,150,283,1,0,162,0,1.0,2,0,2,1
87,46,1,1,101,197,1,1,156,0,0.0,2,0,3,1
144,76,0,2,140,197,0,2,116,0,1.1,1,0,2,1
125,34,0,1,118,210,0,1,192,0,0.7,2,0,2,1
216,62,0,2,130,263,0,1,97,0,1.2,1,1,3,0
106,69,1,3,160,234,1,0,131,0,0.1,1,1,2,1
266,55,0,0,180,327,0,2,117,1,3.4,1,0,2,0
280,42,1,0,136,315,0,1,125,1,1.8,1,0,1,0
91,57,1,0,132,207,0,1,168,1,0.0,2,0,3,1


In [82]:
df_coracao_processed.info()

<class 'pandas.core.frame.DataFrame'>
Index: 456 entries, 1 to 289
Data columns (total 14 columns):
 #   Column    Non-Null Count  Dtype   
---  ------    --------------  -----   
 0   age       456 non-null    uint8   
 1   sex       456 non-null    category
 2   cp        456 non-null    category
 3   trestbps  456 non-null    uint8   
 4   chol      456 non-null    uint16  
 5   fbs       456 non-null    category
 6   restecg   456 non-null    category
 7   thalach   456 non-null    uint8   
 8   exang     456 non-null    category
 9   oldpeak   456 non-null    float32 
 10  slope     456 non-null    category
 11  ca        456 non-null    category
 12  thal      456 non-null    category
 13  target    456 non-null    category
dtypes: category(9), float32(1), uint16(1), uint8(3)
memory usage: 11.8 KB


## Limpeza dos Datasets

[Limpeza dos dados](./limpeza_dados.ipynb)

## Analise Exploratoria dos Dados
AED

[Análise Exploratória dos Dados](./analise_exploratoria.ipynb)

## Utilizando o algoritmo KMeans

[Notebook do algoritmo KMeans](./kmeans.ipynb)

## Utilizando o algoritmo Meanshift

[Notebook do algoritmo MeanShift](./meanshift.ipynb)

## Utilizando o algoritmo FuzzyCMeans

[Notebook do algoritmo Fuzzy-C-Means](./fcmeans.ipynb)

## Bibliografia

### SCIKIT-LEARN
- https://scikit-learn.org/stable/

### SCIKIT-LEARN.KMEANS
- https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html

### SCIKIT-LEARN.MEANSHIFT
- https://scikit-learn.org/stable/modules/generated/sklearn.cluster.MeanShift.html
- https://scikit-learn.org/stable/auto_examples/cluster/plot_mean_shift.html

### FUZZY_C_MEANS
- https://pypi.org/project/fuzzy-c-means/
	- https://github.com/omadson/fuzzy-c-means
- https://github.com/ShristiK/Fuzzy-C-Means-Clustering
- https://pythonhosted.org/scikit-fuzzy/auto_examples/plot_cmeans.html
- https://www.kaggle.com/code/prateekk94/fuzzy-c-means-clustering-on-iris-dataset
- https://fda.readthedocs.io/en/latest/index.html
	- https://fda.readthedocs.io/en/latest/modules/ml/autosummary/skfda.ml.clustering.FuzzyCMeans.html
