[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/repos-especializacion-UdeA/data-raw/blob/main/notebooks/features_extraction.ipynb)

# Extracción de caracteristicas

El siguiente notebook explora de manera sencilla un archivo de matlab donde se guarda la información de un sensor.

In [None]:
# Solo ejecutelo la primera vez si no tiene esto instalado
import sys
!{sys.executable} -m pip install -U ydata-profiling[notebook]
!pip install jupyter-contrib-nbextensions





Collecting jupyter-contrib-nbextensions
  Downloading jupyter_contrib_nbextensions-0.7.0.tar.gz (23.5 MB)
     ---------------------------------------- 23.5/23.5 MB 6.4 MB/s eta 0:00:00
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Collecting jupyter_contrib_core>=0.3.3
  Downloading jupyter_contrib_core-0.4.2.tar.gz (17 kB)
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Collecting jupyter_highlight_selected_word>=0.1.1
  Downloading jupyter_highlight_selected_word-0.2.0-py2.py3-none-any.whl (11 kB)
Collecting jupyter_nbextensions_configurator>=0.4.0
  Downloading jupyter_nbextensions_configurator-0.6.4-py2.py3-none-any.whl (466 kB)
     -------------------------------------- 466.9/466.9 kB 5.9 MB/s eta 0:00:00
Building wheels for collected packages: jupyter-contrib-nbextensions, jupyter_contrib_core
  Building wheel for jupyter-contrib-nbextensions (setup.py): started
  Build



In [3]:
!jupyter nbextension enable --py widgetsnbextension

Enabling notebook extension jupyter-js-widgets/extension...
      - Validating: ok


## 1. Librerias y configuraciones previas

In [4]:
import sys
import os
import zipfile

# Get the absolute path of the current notebook
data_set = "./datasets/features_data_set.csv"
url_data_set = 'https://raw.githubusercontent.com/repos-especializacion-UdeA/data-raw/refs/heads/main/notebooks/datasets/features_data_set.csv'
try:
    import google.colab
    try:
        import scipy.io
    except ImportError:
        !pip install scipy
    data_set = url_data_set   
except ImportError:
    ruta_base = './'


In [14]:
# command to view figures in Jupyter notebook
# %matplotlib inline 

# Tratamiento de datos
# ==============================================================================
import pandas as pd
import numpy as np
import scipy as sc
from ydata_profiling import ProfileReport

# Almacenar en caché los resultados de funciones en el disco
# ==============================================================================
import joblib


# Gestion de librerias
# ==============================================================================
from importlib import reload

# Matemáticas y estadísticas
# ==============================================================================
import math

# Gráficos
# ==============================================================================
import matplotlib.pyplot as plt
from matplotlib import style
import seaborn as sns


# Configuración warnings
# ==============================================================================
import warnings
warnings.filterwarnings('ignore')

# Formateo y estilo
# ==============================================================================
from IPython.display import Markdown, display

# Biblioteca scipy y componentes
# ==============================================================================
import scipy.io
from scipy import signal


## 2. Funciones

In [6]:
# Funciones de utilidad
# ==============================================================================

# To Do...


## 3. Carga del dataset

A continuación se realiza la carga del dataset completo

In [7]:
# Carga del dataset
df = pd.read_csv(data_set)

A continuación se verifica la carga del dataset:

In [9]:
df.head()

Unnamed: 0,s,emg_1,emg_2,emg_3,emg_4,emg_5,emg_6,emg_7,emg_8,emg_9,emg_10,rep,label
0,1,0.05251,0.002414,0.002445,0.002417,0.0024,0.006204,0.0024,0.041218,0.0024,0.019526,0,0
1,1,0.038543,0.00244,0.002513,0.002443,0.002426,0.002803,0.0024,0.029789,0.0024,0.005035,0,0
2,1,0.035662,0.002448,0.002564,0.002446,0.002478,0.001975,0.0024,0.025287,0.0024,0.000813,0,0
3,1,0.037038,0.002425,0.002542,0.00242,0.002526,0.002129,0.0024,0.026216,0.0024,0.001485,0,0
4,1,0.035718,0.002404,0.002478,0.002401,0.002542,0.002346,0.0024,0.026433,0.0024,0.002234,0,0


In [25]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2731393 entries, 0 to 2731392
Data columns (total 13 columns):
 #   Column  Dtype  
---  ------  -----  
 0   s       int64  
 1   emg_1   float64
 2   emg_2   float64
 3   emg_3   float64
 4   emg_4   float64
 5   emg_5   float64
 6   emg_6   float64
 7   emg_7   float64
 8   emg_8   float64
 9   emg_9   float64
 10  emg_10  float64
 11  rep     int64  
 12  label   int64  
dtypes: float64(10), int64(3)
memory usage: 270.9 MB


Hay un total de 13 columnas y ninguna tiene registros faltantes (missing values). Debido a esto, no nos tendremos que preocupar por realizar una imputación de datos. Pero hay muchos datos.

In [10]:
len(df.columns)

13

In [11]:
# Convertir a categorico
df['s'] = pd.Categorical(df['s'])
df['rep'] = pd.Categorical(df['rep'])
df['label'] = pd.Categorical(df['label'])

Se verifica que los cambios en el dataframe se hayan efectuado.

In [12]:
#Lista de variables categóricas
catCols = df.select_dtypes(include = ['object', 'category']).columns.tolist()
print(f"Variables categoricas: {catCols}")
numCols = df.select_dtypes(include = ['float64','int32','int64']).columns.tolist()
print(f"Variables categoricas: {numCols}")

Variables categoricas: ['s', 'rep', 'label']
Variables categoricas: ['emg_1', 'emg_2', 'emg_3', 'emg_4', 'emg_5', 'emg_6', 'emg_7', 'emg_8', 'emg_9', 'emg_10']


## EDA

In [None]:
profile = ProfileReport(df, title="Pandas Profiling Report")

In [None]:
profile.to_notebook_iframe()
# profile.to_widgets() # Bloqueo la maquina

Render widgets:   0%|          | 0/1 [00:00<?, ?it/s]

VBox(children=(Tab(children=(Tab(children=(GridBox(children=(VBox(children=(GridspecLayout(children=(HTML(valu…

### Almacenando el EDA

In [None]:
# Exportando html
profile.to_file("./html_report/report_EDA.html")

# As a JSON string
json_data = profile.to_json()

# As a file
profile.to_file("./json_report/report_EDA.json")

## Referencias

* https://github.com/chuawt/eda-starter
* https://www.kaggle.com/code/bextuychiev/my-6-part-powerful-eda-template
* https://community.ibm.com/community/user/ai-datascience/blogs/shivam-solanki1/2020/02/19/eda-exploratory-data-analysis-with-example-in-jupy
* https://github.com/Saba-Gul/Exploratory-Data-Analysis-and-Statistical-Analysis-Notebooks
* https://www.datacamp.com/es/tutorial/pandas-profiling-ydata-profiling-in-python-guide
* https://docs.profiling.ydata.ai/latest/
* https://github.com/Saba-Gul/Exploratory-Data-Analysis-and-Statistical-Analysis-Notebooks/blob/main/Statistics_for_ML.ipynb
* https://github.com/Saba-Gul/Exploratory-Data-Analysis-and-Statistical-Analysis-Notebooks/blob/main/Online_Ed_Adaptability.ipynb
* https://github.com/Saba-Gul/Exploratory-Data-Analysis-and-Statistical-Analysis-Notebooks/blob/main/Heart_Failure_Survival_Classification.ipynb
* https://github.com/akueisara/audio-signal-processing/blob/master/week%204/A4/A4Part2.py
* https://docs.profiling.ydata.ai/latest/