# Data Type Conversion Analysis

The dataset preprocessing involved converting data types, specifically focusing on columns that were originally of type `float64`. Additionally, the following encoded categorical variables were removed as they were decoded in the dataset:

- `e18_oferecida`: How food was offered to the child
- `e21a_pao`: Type of bread consumed

The following columns were converted from `float64` to `int64`:

- `e04_agua_com_acucar`: Sugar water consumption
- `e13_fruta_vezes`: Fruit consumption frequency 
- `e17_sal_vezes`: Frequency of salty food consumption
- `e181_pedacos`: Food offered in pieces
- `e182_amassada`: Food offered mashed
- `e183_peneira`: Food passed through a sieve
- `e184_liquidificada`: Liquefied food
- `e185_caldo`: Only food broth
- `e189_nao_sabe`: Doesn't know how food was offered

All these variables were converted to `int64`, optimizing storage and processing since they represent discrete measurements.

In [1]:
import pandas as pd

# Carregando o dataset
df = pd.read_csv('/Users/marcelosilva/Desktop/clustering(0-4)/3-E-Aval/DSBIV.CSV')

# Removendo as variáveis codificadas originais
df_final = df.drop(columns=['e18_oferecida', 'e21a_pao'])

# Convertendo TODAS as colunas float64 para int64 (mantém id_anon como int64)
float_columns = df_final.select_dtypes(include=['float64']).columns
for col in float_columns:
    df_final[col] = df_final[col].astype('int64')

# Salvando o dataset final
df_final.to_csv('/Users/marcelosilva/Desktop/clustering(0-4)/3-E-Aval/DSFinal.csv', index=False)

print(f"✅ Dataset final criado!")
print(f"   Linhas: {df_final.shape[0]:,}")
print(f"   Colunas: {df_final.shape[1]}")
print(f"   Tipos de dados: {df_final.dtypes.value_counts().to_dict()}")
print(f"   Float convertidas para int64: {list(float_columns)}")


✅ Dataset final criado!
   Linhas: 14,558
   Colunas: 47
   Tipos de dados: {dtype('int64'): 47}
   Float convertidas para int64: ['e04_agua_com_acucar', 'e13_fruta_vezes', 'e17_sal_vezes', 'e181_pedacos', 'e182_amassada', 'e183_peneira', 'e184_liquidificada', 'e185_caldo', 'e189_nao_sabe']
