# Feature Removal Process

This code implements a systematic approach to remove specific variables from our dataset. Here's what the process entails:

## Function Overview
The `remove_specified_variables()` function performs the following operations:

1. **Path Configuration**
    - Input: `FeaturedDataset_Corrected.csv`
    - Output: Creates a new file `DatasetRemoval.csv`

2. **Variables Targeted for Removal**
    - Age-related variables: `b04_idade`, `bb04_idade_da_mae`
    - Time measurements: `k12_tempo`, `k13_tempo_medida`
    - Specific measurements: `k18_somente`, `k19_somente_medida`
    - Height measurements: `t06_altura_medida2`, `t05_altura_medida1`

3. **Process Steps**
    - Loads the original dataset
    - Identifies existing variables from the removal list
    - Removes the specified variables
    - Saves the cleaned dataset to a new file

4. **Output Information**
    - Displays the dimensions of the dataset before and after cleaning
    - Confirms the save location
    - Returns the cleaned DataFrame

## Error Handling
Includes comprehensive error handling to manage potential issues during execution.

In [1]:
import pandas as pd
import os

def remove_specified_variables():
    """
    Remove variáveis específicas do dataset
    """
    
    # Configuração de caminhos
    input_path = '/Users/marcelosilva/Desktop/projectOne/4/B-Intern Feature Engeneering/FeaturedDataset_Corrected.csv'
    output_dir = '/Users/marcelosilva/Desktop/projectOne/4/C-Feature Removal'
    output_path = os.path.join(output_dir, 'DatasetRemoval.csv')
    
    # Criar diretório se não existir
    os.makedirs(output_dir, exist_ok=True)
    
    try:
        # Carregar dataset
        df = pd.read_csv(input_path)
        print(f"Dataset carregado: {df.shape[0]} linhas × {df.shape[1]} colunas")
        
        # Lista de variáveis para remover
        variables_to_remove = [
            'b04_idade',
            'bb04_idade_da_mae',
            'k12_tempo',
            'k13_tempo_medida',
            'k18_somente',
            'k19_somente_medida',
            't06_altura_medida2',
            't05_altura_medida1'
        ]
        
        # Verificar quais existem
        variables_found = [var for var in variables_to_remove if var in df.columns]
        
        print(f"Removendo {len(variables_found)} variáveis...")
        
        # Remover variáveis
        df_clean = df.drop(columns=variables_found)
        
        # Salvar
        df_clean.to_csv(output_path, index=False)
        
        print(f"Dataset salvo: {df_clean.shape[0]} linhas × {df_clean.shape[1]} colunas")
        print(f"Local: {output_path}")
        
        return df_clean
        
    except Exception as e:
        print(f"Erro: {e}")
        return None

if __name__ == "__main__":
    remove_specified_variables()

Dataset carregado: 4287 linhas × 46 colunas
Removendo 8 variáveis...
Dataset salvo: 4287 linhas × 38 colunas
Local: /Users/marcelosilva/Desktop/projectOne/4/C-Feature Removal/DatasetRemoval.csv
