Carga del archivo JSON 
‚Ä¢ Cargue el archivo pokemonDB_dataset.json usando la 
librer√≠a json o pandas. 
‚Ä¢ Explore su estructura inicial (keys(), items()) para 
entender su jerarqu√≠a. 

In [1]:
import pandas as pd
import json

# ==========================================
# 1. CARGA DEL ARCHIVO JSON
# ==========================================

# Cargar el archivo usando la librer√≠a json
with open('pokemonDB_dataset.json', 'r', encoding='utf-8') as file:
    data = json.load(file)

# ==========================================
# 2. EXPLORAR ESTRUCTURA CON keys() e items()
# ==========================================

# Ver el tipo de estructura ra√≠z (lista o diccionario)
print("Tipo de estructura:", type(data))

# Si es diccionario, explorar sus keys
if isinstance(data, dict):
    print("\nN√∫mero de Pok√©mon:", len(data))
    print("\nPrimeras 5 keys:", list(data.keys())[:5])
    
    # Obtener el primer Pok√©mon para explorar
    primer_key = list(data.keys())[0]
    primer_pokemon = data[primer_key]
    
# Si es lista, obtener el primer elemento
elif isinstance(data, list):
    print("\nN√∫mero de Pok√©mon:", len(data))
    primer_pokemon = data[0]

# Explorar la jerarqu√≠a del primer Pok√©mon usando items()
print(f"\nCampos del Pok√©mon (keys):")
print(list(primer_pokemon.keys()))

print(f"\nEstructura detallada (items):")
for key, value in primer_pokemon.items():
    # Mostrar cada campo con su tipo para entender la jerarqu√≠a
    print(f"  {key}: {type(value).__name__} = {value}")

# ==========================================
# 3. TRANSFORMAR A DATAFRAME
# ==========================================

# Convertir la estructura jer√°rquica a DataFrame
if isinstance(data, dict):
    df = pd.DataFrame.from_dict(data, orient='index')
else:
    df = pd.DataFrame(data)

print("\n" + "="*50)
print("DataFrame creado:")
print(f"Forma: {df.shape} (filas x columnas)")
print("\nPrimeras filas:")
print(df.head())

Tipo de estructura: <class 'dict'>

N√∫mero de Pok√©mon: 1215

Primeras 5 keys: ['Abomasnow', 'Mega Abomasnow', 'Abra', 'Absol', 'Mega Absol']

Campos del Pok√©mon (keys):
['Type', 'Species', 'Height', 'Weight', 'Abilities', 'EV Yield', 'Catch Rate', 'Base Friendship', 'Base Exp', 'Growth Rate', 'Egg Groups', 'Gender', 'Egg Cycles', 'HP Base', 'HP Min', 'HP Max', 'Attack Base', 'Attack Min', 'Attack Max', 'Defense Base', 'Defense Min', 'Defense Max', 'Special Attack Base', 'Special Attack Min', 'Special Attack Max', 'Special Defense Base', 'Special Defense Min', 'Special Defense Max', 'Speed Base', 'Speed Min', 'Speed Max']

Estructura detallada (items):
  Type: str = Grass, Ice
  Species: str = Frost Tree Pok√©mon
  Height: str = 2.2 m (7‚Ä≤03‚Ä≥)
  Weight: str = 135.5 kg (298.7 lbs)
  EV Yield: str = 1 Attack, 1 Sp. Atk
  Catch Rate: str = 60 (7.8% with Pok√©Ball, full HP)
  Base Friendship: str = 50 (normal)
  Base Exp: str = 173
  Growth Rate: str = Slow
  Egg Groups: str = Grass, 

In [4]:
# ==========================================
# 2. REVISI√ìN DE ESTRUCTURA Y TIPOS
# ==========================================

print("="*60)
print("REVISI√ìN DE ESTRUCTURA Y TIPOS DE DATOS")
print("="*60)

# Usar df.info() para identificar tipos de datos
print("\nüìä df.info() - Informaci√≥n general del DataFrame:\n")
df.info()

# Usar df.describe(include='all') para estad√≠sticas
print("\n" + "="*60)
print("\nüìà df.describe(include='all') - Estad√≠sticas descriptivas:\n")
print(df.describe(include='all'))

# Determinar cu√°ntas variables son num√©ricas y cu√°ntas categ√≥ricas
print("\n" + "="*60)
print("CLASIFICACI√ìN DE VARIABLES")
print("="*60)

# Contar variables num√©ricas
num_numericas = df.select_dtypes(include=['int64', 'float64']).shape[1]
columnas_numericas = df.select_dtypes(include=['int64', 'float64']).columns.tolist()

# Contar variables categ√≥ricas (object)
num_categoricas = df.select_dtypes(include=['object']).shape[1]
columnas_categoricas = df.select_dtypes(include=['object']).columns.tolist()

# Mostrar resultados
print(f"\n‚úÖ Total de variables NUM√âRICAS: {num_numericas}")
print("   Columnas:")
for col in columnas_numericas:
    print(f"   ‚Ä¢ {col}")

print(f"\n‚úÖ Total de variables CATEG√ìRICAS: {num_categoricas}")
print("   Columnas:")
for col in columnas_categoricas:
    print(f"   ‚Ä¢ {col}")

# Resumen final
print("\n" + "="*60)
print(f"RESUMEN: {num_numericas} num√©ricas + {num_categoricas} categ√≥ricas = {df.shape[1]} variables totales")
print("="*60)

REVISI√ìN DE ESTRUCTURA Y TIPOS DE DATOS

üìä df.info() - Informaci√≥n general del DataFrame:

<class 'pandas.core.frame.DataFrame'>
Index: 1215 entries, Abomasnow to Zygarde Complete Forme
Data columns (total 31 columns):
 #   Column                Non-Null Count  Dtype 
---  ------                --------------  ----- 
 0   Type                  1215 non-null   object
 1   Species               1215 non-null   object
 2   Height                1215 non-null   object
 3   Weight                1215 non-null   object
 4   Abilities             1215 non-null   object
 5   EV Yield              1215 non-null   object
 6   Catch Rate            1215 non-null   object
 7   Base Friendship       1215 non-null   object
 8   Base Exp              1215 non-null   object
 9   Growth Rate           1215 non-null   object
 10  Egg Groups            1215 non-null   object
 11  Gender                1215 non-null   object
 12  Egg Cycles            1215 non-null   object
 13  HP Base              

Esto es para ver todas las columnas y decidir cu√°les necesitan convertirse a valores numericos 

In [5]:
# Ver todas las columnas
print("Columnas del DataFrame:")
print(df.columns.tolist())


Columnas del DataFrame:
['Type', 'Species', 'Height', 'Weight', 'Abilities', 'EV Yield', 'Catch Rate', 'Base Friendship', 'Base Exp', 'Growth Rate', 'Egg Groups', 'Gender', 'Egg Cycles', 'HP Base', 'HP Min', 'HP Max', 'Attack Base', 'Attack Min', 'Attack Max', 'Defense Base', 'Defense Min', 'Defense Max', 'Special Attack Base', 'Special Attack Min', 'Special Attack Max', 'Special Defense Base', 'Special Defense Min', 'Special Defense Max', 'Speed Base', 'Speed Min', 'Speed Max']


In [6]:
# Ver columnas y sus tipos
print("Columnas y tipos de datos:")
print(df.dtypes)


Columnas y tipos de datos:
Type                    object
Species                 object
Height                  object
Weight                  object
Abilities               object
EV Yield                object
Catch Rate              object
Base Friendship         object
Base Exp                object
Growth Rate             object
Egg Groups              object
Gender                  object
Egg Cycles              object
HP Base                 object
HP Min                  object
HP Max                  object
Attack Base             object
Attack Min              object
Attack Max              object
Defense Base            object
Defense Min             object
Defense Max             object
Special Attack Base     object
Special Attack Min      object
Special Attack Max      object
Special Defense Base    object
Special Defense Min     object
Special Defense Max     object
Speed Base              object
Speed Min               object
Speed Max               object
dtype: objec

In [8]:
# ==========================================
# 3. LIMPIEZA Y CONVERSI√ìN DE CAMPOS
# ==========================================

import re

print("="*60)
print("LIMPIEZA Y CONVERSI√ìN DE CAMPOS")
print("="*60)

# Crear copia del DataFrame
df_clean = df.copy()

# ==========================================
# PASO 1: Convertir columnas num√©ricas
# ==========================================

print("\nüîß Convirtiendo columnas a tipo float...\n")

# Funci√≥n para limpiar valores num√©ricos
def limpiar_numerico(valor):
    """Elimina unidades (m, kg, lbs, %) y convierte a float"""
    if pd.isna(valor):
        return None
    # Extraer solo n√∫meros y puntos decimales
    numeros = re.findall(r'\d+\.?\d*', str(valor))
    if numeros:
        return float(numeros[0])
    return None

# Identificar columnas que deben ser num√©ricas
palabras_clave = ['hp', 'attack', 'defense', 'speed', 'sp', 'height', 'weight', 'base', 'exp']
columnas_numericas = [col for col in df_clean.columns 
                      if any(palabra in col.lower() for palabra in palabras_clave)]

# Convertir cada columna
for col in columnas_numericas:
    df_clean[col] = df_clean[col].apply(limpiar_numerico)
    print(f"   ‚úì {col} convertida a float")

# Renombrar Height y Weight
if 'Height' in df_clean.columns:
    df_clean.rename(columns={'Height': 'Height_m'}, inplace=True)
    print(f"   ‚úì 'Height' renombrada a 'Height_m'")

if 'Weight' in df_clean.columns:
    df_clean.rename(columns={'Weight': 'Weight_kg'}, inplace=True)
    print(f"   ‚úì 'Weight' renombrada a 'Weight_kg'")

# ==========================================
# PASO 2: Separar columnas compuestas
# ==========================================

print("\nüîß Separando columnas compuestas...\n")

# Separar Type ‚Üí Type1, Type2
if 'Type' in df_clean.columns:
    df_clean['Type1'] = df_clean['Type'].apply(
        lambda x: str(x).split(',')[0].strip() if pd.notna(x) else None
    )
    df_clean['Type2'] = df_clean['Type'].apply(
        lambda x: str(x).split(',')[1].strip() if pd.notna(x) and ',' in str(x) else None
    )
    df_clean.drop('Type', axis=1, inplace=True)
    print("   ‚úì 'Type' separada en Type1 y Type2")

# Separar Gender ‚Üí Male (%), Female (%)
if 'Gender' in df_clean.columns:
    df_clean['Male (%)'] = df_clean['Gender'].apply(
        lambda x: limpiar_numerico(str(x).split(',')[0]) if pd.notna(x) and ',' in str(x) else None
    )
    df_clean['Female (%)'] = df_clean['Gender'].apply(
        lambda x: limpiar_numerico(str(x).split(',')[1]) if pd.notna(x) and ',' in str(x) else None
    )
    df_clean.drop('Gender', axis=1, inplace=True)
    print("   ‚úì 'Gender' separada en Male (%) y Female (%)")

# ==========================================
# PASO 3: Verificaci√≥n final
# ==========================================

print("\n" + "="*60)
print("VERIFICACI√ìN FINAL")
print("="*60)

# Contar valores nulos
valores_nulos = df_clean.isnull().sum().sum()
print(f"\nüîç Total de valores nulos: {valores_nulos}")

# Contar variables num√©ricas y categ√≥ricas
num_numericas = df_clean.select_dtypes(include=['int64', 'float64']).shape[1]
num_categoricas = df_clean.select_dtypes(include=['object']).shape[1]

print(f"\nüìä Variables despu√©s de la limpieza:")
print(f"   ‚Ä¢ Num√©ricas: {num_numericas}")
print(f"   ‚Ä¢ Categ√≥ricas: {num_categoricas}")
print(f"   ‚Ä¢ Total: {df_clean.shape[1]}")

print(f"\n‚úÖ DataFrame limpio creado: df_clean")
print(f"   Forma: {df_clean.shape} (filas x columnas)")

# Mostrar primeras filas
print("\nüìã Primeras 3 filas del DataFrame limpio:")
print(df_clean.head(3))

LIMPIEZA Y CONVERSI√ìN DE CAMPOS

üîß Convirtiendo columnas a tipo float...

   ‚úì Species convertida a float
   ‚úì Height convertida a float
   ‚úì Weight convertida a float
   ‚úì Base Friendship convertida a float
   ‚úì Base Exp convertida a float
   ‚úì HP Base convertida a float
   ‚úì HP Min convertida a float
   ‚úì HP Max convertida a float
   ‚úì Attack Base convertida a float
   ‚úì Attack Min convertida a float
   ‚úì Attack Max convertida a float
   ‚úì Defense Base convertida a float
   ‚úì Defense Min convertida a float
   ‚úì Defense Max convertida a float
   ‚úì Special Attack Base convertida a float
   ‚úì Special Attack Min convertida a float
   ‚úì Special Attack Max convertida a float
   ‚úì Special Defense Base convertida a float
   ‚úì Special Defense Min convertida a float
   ‚úì Special Defense Max convertida a float
   ‚úì Speed Base convertida a float
   ‚úì Speed Min convertida a float
   ‚úì Speed Max convertida a float
   ‚úì 'Height' renombrada a 'Heig

In [9]:
# ==========================================
# 4. NORMALIZACI√ìN Y VERIFICACI√ìN FINAL
# ==========================================

print("="*60)
print("NORMALIZACI√ìN Y VERIFICACI√ìN FINAL")
print("="*60)

# ==========================================
# PASO 1: Verificar valores nulos y duplicados
# ==========================================

print("\nüîç Verificando valores nulos...\n")

# Contar valores nulos por columna
valores_nulos = df_clean.isnull().sum()

# Mostrar solo columnas con valores nulos
if valores_nulos.sum() > 0:
    print("Columnas con valores nulos:")
    for col, nulos in valores_nulos[valores_nulos > 0].items():
        print(f"   ‚Ä¢ {col}: {nulos} nulos")
    print(f"\n   Total de valores nulos: {valores_nulos.sum()}")
else:
    print("   ‚úÖ No hay valores nulos")

# Verificar duplicados
print("\nüîç Verificando filas duplicadas...\n")
duplicados = df_clean.duplicated().sum()
print(f"   Filas duplicadas: {duplicados}")

if duplicados > 0:
    print(f"   ‚ö†Ô∏è Se encontraron {duplicados} filas duplicadas")
else:
    print("   ‚úÖ No hay filas duplicadas")

# ==========================================
# PASO 2: Crear DataFrame final con campos espec√≠ficos
# ==========================================

print("\n" + "="*60)
print("CREANDO DATAFRAME FINAL")
print("="*60)

# Campos requeridos seg√∫n el enunciado
campos_requeridos = [
    "Type1", "Type2", "HP Base", "Attack Base", "Defense Base",
    "Speed Base", "Height_m", "Weight_kg", "Base Exp"
]

# Verificar qu√© campos existen
campos_disponibles = [col for col in campos_requeridos if col in df_clean.columns]
campos_faltantes = [col for col in campos_requeridos if col not in df_clean.columns]

print(f"\nüìã Campos disponibles: {len(campos_disponibles)}/{len(campos_requeridos)}")

if campos_faltantes:
    print(f"\n‚ö†Ô∏è Campos faltantes: {campos_faltantes}")

# Crear df_clean con solo los campos requeridos
df_clean = df_clean[campos_disponibles]

print(f"\n‚úÖ DataFrame final creado con {len(campos_disponibles)} campos")
print(f"   Forma: {df_clean.shape} (filas x columnas)")

# ==========================================
# PASO 3: Verificaci√≥n final del DataFrame
# ==========================================

print("\n" + "="*60)
print("INFORMACI√ìN DEL DATAFRAME FINAL")
print("="*60)

# Mostrar informaci√≥n general
print("\nüìä df_clean.info():\n")
df_clean.info()

# Mostrar estad√≠sticas descriptivas
print("\nüìà df_clean.describe():\n")
print(df_clean.describe())

# Mostrar primeras filas
print("\nüìã Primeras 5 filas del DataFrame final:\n")
print(df_clean.head())

# Resumen final
print("\n" + "="*60)
print("RESUMEN FINAL")
print("="*60)

num_numericas_final = df_clean.select_dtypes(include=['int64', 'float64']).shape[1]
num_categoricas_final = df_clean.select_dtypes(include=['object']).shape[1]

print(f"\n‚úÖ DataFrame df_clean listo para an√°lisis")
print(f"   ‚Ä¢ Total de filas (Pok√©mon): {df_clean.shape[0]}")
print(f"   ‚Ä¢ Total de columnas: {df_clean.shape[1]}")
print(f"   ‚Ä¢ Variables num√©ricas: {num_numericas_final}")
print(f"   ‚Ä¢ Variables categ√≥ricas: {num_categoricas_final}")
print(f"   ‚Ä¢ Valores nulos: {df_clean.isnull().sum().sum()}")
print(f"\nüìÅ Columnas finales: {list(df_clean.columns)}")

print("\n" + "="*60)
print("‚úÖ ETAPA 1 COMPLETADA")
print("="*60)

NORMALIZACI√ìN Y VERIFICACI√ìN FINAL

üîç Verificando valores nulos...

Columnas con valores nulos:
   ‚Ä¢ Species: 1215 nulos
   ‚Ä¢ Weight_kg: 1 nulos
   ‚Ä¢ Base Friendship: 23 nulos
   ‚Ä¢ Base Exp: 23 nulos
   ‚Ä¢ Type2: 546 nulos
   ‚Ä¢ Male (%): 197 nulos
   ‚Ä¢ Female (%): 197 nulos

   Total de valores nulos: 2202

üîç Verificando filas duplicadas...

   Filas duplicadas: 7
   ‚ö†Ô∏è Se encontraron 7 filas duplicadas

CREANDO DATAFRAME FINAL

üìã Campos disponibles: 9/9

‚úÖ DataFrame final creado con 9 campos
   Forma: (1215, 9) (filas x columnas)

INFORMACI√ìN DEL DATAFRAME FINAL

üìä df_clean.info():

<class 'pandas.core.frame.DataFrame'>
Index: 1215 entries, Abomasnow to Zygarde Complete Forme
Data columns (total 9 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Type1         1215 non-null   object 
 1   Type2         669 non-null    object 
 2   HP Base       1215 non-null   float64
 3   Attack Base   1215 non-null 