# üìä 02 - Genera Reportes Sweetviz (Din√°mico)

**TFM: Predicci√≥n de Abandono Universitario**

Este notebook genera autom√°ticamente:
- Reportes Sweetviz para CADA tabla que encuentre en data/02_interim/
- HTML de transformaciones DIN√ÅMICO (se adapta a las tablas que haya)

**Autora:** Mar√≠a Jos√© Morte (morte@uji.es)

## 1. Configuraci√≥n Inicial

In [1]:
# =============================================================================
# CONFIGURACI√ìN INICIAL
# =============================================================================

import pandas as pd
import numpy as np
from pathlib import Path
import warnings
warnings.filterwarnings('ignore')

# Configurar rutas
import sys
sys.path.append('../src')

try:
    from config import DATA_INTERIM, DOCS, info_entorno
    info_entorno()
except ImportError:
    DATA_INTERIM = Path('../data/02_interim')
    DOCS = Path('../docs')
    print(f"Usando rutas por defecto")

# Crear carpeta docs si no existe
DOCS.mkdir(parents=True, exist_ok=True)

print(f"\n‚úÖ Configuraci√≥n cargada")

INFORMACI√ìN DEL ENTORNO
Entorno: local
Ra√≠z proyecto: C:\Users\mjmor\0.-TFM\TFM_abandono_fase1_
Data RAW: C:\Users\mjmor\0.-TFM\TFM_abandono_fase1_\data\01_raw
Data INTERIM: C:\Users\mjmor\0.-TFM\TFM_abandono_fase1_\data\02_interim
Data PROCESSED: C:\Users\mjmor\0.-TFM\TFM_abandono_fase1_\data\03_processed
Docs: C:\Users\mjmor\0.-TFM\TFM_abandono_fase1_\docs

‚úÖ Configuraci√≥n cargada


## 2. Instalar/Parchear Sweetviz

In [2]:
# =============================================================================
# INSTALAR Y PARCHEAR SWEETVIZ
# =============================================================================

# Parchear numpy para compatibilidad con sweetviz
import numpy as np
if not hasattr(np, 'VisibleDeprecationWarning'):
    np.VisibleDeprecationWarning = np.exceptions.VisibleDeprecationWarning

try:
    import sweetviz as sv
    print(f"‚úÖ sweetviz ya instalado (version {sv.__version__})")
except ImportError:
    print("Instalando sweetviz...")
    !pip install sweetviz -q
    import sweetviz as sv
    print(f"‚úÖ sweetviz instalado (version {sv.__version__})")

‚úÖ sweetviz ya instalado (version 2.3.1)


## 3. Detectar Tablas Disponibles

In [3]:
# =============================================================================
# DETECTAR TABLAS DISPONIBLES
# =============================================================================

print("="*60)
print("DETECTANDO TABLAS EN data/02_interim/")
print("="*60)

# Encontrar todos los .parquet
ficheros_parquet = sorted(DATA_INTERIM.glob('*.parquet'))

# Crear diccionario con info de cada tabla
TABLAS_INFO = {}

for f in ficheros_parquet:
    nombre = f.stem.replace('_limpio', '')
    df = pd.read_parquet(f)
    
    TABLAS_INFO[nombre] = {
        'fichero': f.name,
        'registros': len(df),
        'columnas': len(df.columns),
        'columnas_lista': list(df.columns),
        'tipos': df.dtypes.to_dict(),
        'nulos': df.isnull().sum().to_dict()
    }
    
    print(f"  {nombre}: {len(df):,} registros, {len(df.columns)} columnas")

print(f"\nüìä Total tablas encontradas: {len(TABLAS_INFO)}")

DETECTANDO TABLAS EN data/02_interim/
  becas: 70,524 registros, 3 columnas
  domicilios: 109,206 registros, 6 columnas
  expedientes: 109,575 registros, 14 columnas
  nac_sexo: 30,873 registros, 4 columnas
  notas: 107,908 registros, 4 columnas
  preinscripcion: 210,996 registros, 10 columnas
  recibos: 114,447 registros, 4 columnas
  titulaciones: 45 registros, 4 columnas
  trabajo: 195,524 registros, 3 columnas

üìä Total tablas encontradas: 9


## 4. Generar Reportes Sweetviz

In [4]:
# =============================================================================
# GENERAR REPORTES SWEETVIZ PARA CADA TABLA
# =============================================================================

print("="*60)
print("GENERANDO REPORTES SWEETVIZ")
print("="*60)

total = len(TABLAS_INFO)

for i, (nombre, info) in enumerate(TABLAS_INFO.items(), 1):
    # Barra de progreso
    progreso = "=" * (i * 30 // total)
    espacios = " " * (30 - len(progreso))
    print(f"\r[{progreso}{espacios}] {i}/{total} - {nombre}...", end="", flush=True)
    
    # Cargar datos
    df = pd.read_parquet(DATA_INTERIM / info['fichero'])
    
    # Generar reporte
    reporte = sv.analyze(df, pairwise_analysis='off')
    
    # Guardar HTML
    html_path = DOCS / f'reporte_{nombre}.html'
    reporte.show_html(str(html_path), open_browser=False)

print(f"\r[{'='*30}] {total}/{total} - COMPLETADO!")

print("\nüìÅ Reportes generados:")
for f in sorted(DOCS.glob('reporte_*.html')):
    print(f"  {f.name}")

GENERANDO REPORTES SWEETVIZ
[===                           ] 1/9 - becas...

                                             |          | [  0%]   00:00 -> (? left)

Report C:\Users\mjmor\0.-TFM\TFM_abandono_fase1_\docs\reporte_becas.html was generated.

                                             |          | [  0%]   00:00 -> (? left)

Report C:\Users\mjmor\0.-TFM\TFM_abandono_fase1_\docs\reporte_domicilios.html was generated.

                                             |          | [  0%]   00:00 -> (? left)

Report C:\Users\mjmor\0.-TFM\TFM_abandono_fase1_\docs\reporte_expedientes.html was generated.

                                             |          | [  0%]   00:00 -> (? left)

Report C:\Users\mjmor\0.-TFM\TFM_abandono_fase1_\docs\reporte_nac_sexo.html was generated.

                                             |          | [  0%]   00:00 -> (? left)

Report C:\Users\mjmor\0.-TFM\TFM_abandono_fase1_\docs\reporte_notas.html was generated.

                                             |          | [  0%]   00:00 -> (? left)

Report C:\Users\mjmor\0.-TFM\TFM_abandono_fase1_\docs\reporte_preinscripcion.html was generated.

                                             |          | [  0%]   00:00 -> (? left)

Report C:\Users\mjmor\0.-TFM\TFM_abandono_fase1_\docs\reporte_recibos.html was generated.

                                             |          | [  0%]   00:00 -> (? left)

Report C:\Users\mjmor\0.-TFM\TFM_abandono_fase1_\docs\reporte_titulaciones.html was generated.

                                             |          | [  0%]   00:00 -> (? left)

Report C:\Users\mjmor\0.-TFM\TFM_abandono_fase1_\docs\reporte_trabajo.html was generated.

üìÅ Reportes generados:
  reporte_becas.html
  reporte_domicilios.html
  reporte_expedientes.html
  reporte_nac_sexo.html
  reporte_notas.html
  reporte_preinscripcion.html
  reporte_recibos.html
  reporte_titulaciones.html
  reporte_trabajo.html


## 5. Generar HTML Transformaciones (Din√°mico)

In [5]:
# =============================================================================
# GENERAR HTML TRANSFORMACIONES DIN√ÅMICO
# =============================================================================

print("="*60)
print("GENERANDO transformaciones_dinamico.html")
print("="*60)

# Colores para cada tabla (se repiten si hay m√°s de 9)
COLORES = ['blue', 'green', 'purple', 'pink', 'orange', 'teal', 'indigo', 'cyan', 'rose']

# Calcular totales
total_tablas = len(TABLAS_INFO)
total_columnas = sum(info['columnas'] for info in TABLAS_INFO.values())

# Generar tarjetas de tablas din√°micamente
tarjetas_html = ""
modales_html = ""
filas_resumen_html = ""

for i, (nombre, info) in enumerate(TABLAS_INFO.items()):
    color = COLORES[i % len(COLORES)]
    
    # Tarjeta
    tarjetas_html += f'''
                <div class="tabla-card bg-white rounded-xl shadow-md p-4 border-l-4 border-{color}-500" onclick="openModal('{nombre}')">
                    <h3 class="font-bold text-{color}-700">üìã {nombre.title()}</h3>
                    <div class="text-sm text-gray-500 mt-1">{info['registros']:,} registros | {info['columnas']} columnas</div>
                    <div class="text-xs text-gray-400 mt-1">Clic para ver detalle</div>
                </div>'''
    
    # Modal con columnas
    columnas_html = ""
    for j, col in enumerate(info['columnas_lista']):
        tipo = str(info['tipos'][col])
        nulos = info['nulos'][col]
        pct_nulos = (nulos / info['registros'] * 100) if info['registros'] > 0 else 0
        bg_class = 'bg-gray-50' if j % 2 == 1 else ''
        columnas_html += f'''<tr class="border-b {bg_class}"><td class="px-3 py-2">{col}</td><td class="px-3 py-2">{tipo}</td><td class="px-3 py-2 text-right">{pct_nulos:.1f}%</td></tr>\n'''
    
    modales_html += f'''
    <div id="modal-{nombre}" class="modal" onclick="closeModal('{nombre}')">
        <div class="modal-content" onclick="event.stopPropagation()">
            <div class="flex justify-between items-center mb-4">
                <h2 class="text-2xl font-bold text-{color}-700">üìã {nombre.title()}</h2>
                <button onclick="closeModal('{nombre}')" class="text-gray-500 hover:text-gray-700 text-2xl">&times;</button>
            </div>
            <p class="mb-4">
                <a href="reporte_{nombre}.html" target="_blank" class="bg-blue-500 hover:bg-blue-600 text-white px-4 py-2 rounded-lg text-sm">üìä Ver Reporte Sweetviz</a>
            </p>
            <div class="bg-gray-50 p-3 rounded-lg mb-4">
                <strong>Registros:</strong> {info['registros']:,} | <strong>Columnas:</strong> {info['columnas']}
            </div>
            <table class="w-full text-sm">
                <thead class="bg-{color}-100"><tr><th class="px-3 py-2 text-left">Columna</th><th class="px-3 py-2 text-left">Tipo</th><th class="px-3 py-2 text-right">% Nulos</th></tr></thead>
                <tbody>
                    {columnas_html}
                </tbody>
            </table>
        </div>
    </div>'''
    
    # Fila tabla resumen
    bg_class = 'bg-gray-50' if i % 2 == 1 else ''
    filas_resumen_html += f'''<tr class="border-b {bg_class}"><td class="px-4 py-2"><a href="reporte_{nombre}.html" class="enlace-reporte">üìä {nombre.title()}</a></td><td class="text-center">{info['registros']:,}</td><td class="text-center text-green-600 font-bold">{info['columnas']}</td></tr>\n'''

# HTML completo
html_completo = f'''<!DOCTYPE html>
<html lang="es">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>TFM Abandono - Transformaciones de Datos (Din√°mico)</title>
    <script src="https://cdn.tailwindcss.com"></script>
    <style>
        .tabla-card {{ transition: all 0.3s ease; cursor: pointer; }}
        .tabla-card:hover {{ transform: translateY(-5px); box-shadow: 0 10px 25px rgba(0,0,0,0.15); }}
        .modal {{ display: none; position: fixed; top: 0; left: 0; width: 100%; height: 100%; background: rgba(0,0,0,0.5); z-index: 1000; }}
        .modal.active {{ display: flex; justify-content: center; align-items: center; }}
        .modal-content {{ background: white; border-radius: 12px; max-width: 900px; max-height: 80vh; overflow-y: auto; padding: 24px; margin: 20px; }}
        .arrow {{ animation: pulse 2s infinite; }}
        @keyframes pulse {{ 0%, 100% {{ opacity: 1; }} 50% {{ opacity: 0.5; }} }}
        .enlace-reporte {{ color: #2563eb; text-decoration: underline; cursor: pointer; }}
        .enlace-reporte:hover {{ color: #1d4ed8; }}
        .df-alumno-link:hover {{ transform: scale(1.02); }}
    </style>
</head>
<body class="bg-gray-100 min-h-screen">
    <header class="bg-gradient-to-r from-blue-600 to-purple-600 text-white py-8">
        <div class="container mx-auto px-4">
            <h1 class="text-4xl font-bold mb-2">üéì TFM: Prediccion de Abandono Universitario</h1>
            <p class="text-xl opacity-90">Diagrama de Transformaciones de Datos - Fase 1 (Din√°mico)</p>
            <p class="text-sm opacity-75 mt-2">Autora: Maria Jose Morte (morte@uji.es) | GitHub: mortemj</p>
        </div>
    </header>

    <main class="container mx-auto px-4 py-8">
        <!-- Resumen -->
        <section class="bg-white rounded-xl shadow-lg p-6 mb-8">
            <h2 class="text-2xl font-bold text-gray-800 mb-4">üìä Resumen del Proceso</h2>
            <div class="grid grid-cols-1 md:grid-cols-3 gap-4">
                <div class="bg-blue-50 rounded-lg p-4 text-center">
                    <div class="text-3xl font-bold text-blue-600">{total_tablas}</div>
                    <div class="text-gray-600">Tablas procesadas</div>
                </div>
                <div class="bg-green-50 rounded-lg p-4 text-center">
                    <div class="text-3xl font-bold text-green-600">{total_columnas}</div>
                    <div class="text-gray-600">Columnas totales</div>
                </div>
                <div class="bg-purple-50 rounded-lg p-4 text-center">
                    <div class="text-3xl font-bold text-purple-600">‚úÖ</div>
                    <div class="text-gray-600">Generado din√°micamente</div>
                </div>
            </div>
        </section>

        <!-- Diagrama de flujo -->
        <section class="mb-8">
            <h2 class="text-2xl font-bold text-gray-800 mb-4">üîÑ Flujo de Transformacion</h2>
            <p class="text-gray-600 mb-4">Haz clic en cada tabla para ver el detalle:</p>
            
            <div class="flex flex-wrap justify-center gap-4 mb-6">
                <div class="bg-yellow-100 border-2 border-yellow-400 rounded-xl p-4 text-center">
                    <div class="text-lg font-bold text-yellow-700">üìÅ Excel Originales</div>
                    <div class="text-sm text-yellow-600">data/01_raw/</div>
                </div>
            </div>

            <div class="text-center text-4xl text-gray-400 arrow mb-6">‚¨áÔ∏è</div>

            <div class="grid grid-cols-1 md:grid-cols-3 lg:grid-cols-3 gap-4 mb-6">
                {tarjetas_html}
            </div>

            <div class="text-center text-4xl text-gray-400 arrow mb-6">‚¨áÔ∏è</div>

            <!-- df_alumno (se a√±adir√° en notebook 03) -->
            <div class="flex justify-center">
                <div id="df-alumno-container" class="bg-gray-300 rounded-xl p-6 text-gray-600 text-center">
                    <div class="text-2xl font-bold">üéØ df_alumno.parquet</div>
                    <div class="text-lg opacity-90">Pendiente de generar</div>
                    <div class="text-sm opacity-75 mt-2">Ejecuta 03_union_dataset_dinamico.ipynb</div>
                </div>
            </div>
        </section>

        <!-- TABLA RESUMEN -->
        <section class="bg-white rounded-xl shadow-lg p-6">
            <h2 class="text-2xl font-bold text-gray-800 mb-4">üìã Resumen por Tabla</h2>
            <p class="text-gray-600 mb-4">Haz clic en el nombre para ver el <strong>Reporte Sweetviz</strong>:</p>
            <div class="overflow-x-auto">
                <table class="w-full text-sm">
                    <thead class="bg-gray-100">
                        <tr>
                            <th class="px-4 py-2 text-left">Tabla</th>
                            <th class="px-4 py-2 text-center">Registros</th>
                            <th class="px-4 py-2 text-center">Columnas</th>
                        </tr>
                    </thead>
                    <tbody>
                        {filas_resumen_html}
                    </tbody>
                </table>
            </div>
        </section>
    </main>

    <!-- MODALES -->
    {modales_html}

    <footer class="bg-gray-800 text-white py-6 mt-8">
        <div class="container mx-auto px-4 text-center">
            <p>TFM: Prediccion de Abandono Universitario</p>
            <p class="text-sm text-gray-400 mt-2">Maria Jose Morte | UJI | 2024</p>
            <p class="text-xs text-gray-500 mt-1">HTML generado dinamicamente desde Python</p>
        </div>
    </footer>

    <script>
        function openModal(tabla) {{ document.getElementById('modal-' + tabla).classList.add('active'); }}
        function closeModal(tabla) {{ document.getElementById('modal-' + tabla).classList.remove('active'); }}
        document.addEventListener('keydown', function(e) {{ if (e.key === 'Escape') {{ document.querySelectorAll('.modal').forEach(m => m.classList.remove('active')); }} }});
    </script>
</body>
</html>'''

# Guardar HTML
html_path = DOCS / 'transformaciones_dinamico.html'
with open(html_path, 'w', encoding='utf-8') as f:
    f.write(html_completo)

print(f"‚úÖ Guardado: {html_path.name}")
print(f"   Tablas incluidas: {total_tablas}")
print(f"   Modales generados: {total_tablas}")

GENERANDO transformaciones_dinamico.html
‚úÖ Guardado: transformaciones_dinamico.html
   Tablas incluidas: 9
   Modales generados: 9


## 6. Resumen Final

In [6]:
# =============================================================================
# RESUMEN FINAL
# =============================================================================

print("\n" + "="*60)
print("RESUMEN FINAL - REPORTES GENERADOS")
print("="*60)

print("\nüìÅ FICHEROS EN docs/:")
for f in sorted(DOCS.glob('*.html')):
    size_kb = f.stat().st_size / 1024
    print(f"  {f.name} ({size_kb:.1f} KB)")

print("\n" + "="*60)
print("‚úÖ NOTEBOOK 02 COMPLETADO")
print("Siguiente paso: Ejecutar 03_union_dataset_dinamico.ipynb")
print("="*60)


RESUMEN FINAL - REPORTES GENERADOS

üìÅ FICHEROS EN docs/:
  reporte_becas.html (591.1 KB)
  reporte_domicilios.html (683.0 KB)
  reporte_expedientes.html (1185.9 KB)
  reporte_nac_sexo.html (592.1 KB)
  reporte_notas.html (676.4 KB)
  reporte_preinscripcion.html (906.1 KB)
  reporte_recibos.html (568.7 KB)
  reporte_titulaciones.html (580.0 KB)
  reporte_trabajo.html (585.9 KB)
  transformaciones_dinamico.html (29.5 KB)

‚úÖ NOTEBOOK 02 COMPLETADO
Siguiente paso: Ejecutar 03_union_dataset_dinamico.ipynb
