# ⚽ AI Football Analyst - Setup & Configuration

**Fase 0: Setup Inicial - Community Edition Compatible**
- Instalar librerías necesarias
- Configurar rutas del workspace
- Validar Delta Lake
- Crear funciones de utilidad

---

## 1. Instalación de Librerías

In [0]:
# Instalar librerías (sin MLflow explícito, ya viene con Databricks)
%pip install plotly scikit-learn xgboost

[43mNote: you may need to restart the kernel using %restart_python or dbutils.library.restartPython() to use updated packages.[0m


## 2. Configuración de Rutas

In [0]:
import os
from pyspark.sql import DataFrame
from pyspark.sql.functions import col
from typing import List
import pandas as pd

# Usuario actual
user_email = spark.sql("SELECT current_user()").collect()[0][0]
print(f"👤 Usuario: {user_email}")

# Rutas base
BASE_PATH = f"/Workspace/Users/{user_email}/football_analyst"

print("\n" + "=" * 60)
print("📂 RUTAS DEL PROYECTO")
print("=" * 60)
print(f"Base: {BASE_PATH}")
print("=" * 60)

👤 Usuario: randryan308@gmail.com

📂 RUTAS DEL PROYECTO
Base: /Workspace/Users/randryan308@gmail.com/football_analyst


## 3. Delta Tables Planificadas

In [0]:
DELTA_TABLES = {
    "bronze": ["football_players_raw", "football_teams_raw", "football_matches_raw"],
    "silver": ["football_players_features", "football_teams_features", "football_matches_features"],
    "gold": ["football_predictions"]
}

print("=" * 60)
print("📊 DELTA TABLES PLANIFICADAS")
print("=" * 60)
for layer, tables in DELTA_TABLES.items():
    print(f"\n{layer.upper()}:")
    for table in tables:
        print(f"  - {table}")
print("=" * 60)

📊 DELTA TABLES PLANIFICADAS

BRONZE:
  - football_players_raw
  - football_teams_raw
  - football_matches_raw

SILVER:
  - football_players_features
  - football_teams_features
  - football_matches_features

GOLD:
  - football_predictions


## 4. Funciones de Utilidad

In [0]:
def save_to_delta(df: DataFrame, table_name: str, mode: str = "overwrite"):
    """Guarda DataFrame en Delta Table"""
    print(f"💾 Guardando: {table_name}")
    df.write.format("delta").mode(mode).saveAsTable(table_name)
    count = spark.table(table_name).count()
    print(f"✅ {count:,} registros")


def load_from_delta(table_name: str) -> DataFrame:
    """Carga DataFrame desde Delta Table"""
    df = spark.table(table_name)
    print(f"📖 {table_name}: {df.count():,} registros")
    return df


print("✅ Funciones listas")

✅ Funciones listas


## 5. Validar Delta Lake

In [0]:
print("🔍 Validando Delta Lake...")

# Test
test_df = spark.createDataFrame([(1, "Test")], ["id", "name"])
test_df.write.format("delta").mode("overwrite").saveAsTable("test_table")
result = spark.table("test_table").count()
spark.sql("DROP TABLE IF EXISTS test_table")

print(f"✅ Delta Lake OK ({result} registro)")

🔍 Validando Delta Lake...
✅ Delta Lake OK (1 registro)


## 6. Verificar MLflow (opcional en Community Edition)

In [0]:
import mlflow

print("🔍 Verificando MLflow...")

try:
    # Intentar configurar experiment (puede fallar en Community Edition)
    experiment_name = f"/Users/{user_email}/football_analyst"
    mlflow.set_experiment(experiment_name)
    print(f"✅ MLflow configurado: {experiment_name}")
except Exception as e:
    print(f"⚠️ MLflow limitado en Community Edition")
    print(f"   (No es crítico, podemos continuar)")
    print(f"   Error: {str(e)[:100]}")

  from google.protobuf import service as _service


🔍 Verificando MLflow...


2025-11-13 19:42:59,485 25906 ERROR _handle_rpc_error GRPC Error received
Traceback (most recent call last):
  File "/databricks/python/lib/python3.11/site-packages/pyspark/sql/connect/client/core.py", line 1862, in config
    resp = self._stub.Config(req, metadata=self.metadata())
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/databricks/python/lib/python3.11/site-packages/grpc/_interceptor.py", line 277, in __call__
    response, ignored_call = self._with_call(
                             ^^^^^^^^^^^^^^^^
  File "/databricks/python/lib/python3.11/site-packages/grpc/_interceptor.py", line 332, in _with_call
    return call.result(), call
           ^^^^^^^^^^^^^
  File "/databricks/python/lib/python3.11/site-packages/grpc/_channel.py", line 440, in result
    raise self
  File "/databricks/python/lib/python3.11/site-packages/grpc/_interceptor.py", line 315, in continuation
    response, call = self._thunk(new_method).with_call(
                     ^^^^^^^^^^^^^

⚠️ MLflow limitado en Community Edition
   (No es crítico, podemos continuar)
   Error: [CONFIG_NOT_AVAILABLE] Configuration spark.mlflow.modelRegistryUri is not available. SQLSTATE: 42K0I


## 7. Resumen

In [0]:
print("=" * 80)
print("🎉 SETUP COMPLETADO")
print("=" * 80)
print(f"\n👤 Usuario: {user_email}")
print(f"📂 Workspace: {BASE_PATH}")
print(f"📊 Delta Tables: {sum(len(v) for v in DELTA_TABLES.values())} planificadas")
print(f"\n✅ Stack Databricks:")
print("   - Delta Lake ✅")
print("   - Spark ✅")
print("   - MLflow (limitado en Community Edition)")
print(f"\n🚀 Listo para FASE 1")
print("=" * 80)

🎉 SETUP COMPLETADO

👤 Usuario: randryan308@gmail.com
📂 Workspace: /Workspace/Users/randryan308@gmail.com/football_analyst
📊 Delta Tables: 7 planificadas

✅ Stack Databricks:
   - Delta Lake ✅
   - Spark ✅
   - MLflow (limitado en Community Edition)

🚀 Listo para FASE 1


## Próximos Pasos

**FASE 1:** Ejecutar `notebooks/01_data_ingestion/simple_download`
- Descarga dataset EPL con curl
- Carga datos a Delta Tables