# 08. Segmentaci√≥n de Clientes y Recomendaciones de Negocio

**TFM: Predicci√≥n de Clientes Fidelizables en E-commerce**  
**Autora:** Magda Monroy Jim√©nez  
**Universidad:** Complutense de Madrid

Este notebook aplica el modelo entrenado para segmentar todos los clientes y generar recomendaciones espec√≠ficas de negocio.

In [None]:
import pandas as pd
import numpy as np
import pickle
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import sys
from pathlib import Path

# Agregar src al path
sys.path.append(str(Path().parent / "src"))
from utils.business_segmentation import generate_customer_insights

print("‚úÖ Librer√≠as cargadas correctamente")

## 1. Cargar Datos y Modelo

In [None]:
# Cargar datos de clientes
df_customers = pd.read_csv('../data/processed/customer_features_with_trends.csv')

# Cargar modelo entrenado
model = pickle.load(open('../results/models/best_loyalty_model.pkl', 'rb'))
scaler = pickle.load(open('../results/models/feature_scaler.pkl', 'rb'))
selector = pickle.load(open('../results/models/feature_selector.pkl', 'rb'))

print(f"üìä Datos cargados: {df_customers.shape[0]:,} clientes")
print(f"ü§ñ Modelo cargado: {type(model).__name__}")

df_customers.head()

## 2. Generar Predicciones para Todos los Clientes

In [None]:
# Preparar caracter√≠sticas para predicci√≥n
from sklearn.preprocessing import LabelEncoder

# Codificar pa√≠s si no existe
if 'Country_encoded' not in df_customers.columns:
    le = LabelEncoder()
    df_customers['Country_encoded'] = le.fit_transform(df_customers['Country'].fillna('Unknown'))

# Caracter√≠sticas para el modelo
feature_cols = [
    'Recency', 'Frequency', 'Monetary', 'TotalQuantity', 'AvgQuantity',
    'AvgUnitPrice', 'AvgRevenue', 'UniqueProducts', 'CustomerLifespan',
    'avg_trends_online_shopping', 'std_trends_online_shopping', 'max_trends_online_shopping',
    'avg_trends_retail_therapy', 'std_trends_retail_therapy', 'max_trends_retail_therapy',
    'avg_trends_gift_shopping', 'std_trends_gift_shopping', 'max_trends_gift_shopping',
    'Country_encoded'
]

# Preparar datos
X = df_customers[feature_cols].fillna(0)

# Hacer predicciones
X_scaled = scaler.transform(X)
X_selected = selector.transform(X_scaled)

# Obtener probabilidades
probabilities = model.predict_proba(X_selected)[:, 1]
predictions = model.predict(X_selected)

# Agregar al DataFrame
df_customers['loyalty_probability'] = probabilities
df_customers['loyalty_prediction'] = predictions

print(f"‚úÖ Predicciones generadas para {len(df_customers):,} clientes")
print(f"üìà Probabilidad promedio: {probabilities.mean():.1%}")
print(f"üéØ Clientes fidelizables predichos: {predictions.sum():,} ({predictions.mean():.1%})")

## 3. Segmentaci√≥n de Clientes

In [None]:
# Aplicar segmentaci√≥n a todos los clientes
segments = []
value_scores = []
risk_levels = []
suggested_budgets = []

for idx, row in df_customers.iterrows():
    customer_data = {
        'Recency': row['Recency'],
        'Frequency': row['Frequency'],
        'Monetary': row['Monetary'],
        'probability': row['loyalty_probability']
    }
    
    insights = generate_customer_insights(customer_data)
    
    segments.append(insights['segment'])
    value_scores.append(insights['value_score'])
    risk_levels.append(insights['risk_level'])
    suggested_budgets.append(insights['suggested_budget'])

# Agregar al DataFrame
df_customers['segment'] = segments
df_customers['value_score'] = value_scores
df_customers['risk_level'] = risk_levels
df_customers['suggested_budget'] = suggested_budgets

print("‚úÖ Segmentaci√≥n completada")

# Resumen de segmentos
segment_summary = df_customers.groupby('segment').agg({
    'CustomerID': 'count',
    'Monetary': 'sum',
    'loyalty_probability': 'mean',
    'value_score': 'mean',
    'suggested_budget': 'sum'
}).round(2)

segment_summary.columns = ['Customers', 'Total_Revenue', 'Avg_Probability', 'Avg_Value_Score', 'Total_Budget']
segment_summary['Revenue_Share'] = (segment_summary['Total_Revenue'] / segment_summary['Total_Revenue'].sum() * 100).round(1)
segment_summary['Customer_Share'] = (segment_summary['Customers'] / segment_summary['Customers'].sum() * 100).round(1)

print("\nüìä RESUMEN DE SEGMENTOS:")
segment_summary

## 4. Visualizaciones de Segmentaci√≥n

In [None]:
# Gr√°fico de distribuci√≥n de segmentos
fig_segments = px.pie(
    values=segment_summary['Customers'], 
    names=segment_summary.index,
    title="Distribuci√≥n de Clientes por Segmento",
    color_discrete_sequence=px.colors.qualitative.Set3
)

fig_segments.update_traces(textposition='inside', textinfo='percent+label')
fig_segments.show()

# Gr√°fico de revenue por segmento
fig_revenue = px.bar(
    x=segment_summary.index,
    y=segment_summary['Total_Revenue'],
    title="Revenue Total por Segmento",
    labels={'x': 'Segmento', 'y': 'Revenue Total (¬£)'},
    color=segment_summary['Total_Revenue'],
    color_continuous_scale='Viridis'
)

fig_revenue.show()

In [None]:
# Scatter plot: Probabilidad vs Value Score por segmento
fig_scatter = px.scatter(
    df_customers,
    x='loyalty_probability',
    y='value_score',
    color='segment',
    size='Monetary',
    hover_data=['CustomerID', 'Recency', 'Frequency'],
    title="Segmentaci√≥n: Probabilidad de Fidelizaci√≥n vs Valor del Cliente",
    labels={
        'loyalty_probability': 'Probabilidad de Fidelizaci√≥n',
        'value_score': 'Score de Valor del Cliente'
    }
)

fig_scatter.show()

## 5. An√°lisis de Clientes de Alto Valor

In [None]:
# Identificar clientes de alto valor (top 10%)
high_value_threshold = df_customers['value_score'].quantile(0.9)
high_value_customers = df_customers[df_customers['value_score'] >= high_value_threshold].copy()

print(f"üèÜ CLIENTES DE ALTO VALOR (Top 10%):")
print(f"Threshold: {high_value_threshold:.1f}/100")
print(f"Cantidad: {len(high_value_customers):,} clientes")
print(f"Revenue total: ¬£{high_value_customers['Monetary'].sum():,.2f}")
print(f"% del revenue total: {(high_value_customers['Monetary'].sum() / df_customers['Monetary'].sum() * 100):.1f}%")

# Top 20 clientes por valor
top_customers = high_value_customers.nlargest(20, 'value_score')[[
    'CustomerID', 'segment', 'value_score', 'loyalty_probability', 
    'Recency', 'Frequency', 'Monetary', 'suggested_budget'
]].round(2)

print("\nüéØ TOP 20 CLIENTES POR VALOR:")
top_customers

## 6. Recomendaciones de Campa√±a por Segmento

In [None]:
# An√°lisis de presupuesto de campa√±a
campaign_analysis = df_customers.groupby('segment').agg({
    'CustomerID': 'count',
    'suggested_budget': ['sum', 'mean'],
    'loyalty_probability': 'mean',
    'Monetary': 'mean'
}).round(2)

campaign_analysis.columns = ['Customers', 'Total_Budget', 'Avg_Budget_Per_Customer', 'Avg_Probability', 'Avg_Monetary']

# Calcular ROI esperado (simplificado)
campaign_analysis['Expected_ROI_Min'] = campaign_analysis['Avg_Monetary'] * 0.1  # 10% incremento conservador
campaign_analysis['ROI_Ratio'] = (campaign_analysis['Expected_ROI_Min'] / campaign_analysis['Avg_Budget_Per_Customer']).round(2)

print("üí∞ AN√ÅLISIS DE PRESUPUESTO DE CAMPA√ëA POR SEGMENTO:")
campaign_analysis

In [None]:
# Visualizaci√≥n de presupuesto vs ROI esperado
fig_budget = px.scatter(
    x=campaign_analysis['Avg_Budget_Per_Customer'],
    y=campaign_analysis['ROI_Ratio'],
    size=campaign_analysis['Customers'],
    color=campaign_analysis.index,
    title="Presupuesto vs ROI Esperado por Segmento",
    labels={
        'x': 'Presupuesto Promedio por Cliente (¬£)',
        'y': 'Ratio ROI Esperado',
        'color': 'Segmento'
    },
    hover_name=campaign_analysis.index
)

# Agregar l√≠nea de break-even (ROI = 1)
fig_budget.add_hline(y=1, line_dash="dash", line_color="red", 
                    annotation_text="Break-even (ROI = 1)")

fig_budget.show()

## 7. Clientes en Riesgo - Acci√≥n Inmediata

In [None]:
# Identificar clientes en riesgo que requieren atenci√≥n inmediata
at_risk_segments = ['At Risk', 'Need Attention', 'Lost']
at_risk_customers = df_customers[df_customers['segment'].isin(at_risk_segments)].copy()

# Priorizar por valor hist√≥rico
at_risk_priority = at_risk_customers[
    (at_risk_customers['Monetary'] > df_customers['Monetary'].median()) &
    (at_risk_customers['Frequency'] >= 2)
].copy()

print(f"üö® CLIENTES EN RIESGO - ACCI√ìN INMEDIATA:")
print(f"Total en riesgo: {len(at_risk_customers):,} clientes")
print(f"Alta prioridad: {len(at_risk_priority):,} clientes")
print(f"Revenue en riesgo: ¬£{at_risk_priority['Monetary'].sum():,.2f}")

# Top clientes en riesgo por valor
top_at_risk = at_risk_priority.nlargest(15, 'Monetary')[[
    'CustomerID', 'segment', 'Recency', 'Frequency', 'Monetary', 
    'loyalty_probability', 'suggested_budget'
]].round(2)

print("\n‚ö†Ô∏è TOP 15 CLIENTES EN RIESGO (Por valor hist√≥rico):")
top_at_risk

## 8. Exportar Resultados para Tableau

In [None]:
# Preparar datos para Tableau
tableau_export = df_customers[[
    'CustomerID', 'Country', 'segment', 'loyalty_probability', 'loyalty_prediction',
    'value_score', 'risk_level', 'suggested_budget',
    'Recency', 'Frequency', 'Monetary', 'UniqueProducts', 'CustomerLifespan'
]].copy()

# Agregar categor√≠as adicionales
tableau_export['probability_category'] = pd.cut(
    tableau_export['loyalty_probability'], 
    bins=[0, 0.2, 0.4, 0.6, 0.8, 1.0],
    labels=['Very Low', 'Low', 'Medium', 'High', 'Very High']
)

tableau_export['value_category'] = pd.cut(
    tableau_export['value_score'],
    bins=[0, 20, 40, 60, 80, 100],
    labels=['Low Value', 'Medium-Low', 'Medium', 'Medium-High', 'High Value']
)

# Guardar para Tableau
tableau_export.to_csv('../results/tableau/customer_segmentation_analysis.csv', index=False)

print("‚úÖ Datos exportados para Tableau")
print(f"üìÅ Archivo: customer_segmentation_analysis.csv")
print(f"üìä Registros: {len(tableau_export):,}")

# Resumen final
print("\nüéØ RESUMEN EJECUTIVO:")
print(f"‚Ä¢ Total clientes analizados: {len(df_customers):,}")
print(f"‚Ä¢ Clientes fidelizables predichos: {df_customers['loyalty_prediction'].sum():,} ({df_customers['loyalty_prediction'].mean():.1%})")
print(f"‚Ä¢ Revenue total: ¬£{df_customers['Monetary'].sum():,.2f}")
print(f"‚Ä¢ Presupuesto total sugerido: ¬£{df_customers['suggested_budget'].sum():,.2f}")
print(f"‚Ä¢ ROI esperado promedio: {(df_customers['Monetary'].sum() * 0.1 / df_customers['suggested_budget'].sum()):.1f}x")

## 9. Conclusiones y Recomendaciones

### Hallazgos Clave:

1. **Segmentaci√≥n Efectiva**: El modelo identifica claramente diferentes segmentos de clientes con caracter√≠sticas y necesidades distintas.

2. **Concentraci√≥n de Valor**: Los clientes "Champions" y "Loyal Customers" representan una proporci√≥n significativa del revenue total.

3. **Oportunidades de Retenci√≥n**: Clientes "At Risk" con alto valor hist√≥rico requieren atenci√≥n inmediata.

### Recomendaciones Estrat√©gicas:

1. **Priorizar Champions y Loyal Customers**: Invertir en programas VIP y de fidelizaci√≥n premium.

2. **Desarrollar Potential Loyalists**: Implementar programas de onboarding y engagement.

3. **Recuperar At Risk**: Campa√±as urgentes de retenci√≥n para clientes de alto valor.

4. **Optimizar Presupuesto**: Asignar recursos seg√∫n el ROI esperado por segmento.

### Pr√≥ximos Pasos:

1. Implementar campa√±as diferenciadas por segmento
2. Monitorear m√©tricas de conversi√≥n y ROI
3. Actualizar modelo con nuevos datos trimestralmente
4. Desarrollar dashboard en tiempo real para seguimiento