# SEED Algorithm: Spatial Economic and Environmental Distribution Algorithm

**Objective:** To select and distribute 1000 optimal locations for senior living facilities in Spain, maximizing territorial coverage, economic viability, and social demand.

---

### Algorithm architecture (4 layers):

1. **Layer 1 - Territorial Base**: Decision space (36,000 census tracts)
2. **Layer 2 - Residential Demand**: Number of Households + Dependency + Density (weight 0.45)
3. **Layer 3 - Economic Viability**: Average Household Income (weight 0.40)
4. **Layer 4 - Saturation**: Territorial Correction Factor (weight 0.15)

### SEED formula:

```
SEED = 0.45*(0.65*number of households + 0.1*dependency + 0.25*density) + 0.4*income + 0.15*saturation
```

### Spatial constraint: Clustering
- Minimum distance between residences: **1 km**
- Iterative selection from highest score

---

## Install dependencies

In [1]:
try:
    import pandas as pd
    import numpy as np
    import folium
    from scipy.spatial.distance import cdist
    from sklearn.preprocessing import MinMaxScaler
except ImportError:
    !pip install pandas numpy folium scipy scikit-learn openpyxl -q

‚úÖ Todas las librer√≠as est√°n instaladas


## Upload and prepare data

In [2]:
import pandas as pd
import numpy as np
from pathlib import Path

ARCHIVO_DATOS = Path("/home/lgarbayo/Escritorio/mvp-residence/data/VARIABLES SEED.xlsx")
DISTANCIA_MINIMA_KM = 1.0  
NUM_RESIDENCIAS = 1000

if not Path(ARCHIVO_DATOS).exists():
    raise FileNotFoundError(f"File not found: {ARCHIVO_DATOS}")

df = pd.read_excel(ARCHIVO_DATOS, sheet_name=0)
df = df.rename(columns={
    'id_seccion': 'seccion_censal',
    'length': 'longitud',
    'latitude': 'latitud'
})

üöÄ ALGORITMO SEED - IMPLEMENTACI√ìN COMPLETA

üìÇ Cargando datos desde '/home/lgarbayo/Escritorio/mvp-residence/data/VARIABLES SEED.xlsx'...
‚úÖ 36,333 secciones censales cargadas

üìä Variables disponibles:
   ‚Ä¢ seccion_censal
   ‚Ä¢ f_of_m
   ‚Ä¢ density
   ‚Ä¢ rent
   ‚Ä¢ saturation
   ‚Ä¢ dependence
   ‚Ä¢ longitud
   ‚Ä¢ latitud

üîç Valores nulos por columna:
   ‚Ä¢ density: 322 (0.89%)
   ‚Ä¢ rent: 1483 (4.08%)


## Layer no.1: territorial base

In [3]:
# filter sections with valid coordinates
df_valid = df.dropna(subset=['latitud', 'longitud']).copy()

# extract province from census section code
df_valid['provincia'] = df_valid['seccion_censal'].astype(str).str.zfill(9).str[:2]

# code provinces dictionary of INE
PROVINCIAS = {
    '01': '√Ålava', '02': 'Albacete', '03': 'Alicante', '04': 'Almer√≠a',
    '05': '√Åvila', '06': 'Badajoz', '07': 'Baleares', '08': 'Barcelona',
    '09': 'Burgos', '10': 'C√°ceres', '11': 'C√°diz', '12': 'Castell√≥n',
    '13': 'Ciudad Real', '14': 'C√≥rdoba', '15': 'A Coru√±a', '16': 'Cuenca',
    '17': 'Girona', '18': 'Granada', '19': 'Guadalajara', '20': 'Gipuzkoa',
    '21': 'Huelva', '22': 'Huesca', '23': 'Ja√©n', '24': 'Le√≥n',
    '25': 'Lleida', '26': 'La Rioja', '27': 'Lugo', '28': 'Madrid',
    '29': 'M√°laga', '30': 'Murcia', '31': 'Navarra', '32': 'Ourense',
    '33': 'Asturias', '34': 'Palencia', '35': 'Las Palmas', '36': 'Pontevedra',
    '37': 'Salamanca', '38': 'S.C. Tenerife', '39': 'Cantabria', '40': 'Segovia',
    '41': 'Sevilla', '42': 'Soria', '43': 'Tarragona', '44': 'Teruel',
    '45': 'Toledo', '46': 'Valencia', '47': 'Valladolid', '48': 'Bizkaia',
    '49': 'Zamora', '50': 'Zaragoza', '51': 'Ceuta', '52': 'Melilla'
}

df_valid['nombre_provincia'] = df_valid['provincia'].map(PROVINCIAS).fillna('Desconocida')


üìç CAPA 1: BASE TERRITORIAL

‚úÖ Secciones con coordenadas v√°lidas: 36,333
‚úó  Secciones descartadas: 0

üó∫Ô∏è  Provincias cubiertas: 44


## Layer no.2: residential demand

### Normalization with scikit-learn

We use `scikit-learn.preprocessing.MinMaxScaler` to normalize all variables to the range [0, 1].

**MinMaxScaler Formula:**
```
X_scaled = (X - X_min) / (X_max - X_min)
```

**Inversion for negative variables:**
- F-of-M: smaller is better ‚Üí invert (1 - X_scaled)
- Saturation: smaller is better ‚Üí invert (1 - X_scaled)

In [4]:
from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler()

# normalize and invert
f_of_m_normalized = scaler.fit_transform(df_valid[['f_of_m']])
df_valid['f_of_m_norm'] = 1 - f_of_m_normalized.flatten()

dependence_normalized = scaler.fit_transform(df_valid[['dependence']])
df_valid['dependence_norm'] = dependence_normalized.flatten()

density_normalized = scaler.fit_transform(df_valid[['density']])
df_valid['density_norm'] = density_normalized.flatten()

# calculate demand score
df_valid['score_demanda'] = (
    0.65 * df_valid['f_of_m_norm'] +
    0.10 * df_valid['dependence_norm'] +
    0.25 * df_valid['density_norm']
)


üìä CAPA 2: DEMANDA RESIDENCIAL

üîß Usando sklearn.preprocessing.MinMaxScaler para normalizaci√≥n
   Rango de normalizaci√≥n: [0, 1]

1Ô∏è‚É£ F-of-M (pir√°mide poblacional ideal)
   ‚Ä¢ F-of-M original - Min: 0.0698, Max: 0.7384
   ‚úì F-of-M normalizado e invertido (0=peor, 1=mejor)
   ‚Ä¢ Normalizado - Min: 0.0000, Max: 1.0000

2Ô∏è‚É£ Grado de dependencia
   ‚Ä¢ Dependencia original - Min: 0.0077, Max: 0.0255
   ‚úì Dependencia normalizada
   ‚Ä¢ Normalizado - Min: 0.0000, Max: 1.0000

3Ô∏è‚É£ Densidad de poblaci√≥n (hab/km¬≤)
   ‚Ä¢ Densidad original - Min: 0.01, Max: 1025.45 hab/km¬≤
   ‚úì Densidad normalizada
   ‚Ä¢ Normalizado - Min: 0.0000, Max: 1.0000

‚úÖ Score de Demanda calculado
   F√≥rmula: 0.65*F-of-M + 0.10*Dependencia + 0.25*Densidad
   ‚Ä¢ Media: 0.6516
   ‚Ä¢ M√≠n:   0.0683
   ‚Ä¢ M√°x:   0.9423

üîç Verificaci√≥n de rangos:
   ‚úì f_of_m_norm         : [0.0000, 1.0000]
   ‚úì dependence_norm     : [0.0000, 1.0000]
   ‚úó density_norm        : [0.0000, 1.0000]


## Layer no.3: economic viability

In [None]:
def calcular_score_renta(renta):
    """
    calculate the economic viability score based on average income.
    asymmetric curve with an optimum at ‚Ç¨72,000:
        stronger penalty below the optimum
        softer penalty above the optimum
    reference points:
        ‚Ç¨40,000 ‚Üí 0.00
        ‚Ç¨72,000 ‚Üí 1.00 (optimum)
    """
    RENTA_OPTIMA = 72000
    RENTA_MIN = 40000
    
    if renta <= RENTA_MIN:
        return 0.0
    elif renta <= RENTA_OPTIMA:
        # increasing curve towards optimum (steeper)
        # use quadratic function for steeper slope
        x = (renta - RENTA_MIN) / (RENTA_OPTIMA - RENTA_MIN)
        return x ** 0.7  # Exponente < 1 para crecimiento r√°pido
    else:
        # smooth decreasing curve after the optimum, milder penalty
        x = (renta - RENTA_OPTIMA) / (RENTA_OPTIMA - RENTA_MIN)
        return 1.0 / (1 + 0.5 * x ** 1.5)

df_valid['score_renta'] = df_valid['rent'].apply(calcular_score_renta)

## Layer no.4: territorial saturation

In [6]:
from sklearn.preprocessing import MinMaxScaler

scaler_sat = MinMaxScaler()

# normalize and invert
saturation_normalized = scaler_sat.fit_transform(df_valid[['saturation']])
df_valid['saturation_norm'] = 1 - saturation_normalized.flatten()


üåç CAPA 4: SATURACI√ìN TERRITORIAL

üìä Saturaci√≥n original - Min: 0.0573, Max: 0.2960

‚úÖ Score de Saturaci√≥n calculado (usando sklearn.preprocessing.MinMaxScaler)
   ‚Ä¢ Saturaci√≥n baja  ‚Üí Score alto (mejor)
   ‚Ä¢ Saturaci√≥n alta  ‚Üí Score bajo (peor)

üìä Estad√≠sticas:
   ‚Ä¢ Score medio:  0.6519
   ‚Ä¢ Score m√≠nimo: 0.0000
   ‚Ä¢ Score m√°ximo: 1.0000

üîç Verificaci√≥n de rango: ‚úì [0.0000, 1.0000]


## Final SEED score

In [7]:
# SEED = 0.45*(score_demanda) + 0.40*(score_renta) + 0.15*(score_saturacion)

df_valid['SEED_score'] = (
    0.45 * df_valid['score_demanda'] +
    0.40 * df_valid['score_renta'] +
    0.15 * df_valid['saturation_norm']
)


üéØ SCORE FINAL SEED

üßÆ F√≥rmula aplicada:
   SEED = 0.45*Demanda + 0.40*Renta + 0.15*Saturaci√≥n

   Donde Demanda = 0.65*F-of-M + 0.10*Dependencia + 0.25*Densidad

‚úÖ Score SEED calculado para 36,333 secciones

üìä Estad√≠sticas del Score SEED:
   ‚Ä¢ Media:       0.4101
   ‚Ä¢ Mediana:     0.3995
   ‚Ä¢ Desv. std:   0.0656
   ‚Ä¢ M√≠nimo:      0.2547
   ‚Ä¢ M√°ximo:      0.8379

üìê Percentiles:
   ‚Ä¢ P10: 0.3619
   ‚Ä¢ P25: 0.3795
   ‚Ä¢ P50: 0.3995
   ‚Ä¢ P75: 0.4216
   ‚Ä¢ P90: 0.4400
   ‚Ä¢ P95: 0.5277
   ‚Ä¢ P99: 0.7325


## 7Ô∏è‚É£ Selecci√≥n con Restricci√≥n Espacial (Clustering)

In [8]:
from scipy.spatial.distance import cdist

def calcular_distancia_haversine(lat1, lon1, lat2, lon2):
    """
    calculate the distance in km between two points using the Haversine formula.    
    """
    # earth ratio in km
    R = 6371.0
    
    # radians transform
    lat1_rad = np.radians(lat1)
    lon1_rad = np.radians(lon1)
    lat2_rad = np.radians(lat2)
    lon2_rad = np.radians(lon2)
    
    # diff
    dlat = lat2_rad - lat1_rad
    dlon = lon2_rad - lon1_rad
    
    # haversine formula
    a = np.sin(dlat/2)**2 + np.cos(lat1_rad) * np.cos(lat2_rad) * np.sin(dlon/2)**2
    c = 2 * np.arctan2(np.sqrt(a), np.sqrt(1-a))
    
    return R * c


def seleccionar_con_clustering(df, num_residencias, distancia_min_km):
    """
    select locations to avoid territorial overlaps
    1. sort by descending seed score
    2. select the best section
    3. remove all sections less than distancia_min_km
    4. repeat until num_residencias is reached
    """
    
    # sort by descending score
    df_sorted = df.sort_values('SEED_score', ascending=False).reset_index(drop=True)
    
    seleccionadas = []
    coords_seleccionadas = []
    
    candidatos = df_sorted.copy()
    
    iteracion = 0
    
    while len(seleccionadas) < num_residencias and len(candidatos) > 0:
        iteracion += 1
        
        # select the best secion
        mejor = candidatos.iloc[0]
        seleccionadas.append(mejor)
        coords_seleccionadas.append((mejor['latitud'], mejor['longitud']))
        
        if iteracion % 100 == 0 or iteracion <= 10:
            print(f"   ‚Ä¢ Iteraci√≥n {iteracion:4d}: Seleccionada secci√≥n {int(mejor['seccion_censal'])}, "
                  f"Score: {mejor['SEED_score']:.4f}, Candidatos restantes: {len(candidatos):,}")
        
        # remove selected
        candidatos = candidatos.iloc[1:].copy()
        
        if len(candidatos) == 0:
            break
        
        # calc distances from selected to all
        distancias = calcular_distancia_haversine(
            mejor['latitud'],
            mejor['longitud'],
            candidatos['latitud'].values,
            candidatos['longitud'].values
        )
        
        # filter cand with more distancia_min_km
        mask_validas = distancias >= distancia_min_km
        candidatos = candidatos[mask_validas].reset_index(drop=True)
    
    return pd.DataFrame(seleccionadas)


# play selection
df_seed_1000 = seleccionar_con_clustering(
    df_valid,
    num_residencias=NUM_RESIDENCIAS,
    distancia_min_km=DISTANCIA_MINIMA_KM
)

# select top 1000
df_seed_1000['ranking'] = range(1, len(df_seed_1000) + 1)


üó∫Ô∏è  SELECCI√ìN CON RESTRICCI√ìN ESPACIAL

üîÑ Iniciando selecci√≥n iterativa...
   ‚Ä¢ Distancia m√≠nima: 1.0 km
   ‚Ä¢ Objetivo: 1000 residencias
   ‚Ä¢ Iteraci√≥n    1: Seleccionada secci√≥n 1503003001, Score: 0.8379, Candidatos restantes: 36,333
   ‚Ä¢ Iteraci√≥n    2: Seleccionada secci√≥n 3501602037, Score: 0.8318, Candidatos restantes: 36,296
   ‚Ä¢ Iteraci√≥n    3: Seleccionada secci√≥n 3605709011, Score: 0.8277, Candidatos restantes: 36,274
   ‚Ä¢ Iteraci√≥n    4: Seleccionada secci√≥n 2006906001, Score: 0.8184, Candidatos restantes: 36,271
   ‚Ä¢ Iteraci√≥n    5: Seleccionada secci√≥n 4625001009, Score: 0.8183, Candidatos restantes: 36,248
   ‚Ä¢ Iteraci√≥n    6: Seleccionada secci√≥n 5100101001, Score: 0.8180, Candidatos restantes: 36,207
   ‚Ä¢ Iteraci√≥n    7: Seleccionada secci√≥n 2906702017, Score: 0.8150, Candidatos restantes: 36,190
   ‚Ä¢ Iteraci√≥n    8: Seleccionada secci√≥n 2006906009, Score: 0.8109, Candidatos restantes: 36,188
   ‚Ä¢ Iteraci√≥n    9: Selecc

## Export CSV

In [9]:
# output directory
output_dir = Path('outputs')
output_dir.mkdir(exist_ok=True)

df_top50 = df_seed_1000.head(50).copy()

columnas_export = [
    'ranking', 'seccion_censal', 'SEED_score',
    'latitud', 'longitud', 'nombre_provincia',
    'f_of_m', 'density', 'dependence', 'rent', 'saturation',
    'score_demanda', 'score_renta', 'saturation_norm'
]

output_top50 = output_dir / 'SEED_top50_ubicaciones.csv'
df_top50[columnas_export].to_csv(output_top50, index=False, encoding='utf-8')

output_top1000 = output_dir / 'SEED_top1000_ubicaciones.csv'
df_seed_1000[columnas_export].to_csv(output_top1000, index=False, encoding='utf-8')


üíæ EXPORTANDO RESULTADOS

‚úÖ Top 50 exportado: outputs/SEED_top50_ubicaciones.csv
   Tama√±o: 9.3 KB

‚úÖ Top 1000 exportado: outputs/SEED_top1000_ubicaciones.csv
   Tama√±o: 183.2 KB


## Map no.1: top 50

In [10]:
import folium

mapa_top50 = folium.Map(
    location=[40.4168, -3.7038],
    zoom_start=6,
    tiles='OpenStreetMap'
)

titulo_html = '''
             <div style="position: fixed; 
                         top: 10px; left: 50px; width: 550px; height: 100px; 
                         background-color: white; border:2px solid #2C3E50; z-index:9999; 
                         font-size:14px; padding: 15px; border-radius: 10px;">
             <h3 style="margin:0; color: #2C3E50;">üèÜ Top 50 Ubicaciones SEED</h3>
             <p style="margin:5px 0; font-size: 13px;">Algoritmo de optimizaci√≥n territorial para residencias de mayores</p>
             <p style="margin:0; font-size:12px; color:#666;">ü•á Top 10 | üîµ Top 11-25 | üü† Top 26-50</p>
             </div>
             '''
mapa_top50.get_root().html.add_child(folium.Element(titulo_html))

for _, row in df_top50.iterrows():
    rank = int(row['ranking'])
    
    # Color y estilo seg√∫n ranking
    if rank <= 10:
        color = 'green'
        icon_type = 'star'
        emoji = 'ü•á'
    elif rank <= 25:
        color = 'blue'
        icon_type = 'home'
        emoji = 'üîµ'
    else:
        color = 'orange'
        icon_type = 'home'
        emoji = 'üü†'
    
    popup_html = f"""
    <div style='font-family: Arial; font-size: 13px; min-width: 300px;'>
        <h3 style='margin:0 0 10px 0; color: #2C3E50;'>{emoji} Ranking #{rank}</h3>
        <hr style='margin: 10px 0; border: 1px solid #ddd;'>
        <table style='width:100%; border-collapse: collapse;'>
            <tr style='background-color: #f8f9fa;'>
                <td style='padding: 8px; font-weight: bold;'>Secci√≥n Censal:</td>
                <td style='padding: 8px;'>{int(row['seccion_censal'])}</td>
            </tr>
            <tr>
                <td style='padding: 8px; font-weight: bold;'>Score SEED:</td>
                <td style='padding: 8px;'><strong>{row['SEED_score']:.4f}</strong></td>
            </tr>
            <tr style='background-color: #f8f9fa;'>
                <td style='padding: 8px; font-weight: bold;'>Provincia:</td>
                <td style='padding: 8px;'>{row['nombre_provincia']}</td>
            </tr>
            <tr>
                <td style='padding: 8px; font-weight: bold;'>F-of-M:</td>
                <td style='padding: 8px;'>{row['f_of_m']:.4f}</td>
            </tr>
            <tr style='background-color: #f8f9fa;'>
                <td style='padding: 8px; font-weight: bold;'>Renta media:</td>
                <td style='padding: 8px;'>{row['rent']:,.0f}‚Ç¨</td>
            </tr>
            <tr>
                <td style='padding: 8px; font-weight: bold;'>Coordenadas:</td>
                <td style='padding: 8px;'>{row['latitud']:.4f}, {row['longitud']:.4f}</td>
            </tr>
        </table>
    </div>
    """
    
    folium.Marker(
        location=[row['latitud'], row['longitud']],
        popup=folium.Popup(popup_html, max_width=400),
        tooltip=f"{emoji} #{rank}: {row['nombre_provincia']} (SEED: {row['SEED_score']:.4f})",
        icon=folium.Icon(color=color, icon=icon_type, prefix='fa')
    ).add_to(mapa_top50)

output_mapa_top50 = output_dir / 'SEED_mapa_top50.html'
mapa_top50.save(str(output_mapa_top50))
mapa_top50


üó∫Ô∏è  GENERANDO MAPA TOP 50

‚úÖ Mapa Top 50 generado: outputs/SEED_mapa_top50.html
   Tama√±o: 139.9 KB


## Map no.2: top 1000

In [11]:
mapa_top1000 = folium.Map(
    location=[40.4168, -3.7038],
    zoom_start=6,
    tiles='OpenStreetMap'
)

titulo_html_1000 = '''
             <div style="position: fixed; 
                         top: 10px; left: 50px; width: 500px; height: 90px; 
                         background-color: white; border:2px solid #2C3E50; z-index:9999; 
                         font-size:14px; padding: 15px; border-radius: 10px;">
             <h3 style="margin:0; color: #2C3E50;">üéØ 1000 Ubicaciones √ìptimas SEED</h3>
             <p style="margin:5px 0; font-size: 13px;">Distribuci√≥n territorial con restricci√≥n espacial (‚â•1km)</p>
             <p style="margin:0; font-size:12px; color:#666;">üî¥ Cada punto = 1 residencia √≥ptima</p>
             </div>
             '''
mapa_top1000.get_root().html.add_child(folium.Element(titulo_html_1000))

for _, row in df_seed_1000.iterrows():
    # Popup simplificado
    popup_text = f"""
    <div style='font-family: Arial; font-size: 12px;'>
        <strong>Ranking #{int(row['ranking'])}</strong><br>
        Secci√≥n: {int(row['seccion_censal'])}<br>
        Score: {row['SEED_score']:.4f}<br>
        Provincia: {row['nombre_provincia']}
    </div>
    """
    
    folium.CircleMarker(
        location=[row['latitud'], row['longitud']],
        radius=4,
        popup=folium.Popup(popup_text, max_width=250),
        tooltip=f"#{int(row['ranking'])}: {row['nombre_provincia']}",
        color='darkred',
        fill=True,
        fillColor='red',
        fillOpacity=0.7,
        weight=2
    ).add_to(mapa_top1000)

output_mapa_top1000 = output_dir / 'SEED_mapa_top1000.html'
mapa_top1000.save(str(output_mapa_top1000))
mapa_top1000


üó∫Ô∏è  GENERANDO MAPA TOP 1000

‚úÖ Mapa Top 1000 generado: outputs/SEED_mapa_top1000.html
   Tama√±o: 1404.3 KB
