<a href="https://colab.research.google.com/github/franciscogarate/cdiae/blob/main/notebooks/5_Concentracion_500m_California.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Cálculo del punto central con la mayor concentración geográfica

In [1]:
!git clone https://github.com/franciscogarate/cdiae

Cloning into 'cdiae'...
remote: Enumerating objects: 94, done.[K
remote: Counting objects: 100% (94/94), done.[K
remote: Compressing objects: 100% (75/75), done.[K
remote: Total 94 (delta 41), reused 60 (delta 19), pack-reused 0 (from 0)[K
Receiving objects: 100% (94/94), 3.46 MiB | 10.57 MiB/s, done.
Resolving deltas: 100% (41/41), done.


In [2]:
import pandas as pd
import numpy as np
from scipy.spatial.distance import cdist

Cargamos los datos:

In [3]:
df = pd.read_feather('cdiae/data/03_model_input/california_housing_clean.ftr')
df.head()

Unnamed: 0,MedInc,HouseAge,AveRooms,AveBedrms,Population,AveOccup,Latitude,Longitude,target
0,7.2574,52.0,8.288136,1.073446,496.0,2.80226,37.85,-122.24,3.521
1,5.6431,52.0,5.817352,1.073059,558.0,2.547945,37.85,-122.25,3.413
2,3.8462,52.0,6.281853,1.081081,565.0,2.181467,37.85,-122.25,3.422
3,4.0368,52.0,4.761658,1.103627,413.0,2.139896,37.85,-122.25,2.697
4,3.6591,52.0,4.931907,0.951362,1094.0,2.128405,37.84,-122.25,2.992


Creamos una matriz de coordenadas (lat, lon):

In [4]:
coords = df[['Latitude', 'Longitude']].to_numpy()

Calculamos la matriz de distancias en grados

In [5]:
distancias = cdist(coords, coords, metric='euclidean')
distancias

array([[0.        , 0.01      , 0.01      , ..., 1.88063819, 1.82833257,
        1.81945047],
       [0.01      , 0.        , 0.        , ..., 1.88608059, 1.83338485,
        1.82496575],
       [0.01      , 0.        , 0.        , ..., 1.88608059, 1.83338485,
        1.82496575],
       ...,
       [1.88063819, 1.88608059, 1.88608059, ..., 0.        , 0.1       ,
        0.06324555],
       [1.82833257, 1.83338485, 1.83338485, ..., 0.1       , 0.        ,
        0.1       ],
       [1.81945047, 1.82496575, 1.82496575, ..., 0.06324555, 0.1       ,
        0.        ]])

Sumamos las viviendas de todos los distritos a menos de 500 metros

In [6]:
matrix_cumulos_500m = distancias < 500 / 100000  # Aproximación de 500 metros a 0.005 grados
matrix_cumulos_500m

array([[ True, False, False, ..., False, False, False],
       [False,  True,  True, ..., False, False, False],
       [False,  True,  True, ..., False, False, False],
       ...,
       [False, False, False, ...,  True, False, False],
       [False, False, False, ..., False,  True, False],
       [False, False, False, ..., False, False,  True]])

In [7]:
matrix_capitales = matrix_cumulos_500m * df['target'].values[np.newaxis, :]
matrix_capitales

array([[3.521, 0.   , 0.   , ..., 0.   , 0.   , 0.   ],
       [0.   , 3.413, 3.422, ..., 0.   , 0.   , 0.   ],
       [0.   , 3.413, 3.422, ..., 0.   , 0.   , 0.   ],
       ...,
       [0.   , 0.   , 0.   , ..., 0.923, 0.   , 0.   ],
       [0.   , 0.   , 0.   , ..., 0.   , 0.847, 0.   ],
       [0.   , 0.   , 0.   , ..., 0.   , 0.   , 0.894]])

In [8]:
capital_concentrado = matrix_capitales.sum(axis=1)

Encontramos el índice del punto con mayor concentración

In [9]:
indice_max = np.argmax(capital_concentrado)
punto_central = df.iloc[indice_max]
punto_central

Unnamed: 0,12334
MedInc,2.8569
HouseAge,52.0
AveRooms,4.703316
AveBedrms,1.146597
Population,1243.0
AveOccup,2.169284
Latitude,37.78
Longitude,-122.44
target,3.722


Coordenadas del punto más denso en contador de viviendas

In [10]:
print(f'Latitud: {punto_central['Latitude']}, Longitud: {punto_central['Longitude']}')
print(f'Capital total radio 500 m: ${capital_concentrado[indice_max]:,.0f}')
print(f'Número de viviendas del cúmulo: {matrix_cumulos_500m[indice_max].sum()}')

Latitud: 37.78, Longitud: -122.44
Capital total radio 500 m: $39
Número de viviendas del cúmulo: 11
