<a href="https://colab.research.google.com/github/mark76jx17/BIG-DATA-project-/blob/main/BIG_DATA_project.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Task
Download Points of Interest (POIs) for Health, Education, Food, Security, Public Services, and Sports in Pavia and Cagliari using `osmnx`, calculate H3 indices for these locations to aggregate service counts per hexagonal cell, and create an interactive KeplerGL map to visualize the density of basic services in both cities.

In [1]:
from google.colab import output
output.enable_custom_widget_manager()

## Install Libraries
Install necessary Python libraries including `osmnx`, `h3`, `geopandas`, and `keplergl` to handle OpenStreetMap data, spatial indexing, and visualization.


In [None]:
!pip install osmnx h3 geopandas keplergl

Collecting osmnx
  Downloading osmnx-2.0.7-py3-none-any.whl.metadata (4.9 kB)
Collecting h3
  Downloading h3-4.4.1-cp312-cp312-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl.metadata (18 kB)
Collecting keplergl
  Downloading keplergl-0.3.7.tar.gz (18.4 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m18.4/18.4 MB[0m [31m67.8 MB/s[0m eta [36m0:00:00[0m
[?25h

## Download OSM Data


Download Points of Interest (POIs) for Pavia and Cagliari using `osmnx`.


In [None]:
import osmnx as ox
import pandas as pd
import geopandas as gpd
import warnings

# Suppress OSMnx UserWarnings regarding query area size to prevent stderr output
warnings.filterwarnings("ignore", category=UserWarning, module="osmnx")

# Define locations
locations = ['Pavia, Italy', 'Cagliari, Italy']

# Define tags
tags = {
    'amenity': [
        'hospital', 'clinic', 'doctors', 'pharmacy',
        'school', 'university', 'kindergarten', 'college', 'library',
        'restaurant', 'cafe', 'fast_food', 'bar', 'pub',
        'police', 'fire_station', 'post_office', 'townhall', 'courthouse'
    ],
    'leisure': [
        'sports_centre', 'pitch', 'stadium', 'swimming_pool'
    ]
}

# Download POIs
gdf_pois = ox.features.features_from_place(locations, tags)

# Verify the data
print(f"Shape of GeoDataFrame: {gdf_pois.shape}")
display(gdf_pois.head())

## Process Data and Calculate H3 Indices
Convert POI geometries to centroids and calculate the H3 index for each location at resolution 9.


In [None]:
import h3
import warnings

# Suppress GeoPandas UserWarning about centroids on geographic CRS
warnings.filterwarnings("ignore", message="Geometry is in a geographic CRS")

# Create a copy to avoid modifying the original dataframe
gdf_h3 = gdf_pois.copy()

# Convert geometries to centroids
gdf_h3['geometry'] = gdf_h3.geometry.centroid

# Extract latitude and longitude
gdf_h3['lat'] = gdf_h3.geometry.y
gdf_h3['lng'] = gdf_h3.geometry.x

# Calculate H3 index at resolution 9
resolution = 9
gdf_h3['h3_index'] = gdf_h3.apply(lambda row: h3.latlng_to_cell(row['lat'], row['lng'], resolution), axis=1)

# Verify the calculation
display(gdf_h3[['geometry', 'lat', 'lng', 'h3_index']].head())

## Aggregate Service Counts

Group the POI data by their H3 cell index to calculate the total number of services available in each hexagonal unit.


In [None]:
# Group by H3 index and count services
df_counts = gdf_h3.groupby('h3_index').size().reset_index(name='service_count')

# Verify the aggregated data
print(f"Number of unique H3 cells: {df_counts.shape[0]}")
display(df_counts.head())

## Generate Interactive Map

Create an interactive KeplerGL map to visualize the density of basic services per H3 cell for both cities.


In [None]:
from keplergl import KeplerGl

# Initialize KeplerGl map
map_1 = KeplerGl(height=600)

# Add data to the map
map_1.add_data(data=df_counts, name='Service Density')

# Save the map to an HTML file
map_1.save_to_html(file_name='services_map.html')

# Display the map
map_1

In [None]:
from google.colab import files
files.download('services_map.html')


### Data Analysis Key Findings

*   **Data Retrieval**: downloaded Points of Interest (POIs) for categories including Health, Education, Food, Security, Public Services, and Sports in Pavia and Cagliari. The resulting dataset contained **6,848** distinct locations.
*   **Spatial Indexing**: Converted POI geometries to centroids and calculated H3 spatial indices at **resolution 9** for every location.
*   **Aggregation**: Aggregated the service data based on spatial indices, resulting in **2,673** unique hexagonal cells containing at least one service.
*   **Visualization Output**: Generated an interactive KeplerGL map (`services_map.html`) visualizing the density of basic services ('service_count') across the hexagonal grid.

### Insights or Next Steps

*   **Urban Density Analysis**: The generated map provides a visual heatmap of service availability, allowing for the immediate identification of "15-minute city" potential by highlighting areas with high concentrations of essential services versus underserved zones in both Pavia and Cagliari.
*   **Granular Filtering**: A recommended next step is to separate the aggregation by specific tag categories (e.g., separating "Health" from "Leisure") to analyze the distribution of specific infrastructure types rather than just total service density.
