# Clustering algorithm

Import the packages necessary to read the datasets from disk and set the base path of the datasets

In [1]:
import pandas as pd
import pathlib
import os

base_path = pathlib.Path(os.environ.get("SAMPLES_DIR", ""))

Read the input dataset containing all the data for the clustering problem

In [9]:
clustering_df = pd.read_csv(
    base_path / "clustering" / "G_140_AMELIA_zoning.csv",
    sep=';')

In [10]:
clustering_df[clustering_df['popolazion'].notna()].head()

Unnamed: 0,ID,popolazion,geometry,usable,beds,personnel,density
0,ARCONATE,68.24,"POLYGON ((487010.785388 5044903.977341, 487010...",,,,0.081721
1,MAGNAGO,95.08,"POLYGON ((484328.912806 5049867.38367, 484328....",,,,0.083897
2,INVERUNO,84.44,"POLYGON ((485634.28175 5041541.041653, 485634....",,,,0.06929
3,BUSCATE,46.8,"POLYGON ((485631.5327 5041538.040956, 485631.5...",,,,0.060276
4,BUSTO GAROLFO,140.42,"POLYGON ((490165.729427 5045231.428402, 490165...",,,,0.109699


In [11]:
clustering_df[clustering_df['personnel'].notna()].head()

Unnamed: 0,ID,popolazion,geometry,usable,beds,personnel,density


In [2]:
from libadalina_core.sedona_utils import EPSGFormats
from clustering.algorithms import clustering_algorithm
from clustering.models.adalina_zoning_distance import ClusteringDistance

clustering_solution = clustering_algorithm(
    clustering_df,
    epsg=EPSGFormats.EPSG32632,
    k_min=4,
    k_max=6,
    distances=[
        ClusteringDistance(name="popolazion", weight=2, function="chebyshev"),
        ClusteringDistance(name="density", function="euclidean"),
        ClusteringDistance(name="geometry"),
    ],
)

clustering_solution.get_dataframe_to_export().head()

Running HiGHS 1.11.0 (git hash: n/a): Copyright (c) 2025 HiGHS under MIT licence terms
MIP  has 4331 rows; 590 cols; 13367 nonzeros; 590 integer variables (590 binary)
Coefficient ranges:
  Matrix [1e+00, 1e+00]
  Cost   [2e-02, 4e-01]
  Bound  [1e+00, 1e+00]
  RHS    [1e+00, 6e+00]
Coefficient ranges:
  Matrix [1e+00, 1e+00]
  Cost   [2e-02, 4e-01]
  Bound  [1e+00, 1e+00]
  RHS    [1e+00, 6e+00]
Presolving model
Presolve: Infeasible

Src: B => Branching; C => Central rounding; F => Feasibility pump; J => Feasibility jump;
     H => Heuristic; L => Sub-MIP; P => Empty MIP; R => Randomized rounding; Z => ZI Round;
     I => Shifting; S => Solve LP; T => Evaluate node; U => Unbounded; X => User solution;
     z => Trivial zero; l => Trivial lower; u => Trivial upper; p => Trivial point

        Nodes      |    B&B Tree     |            Objective Bounds              |  Dynamic Constraints |       Work      
Src  Proc. InQueue |  Leaves   Expl. | BestBound       BestSol              Gap | 

Unnamed: 0,ID,popolazion,geometry,usable,beds,personnel,density,zoning_Adalina
0,ROBECCHETTO CON INDUNO,47.95,"POLYGON ((480748.825 5043027.556, 480748.825 5...",,,,0.034325,6
1,NOSATE,6.44,"POLYGON ((477387.977 5043229.078, 477387.977 5...",,,,0.013081,18
2,BERNATE TICINO,29.41,"POLYGON ((487458.22 5036059.092, 487458.22 503...",,,,0.024243,10
3,CUGGIONO,80.79,"POLYGON ((486991.876 5038856.062, 486991.876 5...",,,,0.054359,6
4,TURBIGO,71.06,"POLYGON ((480748.826 5043027.557, 480748.826 5...",,,,0.083216,6
