# modDR-Framework example: UCI-485 Travel Review Ratings

In this example, the moddr framework is demonstrated using the ‘UCI-485 Travel Review Ratings’ dataset from the UCI Machine Learning Repository. Of the datasets examined, this dataset has the largest number of data points and a clearly recognisable global structure in the UMAP visualisation. However, with regard to the selected features, the data points in the UMAP visualisation are not arranged according to their neighbourhoods. Accordingly, this dataset can be used to examine the applicability of the framework to larger datasets.


The example is intended solely to demonstrate the functions of the framework. The selection of the feature chosen to model a similarity measure to be integrated was not based on content-related arguments, but solely on visual criteria.

Details about the dataset can be found at https://archive.ics.uci.edu/dataset/485/tarvel+review+ratings.

The moddr package is required for execution (locally from https://github.com/kohaupt/modDR or as a PYPI package). Instructions for execution and installation can be found in the README.

## Imports

In [None]:
# may require additional installations of the packages pandas, scikit-learn, ucimlrepo
import pandas as pd
from sklearn.decomposition import PCA
from sklearn.preprocessing import MinMaxScaler
from ucimlrepo import fetch_ucirepo

import moddr

## Data exploration

The following steps show the initial visualisation of the dataset using the DR method (here, UMAP), with the modelled property represented by the colour of the data points via the first principal component of PCA. The selection of features has no content-related significance; it merely illustrates the difference in neighbourhood structure compared to the DR's positioning.

In [4]:
drug_reviews_druglib_com = fetch_ucirepo(id=485)
X = drug_reviews_druglib_com.data.features.copy()

# preprocess the "local services" column as it contains whitespace and non-numeric values
X["local services"] = X["local services"].str.replace(r"\s+", "", regex=True)
X["local services"] = X["local services"].astype("float")
X.dropna(inplace=True)

pd.set_option("display.max_columns", None)
X.describe()

Unnamed: 0,churches,resorts,beaches,parks,theatres,museums,malls,zoos,restaurants,pubs/bars,local services,burger/pizza shops,hotels/other lodgings,juice bars,art galleries,dance clubs,swimming pools,gyms,bakeries,beauty & spas,cafes,view points,monuments,gardens
count,5454.0,5454.0,5454.0,5454.0,5454.0,5454.0,5454.0,5454.0,5454.0,5454.0,5454.0,5454.0,5454.0,5454.0,5454.0,5454.0,5454.0,5454.0,5454.0,5454.0,5454.0,5454.0,5454.0,5454.0
mean,1.455746,2.320048,2.489059,2.797103,2.958904,2.893423,3.351476,2.541177,3.126542,2.832695,2.549622,2.078401,2.12582,2.190429,2.20614,1.19271,0.949349,0.822525,0.96925,0.999626,0.965275,1.749345,1.531051,1.56057
std,0.827732,1.421576,1.247503,1.309188,1.338785,1.282101,1.413291,1.111398,1.356774,1.307299,1.381498,1.249315,1.406682,1.576505,1.715848,1.107176,0.973628,0.948015,1.202883,1.193129,0.928326,1.597816,1.31618,1.171784
min,0.0,0.0,0.0,0.83,1.12,1.11,1.12,0.86,0.84,0.81,0.78,0.78,0.77,0.76,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,0.92,1.36,1.54,1.73,1.77,1.79,1.93,1.62,1.8,1.64,1.58,1.29,1.19,1.03,0.86,0.69,0.58,0.53,0.52,0.54,0.57,0.74,0.79,0.88
50%,1.34,1.91,2.06,2.46,2.67,2.68,3.23,2.17,2.8,2.68,2.0,1.69,1.61,1.49,1.33,0.8,0.74,0.69,0.69,0.69,0.76,1.03,1.07,1.29
75%,1.81,2.6875,2.74,4.0975,4.31,3.8375,5.0,3.19,5.0,3.5275,3.2175,2.2875,2.36,2.74,4.44,1.16,0.91,0.84,0.86,0.86,1.0,2.07,1.56,1.66
max,5.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0,5.0


In [5]:
# chosen features
sim_features = ["beaches", "cafes", "local services", "dance clubs"]

In [6]:
reference_embedding = moddr.processing.dimensionality_reduction_umap(X, n_neighbors=15)

In [7]:
# scale chosen feature values to avoid higher influence of certain features
scaler = MinMaxScaler()
X_sim_scaled = X[sim_features].copy()
for col in sim_features:
    X_sim_scaled[col] = scaler.fit_transform(X_sim_scaled[[col]])

sim_features_reduced = PCA(n_components=1).fit_transform(X_sim_scaled)
feat_labels = {i: sim_features_reduced[i] for i in range(len(sim_features_reduced))}
reference_embedding.labels = feat_labels
reference_embedding.obj_id = 0

## Modification via moddr-framework

The initial positioning is modified using the framework in the following way. To achieve this, the Fruchterman-Reingold (FR), Kamada-Kawai (KK) and MDS layout methods are applied in succession. Apart from the layout method, the same parameters are used for all test series (see below). Due to space constraints, outputs are omitted in the example notebooks and can be generated by local execution.

In [6]:
community_resolutions = [0.0001, 0.001, 0.005, 0.05]
n_neighbors = 15
dr_method = "UMAP"
graph_method = "DR"
boundary_neighbors = False
layout_params_fr = [10, 100, 500, 1000]
layout_params_kk_mds = [0.2, 0.4, 0.6, 0.8, 1.0]

### Modification via Fruchterman-Reingold

In [10]:
mod_embeddings_fr = moddr.processing.run_pipeline(
    data=X,
    sim_features=sim_features,
    dr_method=dr_method,
    dr_param_n_neighbors=n_neighbors,
    graph_method=graph_method,
    community_resolutions=community_resolutions,
    layout_method="FR",
    boundary_neighbors=boundary_neighbors,
    layout_params=layout_params_fr,
    compute_metrics=True,
    verbose=True,
)

------------------------------------------------------------
Start moddr pipeline with the following parameters:
Similarity Features: ['beaches', 'cafes', 'local services', 'dance clubs']
Dimensionality Reduction Method: UMAP with 15 neighbors
Graph Construction Method: DR
Community Detection Resolutions: [0.0001, 0.001, 0.005, 0.05]
Layout Method: FR
Boundary Neighbors: False
Layout Parameters: [10, 100, 500, 1000]
Compute Metrics: True

INFO: Inverting distances via 1 - distances, as normalization is applied.
------------------------------------------------------------
Computing communities via Leiden detection for embedding 0: `UMAP (n_neigbors: 15, min_dist: 1.0)' with resolution '0.0001'.
Computation finished after 0.42 seconds.
Found 14 communities.
------------------------------------------------------------
------------------------------------------------------------
Compute new positions for embedding: `UMAP (n_neigbors: 15, min_dist: 1.0), Leiden (resolution: 0.0001)'.
Start 

Detailed overview of all iterations

In [11]:
moddr.evaluation.create_report(mod_embeddings_fr, metadata=True, metrics=True)

Unnamed: 0,obj_id,dr_method,dr_params,k_neighbors,com_detection,com_detection_params,layout_method,layout_params,trustworthiness,continuity,rnx,sim_stress,sim_stress_com,sim_stress_com_diff,rank_score,distance_score,total_score
0,0,UMAP,"{'n_neighbors': 15, 'min_dist': 1.0, 'random_s...",15,,{},,{},0.971482,0.985217,0.576057,0.565257,0.565257,0.0,0.844252,0.467371,0.655812
1,1,UMAP,"{'n_neighbors': 15, 'min_dist': 1.0, 'random_s...",15,Leiden,{'resolution': 0.0001},FR,"{'iterations': 10, 'boundary_neighbors': False}",0.961251,0.983067,0.525978,0.592935,0.61231,-0.033002,0.823432,0.461783,0.642608
2,2,UMAP,"{'n_neighbors': 15, 'min_dist': 1.0, 'random_s...",15,Leiden,{'resolution': 0.0001},FR,"{'iterations': 100, 'boundary_neighbors': False}",0.978842,0.983474,0.582114,0.594252,0.602322,-0.042989,0.848143,0.463622,0.655882
3,3,UMAP,"{'n_neighbors': 15, 'min_dist': 1.0, 'random_s...",15,Leiden,{'resolution': 0.0001},FR,"{'iterations': 500, 'boundary_neighbors': False}",0.980541,0.983457,0.590389,0.594183,0.6049,-0.040411,0.851462,0.463012,0.657237
4,4,UMAP,"{'n_neighbors': 15, 'min_dist': 1.0, 'random_s...",15,Leiden,{'resolution': 0.0001},FR,"{'iterations': 1000, 'boundary_neighbors': False}",0.980563,0.983469,0.591222,0.594169,0.604891,-0.04042,0.851752,0.46302,0.657386
5,5,UMAP,"{'n_neighbors': 15, 'min_dist': 1.0, 'random_s...",15,Leiden,{'resolution': 0.001},FR,"{'iterations': 10, 'boundary_neighbors': False}",0.988014,0.98291,0.622459,0.567121,0.609479,-0.021107,0.864461,0.471716,0.668089
6,6,UMAP,"{'n_neighbors': 15, 'min_dist': 1.0, 'random_s...",15,Leiden,{'resolution': 0.001},FR,"{'iterations': 100, 'boundary_neighbors': False}",0.992871,0.982239,0.677798,0.567328,0.617733,-0.012853,0.884303,0.469549,0.676926
7,7,UMAP,"{'n_neighbors': 15, 'min_dist': 1.0, 'random_s...",15,Leiden,{'resolution': 0.001},FR,"{'iterations': 500, 'boundary_neighbors': False}",0.993308,0.982185,0.681243,0.567376,0.616024,-0.014562,0.885579,0.469953,0.677766
8,8,UMAP,"{'n_neighbors': 15, 'min_dist': 1.0, 'random_s...",15,Leiden,{'resolution': 0.001},FR,"{'iterations': 1000, 'boundary_neighbors': False}",0.993417,0.982196,0.68193,0.567403,0.616344,-0.014242,0.885848,0.469859,0.677853
9,9,UMAP,"{'n_neighbors': 15, 'min_dist': 1.0, 'random_s...",15,Leiden,{'resolution': 0.005},FR,"{'iterations': 10, 'boundary_neighbors': False}",0.981187,0.983901,0.619247,0.564289,0.610432,-0.021214,0.861445,0.473159,0.667302


In [None]:
metrics_report_fr = moddr.evaluation.create_report(
    mod_embeddings_fr, metadata=False, metrics=True
)
metrics_plot_fr = moddr.visualization.plot_metrics_report(metrics_report_fr)

In [None]:
embedding_graphs_fr = moddr.visualization.display_embeddings(
    mod_embeddings_fr, figsize_columns=3, show_edges=False
)

### Modification via Kamada-Kawai

In [8]:
mod_embeddings_kk = moddr.processing.run_pipeline(
    data=X,
    sim_features=sim_features,
    dr_method=dr_method,
    dr_param_n_neighbors=n_neighbors,
    graph_method=graph_method,
    community_resolutions=community_resolutions,
    layout_method="KK",
    boundary_neighbors=boundary_neighbors,
    layout_params=layout_params_kk_mds,
    compute_metrics=True,
    verbose=True,
)

------------------------------------------------------------
Start moddr pipeline with the following parameters:
Similarity Features: ['beaches', 'cafes', 'local services', 'dance clubs']
Dimensionality Reduction Method: UMAP with 15 neighbors
Graph Construction Method: DR
Community Detection Resolutions: [0.0001, 0.001, 0.005, 0.05]
Layout Method: KK
Boundary Neighbors: False
Layout Parameters: [0.2, 0.4, 0.6, 0.8, 1.0]
Compute Metrics: True

------------------------------------------------------------
Computing communities via Leiden detection for embedding 0: `UMAP (n_neigbors: 15, min_dist: 1.0)' with resolution '0.0001'.
Computation finished after 0.35 seconds.
Found 15 communities.
------------------------------------------------------------
------------------------------------------------------------
Compute new positions for embedding: `UMAP (n_neigbors: 15, min_dist: 1.0), Leiden (resolution: 0.0001)'.
Start computation with Kamada Kawai-algorithm.
Computation of new positions

Detailed overview of all iterations

In [9]:
moddr.evaluation.create_report(mod_embeddings_kk, metadata=True, metrics=True)

Unnamed: 0,obj_id,dr_method,dr_params,k_neighbors,com_detection,com_detection_params,layout_method,layout_params,trustworthiness,continuity,rnx,sim_stress,sim_stress_com,sim_stress_com_diff,rank_score,distance_score,total_score
0,0,UMAP,"{'n_neighbors': 15, 'min_dist': 1.0, 'random_s...",15,,{},,{},0.971482,0.985217,0.576057,0.565257,0.565257,0.0,0.844252,0.467371,0.655812
1,1,UMAP,"{'n_neighbors': 15, 'min_dist': 1.0, 'random_s...",15,Leiden,{'resolution': 0.0001},KK,"{'balance_factor': 0.2, 'boundary_neighbors': ...",0.965205,0.98533,0.564031,0.567062,0.643234,-0.0136,0.838189,0.469869,0.654029
2,2,UMAP,"{'n_neighbors': 15, 'min_dist': 1.0, 'random_s...",15,Leiden,{'resolution': 0.0001},KK,"{'balance_factor': 0.4, 'boundary_neighbors': ...",0.956184,0.985353,0.550778,0.569339,0.634292,-0.022541,0.830772,0.470966,0.650869
3,3,UMAP,"{'n_neighbors': 15, 'min_dist': 1.0, 'random_s...",15,Leiden,{'resolution': 0.0001},KK,"{'balance_factor': 0.6, 'boundary_neighbors': ...",0.949486,0.985301,0.532794,0.571884,0.626469,-0.030364,0.822527,0.471649,0.647088
4,4,UMAP,"{'n_neighbors': 15, 'min_dist': 1.0, 'random_s...",15,Leiden,{'resolution': 0.0001},KK,"{'balance_factor': 0.8, 'boundary_neighbors': ...",0.946028,0.985353,0.523084,0.574441,0.620099,-0.036734,0.818155,0.471963,0.645059
5,5,UMAP,"{'n_neighbors': 15, 'min_dist': 1.0, 'random_s...",15,Leiden,{'resolution': 0.0001},KK,"{'balance_factor': 1.0, 'boundary_neighbors': ...",0.945166,0.985208,0.516906,0.577092,0.61802,-0.038813,0.81576,0.471157,0.643459
6,6,UMAP,"{'n_neighbors': 15, 'min_dist': 1.0, 'random_s...",15,Leiden,{'resolution': 0.001},KK,"{'balance_factor': 0.2, 'boundary_neighbors': ...",0.942298,0.984843,0.497278,0.565274,0.620572,-0.014671,0.80814,0.471031,0.639585
7,7,UMAP,"{'n_neighbors': 15, 'min_dist': 1.0, 'random_s...",15,Leiden,{'resolution': 0.001},KK,"{'balance_factor': 0.4, 'boundary_neighbors': ...",0.908175,0.983573,0.395047,0.565416,0.612863,-0.022379,0.762265,0.472887,0.617576
8,8,UMAP,"{'n_neighbors': 15, 'min_dist': 1.0, 'random_s...",15,Leiden,{'resolution': 0.001},KK,"{'balance_factor': 0.6, 'boundary_neighbors': ...",0.880614,0.981782,0.326505,0.565654,0.608573,-0.02667,0.729634,0.473841,0.601737
9,9,UMAP,"{'n_neighbors': 15, 'min_dist': 1.0, 'random_s...",15,Leiden,{'resolution': 0.001},KK,"{'balance_factor': 0.8, 'boundary_neighbors': ...",0.856996,0.979453,0.277578,0.566047,0.607199,-0.028044,0.704675,0.473988,0.589331


In [None]:
metrics_report_kk = moddr.evaluation.create_report(
    mod_embeddings_kk, metadata=False, metrics=True
)
metrics_plot_kk = moddr.visualization.plot_metrics_report(metrics_report_kk)

In [None]:
embedding_graphs_kk = moddr.visualization.display_embeddings(
    mod_embeddings_kk, figsize_columns=3, show_edges=False
)

### Modification via MDS

In [7]:
mod_embeddings_mds = moddr.processing.run_pipeline(
    data=X,
    sim_features=sim_features,
    dr_method=dr_method,
    dr_param_n_neighbors=n_neighbors,
    graph_method=graph_method,
    community_resolutions=community_resolutions,
    layout_method="MDS",
    boundary_neighbors=boundary_neighbors,
    layout_params=layout_params_kk_mds,
    compute_metrics=True,
    verbose=True,
)

------------------------------------------------------------
Start moddr pipeline with the following parameters:
Similarity Features: ['beaches', 'cafes', 'local services', 'dance clubs']
Dimensionality Reduction Method: UMAP with 15 neighbors
Graph Construction Method: DR
Community Detection Resolutions: [0.0001, 0.001, 0.005, 0.05]
Layout Method: MDS
Boundary Neighbors: False
Layout Parameters: [0.2, 0.4, 0.6, 0.8, 1.0]
Compute Metrics: True

------------------------------------------------------------
Computing communities via Leiden detection for embedding 0: `UMAP (n_neigbors: 15, min_dist: 1.0)' with resolution '0.0001'.
Computation finished after 0.33 seconds.
Found 14 communities.
------------------------------------------------------------
------------------------------------------------------------
Compute new positions for embedding: `UMAP (n_neigbors: 15, min_dist: 1.0), Leiden (resolution: 0.0001)'.
Start computation with MDS-algorithm.
Computation of new positions finishe

Detailed overview of all iterations

In [8]:
moddr.evaluation.create_report(mod_embeddings_mds, metadata=True, metrics=True)

Unnamed: 0,obj_id,dr_method,dr_params,k_neighbors,com_detection,com_detection_params,layout_method,layout_params,trustworthiness,continuity,rnx,sim_stress,sim_stress_com,sim_stress_com_diff,rank_score,distance_score,total_score
0,0,UMAP,"{'n_neighbors': 15, 'min_dist': 1.0, 'random_s...",15,,{},,{},0.971482,0.985217,0.576057,0.565257,0.565257,0.0,0.844252,0.467371,0.655812
1,1,UMAP,"{'n_neighbors': 15, 'min_dist': 1.0, 'random_s...",15,Leiden,{'resolution': 0.0001},MDS,"{'balance_factor': 0.2, 'boundary_neighbors': ...",0.931012,0.982475,0.47005,0.552547,0.557607,-0.102168,0.794513,0.499269,0.646891
2,2,UMAP,"{'n_neighbors': 15, 'min_dist': 1.0, 'random_s...",15,Leiden,{'resolution': 0.0001},MDS,"{'balance_factor': 0.4, 'boundary_neighbors': ...",0.916312,0.975436,0.413632,0.533582,0.423477,-0.236298,0.76846,0.542284,0.655372
3,3,UMAP,"{'n_neighbors': 15, 'min_dist': 1.0, 'random_s...",15,Leiden,{'resolution': 0.0001},MDS,"{'balance_factor': 0.6, 'boundary_neighbors': ...",0.912169,0.966877,0.386355,0.510853,0.291889,-0.367886,0.755134,0.586545,0.670839
4,4,UMAP,"{'n_neighbors': 15, 'min_dist': 1.0, 'random_s...",15,Leiden,{'resolution': 0.0001},MDS,"{'balance_factor': 0.8, 'boundary_neighbors': ...",0.911488,0.957308,0.346144,0.488635,0.181201,-0.478575,0.738313,0.625326,0.68182
5,5,UMAP,"{'n_neighbors': 15, 'min_dist': 1.0, 'random_s...",15,Leiden,{'resolution': 0.0001},MDS,"{'balance_factor': 1.0, 'boundary_neighbors': ...",0.90882,0.947676,0.272367,0.470006,0.098001,-0.561774,0.709621,0.655441,0.682531
6,6,UMAP,"{'n_neighbors': 15, 'min_dist': 1.0, 'random_s...",15,Leiden,{'resolution': 0.001},MDS,"{'balance_factor': 0.2, 'boundary_neighbors': ...",0.937816,0.982465,0.481022,0.557502,0.460899,-0.181492,0.800434,0.516622,0.658528
7,7,UMAP,"{'n_neighbors': 15, 'min_dist': 1.0, 'random_s...",15,Leiden,{'resolution': 0.001},MDS,"{'balance_factor': 0.4, 'boundary_neighbors': ...",0.910282,0.975146,0.417678,0.547973,0.308712,-0.333678,0.767702,0.559433,0.663567
8,8,UMAP,"{'n_neighbors': 15, 'min_dist': 1.0, 'random_s...",15,Leiden,{'resolution': 0.001},MDS,"{'balance_factor': 0.6, 'boundary_neighbors': ...",0.892092,0.965142,0.380863,0.537807,0.200028,-0.442362,0.746032,0.591687,0.66886
9,9,UMAP,"{'n_neighbors': 15, 'min_dist': 1.0, 'random_s...",15,Leiden,{'resolution': 0.001},MDS,"{'balance_factor': 0.8, 'boundary_neighbors': ...",0.888197,0.954681,0.353598,0.527845,0.1268,-0.51559,0.732158,0.614975,0.673567


In [None]:
metrics_report_mds = moddr.evaluation.create_report(
    mod_embeddings_mds, metadata=False, metrics=True
)
metrics_plot_mds = moddr.visualization.plot_metrics_report(metrics_report_mds)

In [None]:
embedding_graphs_mds = moddr.visualization.display_embeddings(
    mod_embeddings_mds, figsize_columns=3, show_edges=False
)