# Similar Regions examples



## Comments
Reported distance values are in a multi-dimentional space. They should only be used for relative region comparison and not be interpreted in any other way. In particular, no linearity of the underlying space should be assumed. For example, a pair of regions at double the distance compared to some other pair should not be interpreted as "twice less similar" in any meaningful way

Changing 'compare_to' argument of similar_to method will trigger rebuilding of the SimilarRegion object which can be expensive.
It is therefore better to complete all queries for given search region before changing it.

If seed region is outside search region (see Ex.2,3), most recent data for the seed region will always be downloaded from Gro database even if the seed region is part of local cache. If this most recent data is different from the cache, it can lead to some differences for such runs compared to those using only cached info (for example when using local cache and searching over the entire world)

Some region names contain commas, so be careful when writing results to .csv files. Consider using other field separators, see Ex.4

## Initial Imports

In [1]:
import pandas as pd
import numpy as np
import logging

# SimilarRegion class definition
from api.client.samples.similar_regions_Frechet.sr import SimilarRegion

# Detailed API info (such as metric_id, item_id, etc.) on properties which can be used in a metric
# Currently contains land surface temperature, rainfall, and soil moisture time series
# as well as seven soil properties
from api.client.samples.similar_regions_Frechet.metric import metric_properties

# Allow nested IO loops (required for jupyter notebooks to work with batch API, not needed in stand-alone python scripts)
import nest_asyncio
nest_asyncio.apply()

## Initialization

In [2]:
# Metric to actually use for this run. Property names should exactly match those used in metric_description
# Here soil properties are down-weighted so that
# the seven of them together have the same weight as one of the time series
metric = {
    'soil_moisture':1.0,
    'rainfall':1.0,
    'land_surface_temperature':1.0,
    'cation_exchange_30cm':1.0/7,
    'ph_h2o_30cm':1.0/7,
    'sand_30cm':1.0/7,
    'silt_30cm':1.0/7,
    'clay_30cm':1.0/7,
    'organic_carbon_content_fine_earth_30cm':1.0/7,
    'soil_water_capacity_100cm':1.0/7
}

# Configure SR object (no real work will be done yet - it will start on the first call to similar_to)
sim = SimilarRegion(metric_properties,
                    update_mode="no",
                    metric_instance=metric)

# To configure for initial download with default parameters (or to reconstruct cache from scratch)
# use the following constructor for SR object instead
#sim = SimilarRegion(metric_properties,update_mode="reload")

# Uncomment this to see extensive informational messages during object build/region search
# (recommended for non-expert users, especially for initial download)
sim._logger.setLevel(logging.INFO)

# Examples

**Example 1**: Find 10 districts across the world most similar to Napa county, California, USA. (region_id 136969). Note that many of them will likely be other California districts

In [3]:
%%time
# First call to similar_to method will trigger data loading from either local cache or Gro database
# depending on SimilarRegion configuration requested during its construction above.
# If local cache is used this should take under 5 minutes
# *****CAUTION******: First download from Gro database for a large region (default is entire world!) will take many ours.

print("Similar to Napa:")
for i in sim.similar_to(136969, number_of_regions=10, requested_level=5):
    print(i)

*********(Re)building similar region object with *******
	region 0
	update_mode no
	data_dir sr_cache
	t_int_per_year 52

Reading region info from cache sr_cache/region_info


Similar to Napa:


Search region is present in cache of size 46184
Trimmed region info to 46184 items
Reading property cation_exchange_30cm from cache sr_cache/cation_exchange_30cm_52.npz ...
Merging in cached info ...
done
Property cation_exchange_30cm is missing 125 regions out of desired 46184 and no updates requested
Reading property clay_30cm from cache sr_cache/clay_30cm_52.npz ...
Merging in cached info ...
done
Property clay_30cm is missing 125 regions out of desired 46184 and no updates requested
Reading property land_surface_temperature from cache sr_cache/land_surface_temperature_52.npz ...
Merging in cached info ...
done
Property land_surface_temperature is missing 200 regions out of desired 46184 and no updates requested
Reading property organic_carbon_content_fine_earth_30cm from cache sr_cache/organic_carbon_content_fine_earth_30cm_52.npz ...
Merging in cached info ...
done
Property organic_carbon_content_fine_earth_30cm is missing 125 regions out of desired 46184 and no updates requested


{'#': 0, 'id': 136969, 'name': 'Napa', 'dist': 4.712160915387242e-08, 'parent': (13055, 'California', 1215, 'United States')}
{'#': 1, 'id': 136984, 'name': 'Santa Clara', 'dist': 0.3627891454950808, 'parent': (13055, 'California', 1215, 'United States')}
{'#': 2, 'id': 136990, 'name': 'Sonoma', 'dist': 0.3851102577241669, 'parent': (13055, 'California', 1215, 'United States')}
{'#': 3, 'id': 136958, 'name': 'Lake', 'dist': 0.4473442349369505, 'parent': (13055, 'California', 1215, 'United States')}
{'#': 4, 'id': 136942, 'name': 'Alameda', 'dist': 0.46884362854509165, 'parent': (13055, 'California', 1215, 'United States')}
{'#': 5, 'id': 136968, 'name': 'Monterey', 'dist': 0.518806365122292, 'parent': (13055, 'California', 1215, 'United States')}
{'#': 6, 'id': 135463, 'name': 'Bahçe', 'dist': 0.5701445282889737, 'parent': (12890, 'Osmaniye', 1205, 'Turkey')}
{'#': 7, 'id': 136946, 'name': 'Calaveras', 'dist': 0.5773979168285837, 'parent': (13055, 'California', 1215, 'United States')}


**Example 2**: Same as Ex.1 but restricted to Ethiopia (region_id 1065). This will trigger rebuilding of SimilrRegion object and, since seed region is outside search region, a download of seed region data directly from Gro

In [4]:
%%time
# Use compare_to argument to restrict search to districts within particular country

print("Similar to Napa in Ethiopia:")
for i in sim.similar_to(136969, number_of_regions=10, requested_level=5, compare_to=1065):
    print(i)

*********(Re)building similar region object with *******
	region 1065
	update_mode no
	data_dir sr_cache
	t_int_per_year 52

Reading region info from cache sr_cache/region_info
Search region is present in cache of size 46184
Trimmed region info to 88 items
Reading property cation_exchange_30cm from cache sr_cache/cation_exchange_30cm_52.npz ...


Similar to Napa in Ethiopia:


Merging in cached info ...
done
Property cation_exchange_30cm is missing 0 regions out of desired 88 and no updates requested
Reading property clay_30cm from cache sr_cache/clay_30cm_52.npz ...
Merging in cached info ...
done
Property clay_30cm is missing 0 regions out of desired 88 and no updates requested
Reading property land_surface_temperature from cache sr_cache/land_surface_temperature_52.npz ...
Merging in cached info ...
done
Property land_surface_temperature is missing 0 regions out of desired 88 and no updates requested
Reading property organic_carbon_content_fine_earth_30cm from cache sr_cache/organic_carbon_content_fine_earth_30cm_52.npz ...
Merging in cached info ...
done
Property organic_carbon_content_fine_earth_30cm is missing 0 regions out of desired 88 and no updates requested
Reading property ph_h2o_30cm from cache sr_cache/ph_h2o_30cm_52.npz ...
Merging in cached info ...
done
Property ph_h2o_30cm is missing 0 regions out of desired 88 and no updates requested
Read

{'#': 0, 'id': 142811, 'name': 'Guji', 'dist': 1.2112411947814918, 'parent': (10925, 'Oromia', 1065, 'Ethiopia')}
{'#': 1, 'id': 115015, 'name': 'Eastern', 'dist': 1.3901961856105938, 'parent': (10928, 'Tigray', 1065, 'Ethiopia')}
{'#': 2, 'id': 114979, 'name': 'Borena', 'dist': 1.465905074173043, 'parent': (10925, 'Oromia', 1065, 'Ethiopia')}
{'#': 3, 'id': 114991, 'name': 'Liben', 'dist': 1.5078767844235907, 'parent': (10926, 'Somali', 1065, 'Ethiopia')}
{'#': 4, 'id': 114978, 'name': 'Bale', 'dist': 1.5277636057283164, 'parent': (10925, 'Oromia', 1065, 'Ethiopia')}
{'#': 5, 'id': 142822, 'name': 'Alaba', 'dist': 1.532333338470649, 'parent': (10927, 'Southern Nations, Nationalities and Peoples', 1065, 'Ethiopia')}
{'#': 6, 'id': 142824, 'name': 'South East', 'dist': 1.5772465179793431, 'parent': (10928, 'Tigray', 1065, 'Ethiopia')}
{'#': 7, 'id': 142823, 'name': "Segen Peoples'", 'dist': 1.6218393021873385, 'parent': (10927, 'Southern Nations, Nationalities and Peoples', 1065, 'Ethio

**Example 3**: Find 10 provinces in Europe (region_id 14) most similar to US state of Iowa (region_id 13066), provide detailed distance report

In [5]:
print("Similar to Iowa in Europe:")
for i in sim.similar_to(13066, number_of_regions=10, requested_level=4,detailed_distance=True, compare_to=14):
    print(i)

*********(Re)building similar region object with *******
	region 14
	update_mode no
	data_dir sr_cache
	t_int_per_year 52

Reading region info from cache sr_cache/region_info
Search region is present in cache of size 46184
Trimmed region info to 9318 items


Similar to Iowa in Europe:


Reading property cation_exchange_30cm from cache sr_cache/cation_exchange_30cm_52.npz ...
Merging in cached info ...
done
Property cation_exchange_30cm is missing 14 regions out of desired 9318 and no updates requested
Reading property clay_30cm from cache sr_cache/clay_30cm_52.npz ...
Merging in cached info ...
done
Property clay_30cm is missing 14 regions out of desired 9318 and no updates requested
Reading property land_surface_temperature from cache sr_cache/land_surface_temperature_52.npz ...
Merging in cached info ...
done
Property land_surface_temperature is missing 12 regions out of desired 9318 and no updates requested
Reading property organic_carbon_content_fine_earth_30cm from cache sr_cache/organic_carbon_content_fine_earth_30cm_52.npz ...
Merging in cached info ...
done
Property organic_carbon_content_fine_earth_30cm is missing 14 regions out of desired 9318 and no updates requested
Reading property ph_h2o_30cm from cache sr_cache/ph_h2o_30cm_52.npz ...
Merging in cached i

{'#': 0, 'id': 11714, 'name': 'Edinet', 'dist': {'total': 0.7624865571290543, 'covar': 0.43147480179914516, 'cation_exchange_30cm': 0.029827030581420555, 'clay_30cm': 0.05072413701545009, 'land_surface_temperature': 0.0017538538565327055, 'organic_carbon_content_fine_earth_30cm': 0.10505897239232384, 'ph_h2o_30cm': 0.2234878843299568, 'rainfall': 0.3925933490497422, 'sand_30cm': 0.0864973996480457, 'silt_30cm': 0.14938558114455347, 'soil_moisture': 0.38240262646327383, 'soil_water_capacity_100cm': 0.024621105093112217}, 'parent': (1134, 'Moldova', '', '')}
{'#': 1, 'id': 12277, 'name': 'Botosani', 'dist': {'total': 0.7716150323099088, 'covar': 0.40986169376647474, 'cation_exchange_30cm': 0.04328607750164215, 'clay_30cm': 0.04588192848041994, 'land_surface_temperature': 0.05866884178687681, 'organic_carbon_content_fine_earth_30cm': 0.0730783746412905, 'ph_h2o_30cm': 0.19615102059413836, 'rainfall': 0.36299699167162136, 'sand_30cm': 0.15194932548426976, 'silt_30cm': 0.22762773245179413, 

**Example 4**: Larger number of provinces similar to Iowa (across entire world) printed with | separator (note that region names can have commas in them). Note that many of provinces from Ex.3 are near the top of the list

In [6]:
for i in sim.similar_to(13066, number_of_regions=200, compare_to=0, requested_level=4, detailed_distance=False):
    lv = list(i.values())
    print("|".join([str(v) for v in lv[:-1]]),"|","|".join([str(v) for v in lv[-1]]))

*********(Re)building similar region object with *******
	region 0
	update_mode no
	data_dir sr_cache
	t_int_per_year 52

Reading region info from cache sr_cache/region_info
Search region is present in cache of size 46184
Trimmed region info to 46184 items
Reading property cation_exchange_30cm from cache sr_cache/cation_exchange_30cm_52.npz ...
Merging in cached info ...
done
Property cation_exchange_30cm is missing 125 regions out of desired 46184 and no updates requested
Reading property clay_30cm from cache sr_cache/clay_30cm_52.npz ...
Merging in cached info ...
done
Property clay_30cm is missing 125 regions out of desired 46184 and no updates requested
Reading property land_surface_temperature from cache sr_cache/land_surface_temperature_52.npz ...
Merging in cached info ...
done
Property land_surface_temperature is missing 200 regions out of desired 46184 and no updates requested
Reading property organic_carbon_content_fine_earth_30cm from cache sr_cache/organic_carbon_content_fi

0|13066|Iowa|0.0 | 1215|United States|0|World
1|12284|Cluj|0.7563396221025158 | 1167|Romania|0|World
2|11714|Edinet|0.7630836727953505 | 1134|Moldova|0|World
3|12277|Botosani|0.7665263284634406 | 1167|Romania|0|World
4|13064|Illinois|0.7666456128269242 | 1215|United States|0|World
5|12298|Mures|0.7801263187947612 | 1167|Romania|0|World
6|12295|Iasi|0.7874493813708706 | 1167|Romania|0|World
7|12301|Prahova|0.7926076096219729 | 1167|Romania|0|World
8|11703|Briceni|0.8173926669444727 | 1134|Moldova|0|World
9|11711|Donduseni|0.8229894445085254 | 1134|Moldova|0|World
10|11733|Ungheni|0.8232082927443589 | 1134|Moldova|0|World
11|11722|Nisporeni|0.8322950542898244 | 1134|Moldova|0|World
12|11716|Floresti|0.8360130360681447 | 1134|Moldova|0|World
13|12350|Krasnodar|0.8365697512629309 | 1168|Russia|0|World
14|12280|Bucharest|0.8373727360831215 | 1167|Romania|0|World
15|12287|Dâmbovita|0.8391799277419617 | 1167|Romania|0|World
16|11719|Hîncesti|0.839321397060479 | 1134|Moldova|0|World
17|11725|R