# Similar Regions examples



## Preliminaries

### Distance metric
Reported distance values are in a multi-dimentional space. They should only be used for relative region comparison and not be interpreted in any other way. In particular, no linearity of the underlying space should be assumed. For example, a pair of regions at double the distance compared to some other pair should not be interpreted as "twice less similar" in any meaningful way.

### Data volume
Whenever the similar_to method is called with a different compare_to argument, the SimilarRegion object is rebuilt, which can be time consuming if the data has not already been cached.  If compare_to=0 (whole world), and requested_level=5 (district), we need several daily data series for 20 years for each of about 45,000 regions, i.e. close to 1 billion data points.

Subsequent calls with the same compare_to take advantage of the cache and do not trigger data downloads from the Gro API. The same applies to the seed region, i.e. the region_id argument, if the region is not present in the cache, data will be downloaded from the Gro API. 

### Output formatting
Some region names contain commas, so when writing results to a csv file, you  should use a csv writer object.


## Initial Imports

SimilarRegion class definition and specification of what data to use. *metric_properties* specifies the Gro  entities (such as metric_id, item_id, etc.) which will used to get data from the Gro API..  The default metric_properties used here contain land surface temperature, rainfall, and soil moisture time series as well as seven soil properties.

In [1]:
import pandas as pd
import numpy as np
import logging

from api.client.samples.similar_regions.similar_region import SimilarRegion
from api.client.samples.similar_regions.metric import metric_properties, metric_weights

# Allow nested IO loops (required for jupyter notebooks to work with batch API, not needed in stand-alone python scripts)
import nest_asyncio

nest_asyncio.apply()

## Initialization

Configure SimilarRegion object (no real work will be done yet - it will start on the first call to similar_to.
This will use any previously downloaded data if any exists in data_dir. 

In [2]:
sim = SimilarRegion(metric_properties,
                    data_dir='/tmp/similar_regions_cache',
                    metric_weights=metric_weights)

# Uncomment this to see extensive informational messages during object build/region search
# sim._logger.setLevel(logging.DEBUG)

# Examples

Note that in each example, the size of the search area to compare to (a countyr, a continent or the whole world) to and the requested region level (district, province or country) significantly affect the amount of data that will be needed.  There are about 45,000 districts and about 5,000 provinces in the whole world.  

## Example 1

Find 10 districts (region level 5), in Oceania (region 13) most similar to Napa county, California, USA (region_id 136969).

In [3]:
%%time

print("Districts similar to Napa in Oceania:")
for i in sim.similar_to(136969, number_of_regions=10, requested_level=5, compare_to=13):
    print(i)

Districts similar to Napa in Oceania:


Getting data series for 12 regions in 1 batch(es) for property cation_exchange_30cm
100%|██████████| 12/12 [00:00<00:00, 31.74it/s]
Getting data series for 12 regions in 1 batch(es) for property clay_30cm
100%|██████████| 12/12 [00:00<00:00, 36.11it/s]
Getting data series for 25 regions in 1 batch(es) for property land_surface_temperature
100%|██████████| 25/25 [00:00<00:00, 125.34it/s]
Getting data series for 12 regions in 1 batch(es) for property organic_carbon_content_fine_earth_30cm
100%|██████████| 12/12 [00:00<00:00, 29.65it/s]
Getting data series for 12 regions in 1 batch(es) for property ph_h2o_30cm
100%|██████████| 12/12 [00:00<00:00, 34.70it/s]
Getting data series for 2 regions in 1 batch(es) for property rainfall
100%|██████████| 2/2 [00:00<00:00, 11.72it/s]
Getting data series for 12 regions in 1 batch(es) for property sand_30cm
100%|██████████| 12/12 [00:00<00:00, 39.07it/s]
Getting data series for 12 regions in 1 batch(es) for property silt_30cm
100%|██████████| 12/12 [00

{'rank': 0, 'id': 102852, 'name': 'Upper Hunter Shire', 'dist': 1.0173006028514207, 'parent': {'id': 10174, 'name': 'New South Wales'}, 'grand_parent': {'id': 1013, 'name': 'Australia'}}
{'rank': 1, 'id': 100022568, 'name': 'Wangaratta', 'dist': 1.0385392829218, 'parent': {'id': 10180, 'name': 'Victoria'}, 'grand_parent': {'id': 1013, 'name': 'Australia'}}
{'rank': 2, 'id': 102739, 'name': 'Dungog', 'dist': 1.0801804882453856, 'parent': {'id': 10174, 'name': 'New South Wales'}, 'grand_parent': {'id': 1013, 'name': 'Australia'}}
{'rank': 3, 'id': 100022537, 'name': 'Indigo', 'dist': 1.0809641142826236, 'parent': {'id': 10180, 'name': 'Victoria'}, 'grand_parent': {'id': 1013, 'name': 'Australia'}}
{'rank': 4, 'id': 100022487, 'name': 'Mid-Western Regional', 'dist': 1.1117052041108755, 'parent': {'id': 10174, 'name': 'New South Wales'}, 'grand_parent': {'id': 1013, 'name': 'Australia'}}
{'rank': 5, 'id': 100022557, 'name': 'Murrindindi', 'dist': 1.1135118505171602, 'parent': {'id': 10180,

## Example 2

Same as Ex.1 but restricted to Ethiopia (region_id 1065). This will trigger rebuilding of SimilrRegion object, since we are using a new search region.

In [4]:
%%time
# Use compare_to argument to restrict search to districts within particular country

print("Districts similar to Napa in Ethiopia:")
for i in sim.similar_to(136969, number_of_regions=10, requested_level=5, compare_to=1065):
    print(i)

Districts similar to Napa in Ethiopia:


Getting data series for 1 regions in 1 batch(es) for property cation_exchange_30cm
100%|██████████| 1/1 [00:00<00:00,  6.06it/s]
Getting data series for 1 regions in 1 batch(es) for property clay_30cm
100%|██████████| 1/1 [00:00<00:00,  6.50it/s]
Getting data series for 1 regions in 1 batch(es) for property land_surface_temperature
100%|██████████| 1/1 [00:00<00:00,  2.00it/s]
Getting data series for 1 regions in 1 batch(es) for property organic_carbon_content_fine_earth_30cm
100%|██████████| 1/1 [00:00<00:00,  6.46it/s]
Getting data series for 1 regions in 1 batch(es) for property ph_h2o_30cm
100%|██████████| 1/1 [00:00<00:00,  6.69it/s]
Getting data series for 1 regions in 1 batch(es) for property rainfall
100%|██████████| 1/1 [00:00<00:00,  1.88it/s]
Getting data series for 1 regions in 1 batch(es) for property sand_30cm
100%|██████████| 1/1 [00:00<00:00,  6.48it/s]
Getting data series for 1 regions in 1 batch(es) for property silt_30cm
100%|██████████| 1/1 [00:00<00:00,  6.76it/s]


{'rank': 0, 'id': 142811, 'name': 'Guji', 'dist': 1.2337430873605335, 'parent': {'id': 10925, 'name': 'Oromia'}, 'grand_parent': {'id': 1065, 'name': 'Ethiopia'}}
{'rank': 1, 'id': 115015, 'name': 'Eastern', 'dist': 1.396813048061249, 'parent': {'id': 10928, 'name': 'Tigray'}, 'grand_parent': {'id': 1065, 'name': 'Ethiopia'}}
{'rank': 2, 'id': 114979, 'name': 'Borena', 'dist': 1.4942763264739025, 'parent': {'id': 10925, 'name': 'Oromia'}, 'grand_parent': {'id': 1065, 'name': 'Ethiopia'}}
{'rank': 3, 'id': 114978, 'name': 'Bale', 'dist': 1.54995939691503, 'parent': {'id': 10925, 'name': 'Oromia'}, 'grand_parent': {'id': 1065, 'name': 'Ethiopia'}}
{'rank': 4, 'id': 114991, 'name': 'Liben', 'dist': 1.5602031219727102, 'parent': {'id': 10926, 'name': 'Somali'}, 'grand_parent': {'id': 1065, 'name': 'Ethiopia'}}
{'rank': 5, 'id': 142822, 'name': 'Alaba', 'dist': 1.5613585615460281, 'parent': {'id': 10927, 'name': 'Southern Nations, Nationalities and Peoples'}, 'grand_parent': {'id': 1065, 'n

## Example 3

Find 10 provinces (region level 4) in Europe (region_id 14) most similar to the US state of Iowa (region_id 13066), provide detailed distance report

In [5]:
print("Provinces similar to Iowa in Europe:")
for i in sim.similar_to(13066, number_of_regions=10, requested_level=4, detailed_distance=True, compare_to=14):
    print(i)

Provinces similar to Iowa in Europe:


Getting data series for 1 regions in 1 batch(es) for property cation_exchange_30cm
100%|██████████| 1/1 [00:00<00:00,  5.77it/s]
Getting data series for 1 regions in 1 batch(es) for property clay_30cm
100%|██████████| 1/1 [00:00<00:00,  6.00it/s]
Getting data series for 1 regions in 1 batch(es) for property land_surface_temperature
100%|██████████| 1/1 [00:00<00:00,  5.61it/s]
Getting data series for 1 regions in 1 batch(es) for property organic_carbon_content_fine_earth_30cm
100%|██████████| 1/1 [00:00<00:00,  6.73it/s]
Getting data series for 1 regions in 1 batch(es) for property ph_h2o_30cm
100%|██████████| 1/1 [00:00<00:00,  6.26it/s]
Getting data series for 1 regions in 1 batch(es) for property rainfall
100%|██████████| 1/1 [00:00<00:00,  5.10it/s]
Getting data series for 1 regions in 1 batch(es) for property sand_30cm
100%|██████████| 1/1 [00:00<00:00,  5.47it/s]
Getting data series for 1 regions in 1 batch(es) for property silt_30cm
100%|██████████| 1/1 [00:00<00:00,  5.69it/s]


{'rank': 0, 'id': 12284, 'name': 'Cluj', 'dist': {'total': 0.7569942835175087, 'covar': 0.43339237683334836, 'cation_exchange_30cm': 0.25067659587868785, 'clay_30cm': 0.041566176229737684, 'land_surface_temperature': 0.11604313658278209, 'organic_carbon_content_fine_earth_30cm': 0.14872593597649922, 'ph_h2o_30cm': 0.01464601966557133, 'rainfall': 0.2128178623321425, 'sand_30cm': 0.30008835181434, 'silt_30cm': 0.3360045979194928, 'soil_moisture': 0.1405871378131267, 'soil_water_capacity_100cm': 0.12975621841492435}, 'parent': {'id': 1167, 'name': 'Romania'}, 'grand_parent': {'id': 0, 'name': 'World'}}
{'rank': 1, 'id': 11714, 'name': 'Edinet', 'dist': {'total': 0.7639846487721322, 'covar': 0.428691574350047, 'cation_exchange_30cm': 0.029827030581420555, 'clay_30cm': 0.05072413701545009, 'land_surface_temperature': 0.06843908553742839, 'organic_carbon_content_fine_earth_30cm': 0.10505897239232384, 'ph_h2o_30cm': 0.2234878843299568, 'rainfall': 0.3925933490497422, 'sand_30cm': 0.086497399

## Example 4
Larger number of provinces similar to Iowa (across entire world). Note that many of provinces from Example 3 are near the top of the list.

In [6]:
import csv
import sys

writer = csv.writer(sys.stdout, delimiter='\t')

for i in sim.similar_to(13066, number_of_regions=200, compare_to=0, requested_level=4, detailed_distance=False):
    writer.writerow(i.values())

Getting data series for 16 regions in 1 batch(es) for property cation_exchange_30cm
100%|██████████| 16/16 [00:00<00:00, 73.31it/s]
Getting data series for 16 regions in 1 batch(es) for property clay_30cm
100%|██████████| 16/16 [00:00<00:00, 68.05it/s]
Getting data series for 31 regions in 1 batch(es) for property land_surface_temperature
100%|██████████| 31/31 [00:00<00:00, 48.43it/s]
Getting data series for 16 regions in 1 batch(es) for property organic_carbon_content_fine_earth_30cm
100%|██████████| 16/16 [00:00<00:00, 72.43it/s]
Getting data series for 16 regions in 1 batch(es) for property ph_h2o_30cm
100%|██████████| 16/16 [00:00<00:00, 72.19it/s]
Getting data series for 2 regions in 1 batch(es) for property rainfall
100%|██████████| 2/2 [00:00<00:00, 13.02it/s]
Getting data series for 16 regions in 1 batch(es) for property sand_30cm
100%|██████████| 16/16 [00:00<00:00, 70.87it/s]
Getting data series for 16 regions in 1 batch(es) for property silt_30cm
100%|██████████| 16/16 [00:

0	13066	Iowa	7.146345858741878e-08	{'id': 1215, 'name': 'United States'}	{'id': 0, 'name': 'World'}
1	12284	Cluj	0.7569978480749663	{'id': 1167, 'name': 'Romania'}	{'id': 0, 'name': 'World'}
2	11714	Edinet	0.7639849432992937	{'id': 1134, 'name': 'Moldova'}	{'id': 0, 'name': 'World'}
3	13064	Illinois	0.7655045070880194	{'id': 1215, 'name': 'United States'}	{'id': 0, 'name': 'World'}
4	12277	Botosani	0.7694642157119207	{'id': 1167, 'name': 'Romania'}	{'id': 0, 'name': 'World'}
5	12298	Mures	0.7830550120462715	{'id': 1167, 'name': 'Romania'}	{'id': 0, 'name': 'World'}
6	12295	Iasi	0.7925580859257505	{'id': 1167, 'name': 'Romania'}	{'id': 0, 'name': 'World'}
7	12301	Prahova	0.8069739506366441	{'id': 1167, 'name': 'Romania'}	{'id': 0, 'name': 'World'}
8	11697	Soldanesti	0.814937962642843	{'id': 1134, 'name': 'Moldova'}	{'id': 0, 'name': 'World'}
9	11703	Briceni	0.8200932930395138	{'id': 1134, 'name': 'Moldova'}	{'id': 0, 'name': 'World'}
10	11733	Ungheni	0.8241366575715298	{'id': 1134, 'nam

86	10459	Vratsa	0.9743783859854216	{'id': 1032, 'name': 'Bulgaria'}	{'id': 0, 'name': 'World'}
87	11709	Cimislia	0.9790093335517946	{'id': 1134, 'name': 'Moldova'}	{'id': 0, 'name': 'World'}
88	12273	Arges	0.9815583071864143	{'id': 1167, 'name': 'Romania'}	{'id': 0, 'name': 'World'}
89	12294	Ialomita	0.9844157856270307	{'id': 1167, 'name': 'Romania'}	{'id': 0, 'name': 'World'}
90	12278	Braila	0.9847316219838171	{'id': 1167, 'name': 'Romania'}	{'id': 0, 'name': 'World'}
91	13260	Podunavski	0.9858547012982423	{'id': 1228, 'name': 'Serbia'}	{'id': 0, 'name': 'World'}
92	11732	Transnistria	0.9873136442105426	{'id': 1134, 'name': 'Moldova'}	{'id': 0, 'name': 'World'}
93	12297	Mehedinti	0.9899684656008257	{'id': 1167, 'name': 'Romania'}	{'id': 0, 'name': 'World'}
94	11707	Causeni	0.9899757688939426	{'id': 1134, 'name': 'Moldova'}	{'id': 0, 'name': 'World'}
95	12306	Teleorman	0.9907722801301277	{'id': 1167, 'name': 'Romania'}	{'id': 0, 'name': 'World'}
96	10445	Pleven	0.9912571961815236	{'id'

170	10743	Pardubický	1.120212509514393	{'id': 1051, 'name': 'Czech Republic'}	{'id': 0, 'name': 'World'}
171	12561	Comunidad Foral de Navarra	1.1208859078532138	{'id': 1188, 'name': 'Spain'}	{'id': 0, 'name': 'World'}
172	10946	Champagne-Ardenne	1.1216663113224579	{'id': 1070, 'name': 'France'}	{'id': 0, 'name': 'World'}
173	12986	Crimea	1.1230270275644216	{'id': 1210, 'name': 'Ukraine'}	{'id': 0, 'name': 'World'}
174	10694	Brodsko-Posavska	1.1231998892084154	{'id': 1048, 'name': 'Croatia'}	{'id': 0, 'name': 'World'}
175	12463	Fiorentino	1.1256992797695062	{'id': 1176, 'name': 'San Marino'}	{'id': 0, 'name': 'World'}
176	10776	P'yongan-bukto	1.1261831273623981	{'id': 1053, 'name': 'North Korea'}	{'id': 0, 'name': 'World'}
177	12989	Ivano-Frankivs'k	1.1284479957008022	{'id': 1210, 'name': 'Ukraine'}	{'id': 0, 'name': 'World'}
178	10773	Kaesong	1.1288652191053243	{'id': 1053, 'name': 'North Korea'}	{'id': 0, 'name': 'World'}
179	10169	Tavush	1.1295296978844664	{'id': 1011, 'name': 'Armen