# Similar Regions examples



## Preliminaries

### Distance metric
Reported distance values are in a multi-dimentional space. They should only be used for relative region comparison and not be interpreted in any other way. In particular, no linearity of the underlying space should be assumed. For example, a pair of regions at double the distance compared to some other pair should not be interpreted as "twice less similar" in any meaningful way.

### Data volume
Whenever the similar_to method is called with a different compare_to argument, the SimilarRegion object is rebuilt, which can be time consuming if the data has not already been cached.  If compare_to=0 (whole world), and requested_level=5 (district), we need several daily data series for 20 years for each of about 45,000 regions, i.e. close to 1 billion data points.

Subsequent calls with the same compare_to take advantage of the cache and do not trigger data downloads from the Gro API. The same applies to the seed region, i.e. the region_id argument, if the region is not present in the cache, data will be downloaded from the Gro API. 

### Output formatting
Some region names contain commas, so when writing results to a csv file, you  should use a csv writer object.


## Initial Imports

SimilarRegion class definition and specification of what data to use. *metric_properties* specifies the Gro  entities (such as metric_id, item_id, etc.) which will used to get data from the Gro API..  The default metric_properties used here contain land surface temperature, rainfall, and soil moisture time series as well as seven soil properties.

In [1]:
import pandas as pd
import numpy as np
import logging

from api.client.samples.similar_regions_Frechet.sr import SimilarRegion
from api.client.samples.similar_regions_Frechet.metric import metric_properties, metric_weights

# Allow nested IO loops (required for jupyter notebooks to work with batch API, not needed in stand-alone python scripts)
import nest_asyncio

nest_asyncio.apply()

## Initialization

Configure SimilarRegion object (no real work will be done yet - it will start on the first call to similar_to.
This will use any previously downloaded data if any exists in data_dir. 

In [2]:
sim = SimilarRegion(metric_properties,
                    data_dir='/tmp/similar_regions_cache',
                    metric_weights=metric_weights)

# Uncomment this to see extensive informational messages during object build/region search
# sim._logger.setLevel(logging.DEBUG)

# Examples

Note that in each example, the size of the search area to compare to (a countyr, a continent or the whole world) to and the requested region level (district, province or country) significantly affect the amount of data that will be needed.  There are about 45,000 districts and about 5,000 provinces in the whole world.  

## Example 1

Find 10 districts (region level 5), in Oceania (region 13) most similar to Napa county, California, USA (region_id 136969).

In [3]:
%%time

print("Districts similar to Napa in Oceania:")
for i in sim.similar_to(136969, number_of_regions=10, requested_level=5, compare_to=13):
    print(i)

Districts similar to Napa in Oceania:


Getting data series for 12 regions in 1 batch(es) for property cation_exchange_30cm
100%|██████████| 12/12 [00:00<00:00, 28.09it/s]
Getting data series for 12 regions in 1 batch(es) for property clay_30cm
100%|██████████| 12/12 [00:00<00:00, 30.46it/s]
Getting data series for 25 regions in 1 batch(es) for property land_surface_temperature
100%|██████████| 25/25 [00:00<00:00, 117.71it/s]
Getting data series for 12 regions in 1 batch(es) for property organic_carbon_content_fine_earth_30cm
100%|██████████| 12/12 [00:00<00:00, 37.16it/s]
Getting data series for 12 regions in 1 batch(es) for property ph_h2o_30cm
100%|██████████| 12/12 [00:00<00:00, 35.38it/s]
Getting data series for 2 regions in 1 batch(es) for property rainfall
100%|██████████| 2/2 [00:00<00:00, 10.38it/s]
Getting data series for 12 regions in 1 batch(es) for property sand_30cm
100%|██████████| 12/12 [00:00<00:00, 41.23it/s]
Getting data series for 12 regions in 1 batch(es) for property silt_30cm
100%|██████████| 12/12 [00

{'#': 0, 'id': 102852, 'name': 'Upper Hunter Shire', 'dist': 1.0173388430612706, 'parents': [(10174, 'New South Wales'), (100023983, 'Hunter Valley, excl Newcastle - SA4 area')], 'grand_parents': [(1013, 'Australia'), (10174, 'New South Wales'), (100024065, 'Australia Statistical Area')]}
{'#': 1, 'id': 100022568, 'name': 'Wangaratta', 'dist': 1.0385239475854269, 'parents': [(10180, 'Victoria'), (100024048, 'Hume - SA4 area')], 'grand_parents': [(1013, 'Australia'), (10180, 'Victoria'), (100024065, 'Australia Statistical Area')]}
{'#': 2, 'id': 102739, 'name': 'Dungog', 'dist': 1.08021988488311, 'parents': [(10174, 'New South Wales'), (100023983, 'Hunter Valley, excl Newcastle - SA4 area')], 'grand_parents': [(1013, 'Australia'), (10174, 'New South Wales'), (100024065, 'Australia Statistical Area')]}
{'#': 3, 'id': 100022537, 'name': 'Indigo', 'dist': 1.0809597805693687, 'parents': [(10180, 'Victoria'), (100024048, 'Hume - SA4 area')], 'grand_parents': [(1013, 'Australia'), (10180, 'Vi

## Example 2

Same as Ex.1 but restricted to Ethiopia (region_id 1065). This will trigger rebuilding of SimilrRegion object, since we are using a new search region.

In [4]:
%%time
# Use compare_to argument to restrict search to districts within particular country

print("Districts similar to Napa in Ethiopia:")
for i in sim.similar_to(136969, number_of_regions=10, requested_level=5, compare_to=1065):
    print(i)

Districts similar to Napa in Ethiopia:


Getting data series for 1 regions in 1 batch(es) for property cation_exchange_30cm
100%|██████████| 1/1 [00:00<00:00,  6.11it/s]
Getting data series for 1 regions in 1 batch(es) for property clay_30cm
100%|██████████| 1/1 [00:00<00:00,  5.86it/s]
Getting data series for 1 regions in 1 batch(es) for property land_surface_temperature
100%|██████████| 1/1 [00:00<00:00,  1.85it/s]
Getting data series for 1 regions in 1 batch(es) for property organic_carbon_content_fine_earth_30cm
100%|██████████| 1/1 [00:00<00:00,  6.01it/s]
Getting data series for 1 regions in 1 batch(es) for property ph_h2o_30cm
100%|██████████| 1/1 [00:00<00:00,  6.89it/s]
Getting data series for 1 regions in 1 batch(es) for property rainfall
100%|██████████| 1/1 [00:00<00:00,  1.64it/s]
Getting data series for 1 regions in 1 batch(es) for property sand_30cm
100%|██████████| 1/1 [00:00<00:00,  4.64it/s]
Getting data series for 1 regions in 1 batch(es) for property silt_30cm
100%|██████████| 1/1 [00:00<00:00,  6.05it/s]


{'#': 0, 'id': 142811, 'name': 'Guji', 'dist': 1.2338421995896156, 'parents': [(10925, 'Oromia'), (100023816, 'Jubba-Shebelle River basin')], 'grand_parents': [(1065, 'Ethiopia'), (11, 'Africa')]}
{'#': 1, 'id': 115015, 'name': 'Eastern', 'dist': 1.39693348320392, 'parents': [(10928, 'Tigray'), (100023870, 'Nile River basin')], 'grand_parents': [(1065, 'Ethiopia'), (11, 'Africa')]}
{'#': 2, 'id': 114979, 'name': 'Borena', 'dist': 1.4944076194977822, 'parents': [(10925, 'Oromia')], 'grand_parents': [(1065, 'Ethiopia')]}
{'#': 3, 'id': 114978, 'name': 'Bale', 'dist': 1.5500765471563827, 'parents': [(10925, 'Oromia'), (100023816, 'Jubba-Shebelle River basin')], 'grand_parents': [(1065, 'Ethiopia'), (11, 'Africa')]}
{'#': 4, 'id': 114991, 'name': 'Liben', 'dist': 1.5603356045152479, 'parents': [(10926, 'Somali'), (100023816, 'Jubba-Shebelle River basin')], 'grand_parents': [(1065, 'Ethiopia'), (11, 'Africa')]}
{'#': 5, 'id': 142822, 'name': 'Alaba', 'dist': 1.561470946006365, 'parents': [(

## Example 3

Find 10 provinces (region level 4) in Europe (region_id 14) most similar to the US state of Iowa (region_id 13066), provide detailed distance report

In [5]:
print("Provinces similar to Iowa in Europe:")
for i in sim.similar_to(13066, number_of_regions=10, requested_level=4, detailed_distance=True, compare_to=14):
    print(i)

Provinces similar to Iowa in Europe:


Getting data series for 1 regions in 1 batch(es) for property cation_exchange_30cm
100%|██████████| 1/1 [00:00<00:00,  5.26it/s]
Getting data series for 1 regions in 1 batch(es) for property clay_30cm
100%|██████████| 1/1 [00:00<00:00,  6.05it/s]
Getting data series for 1 regions in 1 batch(es) for property land_surface_temperature
100%|██████████| 1/1 [00:00<00:00,  5.89it/s]
Getting data series for 1 regions in 1 batch(es) for property organic_carbon_content_fine_earth_30cm
100%|██████████| 1/1 [00:00<00:00,  5.40it/s]
Getting data series for 1 regions in 1 batch(es) for property ph_h2o_30cm
100%|██████████| 1/1 [00:00<00:00,  5.71it/s]
Getting data series for 1 regions in 1 batch(es) for property rainfall
100%|██████████| 1/1 [00:00<00:00,  5.58it/s]
Getting data series for 1 regions in 1 batch(es) for property sand_30cm
100%|██████████| 1/1 [00:00<00:00,  4.72it/s]
Getting data series for 1 regions in 1 batch(es) for property silt_30cm
100%|██████████| 1/1 [00:00<00:00,  5.91it/s]


{'#': 0, 'id': 12284, 'name': 'Cluj', 'dist': {'total': 0.7569978480749656, 'covar': 0.43341876634072, 'cation_exchange_30cm': 0.25067659587868785, 'clay_30cm': 0.041566176229737684, 'land_surface_temperature': 0.11596780389771966, 'organic_carbon_content_fine_earth_30cm': 0.14872593597649922, 'ph_h2o_30cm': 0.01464601966557133, 'rainfall': 0.2128178623321425, 'sand_30cm': 0.30008835181434, 'silt_30cm': 0.3360045979194928, 'soil_moisture': 0.1405871378131267, 'soil_water_capacity_100cm': 0.12975621841492435}, 'parents': [(1167, 'Romania'), (100000615, 'Nord-Vest')], 'grand_parents': [(0, 'World'), (14, 'Europe'), (1290, 'EU-28'), (100000022, 'Eastern Europe'), (100000035, 'EU (European Union)'), (100000112, 'EU-27'), (100000560, 'EU-11'), (100000561, 'EU-13'), (100017849, 'World, excl China')]}
{'#': 1, 'id': 11714, 'name': 'Edinet', 'dist': {'total': 0.7639849432992943, 'covar': 0.4286800658674282, 'cation_exchange_30cm': 0.029827030581420555, 'clay_30cm': 0.05072413701545009, 'land_s

## Example 4
Larger number of provinces similar to Iowa (across entire world). Note that many of provinces from Example 3 are near the top of the list.

In [6]:
import csv
import sys

writer = csv.writer(sys.stdout, delimiter='\t')

for i in sim.similar_to(13066, number_of_regions=200, compare_to=0, requested_level=4, detailed_distance=False):
    writer.writerow(i.values())

Getting data series for 16 regions in 1 batch(es) for property cation_exchange_30cm
100%|██████████| 16/16 [00:00<00:00, 52.69it/s]
Getting data series for 16 regions in 1 batch(es) for property clay_30cm
100%|██████████| 16/16 [00:00<00:00, 38.21it/s]
Getting data series for 31 regions in 1 batch(es) for property land_surface_temperature
100%|██████████| 31/31 [00:00<00:00, 32.26it/s]
Getting data series for 16 regions in 1 batch(es) for property organic_carbon_content_fine_earth_30cm
100%|██████████| 16/16 [00:00<00:00, 38.68it/s]
Getting data series for 16 regions in 1 batch(es) for property ph_h2o_30cm
100%|██████████| 16/16 [00:00<00:00, 58.53it/s]
Getting data series for 2 regions in 1 batch(es) for property rainfall
100%|██████████| 2/2 [00:00<00:00, 11.68it/s]
Getting data series for 16 regions in 1 batch(es) for property sand_30cm
100%|██████████| 16/16 [00:00<00:00, 52.70it/s]
Getting data series for 16 regions in 1 batch(es) for property silt_30cm
100%|██████████| 16/16 [00:

0	13066	Iowa	7.146345858741878e-08	[(1215, 'United States'), (100000100, 'US Corn Belt States'), (100000101, 'US Soybean Belt States'), (100000108, 'PADD II (Midwest)'), (100017952, 'Midwest U.S. - AMS Retail'), (100022934, 'Central U.S. - AMS Poultry'), (100023362, 'Iowa and Minnesota'), (100023363, 'US Western Corn Belt States'), (100023403, 'Contiguous United States')]	[(0, 'World'), (15, 'North America'), (100000016, 'Northern America'), (100017849, 'World, excl China'), (1215, 'United States'), (1215, 'United States'), (1215, 'United States'), (1215, 'United States'), (100000100, 'US Corn Belt States'), (1215, 'United States')]
1	12284	Cluj	0.7569978480749663	[(1167, 'Romania'), (100000615, 'Nord-Vest')]	[(0, 'World'), (14, 'Europe'), (1290, 'EU-28'), (100000022, 'Eastern Europe'), (100000035, 'EU (European Union)'), (100000112, 'EU-27'), (100000560, 'EU-11'), (100000561, 'EU-13'), (100017849, 'World, excl China')]
2	11714	Edinet	0.7639849432992937	[(1134, 'Moldova')]	[(0, 'World'

23	12304	Sibiu	0.8553177815811243	[(1167, 'Romania'), (100000535, 'Centru'), (100000595, 'Macroregiunea unu')]	[(0, 'World'), (14, 'Europe'), (1290, 'EU-28'), (100000022, 'Eastern Europe'), (100000035, 'EU (European Union)'), (100000112, 'EU-27'), (100000560, 'EU-11'), (100000561, 'EU-13'), (100017849, 'World, excl China')]
24	10778	P'yongyang	0.8553686385074424	[(1053, 'North Korea')]	[(0, 'World'), (12, 'Asia'), (100000018, 'Eastern Asia'), (100017849, 'World, excl China')]
25	11723	Ocnita	0.8564968515577026	[(1134, 'Moldova')]	[(0, 'World'), (14, 'Europe'), (1281, 'USSR (historical)'), (100000022, 'Eastern Europe'), (100000164, 'FSU-12'), (100017849, 'World, excl China'), (100022669, 'Commonwealth of Independent States')]
26	11729	Straseni	0.8577676540608519	[(1134, 'Moldova')]	[(0, 'World'), (14, 'Europe'), (1281, 'USSR (historical)'), (100000022, 'Eastern Europe'), (100000164, 'FSU-12'), (100017849, 'World, excl China'), (100022669, 'Commonwealth of Independent States')]
27	11705	

50	12290	Giurgiu	0.9079114164444555	[(1167, 'Romania'), (100000594, 'Macroregiunea trei'), (100000676, 'Sud-Muntenia')]	[(0, 'World'), (14, 'Europe'), (1290, 'EU-28'), (100000022, 'Eastern Europe'), (100000035, 'EU (European Union)'), (100000112, 'EU-27'), (100000560, 'EU-11'), (100000561, 'EU-13'), (100017849, 'World, excl China')]
51	12271	Alba	0.9134300562713099	[(1167, 'Romania'), (100000535, 'Centru'), (100000595, 'Macroregiunea unu')]	[(0, 'World'), (14, 'Europe'), (1290, 'EU-28'), (100000022, 'Eastern Europe'), (100000035, 'EU (European Union)'), (100000112, 'EU-27'), (100000560, 'EU-11'), (100000561, 'EU-13'), (100017849, 'World, excl China')]
52	11700	Balti	0.9149056209523124	[(1134, 'Moldova')]	[(0, 'World'), (14, 'Europe'), (1281, 'USSR (historical)'), (100000022, 'Eastern Europe'), (100000164, 'FSU-12'), (100017849, 'World, excl China'), (100022669, 'Commonwealth of Independent States')]
53	11712	Drochia	0.9150864751422564	[(1134, 'Moldova')]	[(0, 'World'), (14, 'Europe'), 

73	12274	Bacau	0.964874015012636	[(1167, 'Romania'), (100000592, 'Macroregiunea doi'), (100000708, 'Nord-Est (RO)')]	[(0, 'World'), (14, 'Europe'), (1290, 'EU-28'), (100000022, 'Eastern Europe'), (100000035, 'EU (European Union)'), (100000112, 'EU-27'), (100000560, 'EU-11'), (100000561, 'EU-13'), (100017849, 'World, excl China')]
74	10781	Sinuiju	0.9669710675065613	[(1053, 'North Korea')]	[(0, 'World'), (12, 'Asia'), (100000018, 'Eastern Asia'), (100017849, 'World, excl China')]
75	12288	Dolj	0.9676009284551631	[(1167, 'Romania'), (100000593, 'Macroregiunea patru'), (100000677, 'Sud-Vest Oltenia')]	[(0, 'World'), (14, 'Europe'), (1290, 'EU-28'), (100000022, 'Eastern Europe'), (100000035, 'EU (European Union)'), (100000112, 'EU-27'), (100000560, 'EU-11'), (100000561, 'EU-13'), (100017849, 'World, excl China')]
76	13065	Indiana	0.9684299194683439	[(1215, 'United States'), (100000100, 'US Corn Belt States'), (100000101, 'US Soybean Belt States'), (100000108, 'PADD II (Midwest)'), (1000179

97	10435	Dobrich	0.9935518538217587	[(1032, 'Bulgaria'), (100000653, 'Severna Bulgaria'), (100000654, 'Severna i yugoiztochna Bulgaria'), (100000655, 'Severoiztochen')]	[(0, 'World'), (14, 'Europe'), (1290, 'EU-28'), (100000022, 'Eastern Europe'), (100000035, 'EU (European Union)'), (100000112, 'EU-27'), (100000560, 'EU-11'), (100000561, 'EU-13'), (100017849, 'World, excl China')]
98	10458	Vidin	0.9993953248282849	[(1032, 'Bulgaria'), (100000653, 'Severna Bulgaria'), (100000654, 'Severna i yugoiztochna Bulgaria'), (100017850, 'Severozapaden')]	[(0, 'World'), (14, 'Europe'), (1290, 'EU-28'), (100000022, 'Eastern Europe'), (100000035, 'EU (European Union)'), (100000112, 'EU-27'), (100000560, 'EU-11'), (100000561, 'EU-13'), (100017849, 'World, excl China')]
99	13000	Poltava	1.0035975481113357	[(1210, 'Ukraine')]	[(0, 'World'), (14, 'Europe'), (1281, 'USSR (historical)'), (100000022, 'Eastern Europe'), (100000164, 'FSU-12'), (100000165, 'Major Wheat Exporters - WASDE'), (100000170, 'Major 

120	12272	Arad	1.03540164618295	[(1167, 'Romania'), (100000593, 'Macroregiunea patru'), (100000686, 'Vest')]	[(0, 'World'), (14, 'Europe'), (1290, 'EU-28'), (100000022, 'Eastern Europe'), (100000035, 'EU (European Union)'), (100000112, 'EU-27'), (100000560, 'EU-11'), (100000561, 'EU-13'), (100017849, 'World, excl China')]
121	12368	Orel	1.038305928309519	[(1168, 'Russia'), (100000135, 'Central Black Earth'), (100000136, 'Central'), (100000137, 'Central Federal')]	[(0, 'World'), (14, 'Europe'), (1281, 'USSR (historical)'), (100000022, 'Eastern Europe'), (100000164, 'FSU-12'), (100000165, 'Major Wheat Exporters - WASDE'), (100000170, 'Major Coarse Grain Exporters - WASDE'), (100000174, 'Major Corn Exporters - WASDE'), (100000185, 'Major Cotton Importers - WASDE'), (100000196, 'Black Sea'), (100017849, 'World, excl China'), (100022669, 'Commonwealth of Independent States')]
122	12512	Podravska	1.0404480229755873	[(1184, 'Slovenia'), (100000691, 'Vzhodna Slovenija')]	[(0, 'World'), (14, 'E

142	13253	Južno-Banatski	1.0847098395044397	[(1228, 'Serbia'), (100000647, 'Region Vojvodine'), (100000673, 'Srbija-sever')]	[(0, 'World'), (14, 'Europe'), (1262, 'Serbia and Kosovo (historical)'), (1263, 'Serbia Montenegro and Kosovo (historical)'), (1276, 'Yugoslav SFR (historical)'), (1280, 'Serbia and Montenegro (historical)'), (1286, 'Yugoslavia (historical)'), (100000024, 'Southern Europe'), (100017849, 'World, excl China')]
143	13226	Uroševac	1.0852521927149539	[(1226, 'Kosovo')]	[(0, 'World'), (14, 'Europe'), (1262, 'Serbia and Kosovo (historical)'), (1263, 'Serbia Montenegro and Kosovo (historical)'), (1276, 'Yugoslav SFR (historical)'), (1286, 'Yugoslavia (historical)'), (100000024, 'Southern Europe'), (100017849, 'World, excl China')]
144	12334	Kabardin-Balkar	1.0877959136757525	[(1168, 'Russia'), (100000130, 'North Caucasus'), (100000131, 'North Caucasus Federal')]	[(0, 'World'), (14, 'Europe'), (1281, 'USSR (historical)'), (100000022, 'Eastern Europe'), (100000164, 'FSU-12

164	12634	Appenzell Ausserrhoden	1.1150569405786295	[(1194, 'Switzerland'), (100000634, 'Ostschweiz')]	[(0, 'World'), (14, 'Europe'), (100000025, 'Western Europe'), (100000562, 'European Free Trade Association'), (100017849, 'World, excl China')]
165	12291	Gorj	1.115278351826398	[(1167, 'Romania'), (100000593, 'Macroregiunea patru'), (100000677, 'Sud-Vest Oltenia')]	[(0, 'World'), (14, 'Europe'), (1290, 'EU-28'), (100000022, 'Eastern Europe'), (100000035, 'EU (European Union)'), (100000112, 'EU-27'), (100000560, 'EU-11'), (100000561, 'EU-13'), (100017849, 'World, excl China')]
166	10185	Niederösterreich	1.1153318431309598	[(1014, 'Austria'), (100000631, 'Ostösterreich')]	[(0, 'World'), (14, 'Europe'), (1288, 'EU-15'), (1289, 'EU-25'), (1290, 'EU-28'), (100000025, 'Western Europe'), (100000035, 'EU (European Union)'), (100000112, 'EU-27'), (100017849, 'World, excl China')]
167	13003	Sumy	1.1169994405715546	[(1210, 'Ukraine')]	[(0, 'World'), (14, 'Europe'), (1281, 'USSR (historical)'), (

187	12460	Chiesanuova	1.1433739701104675	[(1176, 'San Marino')]	[(0, 'World'), (14, 'Europe'), (100000024, 'Southern Europe'), (100017849, 'World, excl China')]
188	10951	Bourgogne	1.1437978362396302	[(1070, 'France'), (100000521, 'Bassin Parisien'), (100000528, 'Bourgogne-Franche-Comté')]	[(0, 'World'), (14, 'Europe'), (1288, 'EU-15'), (1289, 'EU-25'), (1290, 'EU-28'), (100000025, 'Western Europe'), (100000035, 'EU (European Union)'), (100000103, 'EU-12'), (100000112, 'EU-27'), (100000649, 'RUP FR-Régions ultrapériphériques françaises'), (100017849, 'World, excl China')]
189	13266	Srednje-Banatski	1.1453526174577493	[(1228, 'Serbia'), (100000647, 'Region Vojvodine'), (100000673, 'Srbija-sever')]	[(0, 'World'), (14, 'Europe'), (1262, 'Serbia and Kosovo (historical)'), (1263, 'Serbia Montenegro and Kosovo (historical)'), (1276, 'Yugoslav SFR (historical)'), (1280, 'Serbia and Montenegro (historical)'), (1286, 'Yugoslavia (historical)'), (100000024, 'Southern Europe'), (100017849, 'World