# Similar Regions examples



## Comments
Reported distance values are in a multi-dimentional space. They should only be used for relative region comparison and not be interpreted in any other way. In particular, no linearity of the underlying space should be assumed. For example, a pair of regions at double the distance compared to some other pair should not be interpreted as "twice less similar" in any meaningful way

Changing 'compare_to' argument of similar_to method will trigger rebuilding of the SimilarRegion object which can be expensive.
It is therefore better to complete all queries for given search region before changing it.

If seed region is outside search region (see Ex.2,3), most recent data for the seed region will always be downloaded from Gro database even if the seed region is part of local cache. If this most recent data is different from the cache, it can lead to some differences for such runs compared to those using only cached info (for example when using local cache and searching over the entire world)

Some region names contain commas, so be careful when writing results to .csv files. Consider using other field separators, see Ex.4

## Initial Imports

In [1]:
import pandas as pd
import numpy as np
import logging

# SimilarRegion class definition
from api.client.samples.similar_regions_Frechet.sr import SimilarRegion

# Detailed API info (such as metric_id, item_id, etc.) on properties which can be used in a metric
# Currently contains land surface temperature, rainfall, and soil moisture time series
# as well as seven soil properties
from api.client.samples.similar_regions_Frechet.metric import metric_properties, metric_weights

# Allow nested IO loops (required for jupyter notebooks to work with batch API, not needed in stand-alone python scripts)
import nest_asyncio

nest_asyncio.apply()

## Initialization

In [5]:
# Configure SR object (no real work will be done yet - it will start on the first call to similar_to)
# To reconstruct from scratch, make sure data_dir is empty.

sim = SimilarRegion(metric_properties,
                    data_dir='/tmp/similar_regions_cache',
                    metric_weights=metric_weights)

# Uncomment this to see extensive informational messages during object build/region search
sim._logger.setLevel(logging.INFO)

# Examples

Note that in each example, the size of the search area to compare to (a countyr, a continent or the whole world) to and the requested region level (district, province or country) significantly affect the amount of data that will be needed.  There are about 45,000 districts and about 5,000 provinces in the whole world.  

## Example 1

Find 10 districts (region level 5), in Oceania (region 13) most similar to Napa county, California, USA (region_id 136969).

In [6]:
%%time
# First call to similar_to method will trigger data loading from either local cache or Gro database
# depending on SimilarRegion configuration requested during its construction above.
# If local cache is used this should take under 5 minutes
# *****CAUTION******: First download from Gro database for a large region (default is entire world!) will take many ours.

print("Districts similar to Napa in Oceania:")
for i in sim.similar_to(136969, number_of_regions=10, requested_level=5, compare_to=13):
    print(i)

Similar to Napa:


100%|██████████| 22/22 [00:00<00:00, 36.21it/s]
  valid_data = valid_data/data_counters # division by zero where we do not have data
100%|██████████| 22/22 [00:00<00:00, 43.72it/s]
100%|██████████| 52/52 [00:08<00:00,  5.82it/s]
100%|██████████| 22/22 [00:00<00:00, 39.80it/s]
100%|██████████| 22/22 [00:00<00:00, 42.84it/s]
 91%|█████████ | 1038/1138 [03:37<00:11,  9.05it/s]Connection closed
OK
 91%|█████████ | 1038/1138 [03:37<00:20,  4.78it/s]
100%|██████████| 106/106 [00:44<00:00,  2.38it/s]
100%|██████████| 1138/1138 [00:00<00:00, 1598.75it/s]
100%|██████████| 1138/1138 [00:00<00:00, 1535.15it/s]
100%|██████████| 1138/1138 [01:09<00:00, 16.45it/s]
100%|██████████| 1138/1138 [00:00<00:00, 1307.34it/s]
100%|██████████| 1/1 [00:00<00:00,  5.73it/s]
100%|██████████| 1/1 [00:00<00:00,  5.63it/s]
100%|██████████| 1/1 [00:00<00:00,  1.79it/s]
100%|██████████| 1/1 [00:00<00:00,  5.46it/s]
100%|██████████| 1/1 [00:00<00:00,  5.69it/s]
100%|██████████| 1/1 [00:03<00:00,  3.54s/it]
100%|██████

{'#': 0, 'id': 102852, 'name': 'Upper Hunter Shire', 'dist': 1.0173388430612706, 'parent': (10174, 'New South Wales', 1013, 'Australia')}
{'#': 1, 'id': 100022568, 'name': 'Wangaratta', 'dist': 1.0385239475854269, 'parent': (10180, 'Victoria', 1013, 'Australia')}
{'#': 2, 'id': 102739, 'name': 'Dungog', 'dist': 1.08021988488311, 'parent': (10174, 'New South Wales', 1013, 'Australia')}
{'#': 3, 'id': 100022537, 'name': 'Indigo', 'dist': 1.0809597805693687, 'parent': (10180, 'Victoria', 1013, 'Australia')}
{'#': 4, 'id': 100022487, 'name': 'Mid-Western Regional', 'dist': 1.1117281793754827, 'parent': (10174, 'New South Wales', 1013, 'Australia')}
{'#': 5, 'id': 100022557, 'name': 'Murrindindi', 'dist': 1.1134716292216345, 'parent': (10180, 'Victoria', 1013, 'Australia')}
{'#': 6, 'id': 103807, 'name': 'Wodonga', 'dist': 1.1182413463837215, 'parent': (10180, 'Victoria', 1013, 'Australia')}
{'#': 7, 'id': 100022478, 'name': 'Greater Hume Shire', 'dist': 1.1264317762109375, 'parent': (10174

## Example 2

Same as Ex.1 but restricted to Ethiopia (region_id 1065). This will trigger rebuilding of SimilrRegion object and, since seed region is outside search region, a download of seed region data directly from Gro

In [7]:
%%time
# Use compare_to argument to restrict search to districts within particular country

print("Districts similar to Napa in Ethiopia:")
for i in sim.similar_to(136969, number_of_regions=10, requested_level=5, compare_to=1065):
    print(i)

Similar to Napa in Ethiopia:


100%|██████████| 1/1 [00:00<00:00,  5.47it/s]
100%|██████████| 1/1 [00:00<00:00,  6.16it/s]
100%|██████████| 1/1 [00:00<00:00,  1.99it/s]
100%|██████████| 1/1 [00:00<00:00,  6.07it/s]
100%|██████████| 1/1 [00:00<00:00,  5.99it/s]
100%|██████████| 1/1 [00:00<00:00,  1.67it/s]
100%|██████████| 1/1 [00:00<00:00,  6.40it/s]
100%|██████████| 1/1 [00:00<00:00,  6.67it/s]
100%|██████████| 1/1 [00:00<00:00,  2.88it/s]
100%|██████████| 1/1 [00:00<00:00,  6.41it/s]


{'#': 0, 'id': 142811, 'name': 'Guji', 'dist': 1.2338421995896156, 'parent': (10925, 'Oromia', 1065, 'Ethiopia')}
{'#': 1, 'id': 115015, 'name': 'Eastern', 'dist': 1.39693348320392, 'parent': (10928, 'Tigray', 1065, 'Ethiopia')}
{'#': 2, 'id': 114979, 'name': 'Borena', 'dist': 1.4944076194977822, 'parent': (10925, 'Oromia', 1065, 'Ethiopia')}
{'#': 3, 'id': 114978, 'name': 'Bale', 'dist': 1.5500765471563827, 'parent': (10925, 'Oromia', 1065, 'Ethiopia')}
{'#': 4, 'id': 114991, 'name': 'Liben', 'dist': 1.5603356045152479, 'parent': (10926, 'Somali', 1065, 'Ethiopia')}
{'#': 5, 'id': 142822, 'name': 'Alaba', 'dist': 1.561470946006365, 'parent': (10927, 'Southern Nations, Nationalities and Peoples', 1065, 'Ethiopia')}
{'#': 6, 'id': 142824, 'name': 'South East', 'dist': 1.5920198683128988, 'parent': (10928, 'Tigray', 1065, 'Ethiopia')}
{'#': 7, 'id': 142823, 'name': "Segen Peoples'", 'dist': 1.6524844247816282, 'parent': (10927, 'Southern Nations, Nationalities and Peoples', 1065, 'Ethiop

## Example 3

Find 10 provinces (region level 4) in Europe (region_id 14) most similar to the US state of Iowa (region_id 13066), provide detailed distance report

In [None]:
print("Provinces similar to Iowa in Europe:")
for i in sim.similar_to(13066, number_of_regions=10, requested_level=4, detailed_distance=True, compare_to=14):
    print(i)

Provinces similar to Iowa in Europe:


100%|██████████| 14/14 [00:00<00:00, 29.94it/s]
  valid_data = valid_data/data_counters # division by zero where we do not have data
100%|██████████| 14/14 [00:00<00:00, 35.00it/s]
100%|██████████| 11/11 [00:10<00:00,  1.04it/s]
100%|██████████| 14/14 [00:00<00:00, 26.95it/s]
100%|██████████| 14/14 [00:00<00:00, 27.62it/s]
 10%|▉         | 900/9318 [03:14<21:05,  6.65it/s]   Connection closed
OK
Connection closed
OK
 19%|█▉        | 1800/9318 [06:45<19:36,  6.39it/s]   Connection closed
OK
Connection closed
OK
Connection closed
OK
 36%|███▋      | 3400/9318 [12:38<16:34,  5.95it/s]   

## Example 4
Larger number of provinces similar to Iowa (across entire world) printed with | separator (note that region names can have commas in them). Note that many of provinces from Ex.3 are near the top of the list

In [None]:
for i in sim.similar_to(13066, number_of_regions=200, compare_to=0, requested_level=4, detailed_distance=False):
    lv = list(i.values())
    print("|".join([str(v) for v in lv[:-1]]),"|","|".join([str(v) for v in lv[-1]]))