<a href="https://colab.research.google.com/github/joysaikat/Data_Science/blob/master/analogous_years_la_nina_brazil_argentina.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Analgous Years Example
The Analogous Years application enables users to compare events from a set period of time, to those of the same date range in other years. The application will compute ranks of similarity between the specified period, and the same period from previous or future years.

# -1. Optional Cell To Add Gro-access-token (Read before running)
You should run the following cell (after appropriately modifying) ONLY if you have saved the `GROAPI_TOKEN` in their google drive. You will have the option to manually add the access token, later.

In [1]:
from google.colab import drive
drive.mount('/content/drive')
!pip install ConfigParser #Package for parsing configuration file holding GROAPI_TOKEN
import configparser
config = configparser.RawConfigParser()
config.read('/content/drive/My Drive/Colab Notebooks/Properties/gro.properties')#

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


['/content/drive/My Drive/Colab Notebooks/Properties/gro.properties']

## 0. Install Gro API client and other Analogous Years packages.
To get started with `Analogous Years`, you have to install `Gro API Client` and some of the necessary libraries (which do not come installed in Colab) for running the `analogous_years` package.

In [2]:
!pip install git+https://github.com/gro-intelligence/api-client.git #Install Gro api client
!pip install dtw # dynamic time warping package for running analogous_years
!pip install tsfresh # time series feature extraction package for running analogous_years

# from groclient import GroClient
from api.client.samples.analogous_years.lib import final_ranks_computation
import pandas as pd
pd.options.mode.chained_assignment = None # This is to avoid seeing a lot of warnings from pandas

Collecting git+https://github.com/gro-intelligence/api-client.git
  Cloning https://github.com/gro-intelligence/api-client.git to /tmp/pip-req-build-wy0s2dly
  Running command git clone -q https://github.com/gro-intelligence/api-client.git /tmp/pip-req-build-wy0s2dly
Building wheels for collected packages: groclient
  Building wheel for groclient (setup.py) ... [?25l[?25hdone
  Created wheel for groclient: filename=groclient-1.83.0-cp36-none-any.whl size=79903 sha256=29c3328847f227d1394ebf0485ae568da86c569b3c03e3eee83ddadc710f92dd
  Stored in directory: /tmp/pip-ephem-wheel-cache-c39s97db/wheels/22/03/97/10896ebca874c083ebb2c0e99ef60f0224b1c4a0a063dec9d3
Successfully built groclient




    You are importing modules from deprecated `api` module to access Gro
    Intelligence's API.  Please update your code to import from the `groclient`
    module instead.  The `api` module will be removed by 2021-03-31.

    Replace: from api.client.gro_client import GroClient
       with: from groclient import GroClient

    And replace any other imports from `api.client.*` with imports from
    `groclient.*` instead.

    Please reach out to api-support@gro-intelligence.com if you need any help!

  import pandas.util.testing as tm


## 1. Fill in the API access token (User input required)

You have to fill in the variable `GROAPI_TOKEN` below, with your personal access token (assuming you have not done so in the first cell).

In [3]:
GROAPI_TOKEN = None
if not GROAPI_TOKEN:
  GROAPI_TOKEN = config['DEFAULT']['GROAPI_TOKEN']

## 2B. Adding data series and dates (User input required)
# Brazil

We are interested in 4 regions in Brazil

1. Mato Grosso


In [4]:
# Soil moisture on cropland - Availability in soil (volume/volume) - 
# Mato Grosso (Gro Derived Geospatial)
mato_grosso_soil_moisture = {'metric_id': 15531082, 
	'item_id': 8938,
	'region_id': [10418, 10417, 10406, 10424],
	'partner_region_id': 0, 
	'source_id': 82, 
	'frequency_id': 1}

# Rainfall - Precipitation Quantity - Mato Grosso (NASA GPM 3IMERGDL)
mato_grosso_rainfall = {
	'metric_id': 2100031, 
	'item_id': 2039, 
	'region_id': 10418, 
	'partner_region_id': 0, 
	'source_id': 126, 
	'frequency_id': 6, 
	'unit_id': 2, 
}

# Land temperature (daytime) - Temperature - Mato Grosso (NASA MODIS MOD11 LST)
mato_grosso_temperature = {
	'metric_id': 2540047, 
	'item_id': 3457, 
	'region_id': 10418, 
	'partner_region_id': 0, 
	'source_id': 26, 
	'frequency_id': 6, 
	'unit_id': 36, 
}


mato_grosso_series_list = [
               mato_grosso_soil_moisture,
               mato_grosso_rainfall,
               mato_grosso_temperature
               ]

2. Mato Grosso do Sul

In [5]:
# Soil moisture on cropland - Availability in soil (volume/volume) - 
# Mato Grosso do Sul (Gro Derived Geospatial)
mato_grosso_do_sul_soil_moisture = {'metric_id': 15531082, 
	'item_id': 8938,
	'region_id': 10417,
	'partner_region_id': 0, 
	'source_id': 82, 
	'frequency_id': 1}

# Rainfall - Precipitation Quantity - Mato Grosso do Sul (NASA GPM 3IMERGDL)
mato_grosso_do_sul_rainfall = {
	'metric_id': 2100031, 
	'item_id': 2039, 
	'region_id': 10417, 
	'partner_region_id': 0, 
	'source_id': 126, 
	'frequency_id': 6, 
	'unit_id': 2, 
}

# Land temperature (daytime) - Temperature - Mato Grosso do Sul 
# (NASA MODIS MOD11 LST)
mato_grosso_do_sul_temperature = {
	'metric_id': 2540047, 
	'item_id': 3457, 
	'region_id': 10417, 
	'partner_region_id': 0, 
	'source_id': 26, 
	'frequency_id': 6, 
	'unit_id': 36, 
}


mato_grosso_do_sul_series_list = [
               mato_grosso_do_sul_soil_moisture,
               mato_grosso_do_sul_rainfall,
               mato_grosso_do_sul_temperature
               ]


3. Parana

In [6]:
# Soil moisture on cropland - Availability in soil (volume/volume) - 
# Parana (Gro Derived Geospatial)
parana_soil_moisture = {'metric_id': 15531082, 
	'item_id': 8938,
	'region_id': 10406,
	'partner_region_id': 0, 
	'source_id': 82, 
	'frequency_id': 1}

# Rainfall - Precipitation Quantity - Parana (NASA GPM 3IMERGDL)
parana_rainfall = {
	'metric_id': 2100031, 
	'item_id': 2039, 
	'region_id': 10406, 
	'partner_region_id': 0, 
	'source_id': 126, 
	'frequency_id': 6, 
	'unit_id': 2, 
}

# Land temperature (daytime) - Temperature - Parana (NASA MODIS MOD11 LST)
parana_temperature = {
	'metric_id': 2540047, 
	'item_id': 3457, 
	'region_id': 10406, 
	'partner_region_id': 0, 
	'source_id': 26, 
	'frequency_id': 6, 
	'unit_id': 36, 
}


parana_series_list = [
               parana_soil_moisture,
               parana_rainfall,
               parana_temperature
               ]


4. Rio Grande do Sul

In [7]:
# Soil moisture on cropland - Availability in soil (volume/volume) - 
# Rio Grande do Sul (Gro Derived Geospatial)
rio_grande_do_sul_soil_moisture = {'metric_id': 15531082, 
	'item_id': 8938,
	'region_id': 10424,
	'partner_region_id': 0, 
	'source_id': 82, 
	'frequency_id': 1}

# Rainfall - Precipitation Quantity - Rio Grande do Sul (NASA GPM 3IMERGDL)
rio_grande_do_sul_rainfall = {
	'metric_id': 2100031, 
	'item_id': 2039, 
	'region_id': 10424, 
	'partner_region_id': 0, 
	'source_id': 126, 
	'frequency_id': 6, 
	'unit_id': 2, 
}

# Land temperature (daytime) - Temperature - Rio Grande do Sul (NASA MODIS MOD11 LST)
rio_grande_do_sul_temperature = {
	'metric_id': 2540047, 
	'item_id': 3457, 
	'region_id': 10424, 
	'partner_region_id': 0, 
	'source_id': 26, 
	'frequency_id': 6, 
	'unit_id': 36, 
}


rio_grande_do_sul_series_list = [
               rio_grande_do_sul_soil_moisture,
               rio_grande_do_sul_rainfall,
               rio_grande_do_sul_temperature
               ]


## Dates for Brazil


In [8]:
initial_date_brazil = '2020-06-01'
final_date_brazil = '2020-11-02'

## 3B. Vanilla Analogous Years Rank Brazil
The output will be a pandas dataframe containg the ranks computed by ensembling several distance calculation methods.

Mato Grosso

In [9]:
result_mato_grosso = final_ranks_computation.analogous_years(
    GROAPI_TOKEN, mato_grosso_series_list, initial_date_brazil, final_date_brazil, enso=True, all_ranks=True)
result_mato_grosso

Feature Extraction: 100%|██████████| 11/11 [00:01<00:00,  9.26it/s]
Feature Extraction: 100%|██████████| 11/11 [00:01<00:00,  9.55it/s]
Feature Extraction: 100%|██████████| 11/11 [00:01<00:00,  9.45it/s]
Feature Extraction: 100%|██████████| 11/11 [00:01<00:00,  9.52it/s]


Unnamed: 0_level_0,cumulative_rank,euclidean_rank,ts-features_rank,composite_rank
period,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2010-06-01 to 2010-11-02,4,4,3,4
2011-06-01 to 2011-11-02,11,11,11,11
2012-06-01 to 2012-11-02,5,5,5,5
2013-06-01 to 2013-11-02,10,9,7,10
2014-06-01 to 2014-11-02,9,7,8,8
2015-06-01 to 2015-11-02,6,6,9,6
2016-06-01 to 2016-11-02,8,10,10,9
2017-06-01 to 2017-11-02,3,3,4,3
2018-06-01 to 2018-11-02,7,8,6,7
2019-06-01 to 2019-11-02,2,2,2,2


Mato Grosso do Sul

In [10]:
result_mato_grosso_do_sul = final_ranks_computation.analogous_years(
    GROAPI_TOKEN, mato_grosso_do_sul_series_list, initial_date_brazil, final_date_brazil, enso=True, all_ranks=True)
result_mato_grosso_do_sul

Feature Extraction: 100%|██████████| 11/11 [00:01<00:00, 10.17it/s]
Feature Extraction: 100%|██████████| 11/11 [00:01<00:00,  9.43it/s]
Feature Extraction: 100%|██████████| 11/11 [00:01<00:00,  9.64it/s]
Feature Extraction: 100%|██████████| 11/11 [00:01<00:00,  9.84it/s]


Unnamed: 0_level_0,cumulative_rank,euclidean_rank,ts-features_rank,composite_rank
period,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2010-06-01 to 2010-11-02,2,7,6,4
2011-06-01 to 2011-11-02,6,5,5,6
2012-06-01 to 2012-11-02,4,6,4,5
2013-06-01 to 2013-11-02,8,9,9,8
2014-06-01 to 2014-11-02,10,11,11,10
2015-06-01 to 2015-11-02,11,10,10,11
2016-06-01 to 2016-11-02,5,2,3,3
2017-06-01 to 2017-11-02,7,4,7,7
2018-06-01 to 2018-11-02,9,8,8,9
2019-06-01 to 2019-11-02,3,3,2,2


Parana

In [11]:
result_parana = final_ranks_computation.analogous_years(
    GROAPI_TOKEN, parana_series_list, initial_date_brazil, final_date_brazil, enso=True, all_ranks=True)
result_parana

Feature Extraction: 100%|██████████| 11/11 [00:01<00:00,  9.81it/s]
Feature Extraction: 100%|██████████| 11/11 [00:01<00:00,  9.62it/s]
Feature Extraction: 100%|██████████| 11/11 [00:01<00:00,  9.65it/s]
Feature Extraction: 100%|██████████| 11/11 [00:01<00:00,  9.63it/s]


Unnamed: 0_level_0,cumulative_rank,euclidean_rank,ts-features_rank,composite_rank
period,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2010-06-01 to 2010-11-02,3,8,8,3
2011-06-01 to 2011-11-02,10,3,3,7
2012-06-01 to 2012-11-02,2,4,2,2
2013-06-01 to 2013-11-02,5,7,4,4
2014-06-01 to 2014-11-02,9,9,6,9
2015-06-01 to 2015-11-02,11,11,11,11
2016-06-01 to 2016-11-02,8,5,7,8
2017-06-01 to 2017-11-02,6,10,10,10
2018-06-01 to 2018-11-02,4,6,9,5
2019-06-01 to 2019-11-02,7,2,5,6


Rio Grande do Sul

In [12]:
result_rio_grande_do_sul = final_ranks_computation.analogous_years(
    GROAPI_TOKEN, rio_grande_do_sul_series_list, initial_date_brazil, final_date_brazil, enso=True, all_ranks=True)
result_rio_grande_do_sul

Feature Extraction: 100%|██████████| 11/11 [00:01<00:00,  9.75it/s]
Feature Extraction: 100%|██████████| 11/11 [00:01<00:00,  9.55it/s]
Feature Extraction: 100%|██████████| 11/11 [00:01<00:00,  9.55it/s]
Feature Extraction: 100%|██████████| 11/11 [00:01<00:00,  9.48it/s]


Unnamed: 0_level_0,cumulative_rank,euclidean_rank,ts-features_rank,composite_rank
period,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2010-06-01 to 2010-11-02,7,3,3,3
2011-06-01 to 2011-11-02,8,5,2,4
2012-06-01 to 2012-11-02,3,7,8,6
2013-06-01 to 2013-11-02,9,4,4,5
2014-06-01 to 2014-11-02,10,6,6,10
2015-06-01 to 2015-11-02,11,8,7,11
2016-06-01 to 2016-11-02,4,11,10,8
2017-06-01 to 2017-11-02,6,10,11,9
2018-06-01 to 2018-11-02,5,2,5,2
2019-06-01 to 2019-11-02,2,9,9,7


## 2A. Adding data series and dates (User input required)
# Argentina

We are interested in 4 regions in Argentina

1. Cordoba


In [13]:
# Soil moisture on cropland - Availability in soil (volume/volume) - 
# Cordoba (Gro Derived Geospatial)
cordoba_soil_moisture = {'metric_id': 15531082, 
	'item_id': 8938,
	'region_id': 10141,
	'partner_region_id': 0, 
	'source_id': 82, 
	'frequency_id': 1}

# Rainfall - Precipitation Quantity - Cordoba (NASA GPM 3IMERGDL)
cordoba_rainfall = {
	'metric_id': 2100031, 
	'item_id': 2039, 
	'region_id': 10141, 
	'partner_region_id': 0, 
	'source_id': 126, 
	'frequency_id': 6, 
	'unit_id': 2, 
}

# Land temperature (daytime) - Temperature - Cordoba 
# (NASA MODIS MOD11 LST)
cordoba_temperature = {
	'metric_id': 2540047, 
	'item_id': 3457, 
	'region_id': 10141, 
	'partner_region_id': 0, 
	'source_id': 26, 
	'frequency_id': 6, 
	'unit_id': 36, 
}


cordoba_series_list = [
               cordoba_soil_moisture,
               cordoba_rainfall,
               cordoba_temperature
               ]


2. Santa Fe

In [14]:
# Soil moisture on cropland - Availability in soil (volume/volume) - 
# Santa Fe (Gro Derived Geospatial)
santa_fe_soil_moisture = {'metric_id': 15531082, 
	'item_id': 8938,
	'region_id': 10156,
	'partner_region_id': 0, 
	'source_id': 82, 
	'frequency_id': 1}

# Rainfall - Precipitation Quantity - Santa Fe (NASA GPM 3IMERGDL)
santa_fe_rainfall = {
	'metric_id': 2100031, 
	'item_id': 2039, 
	'region_id': 10156, 
	'partner_region_id': 0, 
	'source_id': 126, 
	'frequency_id': 6, 
	'unit_id': 2, 
}

# Land temperature (daytime) - Temperature - Santa Fe
# (NASA MODIS MOD11 LST)
santa_fe_temperature = {
	'metric_id': 2540047, 
	'item_id': 3457, 
	'region_id': 10156, 
	'partner_region_id': 0, 
	'source_id': 26, 
	'frequency_id': 6, 
	'unit_id': 36, 
}


santa_fe_series_list = [
               santa_fe_soil_moisture,
               santa_fe_rainfall,
               santa_fe_temperature
               ]


3. Buenos Aires

In [15]:
# Soil moisture on cropland - Availability in soil (volume/volume) - 
# Buenos Aires (Gro Derived Geospatial)
buenos_aires_soil_moisture = {'metric_id': 15531082, 
	'item_id': 8938,
	'region_id': 10136,
	'partner_region_id': 0, 
	'source_id': 82, 
	'frequency_id': 1}

# Rainfall - Precipitation Quantity - Buenos Aires (NASA GPM 3IMERGDL)
buenos_aires_rainfall = {
	'metric_id': 2100031, 
	'item_id': 2039, 
	'region_id': 10136, 
	'partner_region_id': 0, 
	'source_id': 126, 
	'frequency_id': 6, 
	'unit_id': 2, 
}

# Land temperature (daytime) - Temperature - Buenos Aires
# (NASA MODIS MOD11 LST)
buenos_aires_temperature = {
	'metric_id': 2540047, 
	'item_id': 3457, 
	'region_id': 10136, 
	'partner_region_id': 0, 
	'source_id': 26, 
	'frequency_id': 6, 
	'unit_id': 36, 
}


buenos_aires_series_list = [
               buenos_aires_soil_moisture,
               buenos_aires_rainfall,
               buenos_aires_temperature
               ]



## Dates for Argentina




In [16]:
initial_date_argentina = '2020-06-01'
final_date_argentina = '2020-11-02'

## 3B. Vanilla Analogous Years Rank Argentina
The output will be a pandas dataframe containg the ranks computed by ensembling several distance calculation methods.

Cordoba

In [17]:
result_cordoba = final_ranks_computation.analogous_years(
    GROAPI_TOKEN, cordoba_series_list, initial_date_argentina, final_date_argentina, enso=True, all_ranks=True)
result_cordoba

Feature Extraction: 100%|██████████| 11/11 [00:01<00:00,  9.66it/s]
Feature Extraction: 100%|██████████| 11/11 [00:01<00:00,  9.65it/s]
Feature Extraction: 100%|██████████| 11/11 [00:01<00:00,  9.75it/s]
Feature Extraction: 100%|██████████| 11/11 [00:01<00:00,  9.62it/s]


Unnamed: 0_level_0,cumulative_rank,euclidean_rank,ts-features_rank,composite_rank
period,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2010-06-01 to 2010-11-02,4,5,5,5
2011-06-01 to 2011-11-02,2,2,2,2
2012-06-01 to 2012-11-02,9,9,9,9
2013-06-01 to 2013-11-02,3,3,3,3
2014-06-01 to 2014-11-02,7,7,7,7
2015-06-01 to 2015-11-02,8,8,6,8
2016-06-01 to 2016-11-02,10,10,11,10
2017-06-01 to 2017-11-02,11,11,10,11
2018-06-01 to 2018-11-02,5,4,4,4
2019-06-01 to 2019-11-02,6,6,8,6


Santa Fe

In [18]:
result_santa_fe = final_ranks_computation.analogous_years(
    GROAPI_TOKEN, santa_fe_series_list, initial_date_argentina, final_date_argentina, enso=True, all_ranks=True)
result_santa_fe

Feature Extraction: 100%|██████████| 11/11 [00:01<00:00,  9.51it/s]
Feature Extraction: 100%|██████████| 11/11 [00:01<00:00,  9.90it/s]
Feature Extraction: 100%|██████████| 11/11 [00:01<00:00,  9.76it/s]
Feature Extraction: 100%|██████████| 11/11 [00:01<00:00,  9.83it/s]


Unnamed: 0_level_0,cumulative_rank,euclidean_rank,ts-features_rank,composite_rank
period,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2010-06-01 to 2010-11-02,9,3,6,7
2011-06-01 to 2011-11-02,3,5,3,4
2012-06-01 to 2012-11-02,6,9,11,8
2013-06-01 to 2013-11-02,8,6,4,6
2014-06-01 to 2014-11-02,4,7,7,5
2015-06-01 to 2015-11-02,10,11,8,9
2016-06-01 to 2016-11-02,7,10,10,10
2017-06-01 to 2017-11-02,11,8,9,11
2018-06-01 to 2018-11-02,2,2,5,2
2019-06-01 to 2019-11-02,5,4,2,3


Buenos Aires

In [19]:
result_buenos_aires = final_ranks_computation.analogous_years(
    GROAPI_TOKEN, buenos_aires_series_list, initial_date_argentina, final_date_argentina, enso=True, all_ranks=True)
result_buenos_aires

Feature Extraction: 100%|██████████| 11/11 [00:01<00:00,  9.93it/s]
Feature Extraction: 100%|██████████| 11/11 [00:01<00:00,  9.99it/s]
Feature Extraction: 100%|██████████| 11/11 [00:01<00:00,  9.35it/s]
Feature Extraction: 100%|██████████| 11/11 [00:01<00:00,  9.74it/s]


Unnamed: 0_level_0,cumulative_rank,euclidean_rank,ts-features_rank,composite_rank
period,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
2010-06-01 to 2010-11-02,5,4,3,3
2011-06-01 to 2011-11-02,9,6,6,7
2012-06-01 to 2012-11-02,7,8,9,8
2013-06-01 to 2013-11-02,6,5,5,5
2014-06-01 to 2014-11-02,11,11,10,11
2015-06-01 to 2015-11-02,8,10,8,9
2016-06-01 to 2016-11-02,4,3,4,4
2017-06-01 to 2017-11-02,10,9,11,10
2018-06-01 to 2018-11-02,3,7,7,6
2019-06-01 to 2019-11-02,2,2,2,2
