# At-time query examples

## Use case: "I am backtesting and need to avoid lookahead bias. What data would Gro have had available on each date I am testing?"

Data does not usually appear immediately at the end of its period. If a source
is reporting a "total exports" number for January 1st to December 31st,
that data point may not be reported by the source until February of the
following year, for example. This is referred to as "source lag." You can
inspect a given source's worst-case expected lag using the `lookup()` function,
like so: `client.lookup('sources', source_id)['sourceLag']`

Gro keeps track of on what date each point was reported, and, using this
at_time feature demonstrated below, the `get_data_points()` function can filter
out points that would not have been reported yet as of the given date.

Data also may be revised after it has been published, as is the case most often
with forecasts like the Gro Yield Model that get closer to the true number as
the season progresses. The default mode for `get_data_points()` is to give the
latest point for each period, since the one reported most recently is the most
up-to-date, and presumably most accurate, value. You may, however, want to
analyze the historical accuracy of a forecast as the season progressed, and for
that you need to know what the latest forecast was at each point you're
interested in. For that, you may also use the at_time query demonstrated below.

See Also
--------
- `api.client.lib.get_data_points()`
- `api.client.lib.lookup()`
- https://github.com/gro-intelligence/api-client/wiki/FAQ#q-what-does-sourcelag-mean-when-i-use-clientlookup-to-inspect-a-sources-details
- https://github.com/gro-intelligence/api-client/wiki/FAQ#q-how-do-i-see-previous-values-for-a-time-series-point-to-see-how-the-value-changed-over-time

In [1]:
import os
from api.client.gro_client import GroClient

API_HOST = 'api.gro-intelligence.com'
ACCESS_TOKEN = os.environ['GROAPI_TOKEN']

client = GroClient(API_HOST, ACCESS_TOKEN)

## Example #1: LST and sporadic lag

For LST, there is a defined worst-case lag for their daily data of 7 days

In [2]:
LST = 26
print(client.lookup('sources', LST)['sourceLag'])

{'daily': '7d', 'weekly': '8d'}


That means that for any given data point, we can typically expect it to be available at any time between the end_date of the point and 7 days after the end date of the point.

Let's look at a particular series and see how that may vary.

In [3]:
TEMP=2540047
LAND_TEMP=3457
IOWA=13066
DAILY=1
LST=26

For LST, typically data is not published by the source same-day. We can see that by requesting the 2018-12-13 point with an "at_time" of 2018-12-13

In [4]:
client.get_data_points(
    metric_id=TEMP,
    item_id=LAND_TEMP,
    region_id=IOWA,
    frequency_id=DAILY,
    source_id=LST,
    start_date='2018-12-13',
    end_date='2018-12-13',
    at_time='2018-12-13'
)

[]

An empty response is expected, since we are simulating what would have been available on 2018-12-13 and the data for the given day has not been published yet.

On 2018-12-14 however, the previous day's point is available:

In [5]:
client.get_data_points(
    metric_id=TEMP,
    item_id=LAND_TEMP,
    region_id=IOWA,
    frequency_id=DAILY,
    source_id=LST,
    start_date='2018-12-13',
    end_date='2018-12-13',
    at_time='2018-12-14'
)

[{'start_date': '2018-12-13T00:00:00.000Z',
  'end_date': '2018-12-13T00:00:00.000Z',
  'value': -7.43692102699087,
  'input_unit_id': 36,
  'input_unit_scale': 1,
  'metric_id': 2540047,
  'item_id': 3457,
  'region_id': 13066,
  'frequency_id': 1,
  'unit_id': 36}]

So the above case is an example of when source lag is 1 day. Remember when we looked up the source that we have a defined source lag of 7 days. But that is the *worst case* lag and not the average case. So 1 day may be pretty common.

---

Let's look at a case where the lag was more than 1 day:

From December of 2018 into January and February of 2019, LST's data was updated sporadically due to the government shutdown.

If we expect the 2019-01-06 data point to be available on 2019-01-07, assuming a 1-day lag like we saw above, we might be surprised to find that it *still* isn't available, even on 2019-01-10:

In [6]:
client.get_data_points(
    metric_id=TEMP,
    item_id=LAND_TEMP,
    region_id=IOWA,
    frequency_id=DAILY,
    source_id=LST,
    start_date='2019-01-06',
    end_date='2019-01-06',
    at_time='2019-01-10'
)

[]

The 2019-01-06 data point was not published until 2019-01-11:

In [7]:
client.get_data_points(
    metric_id=TEMP,
    item_id=LAND_TEMP,
    region_id=IOWA,
    frequency_id=DAILY,
    source_id=LST,
    start_date='2019-01-06',
    end_date='2019-01-06',
    at_time='2019-01-11'
)

[{'start_date': '2019-01-06T00:00:00.000Z',
  'end_date': '2019-01-06T00:00:00.000Z',
  'value': None,
  'input_unit_id': 36,
  'input_unit_scale': 1,
  'metric_id': 2540047,
  'item_id': 3457,
  'region_id': 13066,
  'frequency_id': 1,
  'unit_id': 36}]

## Example #2: Gro's Yield Model Intra-season

Another common use-case for the at-time query is for predictive models:

Using the at-time query, one can input a date to see what the latest prediction up to that point in time was.

Note that source lag cannot be used as described above in the case of forecasts since forecasts are made prior to the period's end date. Additionally there can be many forecasts of the same value, which would not be well-represented by a single "lag" value.

For example, below we simulate at three points in 2017 what the Gro Yield Model forecasted the ultimate 2017 yield to be:

In [8]:
dates_of_interest = ['2017-01-01', '2017-05-18', '2017-09-01']
for date in dates_of_interest:
    data_points = client.get_data_points(
        metric_id=170037,
        item_id=274, # Corn
        region_id=1215, # United States
        frequency_id=9, # Annual
        source_id=32, # Gro Yield Model
        start_date='2017-01-01',
        end_date='2017-12-31',
        at_time=date
    )
    if(len(data_points)==0):
        print("On {}, there was no Gro Yield Model prediction yet.".format(date))
    else:
        print("On {}, the latest Gro Yield Model prediction was: {}".format(date, data_points[-1]['value']))
    

On 2017-01-01, there was no Gro Yield Model prediction yet.
On 2017-05-18, the latest Gro Yield Model prediction was: 10.582933064570682
On 2017-09-01, the latest Gro Yield Model prediction was: 10.769001568304368
