# 4.1 - Macrobond Data API for Python - Revision History

*Using Macrobond Data API for Python to query one or more time series in a point-in-time fashion. Revision history is defined as the feature allowing to request the original values of a time series before it could have been revised by the source. The vintageTimeStamp provides the point-in-time stamp as of which the series was known to have the values retrieved. This time stamp corresponds to when Macrobond has made the data available in its database.*

This notebook aims to provide examples of how to use methods from Macrobond Data API for Python to work with Revision History.

We will focus on using various calls from observing whether a time series carries Revision History up to requesting the full array of historical revisions.

You can find a full description of all methods and parameters used in the examples in the [documentation of the API](https://macrobond.github.io/macrobond-data-api/common/api.html).

*The examples uses the common functions of Macrobond API. Full error handling is omitted for brevity*

***

## Importing packages

In [0]:
import macrobond_data_api as mda
from macrobond_data_api.common.types import RevisionHistoryRequest

import pandas as pd
from datetime import datetime, timezone

***

## Get the data - get_revision_info

get_revision_info provides high level information whether the time series store Revision History and had Revisions yet or not. The output also provides the list of vintageTimeStamps for series carrying Revision History.

In the example below, we are using time series uslama4760:

> `United States, Employment, Payroll, Nonfarm, Payroll, Total (1-Month Net Change), SA`

In [None]:
data_frame = mda.get_revision_info("uslama4760")[0].to_pd_data_frame()
data_frame.groupby(data_frame.columns.to_list()[:-2])["vintage_time_stamps"].agg(
    list
).reset_index()

Our time series is enabled to store Revision History and already had revisions.
Let's now observe all available vintages. Note that we are printing the last 5 values here.

In [None]:
data_frame[["vintage_time_stamps"]].tail()

***

## Get the data - get_observation_history

get_observation_history finds the various revisions a specific observation 'observationDate' has had over time. The historical revisions are provided with their respective dates of integration 'timeStamps'.

In [None]:
history_observation = mda.get_observation_history(
    "uslama4760", pd.to_datetime("2022-07-01")
)[0].to_pd_data_frame()
history_observation

In [None]:
print(
    "Observation date:",
    mda.get_observation_history("uslama4760", pd.to_datetime("2022-07-01"))[
        0
    ].to_dict()["ObservationDate"],
)

Let's measure the time difference from the first publication to the last revision:

In [None]:
history_observation.index = pd.to_datetime(history_observation.index)
history_observation.index.max() - history_observation.index.min()

***

## Get the data - get_nth_release

get_nth_release retrieves the nth change of each observation of the requested time series. This can be particularly useful to observe patterns of revisions from a single source or on a unique concept across various regions. Specifying the parameter as 0 will give you the unrevised data.

In [None]:
# We are fetching here the first revision (n=1) for each observation of our time series. You can set n=0 for unrevised data i.e. initial releases.
nth_release = pd.DataFrame(
    mda.get_nth_release(1, ["uslama4760"], include_times_of_change=True)[0].to_dict(),
    columns=["Dates", "Values"],
)
nth_release

We can add timesOfChange to get the time each value was modified. As we are fetching the first revision for each observation, let's bear in mind that the very first vintage recorded for this series was on 2016-03-31 i.e. there was no revision recorded before that hence we can expect to see no timesOfChange nor values prior to this date.

In [None]:
time_of_change = pd.DataFrame(
    mda.get_nth_release(1, ["uslama4760"], include_times_of_change=True)[
        0
    ].values_metadata
)  # values_metadata stands for the time of change/revision time stamp of the underlying value
nth_release = pd.merge(nth_release, time_of_change, left_index=True, right_index=True)
nth_release

***

## Get the data - get_vintage_series

In [None]:
from dateutil.tz import tzutc

vintage_data = mda.get_vintage_series(
    datetime(2023, 1, 6, 13, 30, tzinfo=tzutc()),
    ["uslama4760"],
    include_times_of_change=True,
)[0]

vintage_series = pd.DataFrame(vintage_data.to_dict(), columns=["Dates", "Values"])
time_of_change = pd.DataFrame(vintage_data.values_metadata)
vintage_series = pd.merge(
    vintage_series, time_of_change, left_index=True, right_index=True
)
vintage_series

In [None]:
print(
    "Observation as of",
    mda.get_vintage_series(
        datetime(2023, 1, 6, 13, 30, tzinfo=tzutc()),
        ["uslama4760"],
        include_times_of_change=True,
    )[0].revision_time_stamp,
)

***

## Get the data - get_all_vintage_series

get_all_vintage_series retrieves the full array of revisions over time for the requested time series.
We can build a dataframe of such revisions. 

In [None]:
mda.get_all_vintage_series("gbcpi")

***

## Get the data - get_many_series_with_revisions

Thanks to the get_many_series_with_revisions, we can request incremental updates and keep in sync with the Macrobond's database.

Our typical workflow starts with downloading the full array of revisions for your universe. Once you have built the history, you can keep your database in sync with ours by requesting incremental updates only i.e. new vintages thanks to the various time stamps of the previous download.

Let's take the example of time series `depric1312` where:

> `if_modified_since` = 2021-11-15T00:00:50+00:00

> `last_revision` = 2021-10-28T15:53:57.1091028Z

In [None]:
for i in mda.get_many_series_with_revisions(
    [
        RevisionHistoryRequest(
            "depric1312",
            if_modified_since=datetime(2021, 11, 15, 00, 00, 50, tzinfo=timezone.utc),
            last_revision=datetime(2021, 10, 28, 15, 53, 57, tzinfo=timezone.utc),
        )
    ]
):
    if i.status_code:
        print(
            "PARTIAL_CONTENT will be returned if last_revision is being specified and earlier than current LastRevisionTimeStamp:"
        )
        print(i.status_code, i.error_text)
        dataframe_all = pd.DataFrame(i.vintages)

In [None]:
dataframe_all

In [None]:
dataframe_all.explode(["dates", "values"]).pivot_table(
    index="dates", columns="vintage_time_stamp", values="values"
)

Please note that if there is LastRevisionAdjustmentTimeStamp  metadata in the underlying time series, then the parameter last_revision_adjustment has to be specified as the LastRevisionAdjustmentTimeStamp from last return (and last_revision should not be earlier than LastRevisionAdjustmentTimeStamp), in order to get partial return. 