# 4.1 - Macrobond Data API for Python - Revision History

*Using Macrobond Data API for Python to query one or more time series in a point-in-time fashion. Revision history is defined as the feature allowing to request the original values of a time series before it could have been revised by the source. The vintageTimeStamp provides the point-in-time stamp as of which the series was known to have the values retrieved. This time stamp corresponds to when Macrobond has made the data available in its database.*

This notebook aims to provide examples of how to use methods from Macrobond Data API for Python to work with Revision History.

We will focus on using various calls from observing whether a time series carries Revision History up to requesting the full array of historical revisions.

You can find a full description of all methods and parameters used in the examples in the [documentation of the API](https://macrobond.github.io/macrobond-data-api/common/api.html).

*The examples uses the common functions of Macrobond API. Full error handling is omitted for brevity*

***

## Importing packages

In [0]:
import macrobond_data_api as mda
from macrobond_data_api.web import WebClient
from macrobond_data_api.common.types import RevisionHistoryRequest

import pandas as pd
import matplotlib.pyplot as plt
from datetime import datetime

***

## Get the data - get_revision_info

get_revision_info provides high level information whether the time series store Revision History and had Revisions yet or not. The output also provides the list of vintageTimeStamps for series carrying Revision History.

In the example below, we are using time series uslama4760:

> `United States, Employment, Payroll, Nonfarm, Payroll, Total (1-Month Net Change), SA`

In [0]:
data_frame = mda.get_revision_info("uslama4760")[0].to_pd_data_frame()
data_frame.groupby(data_frame.columns.to_list()[:-2])["vintage_time_stamps"].agg(list).reset_index()

Our time series is enabled to store Revision History and already had revisions.
Let's now observe all available vintages. Note that we are calling the last 5 values here.

In [0]:
data_frame[["vintage_time_stamps"]].tail()

***

## Get the data - get_observation_history

get_observation_history finds the various values a specific historical release 'observationDate' could have had over time. The historical revisions are provided with their respective dates of integration 'timeStamps'.

In [0]:
history_observation = mda.get_observation_history("uslama4760", pd.to_datetime("2022-07-01"))[0].to_pd_data_frame()
history_observation

In [0]:
print(
    "Observation date:",
    mda.get_observation_history("uslama4760", pd.to_datetime("2022-07-01"))[0].to_dict()["ObservationDate"],
)

Let's measure the time difference from the first publication to the last revision:

In [0]:
history_observation.index = pd.to_datetime(history_observation.index)
history_observation.index.max() - history_observation.index.min()

***

## Get the data - get_nth_release

get_nth_release retrieves the nth change of value of the requested time series.
Now that we have pulled out the revisions of a specific time series for a fixed observation via the get_observation_history method, we can isloate the nth revision in this array. This can be particularly useful to observe patterns of revisions from a single source or on a unique concept across various regions.  

In [0]:
# We are fetching here the first revision (n=1) for each observation of our time series. You can set n=0 for unrevised data i.e. initial releases.
nth_release = pd.DataFrame(
    mda.get_one_nth_release(1, "uslama4760", include_times_of_change=True).to_dict(), columns=["Dates", "Values"]
)
nth_release

We can add timesOfChange to get The time each value was last modified. As we are fetching the first revision for each observation, let's bear in mind that the very first vintage recorded for this series was on 2016-03-31 i.e. there was no revision recorded beforehand hence we can expect to see no timesOfChange nor values prior to this date.

In [0]:
time_of_change = pd.DataFrame(mda.get_one_nth_release(1, "uslama4760", include_times_of_change=True).values_metadata)
nth_release = pd.merge(nth_release, time_of_change, left_index=True, right_index=True)
nth_release

***

## Get the data - get_one_vintage_series

In [0]:
from dateutil.tz import tzutc
from datetime import datetime

vintage_data = mda.get_one_vintage_series(
    datetime(2023, 1, 6, 13, 30, tzinfo=tzutc()), "uslama4760", include_times_of_change=True
)

vintage_series = pd.DataFrame(vintage_data.to_dict(), columns=["Dates", "Values"])
time_of_change = pd.DataFrame(vintage_data.values_metadata)
vintage_series = pd.merge(vintage_series, time_of_change, left_index=True, right_index=True)
vintage_series

In [0]:
print(
    "Observation as of",
    mda.get_one_vintage_series(
        datetime(2023, 1, 6, 13, 30, tzinfo=tzutc()), "uslama4760", include_times_of_change=True
    ).revision_time_stamp,
)

***

## Get the data - get_all_vintage_series

get_all_vintage_series retrieves the full array of revisions over time for the requested time series.
We can build a dataframe of such revisions. 

In [0]:
mda.get_all_vintage_series("gbcpi")

***

## Get the data - get_many_series_with_revisions

Thanks to the get_many_series_with_revisions, we can request partial updates and keep in synched with the Macrobond's database.

Our typical workflow starts with downloading the full array of revisions for your universe. Once you have built the history, you can keep your database in synced with ours by requesting incremental updates only i.e. new vintages thanks to the various time stamps of the previous download.

Let's take the example of time series `depric1312` where:

> `LastModifiedTimeStamp` = 2021-11-15T00:00:50+00:00

> `LastRevisionTimeStamp` = 2021-10-28T15:53:57.1091028Z

In [0]:
for i in mda.get_many_series_with_revisions(
    [
        RevisionHistoryRequest(
            "depric1312",
            if_modified_since=datetime(2021, 11, 15, 00, 00, 50),
            last_revision=datetime(2021, 10, 28, 15, 53, 57),
        )
    ]
):
    if i.status_code:
        print(
            "Let's check first the type of response received. If you have inserted time stamps in the payload, it is most likely a partial update with status PARTIAL_CONTENT will be returned:"
        )
        print(i.status_code, i.error_text)
        dataframe_all = pd.DataFrame(i.vintages)

In [0]:
dataframe_all

In [0]:
dataframe_all.apply(pd.Series.explode).pivot_table(index="dates", columns="vintage_time_stamp", values="values")