# 3.1 - Macrobond Data API for Python - Aligning multiple Time Series

*Using Macrobond's web API features to align various time series on a single calendar, frequency or currency and deal with missing values when observations do not all carry the same frequency.*

This notebook aims to provide examples of how to use Macrobond Data API for Python as well as insights on the various methodologies used to align our time series for analysis.

We will focus on using the get_unified_series method. This helps you doing the necessary pre-work before running your analysis or model.

You can find a full description of all methods and parameters used in the examples in the [documentation of the API](https://macrobond.github.io/macrobond-data-api/common/api.html).

*The examples uses the common functions of Macrobond API. Full error handling is omitted for brevity*

***

## Importing packages

In [1]:
import macrobond_data_api as mda
from macrobond_data_api.common.types import StartOrEndPoint
from macrobond_data_api.common.types import SeriesEntry
from macrobond_data_api.common.enums import SeriesFrequency
from macrobond_data_api.common.enums import CalendarMergeMode
from macrobond_data_api.common.enums import SeriesMissingValueMethod

import statsmodels.api as statsmodels_api
from sklearn import linear_model

***

## Get the data - fetchunifiedseries

Note that we are using here the below time series in this example:
* cyinea0001 - Cyprus, Earnings, Wage Growth, Nominal
* cypric0014 - Cyprus, Consumer Price Index, Miscellaneous Goods & Services, Index
* cytour0076 - Cyprus, Income, Revenue, Total, EUR
* un_myos_cy_total - Cyprus, Human Development, Education, Mean Years of Schooling

We want to look at data from Cyprus and conduct multiple regression analysis further down the notebook. Our dataset has the following features:

•	Our dependant variable will be nominal wage growth, which has an inception date of 1960 and is collected from the Cyprus Statistical Service (CYSTAT), the frequency is annual.

•	Our first independent variable will be Consumer Price Index for Miscellaneous Goods & Services which has an inception date of 2000 and is also collected from CYSTAT, and the frequency is monthly.

•	Our second independent variable is Income, total Revenue from foreign tourism (EUR) which has an inception date of 2001 and collected from CYSTAT, it has a monthly frequency.

•	Our final independent variable is Education, Mean Years of Schooling which is collected from United Nations Development Programme (UNDP) and has an inception date of 1990, the frequency is annual.
Immediately we can see that the current data set have different time scales, frequencies and currencies. So, in order for us to make the data comparable, we will utilise the 'Fetchallunifiedseries' endpoint which utilises a post request, which takes the following form below, let's see what each of these mean and how it can manipulate our data.

In the below code, we are harmonizing the frequency to Annual, the currency to USD and the start and end points from all series.

In [2]:
data_frame = mda.get_unified_series(
    SeriesEntry(missing_value_method=SeriesMissingValueMethod.NONE, name="cyinea0001"),
    SeriesEntry(missing_value_method=SeriesMissingValueMethod.NONE, name="cypric0014"),
    SeriesEntry(missing_value_method=SeriesMissingValueMethod.NONE, name="cytour0076"),
    SeriesEntry(missing_value_method=SeriesMissingValueMethod.NONE, name="un_myos_cy_total"),
    frequency=SeriesFrequency.ANNUAL,
    currency="USD",
    calendar_merge_mode=CalendarMergeMode.AVAILABLE_IN_ALL,
    start_point=StartOrEndPoint.data_in_all_series(),
    end_point=StartOrEndPoint.data_in_all_series(),
).to_pd_data_frame()
data_frame.columns = [
    "Date",
    "Wage Growth",
    "CPI",
    "Income from Foreign Tourism",
    "Mean Years of Schooling",
]
data_frame

***

## Multiple Regression Analysis

Now that we have all the variables visually, we will use the package sklearn and from there use the linear_model package to make our model. Let us first start by defining our variables.

In [3]:
x = data_frame[["CPI", "Income from Foreign Tourism", "Mean Years of Schooling"]].values
y = data_frame["Wage Growth"]

regr = linear_model.LinearRegression()
regr.fit(x, y)

x = statsmodels_api.add_constant(x)
Summary = statsmodels_api.OLS(y, x).fit()
Summary.summary()

In [4]:
CYP_Wage_Growth = regr.predict([[100.010000, 2.994805e09, 12.1712]])
print("Cyprus Wage Growth Forecast")
print(CYP_Wage_Growth)

***

## Conclusion

Here we can see how the get_unified_series method eases workflows by simply querying the data needed in the model, applying the transformations and visualising the results, rather than doing a one-off mathematical transformation from scratch. Not only this feature saves a lot of time in the preparatory work but it also increases consistency across the various time series and models running off the back of the Macrobond data.  