# 3.1 - Macrobond web API - Aligning multiple Time Series

*Using Macrobond's web API features to align various time series on a single calendar, frequency or currency and deal with missing values when observations do not all carry the same frequency.*

This notebook aims to provide examples of how to use Macrobond's web API call methods as well as insights on the various methodologies used to align our time series for analysis.

We will focus here on using the FetchUnifiedSeries POST call. This helps you doing the necessary pre-work before running your analysis or model.

*Full error handling is omitted for brevity*

***

## Importing packages

In [2]:
import statsmodels.api as statsmodels_api
from sklearn import linear_model

from macrobond_financial.common import Credentials
from macrobond_financial.common.enums import SeriesFrequency
from macrobond_financial.common.typs import StartOrEndPoint
from macrobond_financial.web import WebClient

***

## Authentication

If you have a Macrobond's web API account, enter your *client_id* and *client_secret* below.

In [3]:
credentials = Credentials()

***

## Get the data - fetchunifiedseries

Note that we are using here the below time series in this example:
* cyinea0001 - Cyprus, Earnings, Wage Growth, Nominal
* cypric0014 - Cyprus, Consumer Price Index, Miscellaneous Goods & Services, Index
* cytour0076 - Cyprus, Income, Revenue, Total, EUR
* un_myos_cy_total - Cyprus, Human Development, Education, Mean Years of Schooling

Feel free to refer to https://api.macrobondfinancial.com/swagger/index.html to get the comprehensive list of web API endpoints and parameters used.

We want to look at data from Cyprus and conduct multiple regression analysis further down the notebook. Our dataset has the following features:

•	Our dependant variable will be nominal wage growth, which has an inception date of 1960 and is collected from the Cyprus Statistical Service (CYSTAT), the frequency is annual.

•	Our first independent variable will be Consumer Price Index for Miscellaneous Goods & Services which has an inception date of 2000 and is also collected from CYSTAT, and the frequency is monthly.

•	Our second independent variable is Income, total Revenue from foreign tourism (EUR) which has an inception date of 2001 and collected from CYSTAT, it has a monthly frequency.

•	Our final independent variable is Education, Mean Years of Schooling which is collected from United Nations Development Programme (UNDP) and has an inception date of 1990, the frequency is annual.
Immediately we can see that the current data set have different time scales, frequencies and currencies. So, in order for us to make the data comparable, we will utilise the 'Fetchallunifiedseries' endpoint which utilises a post request, which takes the following form below, let's see what each of these mean and how it can manipulate our data.

***

## A few explanations on our request - Schema definitions
You can refer to the schema definition of this request in Swagger:

![](2021-08-02-10-49-27.png)

The full schema is also detailed below. Default values are in *italics*.


description:	
Request of a list of series converted to the same calendar

**frequency**	integer($int32)
nullable: true
The frequency to convert all series to. The default is to convert to the highest frequency of the series in the request.

1 = Annual

2 = SemiAnnual

3 = QuadMonthly

4 = Quarterly

5 = BiMonthly

6 = Monthly

7 = Weekly

8 = Daily

100 = Lowest

*101 = Highest*

Enum:
[ 1, 2, 3, 4, 5, 6, 7, 8, 100, 101 ]

**weekdays**	integer($int32)
nullable: true
The days of the week used for daily series. The default is Monday to Friday.

1 = Sunday (Represents a Sunday)

2 = Monday (Represents a Monday)

4 = Tuesday (Represents a Tuesday)

8 = Wednesday (Represents a Wednesday)

16 = Thursday (Represents a Thursday)

31 = SundayToThursday (Sunday to Thursday daymask, weekend on Friday and Saturday)

32 = Friday (Represents a Friday)

*62 = MondayToFriday (Standard five day week)*

64 = Saturday (Represents a Saturday)

79 = SaturdayToWednesday (Saturday to Wednesday daymask, weekend on Thursday and Friday)

94 = MondayToThursdayAndSaturday (Monday to Thursday and Saturday, weekend on Friday and Sunday)

95 = SaturdayToThursday (Saturday to Thursday, weekend on Friday)

127 = FullWeek (All days of the week)

Enum:
[ 1, 2, 4, 8, 16, 31, 32, 62, 64, 79, 94, 95, 127 ]

**calendarMergeMode**	integer($int32)
nullable: true
The merge mode determines how the series calendars are used when forming the new shared calendar. The default is to use all observations that are in any calendar.

0 = FullCalendar (Include the full range implied by the frequency and weekday settings)

1 = AvailableInAll (Use points in time that are available in all calendars)

*2 = AvailableInAny (Use points in time that are available in any calendar)*

Enum:
[ 0, 1, 2 ]

**currency**	string
nullable: true
The currency to use for currency conversion or omitted for no conversion.

**startDateMode**	integer($int32)
nullable: true
The start date mode determines how the start date is calculated. By default the mode is to start when there is data in any series.

*0 = DataInAnySeries (All the series start or end when there is data in any series)*

1 = DataInAllSeries (All the series start or end when there is data in all series)

Enum:
[ 0, 1 ]

**startPoint**	string
nullable: true
The start point. By default, this is determined by the startDateMode. It can be a date on the format yyyy-mm-dd or a number of observations relative the end of the series.

**endDateMode**	integer($int32)
nullable: true
The end date mode determines how the end date is calculated. By default the mode is to end when there is no data in any series.

*0 = DataInAnySeries (All the series start or end when there is data in any series)*

1 = DataInAllSeries (All the series start or end when there is data in all series)

Enum:
[ 0, 1 ]

**endPoint**	string
nullable: true
The end point. By default, this is determined by the endDateMode. It can be a date on the format yyyy-mm-dd or a number of observations relative the end of the series.

**seriesEntries**	
nullable: true
The list of series entries that defines the series to request.

UnifiedSeriesEntry{
description:	
Request of a list of series converted to the same calendar

**name**	string
nullable: true
The name of the series.

**missingValueMethod**	integer($int32)
nullable: true
The method for filling in missing values. The default is the automatic method.

0 = None (Do not fill in missing values)

*1 = Auto (Determine the method based on the series classification)*

2 = PreviousValue (Use the previous non-missing value)

3 = ZeroValue (Use the value zero)

4 = LinearInterpolation (Do a linear interpolation)

Enum:
[ 0, 1, 2, 3, 4 ]

**toLowerFrequencyMethod**	integer($int32)
nullable: true
*0 = Auto (Determine the method based on the series classification)*

1 = Last (Use last observation in higher frequency when converting to lower frequency)

2 = First (Use first observation in higher frequency when converting to lower frequency)

3 = Flow (Use aggregate of observations in higher frequency when converting to lower frequency)

4 = PercentageChange (Use recalculated percentage changes when converting pp100 series to lower frequency)

5 = Highest (Use highest observation in higher frequency when converting to lower frequency)

6 = Lowest (Use lowest observation in higher frequency when converting to lower frequency)

7 = Average (Use average of observations in higher frequency when converting to lower frequency)

8 = ConditionalPercentageChange (Use recalculated percentage changes when converting pp100 series to lower frequency, but only if it actually has the pp100 attribute)

Enum:
[ 0, 1, 2, 3, 4, 5, 6, 7, 8 ]

**toHigherFrequencyMethod**	integer($int32)
nullable: true
*0 = Auto (Determine the method based on the series classification)*

1 = Same (Duplicate the lower frequency value for each of the higher frequency series positions)

2 = Distribute (Distribute the lower frequency value into equal sized parts for each of the higher frequency series positions)

3 = PercentageChange (Distribute the percentage change so that the product of the higher frequency observations - 100, is the same)

4 = LinearInterpolation (Use a linear interpolation between each pair of lower frequency values to fill in each of the higher frequency values)

5 = Pulse (Sets the value for the first observation in the period range and the other values to 'missing')

6 = QuadraticDistribution (Use a quadratic interpolation that optimize the area under the lower frequency values to fill in the higher frequency values)

7 = CubicInterpolation (Use a cubic interpolation that optimize the area under the lower frequency values to fill in the higher frequency values)

8 = ConditionalPercentageChange (Distribute the percentage change so that the product of the higher frequency observations - 100, is the same, but only if pp100 is set on the series)

Enum:
[ 0, 1, 2, 3, 4, 5, 6, 7, 8 ]

**partialPeriodsMethod**	integer($int32)
nullable: true
0 = None (Type of partial period method when converting to lower frequency)

*1 = Auto (Determine the method based on the series meta data)*

2 = RepeatLastValue (Fill up the partial period by repeating the last value)

3 = FlowCurrentSum (Fill up the partial period with the average of the incomplete period)

4 = PastRateOfChange (Use the rate of change from the previous year to extend the partial period)

5 = Zero (Fill up the partial period with zeroes)

Enum:
[ 0, 1, 2, 3, 4, 5 ]

***

## Visualising the data
We have flattened the columns we want to portray in our chart (dates and values) to prepare our data to be graphed further down the notebook

In [4]:
with WebClient(credentials.client_id, credentials.client_secret) as api:
    data_frame = api.get_unified_series(
        "cyinea0001",
        "cypric0014",
        "cytour0076",
        "un_myos_cy_total",
        frequency=SeriesFrequency.ANNUAL,
        currency="USD",
        start_point=StartOrEndPoint.data_in_all_series(),
        end_point=StartOrEndPoint.data_in_all_series(),
    ).data_frame(
        [
            "dates",
            "Wage Growth",
            "CPI",
            "Income from Foreign Tourism",
            "Mean Years of Schooling",
        ]
    )
data_frame

Unnamed: 0,dates,Wage Growth,CPI,Income from Foreign Tourism,Mean Years of Schooling
0,2001-01-01 00:00:00+00:00,5.1,74.092784,1928952000.0,10.1
1,2002-01-01 00:00:00+00:00,5.6,79.499212,1853059000.0,10.2
2,2003-01-01 00:00:00+00:00,6.3,83.221561,1970487000.0,10.4
3,2004-01-01 00:00:00+00:00,4.3,87.258003,2063894000.0,10.5
4,2005-01-01 00:00:00+00:00,5.4,90.819964,2118079000.0,10.7
5,2006-01-01 00:00:00+00:00,5.4,92.985767,2220008000.0,10.9
6,2007-01-01 00:00:00+00:00,5.2,95.475094,2550603000.0,11.2
7,2008-01-01 00:00:00+00:00,6.4,97.811645,2667023000.0,11.3
8,2009-01-01 00:00:00+00:00,3.0,100.489693,2104405000.0,11.3
9,2010-01-01 00:00:00+00:00,2.4,102.628536,2021286000.0,11.5


***

## Multiple Regression Analysis

Now that we have all the variables visually, we will use the package sklearn and from there use the linear_model package to make our model. Let us first start by defining our variables.

In [5]:
x = data_frame[["CPI", "Income from Foreign Tourism", "Mean Years of Schooling"]]
y = data_frame["Wage Growth"]

regr = linear_model.LinearRegression()
regr.fit(x, y)

x = statsmodels_api.add_constant(x)
Summary = statsmodels_api.OLS(y, x).fit()
Summary.summary()

  "anyway, n=%i" % int(n))


0,1,2,3
Dep. Variable:,Wage Growth,R-squared:,0.748
Model:,OLS,Adj. R-squared:,0.698
Method:,Least Squares,F-statistic:,14.84
Date:,"Fri, 18 Mar 2022",Prob (F-statistic):,9.26e-05
Time:,17:10:27,Log-Likelihood:,-33.05
No. Observations:,19,AIC:,74.1
Df Residuals:,15,BIC:,77.88
Df Model:,3,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,60.3246,9.562,6.309,0.000,39.943,80.706
CPI,0.2537,0.112,2.265,0.039,0.015,0.492
Income from Foreign Tourism,5.536e-09,1.96e-09,2.824,0.013,1.36e-09,9.72e-09
Mean Years of Schooling,-8.3693,1.894,-4.418,0.000,-12.407,-4.332

0,1,2,3
Omnibus:,11.255,Durbin-Watson:,1.3
Prob(Omnibus):,0.004,Jarque-Bera (JB):,8.495
Skew:,-1.444,Prob(JB):,0.0143
Kurtosis:,4.546,Cond. No.,67000000000.0


In [6]:
CYP_Wage_Growth = regr.predict([[100.010000, 2.994805e09, 12.1712]])
print("Cyprus Wage Growth Forecast")
print(CYP_Wage_Growth)

Cyprus Wage Growth Forecast
[0.41567341]


***

## Conclusion

Here we can see how the FetchUnifiedSeries endpoint which utilises a POST request really eases workflows by simply querying the data needed in the model, applying the transformations and visualising the results, rather than doing a one-off mathematical transformation from scratch. Not only this feature saves a lot of time in the preparatory and necessary work but it also increases consistency across the various time series and models running off the back of the Macrobond data.  