# 2.1 - Macrobond web API - Categories Exploration

*Performing coverage checks based on Macrobond's Categories*

This notebook aims to provide examples of how to use Macrobond's web API call methods as well as insights on the key attributes used to display the output in an understandable format.

We will focus here on using the Search method based on a **Category** input.
Our data is arranged as a logical hierarchy of categories to help you find or narrow down related datasets quickly

*Full error handling is omitted for brevity*

***

## Importing packages

In [2]:
from macrobond_financial.web import WebClient

***

## Get the data
Feel free to refer to https://api.macrobondfinancial.com/swagger/index.html to get the comprehensive list of web API endpoints and parameters used.

In the example below, we are using here the Search endpoint with filters on Category `inea` and Region `gb`: 
> **Income & Earnings - United Kingdom**

Feel free to use the notebook **1.1 - Macrobond web API - Metadata Navigation** to pull out a list of all available categories and regions.

***

## Visualising the data
Let's evaluate Macrobond's coverage for Financial Accounts-related time series in the United Kingdom.

In [3]:
with WebClient() as api:
    data_frame = api.entity_search(
        entity_types="TimeSeries",
        must_have_values={"Region": "gb", "category": "inea"},
    ).to_pd_data_frame()[[
            "Name",
            "FullDescription",
            "Region",
            "Frequency",
            "Source",
            "FirstRevisionTimeStamp",
    ]]
data_frame.head(10)

Unnamed: 0,Name,FullDescription,Region,Frequency,Source,FirstRevisionTimeStamp
0,oecd_tim_00081601,"United Kingdom, OECD TiM, Trade in Employment ...",[gb],annual,src_oecd,
1,oecd_tim_00094171,"United Kingdom, OECD TiM, Trade in Employment ...",[gb],annual,src_oecd,
2,oecd_tim_00057862,"United Kingdom, OECD TiM, Trade in Employment ...",[gb],annual,src_oecd,
3,oecd_tim_00007285,"United Kingdom, OECD TiM, Trade in Employment ...",[gb],annual,src_oecd,
4,oecd_tim_00007348,"United Kingdom, OECD TiM, Trade in Employment ...",[gb],annual,src_oecd,
5,oecd_tim_00038359,"United Kingdom, OECD TiM, Trade in Employment ...",[gb],annual,src_oecd,
6,oecd_tim_00092980,"United Kingdom, OECD TiM, Trade in Employment ...",[gb],annual,src_oecd,
7,oecd_tim_00114370,"United Kingdom, OECD TiM, Trade in Employment ...",[gb],annual,src_oecd,
8,oecd_tim_00101956,"United Kingdom, OECD TiM, Trade in Employment ...",[gb],annual,src_oecd,
9,oecd_tim_00137155,"United Kingdom, OECD TiM, Trade in Employment ...",[gb],annual,src_oecd,


### We will now focus on the Point-in-Time (PiT) series in this coverage check
Let's isolate the first element of the Region attribute. While most of the time series carry one region only, some can have multiple regions with for instance "gb" and "gb,city_[xxx]".

In [4]:
data_frame["RegionString"] = data_frame["Region"].apply(
    lambda x: ", ".join(map(str, x))
)
data_frame.head(10)

Unnamed: 0,Name,FullDescription,Region,Frequency,Source,FirstRevisionTimeStamp,RegionString
0,oecd_tim_00081601,"United Kingdom, OECD TiM, Trade in Employment ...",[gb],annual,src_oecd,,gb
1,oecd_tim_00094171,"United Kingdom, OECD TiM, Trade in Employment ...",[gb],annual,src_oecd,,gb
2,oecd_tim_00057862,"United Kingdom, OECD TiM, Trade in Employment ...",[gb],annual,src_oecd,,gb
3,oecd_tim_00007285,"United Kingdom, OECD TiM, Trade in Employment ...",[gb],annual,src_oecd,,gb
4,oecd_tim_00007348,"United Kingdom, OECD TiM, Trade in Employment ...",[gb],annual,src_oecd,,gb
5,oecd_tim_00038359,"United Kingdom, OECD TiM, Trade in Employment ...",[gb],annual,src_oecd,,gb
6,oecd_tim_00092980,"United Kingdom, OECD TiM, Trade in Employment ...",[gb],annual,src_oecd,,gb
7,oecd_tim_00114370,"United Kingdom, OECD TiM, Trade in Employment ...",[gb],annual,src_oecd,,gb
8,oecd_tim_00101956,"United Kingdom, OECD TiM, Trade in Employment ...",[gb],annual,src_oecd,,gb
9,oecd_tim_00137155,"United Kingdom, OECD TiM, Trade in Employment ...",[gb],annual,src_oecd,,gb


### Let's convert the date-time to years only

In [5]:
data_frame["FirstRevisionYear"] = data_frame["FirstRevisionTimeStamp"].str[:4]

### Displaying the new DataFrame
Let's see how our transformations have been applied by isolating on a few columns: `df.iloc[rows,[columns]]`. Note that we are also dropping NaN values in the FirstRevisionDate column: `df.dropna(subset=['FirstRevisionDate'])`

In [6]:
data_frame_final = data_frame.dropna(subset=["FirstRevisionYear"]).iloc[
    0:1000, [0, 1, 6, 3, 4, 7]
]
data_frame_final

Unnamed: 0,Name,FullDescription,RegionString,Frequency,Source,FirstRevisionYear
30,oecd_stan_00157265,"United Kingdom, OECD STAN, Structural Analysis...",gb,annual,src_oecd,2020
31,oecd_stan_00155753,"United Kingdom, OECD STAN, Structural Analysis...",gb,annual,src_oecd,2020
35,oecd_stan_00156143,"United Kingdom, OECD STAN, Structural Analysis...",gb,annual,src_oecd,2020
36,oecd_stan_00155852,"United Kingdom, OECD STAN, Structural Analysis...",gb,annual,src_oecd,2020
37,oecd_stan_00156947,"United Kingdom, OECD STAN, Structural Analysis...",gb,annual,src_oecd,2020
...,...,...,...,...,...,...
3842,oecd_mei_00437266,"United Kingdom, OECD MEI, Labour Compensation,...",gb,quarterly,src_oecd,2019
3843,gbinea0015,"United Kingdom, Income, Tax Payer, Haringey, GBP",gb,annual,src_gbhmrc,2020
3844,gbinea00851,"United Kingdom, Average Weekly Earnings, Publi...",gb,monthly,src_gbons,2016
3846,gbinea0006,"United Kingdom, Income, Tax Payer, Brent, GBP",gb,annual,src_gbhmrc,2020


### Group the results by FirstRevisionYear and Frequency
Note that Macrobond started to systematically collect PiT data in 2018. 
PiT coverage prior to 2018 has been backfilled by leveraging the source or internal collection logs.

In [7]:
df_group = (
    data_frame_final.groupby(["FirstRevisionYear", "Frequency"])["Name"]
    .count()
    .reset_index(name="Count")
)
df_group

Unnamed: 0,FirstRevisionYear,Frequency,Count
0,2015,monthly,1
1,2016,monthly,51
2,2018,annual,8
3,2018,monthly,4
4,2018,quarterly,6
5,2019,annual,1
6,2019,monthly,2
7,2019,quarterly,1
8,2020,annual,231
9,2020,monthly,326
