# Tutorial for creating your own dataset from OECD API

After following this tutorial, you are expected to

1. Test api connections
2. Create and update your own recipe for your data
3. Run the databuilder and store the data for analysis

## Importing the module and checking the API connection.

Remember, OECD is imposing 20 queries per minute and 20 downloads per hour limit.

In [None]:
#!pip install oecddatabuilder

In [1]:
import oecddatabuilder as OECD_data

OECD_data.utils.test_api_connection()

INFO:oecddatabuilder.utils:API connection successful.


## RECIPE

- First, you need to prepare the recipe for your data. Recipe is a nested dictionary to store necessary information about your query.

- RecipeLoader is simple, it has two functions: "load" and "update"

In [4]:
recipe_loader = OECD_data.RecipeLoader()

ERROR:oecddatabuilder.recipe_loader:Error loading recipe file: Expecting value: line 1 column 1 (char 0)
INFO:oecddatabuilder.recipe_loader:Atomic write successful to /Users/minkeychang/oecddatabuilder/config/recipe.json
INFO:oecddatabuilder.recipe_loader:Entire recipe configuration saved successfully to /Users/minkeychang/oecddatabuilder/config/recipe.json.


In [5]:
recipe_loader.show() # Only default recipe

{'DEFAULT': {'C': {'ACTIVITY': '',
                   'ADJUSTMENT': '',
                   'COUNTERPART_SECTOR': '',
                   'EXPENDITURE': '',
                   'FREQ': 'Q',
                   'INSTR_ASSET': '',
                   'PRICE_BASE': 'LR',
                   'REF_AREA': 'KOR+CAN+USA+CHN+GBR+DEU+FRA+JPN+ITA+IND+MEX+IRL',
                   'SECTOR': 'S1M',
                   'TABLE_IDENTIFIER': '',
                   'TRANSACTION': 'P3',
                   'TRANSFORMATION': '',
                   'UNIT_MEASURE': 'USD_PPP'},
             'EX': {'ACTIVITY': '',
                    'ADJUSTMENT': '',
                    'COUNTERPART_SECTOR': '',
                    'EXPENDITURE': '',
                    'FREQ': 'Q',
                    'INSTR_ASSET': '',
                    'PRICE_BASE': 'LR',
                    'REF_AREA': 'KOR+CAN+USA+CHN+GBR+DEU+FRA+JPN+ITA+IND+MEX+IRL',
                    'SECTOR': 'S1',
                    'TABLE_IDENTIFIER': '',
             

##### From the following line, when you load recipe loader, it returns nested dictionary containing inforamtion about your dataset.

- It also creates recipe.json file under the /conf/ directory. It is okay and recommanded to modify, add, delete the recipes from recipe.json for your own preprint.
- However, it is NOT recommanded to modify the default recipe.
- Last but not least, be aware of recipe's nested dictionary format. I know it's confusing but that was the best I could do.

In [6]:
default_recipe = recipe_loader.load(recipe_name="DEFAULT")

default_recipe

INFO:oecddatabuilder.recipe_loader:User configuration merged for group 'DEFAULT'.


{'Y': {'FREQ': 'Q',
  'ADJUSTMENT': '',
  'REF_AREA': 'KOR+CAN+USA+CHN+GBR+DEU+FRA+JPN+ITA+IND+MEX+IRL',
  'SECTOR': 'S1',
  'COUNTERPART_SECTOR': '',
  'TRANSACTION': 'B1GQ',
  'INSTR_ASSET': '',
  'ACTIVITY': '',
  'EXPENDITURE': '',
  'UNIT_MEASURE': 'USD_PPP',
  'PRICE_BASE': 'LR',
  'TRANSFORMATION': '',
  'TABLE_IDENTIFIER': ''},
 'C': {'FREQ': 'Q',
  'ADJUSTMENT': '',
  'REF_AREA': 'KOR+CAN+USA+CHN+GBR+DEU+FRA+JPN+ITA+IND+MEX+IRL',
  'SECTOR': 'S1M',
  'COUNTERPART_SECTOR': '',
  'TRANSACTION': 'P3',
  'INSTR_ASSET': '',
  'ACTIVITY': '',
  'EXPENDITURE': '',
  'UNIT_MEASURE': 'USD_PPP',
  'PRICE_BASE': 'LR',
  'TRANSFORMATION': '',
  'TABLE_IDENTIFIER': ''},
 'G': {'FREQ': 'Q',
  'ADJUSTMENT': '',
  'REF_AREA': 'KOR+CAN+USA+CHN+GBR+DEU+FRA+JPN+ITA+IND+MEX+IRL',
  'SECTOR': 'S13',
  'COUNTERPART_SECTOR': '',
  'TRANSACTION': 'P3',
  'INSTR_ASSET': '',
  'ACTIVITY': '',
  'EXPENDITURE': '',
  'UNIT_MEASURE': 'USD_PPP',
  'PRICE_BASE': 'LR',
  'TRANSFORMATION': '',
  'TABLE_IDENTI

##### You can also update the recipe by using the following function.

If you access OECD Data Explorer [webpage](https://data-explorer.oecd.org/) and query(search) for your needed data, it will have **developer api** section on the right.

![OECD_API_DEVELOPER](../docs/image/API_demo.png)

### WARNING: OECD API LIMITS

##### Running the function ```update_recipe_from_url``` would create transactions as much as number of columns in the recipe.
- This means if there are more than 20 columns, you cannot run it at once.
- It also means that running it everytime would cost you the accessibility.
- I strongly recommand you updating the recipe once in the beginning and later modfiy manually through ```recipe.json``` file.
- You can generate your updated ```recipe.json``` with ```save()``` function.


##### Here, you can paste the copied link in the following format.

- In nested dictionary format, you can designate the recipe name, and dictionaries of column name as a key and url link you copied as a value.

- Each link **MUST** contain ONLY ONE transaction because multiple transactions would have different combinations of filters and will throw an error.

- For instance, if you query multiple transactions in one link in the format of P3+D1+P5, there is a great chance that P3, D1, P5 would require different set of filters. This is why we require user to provide only one time series data for a variable.

- FYI, I suggest you referring to [OECD API Documentation](../docs/OECD_API_documentation.pdf) for more information and understanding of API structure.

In [None]:
recipe_loader.update_recipe_from_url("TUTORIAL",
                            {"A": "https://sdmx.oecd.org/public/rest/data/OECD.SDD.NAD,DSD_NASEC1@DF_QSA,1.1/Q..AUT....P3.......?startPeriod=2023-Q3",
                             "B": "https://sdmx.oecd.org/public/rest/data/OECD.SDD.NAD,DSD_NASEC1@DF_QSA,1.1/Q..AUT....D1.......?startPeriod=2023-Q3",
                             "C": "https://sdmx.oecd.org/public/rest/data/OECD.SDD.NAD,DSD_NASEC1@DF_QSA,1.1/Q..AUT....P5.......?startPeriod=2023-Q3"
                             }
                            )

##### You can see that we now have another blueprint for the data **Tutorial**

In [None]:
new_recipe = recipe_loader.load(recipe_name="TUTORIAL")

new_recipe

#####  You can save the new recipe to ensure you will use it next time.

In [None]:
recipe_loader.save()

## Build Data

##### Now you are almost there! you can build the data based on your recipe.

- Current DEFAULT recipe contains QNA(Quarterly National Account) Dataset from OECD to construct the data for famous identity in economics
$$
Y = C + I + G + EX - IM
$$

- You can first load the recipe by calling load function of recipe loader class with your preferred recipe name.

In [None]:
default_recipe = recipe_loader.load("DEFAULT")

##### Before actually building the dataframe, you can check it with the function ```test_api_connection(recipe)```.

- Issue here is that you must provide base url too in the testing. each transaction and table has different name in ```OECD.SDD.NAD,DSD_NASEC1@DF_QSA,1.1/``` part.
- If you look at the [API explainer page](https://www.oecd.org/en/data/insights/data-explainers/2024/09/api.html) you can see the structure below.
- It is important that you need to provide the correct base url to test and build the dataset.

![url_structure](../docs/image/url_structure.png)

In [None]:
OECD_data.utils.test_recipe(new_recipe, base_url="https://sdmx.oecd.org/public/rest/data/OECD.SDD.NAD,DSD_NASEC1@DF_QSA,1.1/")

##### Here, you are encouraged to have large request interval and small chunk size because there's API limits.

For the below code, it took 18m and 49.3s to execute and interval of 1 request per minute.

In [None]:
API = OECD_data.OECDAPI_Databuilder(config=default_recipe, start="1990-Q1", end="2024-Q4", freq="Q", response_format="csv",
                                    dbpath="../datasets/OECD",
                                    base_url="https://sdmx.oecd.org/public/rest/data/OECD.SDD.NAD,DSD_NAMAIN1@DF_QNA,1.1/", request_interval=60)

In [None]:
API.fetch_data(chunk_size=50) # This takes long time

## Now finally! We have fetched all the dataset.

- We can aggregate these data to create one dataframe that we designed from the beginning in the 'recipe'.
- Data would consist of 'date', 'country', and all the other indicator value columns.

In [None]:
df = API.create_dataframe()

In [None]:
df.head()

In [None]:
df.to_csv("keynsian.csv")

In [None]:
df.describe()

# Congratulations! You have created your own dataset for economic analysis from OECD API

There's always room for improvement so I am open to any pull requests, forks, suggestions!

Happy downloading OECD API datasets! (Sounds like LLM but all written by myself.)