# Explore the IMF API
This notebook attempts to dociument my exploration of the API and hopefully help with documenting decisions about the API.

We are going to use [this documentation](http://datahelp.imf.org/knowledgebase/articles/667681-using-json-restful-web-service) from the IMF to explore the db.

In [1]:
import requests
import random
import pandas as pd
from flatten_dict import flatten
from pprint import pprint
IMF_BASE_API = "http://dataservices.imf.org/REST/SDMX_JSON.svc"

In [2]:
def get_data(url):
    response = requests.get(url)
    response.raise_for_status()
    return response.json()

In [3]:
dataflow_data = get_data(f"{IMF_BASE_API}/Dataflow") # TODO: maybe use the python path lib for urls?
pprint(dataflow_data, depth=1)
print()
pprint(dataflow_data, depth=2)

{'Structure': {...}}

{'Structure': {'@xmlns': 'http://www.SDMX.org/resources/SDMXML/schemas/v2_0/message',
               '@xmlns:xsd': 'http://www.w3.org/2001/XMLSchema',
               '@xmlns:xsi': 'http://www.w3.org/2001/XMLSchema-instance',
               '@xsi:schemaLocation': 'http://www.SDMX.org/resources/SDMXML/schemas/v2_0/message '
                                      'https://registry.sdmx.org/schemas/v2_0/SDMXMessage.xsd',
               'Dataflows': {...},
               'Header': {...}}}


Let's explore the `Header` and `DataFlows`

In [4]:
pprint(dataflow_data["Structure"]["Header"], depth=3)

{'ID': '47599286-3bb0-40ca-ad86-3df375e23154',
 'Prepared': '2020-06-08T04:16:49',
 'Receiver': {'@id': 'ZZZ'},
 'Sender': {'@id': '1C0',
            'Contact': {'Telephone': '+ 1 (202) 623-6220',
                        'URI': 'http://www.imf.org'},
            'Name': {'#text': 'IMF', '@xml:lang': 'en'}},
 'Test': 'false'}


`Header` Just gives us some metadata

In [5]:
pprint(dataflow_data["Structure"]["Dataflows"], depth=2)

{'Dataflow': [{...},
              {...},
              {...},
              {...},
              {...},
              {...},
              {...},
              {...},
              {...},
              {...},
              {...},
              {...},
              {...},
              {...},
              {...},
              {...},
              {...},
              {...},
              {...},
              {...},
              {...},
              {...},
              {...},
              {...},
              {...},
              {...},
              {...},
              {...},
              {...},
              {...},
              {...},
              {...},
              {...},
              {...},
              {...},
              {...},
              {...},
              {...},
              {...},
              {...},
              {...},
              {...},
              {...},
              {...},
              {...},
              {...},
              {...},
             

Let's have a look at the dimensions of the data

In [6]:
list_of_dataflows = dataflow_data["Structure"]["Dataflows"]["Dataflow"]
print(len(list_of_dataflows))

232


Now let's see if the blobs in this list, which we can reasonably conclude are dataflows, have a similar structure.

In [7]:
for i in range(3):
    random_int = random.randrange(len(list_of_dataflows))
    print(list_of_dataflows[random_int].keys())

dict_keys(['@id', '@version', '@agencyID', '@isFinal', '@xmlns', 'Name', 'KeyFamilyRef'])
dict_keys(['@id', '@version', '@agencyID', '@isFinal', '@xmlns', 'Name', 'KeyFamilyRef'])
dict_keys(['@id', '@version', '@agencyID', '@isFinal', '@xmlns', 'Name', 'KeyFamilyRef'])


In [8]:
for i in range(3):
    random_int = random.randrange(len(list_of_dataflows))
    pprint(list_of_dataflows[random_int])

{'@agencyID': 'IMF',
 '@id': 'DS-FDI',
 '@isFinal': 'true',
 '@version': '1.0',
 '@xmlns': 'http://www.SDMX.org/resources/SDMXML/schemas/v2_0/structure',
 'KeyFamilyRef': {'KeyFamilyAgencyID': 'IMF', 'KeyFamilyID': 'FDI'},
 'Name': {'#text': 'Financial Development Index', '@xml:lang': 'en'}}
{'@agencyID': 'IMF',
 '@id': 'DS-AFRREO201904',
 '@isFinal': 'true',
 '@version': '1.0',
 '@xmlns': 'http://www.SDMX.org/resources/SDMXML/schemas/v2_0/structure',
 'KeyFamilyRef': {'KeyFamilyAgencyID': 'IMF', 'KeyFamilyID': 'AFRREO201904'},
 'Name': {'#text': 'Sub-Saharan Africa Regional Economic Outlook (AFRREO) '
                   'April 2019',
          '@xml:lang': 'en'}}
{'@agencyID': 'IMF',
 '@id': 'DS-GFSCOFOG',
 '@isFinal': 'true',
 '@version': '1.0',
 '@xmlns': 'http://www.SDMX.org/resources/SDMXML/schemas/v2_0/structure',
 'KeyFamilyRef': {'KeyFamilyAgencyID': 'IMF', 'KeyFamilyID': 'GFSCOFOG'},
 'Name': {'#text': 'Government Finance Statistics (GFS), Expenditure by '
                   '

We can reasonably conclude that the data are the same initial structure

Let's convert this to a dataframe and take a look at what this data looks like.

In [9]:
pd.DataFrame(data=list_of_dataflows).head()

Unnamed: 0,@id,@version,@agencyID,@isFinal,@xmlns,Name,KeyFamilyRef,Description
0,DS-FAS,1.0,IMF,True,http://www.SDMX.org/resources/SDMXML/schemas/v...,"{'@xml:lang': 'en', '#text': 'Financial Access...","{'KeyFamilyID': 'FAS', 'KeyFamilyAgencyID': 'I...",
1,DS-MCDREO,1.0,IMF,True,http://www.SDMX.org/resources/SDMXML/schemas/v...,"{'@xml:lang': 'en', '#text': 'Middle East and ...","{'KeyFamilyID': 'MCDREO', 'KeyFamilyAgencyID':...",
2,DS-DOT,1.0,IMF,True,http://www.SDMX.org/resources/SDMXML/schemas/v...,"{'@xml:lang': 'en', '#text': 'Direction of Tra...","{'KeyFamilyID': 'DOT', 'KeyFamilyAgencyID': 'I...",
3,DS-CDIS,1.0,IMF,True,http://www.SDMX.org/resources/SDMXML/schemas/v...,"{'@xml:lang': 'en', '#text': 'Coordinated Dire...","{'KeyFamilyID': 'CDIS', 'KeyFamilyAgencyID': '...",
4,DS-GFS01,1.0,IMF,True,http://www.SDMX.org/resources/SDMXML/schemas/v...,"{'@xml:lang': 'en', '#text': 'Government Finan...","{'KeyFamilyID': 'GFS01', 'KeyFamilyAgencyID': ...",


We need to flatten the dicts that we are encountering.

In [10]:
def flatten_if_dict(thing, reducer_type="underscore"):
    if isinstance(thing, dict):
        return flatten(thing, reducer=reducer_type)
    return thing
    

In [11]:
dataflows = pd.DataFrame(data = [
    flatten_if_dict(blob)
    for blob
    in list_of_dataflows
])

dataflows.head()

Unnamed: 0,@id,@version,@agencyID,@isFinal,@xmlns,Name_@xml:lang,Name_#text,KeyFamilyRef_KeyFamilyID,KeyFamilyRef_KeyFamilyAgencyID,Description_@xml:lang,Description_#text
0,DS-FAS,1.0,IMF,True,http://www.SDMX.org/resources/SDMXML/schemas/v...,en,Financial Access Survey (FAS),FAS,IMF,,
1,DS-MCDREO,1.0,IMF,True,http://www.SDMX.org/resources/SDMXML/schemas/v...,en,Middle East and Central Asia Regional Economic...,MCDREO,IMF,,
2,DS-DOT,1.0,IMF,True,http://www.SDMX.org/resources/SDMXML/schemas/v...,en,Direction of Trade Statistics (DOTS),DOT,IMF,,
3,DS-CDIS,1.0,IMF,True,http://www.SDMX.org/resources/SDMXML/schemas/v...,en,Coordinated Direct Investment Survey (CDIS),CDIS,IMF,,
4,DS-GFS01,1.0,IMF,True,http://www.SDMX.org/resources/SDMXML/schemas/v...,en,Government Finance Statistics (GFS 2001),GFS01,IMF,,


Let's do a bit of work:
 - checking what the data looks like
 - getting rid of unecessary columns
 - seeing if we can get a tidier layout for our data

In [12]:
dataflows.dtypes

@id                               object
@version                          object
@agencyID                         object
@isFinal                          object
@xmlns                            object
Name_@xml:lang                    object
Name_#text                        object
KeyFamilyRef_KeyFamilyID          object
KeyFamilyRef_KeyFamilyAgencyID    object
Description_@xml:lang             object
Description_#text                 object
dtype: object

In [13]:
for column_name in dataflows.columns:
    print(f"==> {column_name}")
    column_set = set(list(dataflows[column_name]))
    if len(column_set) <= 5:
        pprint(column_set)
    else:
        print(f"length: {len(column_set)}")
    print()

==> @id
length: 232

==> @version
{'1.0'}

==> @agencyID
{'IMF'}

==> @isFinal
{'true'}

==> @xmlns
{'http://www.SDMX.org/resources/SDMXML/schemas/v2_0/structure'}

==> Name_@xml:lang
{'en'}

==> Name_#text
length: 232

==> KeyFamilyRef_KeyFamilyID
length: 232

==> KeyFamilyRef_KeyFamilyAgencyID
{'IMF'}

==> Description_@xml:lang
{nan, 'en'}

==> Description_#text
length: 12



Okay, so this tells us that we can throw away a lot of the columns.

In [14]:
dataflows_clean = dataflows[["@id", "Name_#text", "KeyFamilyRef_KeyFamilyID", "Description_#text"]]
dataflows_clean.head()

Unnamed: 0,@id,Name_#text,KeyFamilyRef_KeyFamilyID,Description_#text
0,DS-FAS,Financial Access Survey (FAS),FAS,
1,DS-MCDREO,Middle East and Central Asia Regional Economic...,MCDREO,
2,DS-DOT,Direction of Trade Statistics (DOTS),DOT,
3,DS-CDIS,Coordinated Direct Investment Survey (CDIS),CDIS,
4,DS-GFS01,Government Finance Statistics (GFS 2001),GFS01,


It's also interesting to see the description column only contains 12 explanations. Let's check those out.

In [15]:
dataflows_clean[dataflows_clean["Description_#text"].notnull()][["@id", "Description_#text"]]

Unnamed: 0,@id,Description_#text
36,DS-RAFIT2AGG,Summary Statistics
87,DS-RAFIT3P,RA FIT Round 3 Completion and Participation Rates
88,DS-ISORA2016AGG,This Dataset will be used under Data Tab for I...
140,DS-MFS,MFS data from IFS External Publication - for E...
151,DS-BOP_2018M11,archive (10/25/2018 7:46:45 AM)
152,DS-BOPAGG_2018,archive (11/20/2018 9:25:21 AM)
184,DS-FAS_2018,including data from deleted indicators. Archvi...
185,DS-ISOCA2018PR,[Country.Code][Indicator.Code][Report.Code][St...
189,DS-ISORA2016OECD,[Country.Code][Indicator.Code] -- CERTIFY DATA...
190,DS-ISORA4OECDPUBLIC,CERTIFY DATA -- EDD PUBLIC USERS TAB 2016-2017...


Let's explore the IDs next. Do they all start with DS?

In [16]:
[
    _id
    for _id
    in dataflows_clean["@id"]
    if not _id.startswith("DS")
]

[]

We might get some additional info from the family key

In [17]:
dataflows_clean["KeyFamilyRef_KeyFamilyID"].describe()

count             232
unique            232
top       IFS_2018M03
freq                1
Name: KeyFamilyRef_KeyFamilyID, dtype: object

In fact, looking at the following, this could be another repertitive column . . .

In [18]:
dataflows_clean[["@id", "KeyFamilyRef_KeyFamilyID"]]

Unnamed: 0,@id,KeyFamilyRef_KeyFamilyID
0,DS-FAS,FAS
1,DS-MCDREO,MCDREO
2,DS-DOT,DOT
3,DS-CDIS,CDIS
4,DS-GFS01,GFS01
...,...,...
227,DS-GFSSSUC2019,GFSSSUC2019
228,DS-APDREO201910,APDREO201910
229,DS-IFS_2020M05,IFS_2020M05
230,DS-IFS_2020M06,IFS_2020M06


In [19]:
for row in dataflows_clean[["@id", "KeyFamilyRef_KeyFamilyID"]].itertuples():
    if row[1][3:] != row[2]:
        print(row[1], row[2])
