# unesco_reader tutorial

Pythonic access to UNESCO data

`unesco_reader` is a Python package providing a simple interface to access UNESCO Institute of Statistics (UIS)
data. UIS currently does not offer API access to its data. Users must download zipped files and extract the data.
This process requires several manual steps explained in their [python tutorial](https://apiportal.uis.unesco.org/bdds-tutorial). This package simplifies the process by providing a simple
interface to access, explore, and analyze the data, using pandas DataFrames. This package also
allows users to view dataset documentation and other information such as the latest update date for, and all
available datasets from UIS.

### Basic Usage

Start by installing and importing `unesco_reader`

In [27]:
!pip install unesco-reader > /dev/null



In [22]:
import unesco_reader as uis

Retrieve information about all the available datasets from UIS.

In [2]:
uis.info()

name                                                               latest_update    theme
-----------------------------------------------------------------  ---------------  ---------
SDG Global and Thematic Indicators                                 February 2024    Education
Other Policy Relevant Indicators (OPRI)                            February 2024    Education
Research and Development (R&D) SDG 9.5                             February 2024    Science
Research and Development (R&D) – Other Policy Relevant Indicators  February 2024    Science
SDG 11.4                                                           February 2024    Culture
Demographic and Socio-economic Indicators                          February 2024    External
Education Non Core Archive                                         February 2020    Archive
Innovation Archive                                                 April 2017       Archive
Cultural trade Archive                                             June 202

Retrieve a list of all available datasets from UIS.

In [3]:
uis.available_datasets()

['SDG Global and Thematic Indicators',
 'Other Policy Relevant Indicators (OPRI)',
 'Research and Development (R&D) SDG 9.5',
 'Research and Development (R&D) – Other Policy Relevant Indicators',
 'SDG 11.4',
 'Demographic and Socio-economic Indicators',
 'Education Non Core Archive',
 'Innovation Archive',
 'Cultural trade Archive',
 'Research and Development (R&D) Archive',
 'Cultural employment Archive',
 'Feature Film Archive']

Optionally you can specify a theme to filter the datasets.

In [4]:
uis.available_datasets(theme='Education')

['SDG Global and Thematic Indicators',
 'Other Policy Relevant Indicators (OPRI)']

To access data for a particular dataset, use the `UIS` class passing the name of the dataset. 
A `UIS` object allows a user to easily access, explore, and analyse the data.
On instantiation, the data will be extracted from the UIS website, or if it has already been 
extracted, it will be read from the cache (more on caching below)

In [5]:
sdg = uis.UIS("SDG Global and Thematic Indicators")
sdg

INFO: Dataset loaded successfully.


UIS dataset: SDG Global and Thematic Indicators

Basic information about the dataset can be accessed using the `info` method.

In [6]:
sdg.info()

-------------  ----------------------------------
name           SDG Global and Thematic Indicators
latest update  February 2024
theme          Education
-------------  ----------------------------------


Information is also accessible through the attributes of the object.

In [7]:
name = sdg.name
update = sdg.latest_update
theme = sdg.theme

print(f"Name: {name}\nUpdate: {update}\nTheme: {theme}")

Name: SDG Global and Thematic Indicators
Update: February 2024
Theme: Education


The `readme` attribute contains the dataset documentation. To display the documentation, use the `display_readme` method.

In [8]:
sdg.display_readme()

Dataset: Sustainable Development Goal 4
Release date: 2024-02
Data extracted on: 2024-02-16 11:52:41

- [Introduction](#introduction)
- [Intended Audience](#intended-audience)
- [Contact information](#contact-information)
- [Archive Content](#archive-content)
- [Data Model](#data-model)
- [Metadata](#metadata)
- [License](#license)
- [CHANGELOG all annotated changes for previous year](#changelog-all-annotated-changes-for-previous-year)

# Introduction
This archive consists of the latest official data disseminated by the UNESCO Institute of Statistics (UIS) for a specific dataset. This dataset was compiled using the latest data as of the date appearing at the beginning of this file. The UIS periodically updates this dataset. To find out when the next update will occur please visit http://uis.unesco.org.

# Intended Audience 
This archive is a result of a rigorous data production activity that ensures a high level of data quality. In order to expose the dataset to the la

Various methods exist to access the data.
To access country data:

In [9]:
sdg.get_country_data()

Unnamed: 0,country_id,country_name,indicator_id,indicator_label,year,value
0,VAT,Holy See,ROFST.2.GPIA.CP,Out-of-school rate for adolescents of lower se...,1970,0.000000
1,THA,Thailand,EA.4T8.AG25T99.GPIA,"Educational attainment rate, completed post-se...",1970,0.459320
2,VAT,Holy See,TRTP.2T3.GPIA,Proportion of teachers with the minimum requir...,1970,0.000000
3,LBR,Liberia,GER.5T8,"Gross enrolment ratio for tertiary education, ...",1970,0.891370
4,PAN,Panama,GER.5T8.M,"Gross enrolment ratio for tertiary education, ...",1970,7.298850
...,...,...,...,...,...,...
1113029,BGD,Bangladesh,CR.2.F,"Completion rate, lower secondary education, fe...",2024,81.015709
1113030,CHE,Switzerland,CR.2.F,"Completion rate, lower secondary education, fe...",2024,99.218857
1113031,MLT,Malta,CR.2.GPIA,"Completion rate, lower secondary education, ad...",2024,1.000714
1113032,BOL,Bolivia (Plurinational State of),CR.3.F,"Completion rate, upper secondary education, fe...",2024,76.721634


This will return a pandas DataFrame with the country data, in a structured and expected format.
By default the dataframe will not contain metadata. To include metadata in the output, set the `metadata` parameter to `True`.
Countries may also be filtered for a specific region by specifying the region's ID in the `region` parameter.
To see available regions use the `get_regions` method.

In [10]:
sdg.get_country_data(include_metadata=True, region='WB: World')

Unnamed: 0,country_id,country_name,indicator_id,indicator_label,year,value,magnitude,qualifier,Change:Data reporting,Source:Data sources,Under Coverage:Students or individuals
0,THA,Thailand,EA.4T8.AG25T99.GPIA,"Educational attainment rate, completed post-se...",1970,0.459320,,,,Source: Census,
1,LBR,Liberia,GER.5T8,"Gross enrolment ratio for tertiary education, ...",1970,0.891370,,,,,
2,PAN,Panama,GER.5T8.M,"Gross enrolment ratio for tertiary education, ...",1970,7.298850,,,,,
3,LBN,Lebanon,EA.S1T8.AG25T99,Educational attainment: at least some primary ...,1970,54.400002,,,,Source: Census,
4,USA,United States of America,EA.2T8.AG25T99,"Educational attainment rate, completed lower s...",1970,65.680931,,,,Source: Census,
...,...,...,...,...,...,...,...,...,...,...,...
1095690,BGD,Bangladesh,CR.2.F,"Completion rate, lower secondary education, fe...",2024,81.015709,,,,https://education-estimates.org/completion/dat...,
1095691,CHE,Switzerland,CR.2.F,"Completion rate, lower secondary education, fe...",2024,99.218857,,,,https://education-estimates.org/completion/dat...,
1095692,MLT,Malta,CR.2.GPIA,"Completion rate, lower secondary education, ad...",2024,1.000714,,,,https://education-estimates.org/completion/dat...,
1095693,BOL,Bolivia (Plurinational State of),CR.3.F,"Completion rate, upper secondary education, fe...",2024,76.721634,,,,https://education-estimates.org/completion/dat...,


Metadata, available countries, available regions, and variables are also accessible through class objects.


In [11]:
sdg.get_metadata() # metadata

Unnamed: 0,country_id,country_name,indicator_id,indicator_label,year,type,metadata
0,ALB,Albania,ADMI.ENDOFLOWERSEC.MAT,Administration of a nationally-representative...,2014,Source:Data sources,Programme for International Student Assessment...
1,ALB,Albania,ADMI.ENDOFLOWERSEC.MAT,Administration of a nationally-representative...,2015,Source:Data sources,Programme for International Student Assessment...
2,ALB,Albania,ADMI.ENDOFLOWERSEC.MAT,Administration of a nationally-representative...,2016,Source:Data sources,Programme for International Student Assessment...
3,ALB,Albania,ADMI.ENDOFLOWERSEC.MAT,Administration of a nationally-representative...,2017,Source:Data sources,Programme for International Student Assessment...
4,ALB,Albania,ADMI.ENDOFLOWERSEC.MAT,Administration of a nationally-representative...,2018,Source:Data sources,Programme for International Student Assessment...
...,...,...,...,...,...,...,...
859003,SVN,Slovenia,YADULT.PROFINUMERACY.WPIA,Proportion of population achieving at least a ...,2015,Source:Data sources,Programme for the International Assessment of ...
859004,SWE,Sweden,YADULT.PROFINUMERACY.WPIA,Proportion of population achieving at least a ...,2012,Source:Data sources,Programme for the International Assessment of ...
859005,TUR,Turkey,YADULT.PROFINUMERACY.WPIA,Proportion of population achieving at least a ...,2015,Source:Data sources,Programme for the International Assessment of ...
859006,USA,United States of America,YADULT.PROFINUMERACY.WPIA,Proportion of population achieving at least a ...,2014,Source:Data sources,Programme for the International Assessment of ...


In [12]:
sdg.get_countries() # available countries

Unnamed: 0,country_id,country_name
0,AFG,Afghanistan
1,ALB,Albania
2,DZA,Algeria
3,ASM,American Samoa
4,AND,Andorra
...,...,...
236,WLF,Wallis and Futuna Islands
237,WSM,Samoa
238,YEM,Yemen
239,COD,Democratic Republic of the Congo


In [13]:
sdg.get_regions() # available regions

Unnamed: 0,region_id,country_id,country_name,grouping_entity,region_name
0,AIMS: Asia and the Pacific,AFG,Afghanistan,AIMS,Asia and the Pacific
1,AIMS: Asia and the Pacific,AUS,Australia,AIMS,Asia and the Pacific
2,AIMS: Asia and the Pacific,BGD,Bangladesh,AIMS,Asia and the Pacific
3,AIMS: Asia and the Pacific,BTN,Bhutan,AIMS,Asia and the Pacific
4,AIMS: Asia and the Pacific,SLB,Solomon Islands,AIMS,Asia and the Pacific
...,...,...,...,...,...
7297,WB: World,VIR,United States Virgin Islands,WB,World
7298,WB: World,WSM,Samoa,WB,World
7299,WB: World,YEM,Yemen,WB,World
7300,WB: World,COD,Democratic Republic of the Congo,WB,World


In [14]:
sdg.get_variables() # available variables

Unnamed: 0,indicator_id,indicator_label
0,ADMI.ENDOFLOWERSEC.MAT,Administration of a nationally-representative...
1,ADMI.ENDOFLOWERSEC.READ,Administration of a nationally-representative...
2,ADMI.ENDOFPRIM.MAT,Administration of a nationally-representative...
3,ADMI.ENDOFPRIM.READ,Administration of a nationally-representative...
4,ADMI.GRADE2OR3PRIM.MAT,Administration of a nationally representative...
...,...,...
2132,YADULT.PROFINUMERACY.WPIA,Proportion of population achieving at least a ...
2133,YEARS.FC.COMP.02,Number of years of compulsory pre-primary educ...
2134,YEARS.FC.COMP.1T3,Number of years of compulsory primary and seco...
2135,YEARS.FC.FREE.02,Number of years of free pre-primary education ...


To refresh the data and extract the latest data from the UIS website, use the `refresh_data` method.

In [15]:
sdg.refresh()

INFO: Data refreshed successfully.


### Caching

Caching is used to prevent unnecessary requests to the UIS website and enhance performance.
To refresh data returned by functions, use the `refresh` parameter.

In [16]:
uis.info(refresh=True)

name                                                               latest_update    theme
-----------------------------------------------------------------  ---------------  ---------
SDG Global and Thematic Indicators                                 February 2024    Education
Other Policy Relevant Indicators (OPRI)                            February 2024    Education
Research and Development (R&D) SDG 9.5                             February 2024    Science
Research and Development (R&D) – Other Policy Relevant Indicators  February 2024    Science
SDG 11.4                                                           February 2024    Culture
Demographic and Socio-economic Indicators                          February 2024    External
Education Non Core Archive                                         February 2020    Archive
Innovation Archive                                                 April 2017       Archive
Cultural trade Archive                                             June 202

In [17]:
uis.available_datasets(refresh=True)

['SDG Global and Thematic Indicators',
 'Other Policy Relevant Indicators (OPRI)',
 'Research and Development (R&D) SDG 9.5',
 'Research and Development (R&D) – Other Policy Relevant Indicators',
 'SDG 11.4',
 'Demographic and Socio-economic Indicators',
 'Education Non Core Archive',
 'Innovation Archive',
 'Cultural trade Archive',
 'Research and Development (R&D) Archive',
 'Cultural employment Archive',
 'Feature Film Archive']

`refresh=True` will clear the cache and force extraction of the data and information from the UIS website.

For the `UIS` class, the `refresh_data` method will clear the cache and extract the latest data from the UIS website.

In [18]:
sdg.refresh()

INFO: Data refreshed successfully.


To clear all cached data, use the `clear_all_caches` method.

In [19]:
uis.clear_all_caches()

INFO: All caches cleared.
