# unesco_reader tutorial

`unesco_reader` can be used to extract, interact and explore UNESCO data. Current functionality is limited to UNESCO Institute of Statistics (UIS) data, that can be accessed through their [bulk download service](https://apiportal.uis.unesco.org/bdds). `unesco_reader` removes the need to manually download the data, and offers some simple tools to interact with and explore the datasets. More analytical functionality is coming soon.

To explore UIS data import the `uis` module from `unesco_reader`

In [9]:
from unesco_reader import uis

Explore the available datasets and get information about a particular dataset

In [3]:
uis.available_datasets() # see available datasets - returns a list of dataset codes

['SDG', 'OPRI', 'SCI', 'SDG11', 'DEM']

In [4]:
uis.available_datasets(as_names=True) # get the full names for available datasets

['SDG Global and Thematic Indicators',
 'Other Policy Relevant Indicators',
 'Research and Development (R&D) SDG 9.5',
 'SDG 11.4',
 'Demographic and Socio-economic Indicators']

In [6]:
uis.available_datasets(category='education')

['SDG', 'OPRI']

In [2]:
uis.dataset_info('SDG') #get information about the SDG dataset

----------------  ----------------------------------
dataset_name      SDG Global and Thematic Indicators
dataset_code      SDG
dataset_category  education
----------------  ----------------------------------


To access and explore the data for a particular UIS dataset, use the `UIS` object

In [4]:
# First, instantiate a `UIS` object, passing the dataset code or name that you want to explore
# Here we are going to instantiate the `SDG` dataset

sdg = uis.UIS('SDG') # you can also pass the dataset name `SDG Global and Thematic Indicators`
sdg

<unesco_reader.uis.UIS at 0x1f13e4ed3c0>

You can get information about the dataset, such as dataset name, code, category, and link to download the data zipped file


In [15]:
sdg.dataset_code

'SDG'

In [14]:
sdg.dataset_name

'SDG Global and Thematic Indicators'

In [16]:
sdg.dataset_category

'education'

In [17]:
sdg.link

'https://apimgmtstzgjpfeq2u763lag.blob.core.windows.net/content/MediaLibrary/bdds/SDG.zip'

In order to explore the data, use the `load_data` which loads data to the object by downloading it from
UNESCO, cleaning it, and formatting it to a pandas DataFrame.

If you already downloaded the zipped file locally,
you can pass the path to the file, and the data will be read from the local file rather than being downloaded.

In [5]:
sdg.load_data() # optionally pass `local_path = "path to zipped file..."` to use a locally downloaded file

INFO 2023-01-29 14:32:11,791 [uis.py:load_data:372] Data loaded for dataset: SDG


<unesco_reader.uis.UIS at 0x1f13e4ed3c0>

Now that the data is loaded to the object you can start exploring it!

To get general information about the dataset use the `info()` methos

In [6]:
sdg.info()

--------------------  ----------------------------------
dataset_name          SDG Global and Thematic Indicators
dataset_code          SDG
dataset_category      education
available indicators  1609
available countries   241
time range            1950 - 2022
available regions     179
--------------------  ----------------------------------


You can take a look at the available indicators

In [7]:
# return a list of available indicator codes
indicators = sdg.available_indicators()
indicators[0: 5] # these are only the first 6 indicators

['ADMI.ENDOFLOWERSEC.MAT',
 'ADMI.ENDOFLOWERSEC.READ',
 'ADMI.ENDOFPRIM.MAT',
 'ADMI.ENDOFPRIM.READ',
 'ADMI.GRADE2OR3PRIM.MAT']

In [8]:
# get the names of indicators
indicator_names = sdg.available_indicators(as_names=True)
indicator_names[0:5] # these are the first 5 indicators

[' Administration of a nationally-representative learning assessment at the end of lower secondary education in mathematics (number)',
 ' Administration of a nationally-representative learning assessment at the end of lower secondary education in reading (number)',
 ' Administration of a nationally-representative learning assessment at the end of primary in mathematics (number)',
 ' Administration of a nationally-representative learning assessment at the end of primary in reading (number)',
 ' Administration of a nationally representative learning assessment in Grade 2 or 3 in mathematics (number)']

You can explore the countries that are available

In [8]:
# get a list of available countries
countries = sdg.available_countries()
countries[0:10] # these are only the first 10 countries

['AFG', 'ALB', 'DZA', 'ASM', 'AND', 'AGO', 'AIA', 'ATG', 'ARG', 'ARM']

In [9]:
# get the list of countries as country names
country_names = sdg.available_countries(as_names=True)
country_names[0:5]

['Afghanistan', 'Albania', 'Algeria', 'American Samoa', 'Andorra']

In [11]:
# You can also see which countries belong to a particular region
# here we will see which countries belong to the World Bank's country grouping for MENA
mena = sdg.available_countries(as_names=True, region='WB: Middle East & North Africa')
mena

['Algeria',
 'Bahrain',
 'Egypt',
 'Djibouti',
 'Iran (Islamic Republic of)',
 'Iraq',
 'Israel',
 'Jordan',
 'Kuwait',
 'Lebanon',
 'Libya',
 'Malta',
 'Morocco',
 'Oman',
 'Palestine',
 'Qatar',
 'Saudi Arabia',
 'Syrian Arab Republic',
 'Tunisia',
 'United Arab Emirates',
 'Yemen']

In [13]:
# you can also see the regions that are available.
# Note that some datasets may not have regional data, so calling this function may raise an error explaining that regional data is not available

# Additional functionality to explore regional grouping by source eg AIMS, WB is coming soon

regions = sdg.available_regions()
regions[0:5] # these are only the first 5 regions

['AIMS: Asia and the Pacific',
 'AIMS: Central Asia',
 'AIMS: East Asia',
 'AIMS: East Asia and the Pacific',
 'AIMS: Pacific']

In order the get the data, use the `get_data()` method

In [6]:
df = sdg.get_data()

In [7]:
# if you are interested in regional data, you can specify the grouping
df = sdg.get_data(grouping='regional')

In [8]:
# You can also include metadata in the outputted dataframe
df = sdg.get_data(include_metadata=True)

Much more functionality is coming soon! If you have suggestions to improve or add to the package, please contribute by opening an issue!