# unesco_reader tutorial

Pythonic access to UNESCO data

`unesco_reader` is a Python package providing a simple interface to access UNESCO Institute of Statistics (UIS)
data. UIS currently does not offer API access to its data. Users must download zipped files and extract the data.
This process requires several manual steps explained in their [python tutorial](https://apiportal.uis.unesco.org/bdds-tutorial). This package simplifies the process by providing a simple
interface to access, explore, and analyze the data, using pandas DataFrames. This package also
allows users to view dataset documentation and other information such as the latest update date for, and all
available datasets from UIS.

### Basic Usage

Start by installing and importing `unesco_reader`

In [27]:
!pip install unesco-reader > /dev/null

In [22]:
import unesco_reader as uis

Retrieve information about all the available datasets from UIS.

In [2]:
uis.info()

Retrieve a list of all available datasets from UIS.

In [3]:
uis.available_datasets()

Optionally you can specify a theme to filter the datasets.

In [4]:
uis.available_datasets(theme='Education')

To access data for a particular dataset, use the `UIS` class passing the name of the dataset. 
A `UIS` object allows a user to easily access, explore, and analyse the data.
On instantiation, the data will be extracted from the UIS website, or if it has already been 
extracted, it will be read from the cache (more on caching below)

In [5]:
sdg = uis.UIS("SDG Global and Thematic Indicators")
sdg

Basic information about the dataset can be accessed using the `info` method.

In [6]:
sdg.info()

Information is also accessible through the attributes of the object.

In [7]:
name = sdg.name
update = sdg.latest_update
theme = sdg.theme

print(f"Name: {name}\nUpdate: {update}\nTheme: {theme}")

The `readme` attribute contains the dataset documentation. To display the documentation, use the `display_readme` method.

In [8]:
sdg.display_readme()

Various methods exist to access the data.
To access country data:

In [9]:
sdg.get_country_data()

This will return a pandas DataFrame with the country data, in a structured and expected format.
By default the dataframe will not contain metadata. To include metadata in the output, set the `metadata` parameter to `True`.
Countries may also be filtered for a specific region by specifying the region's ID in the `region` parameter.
To see available regions use the `get_regions` method.

In [10]:
sdg.get_country_data(include_metadata=True, region='WB: World')

Metadata, available countries, available regions, and variables are also accessible through class objects.


In [11]:
sdg.get_metadata() # metadata

In [12]:
sdg.get_countries() # available countries

In [13]:
sdg.get_regions() # available regions

In [14]:
sdg.get_variables() # available variables

To refresh the data and extract the latest data from the UIS website, use the `refresh_data` method.

In [15]:
sdg.refresh()

### Caching

Caching is used to prevent unnecessary requests to the UIS website and enhance performance.
To refresh data returned by functions, use the `refresh` parameter.

In [16]:
uis.info(refresh=True)

In [17]:
uis.available_datasets(refresh=True)

`refresh=True` will clear the cache and force extraction of the data and information from the UIS website.

For the `UIS` class, the `refresh_data` method will clear the cache and extract the latest data from the UIS website.

In [18]:
sdg.refresh()

To clear all cached data, use the `clear_all_caches` method.

In [19]:
uis.clear_all_caches()