Pythonic access to UNESCO data
unesco_reader
is a Python package that provides a simple interface to access UNESCO Institute of Statistics (UIS)
data. UIS currently does not offer API access to its data. Users must download zipped files and extract the data.
This process requires several manual steps explained in their python tutorial. This package simplifies the process by providing a simple
interface to access, explore, and analyze the data, already structured and formatted through pandas DataFrames. This package also
allows users to view dataset documentation and other information such as the date of last update, as well as retrieve
information about all available datasets from UIS.
UIS data is expected to be accessible through the DataCommons API in the future and should be the preferred method to access the data. Future versions of this package may include support for the API, or may be deprecated and remain as a legacy package.
This package is designed to scrape data from the UIS website. As a result of this approach the package may be subject to breakage if the website structure or data file formats change without notice. Please report any unexpected errors or issues you encounter. All feedback, suggestions, and contributions are welcome!
$ pip install unesco-reader
Importing the package
import unesco_reader as uis
Retrieve information about all the available datasets from UIS.
uis.info()
This function will display all available datasets and relevant information about them.
>>>
name latest_update theme
----------------------------------------------------------------- --------------- ---------
SDG Global and Thematic Indicators February 2024 Education
Other Policy Relevant Indicators (OPRI) February 2024 Education
Research and Development (R&D) SDG 9.5 February 2024 Science
Research and Development (R&D) – Other Policy Relevant Indicators February 2024 Science
...
Retrieve a list of all available datasets from UIS.
uis.available_datasets()
>>> ['SDG Global and Thematic Indicators',
'Other Policy Relevant Indicators (OPRI)',
'Research and Development (R&D) SDG 9.5',
...]
Optionally you can specify a theme to filter the datasets.
uis.available_datasets(theme='Education')
To access data for a particular dataset, use the UIS
class passing the name of the dataset.
A UIS
object allows a user to easily access, explore, and analyse the data.
On instantiation, the data will be extracted from the UIS website, or if it has already been
extracted, it will be read from the cache (more on caching below)
from unesco_reader import UIS
sdg = UIS("SDG Global and Thematic Indicators")
Basic information about the dataset can be accessed using the info
method.
sdg.info()
This will display information about the dataset, such as the name, and the latest update, and theme
>>>
------------- ----------------------------------
name SDG Global and Thematic Indicators
latest update February 2024
theme Education
------------- ----------------------------------
Information is also accessible through the attributes of the object.
name = sdg.name
update = sdg.latest_update
theme = sdg.theme
documentation = sdg.readme
The readme
attribute contains the dataset documentation. To display the documentation, use the display_readme
method.
sdg.display_readme()
Various methods exist to access the data. To access country data:
df = sdg.get_country_data()
This will return a pandas DataFrame with the country data, in a structured and expected format.
By default the dataframe will not contain metadata. To include metadata in the output, set the include_metadata
parameter to True
.
Countries may also be filtered for a specific region by specifying the region's ID in the region
parameter.
To see available regions use the get_regions
method.
df = sdg.get_country_data(include_metadata=True, region='WB: World')
To access regional data:
df = sdg.get_region_data()
This will return a pandas DataFrame with the regional data, in a structured and expected format. Note that not all datasets contain regional data.
If the dataset does not contain regional data, an error will be raised. This is the same for any other data that is not available for the particular dataset.
By default the dataframe will not contain metadata. To include metadata in the output, set the include_metadata
parameter to True
.
Metadata, available countries, available regions, and variables are also accessible through class objects.
metadata_df = sdg.get_metadata()
countries_df = sdg.get_countries()
regions_df = sdg.get_regions()
variables_df = sdg.get_variables()
To refresh the data and extract the latest data from the UIS website, use the refresh
method.
sdg.refresh()
Caching is used to prevent unnecessary requests to the UIS website and enhance performance.
To refresh data returned by functions, use the refresh
parameter. Caching using the LRU
(Least Recently Used) algorithm approach and stores data in RAM. The cache is cleared when the
program is terminated.
uis.info(refresh=True)
uis.available_datasets(refresh=True)
refresh=True
will clear the cache and force extraction of the data and information from the UIS website.
For the UIS
class, the refresh
method will clear the cache and extract the latest data from the UIS website.
sdg.refresh()
To clear all cached data, use the clear_all_caches
method.
uis.clear_all_caches()
All contributions are welcome! If you find a bug, or have a suggestion for a new feature, or an improvement on the documentation please open an issue. Since this project is under current development, please check open issues and make sure the issue has not been raised already.
A detailed overview of the contribution process can be found here. By contributing to this project, you agree to abide by its terms.
unesco_reader
was created by Luca Picci. It is licensed under the terms of the MIT license.
unesco_reader
was created with cookiecutter
and the
py-pkgs-cookiecutter
template.