# Introduction of `geodatasets`

The `geodatasets` contains an API on top of a JSON with metadata of externally hosted datasets containing geospatial information useful for illustrative and educational purposes.

In [1]:
import geodatasets

## What is the `geodatasets.data` object?

The `geodatasets.data` objects are stored as dictionaries (with attribute access for easy access), so we can explore them in Python (or using tab completion in an interactive session or as a collapsible inventory in Jupyter!):

In [2]:
geodatasets.data

Datasets are normally grouped by the provider.

In [3]:
geodatasets.data.geoda

Looking at a single dataset, for example “atlanta”, we can see this is a `Dataset` object, which behaves as a dict:

In [4]:
geodatasets.data.geoda.atlanta

A Dataset is then a dictionary with a few required entries, such as the url, with some additional metadata information (optional). These can be the attribution, size, license or description.

## Downloading and caching

You can use `geodatasets` to download and cache individual datasets, or just retrieve the URL. Top-level API accepts a string that returns a matching Dataset from the Bunch if the name contains the same letters in the same order as the item’s name irrespective of the letter case, spaces, dashes and other characters.

In [5]:
geodatasets.get_url("geoda atlanta")

'https://geodacenter.github.io/data-and-lab//data/atlanta_hom.zip'

Or we can ask for a local path. If the file is already in cache, it is just returned. Otherwise, it gets downloaded first.

In [6]:
geodatasets.get_path("geoda atlanta")

Downloading file 'atlanta_hom.zip' from 'https://geodacenter.github.io/data-and-lab//data/atlanta_hom.zip' to '/Users/martin/Library/Caches/geodatasets'.


'/Users/martin/Library/Caches/geodatasets/atlanta_hom.zip'