# Fetch files from remote resources using pooch

Including (large) input data files in repositories can lead to problems with version control performance or licensing.

One way to tackle this issue is to fetch remote resources on demand. 

The [pooch](https://www.fatiando.org/pooch/latest/) library makes this nice and easy.

## Example usage of pooch

* We want to fetch the files from the zenodo entry https://zenodo.org/record/5727072/. 
* This entry runs under an open source license so no user credentials are necessary. pooch can also work with user credentials, for details see the [pooch docs](https://www.fatiando.org/pooch/latest/authentication.html)).
* By default pooch saves the downloaded file to a hidden cache folder. This can be changed by providing a `path` arguemnt.
* Similarly, pooch names the file according to the last part of the url and a hash of the url ([see for details](https://www.fatiando.org/pooch/latest/api/generated/pooch.retrieve.html#pooch.retrieve)). This name can be customized by providing the `fname` argument.
* Since we are downloading a zip file, we would like to unzip it to access the contents. pooch provides a `processor` option where post download actions can be defined. For unzipping we use the pre-defined [`pooch.Unzip()`](https://www.fatiando.org/pooch/latest/api/generated/pooch.Unzip.html).
* In the following example we use `pooch.retrieve` to download our files. This conveniently returns the path to our downloaded files so that we can direclty work with them.

In [None]:
import pooch
from pathlib import Path

commit_data_name = "COMMIT_data_v1.1.zip"

files = pooch.retrieve(
    url="doi:10.5281/zenodo.5727072/COMMIT_data_v1.1.zip", # download from a doi
    known_hash="md5:62cd6e12fa21d12d7ce8e6b21739e440", # known hash of the zip file from zenodo
    fname=commit_data_name, # rename to the same name as on zenodo
    path=".", # save the output to the current working directory
    progressbar=True, # display a progress bar while the download is running
    processor=pooch.Unzip(extract_dir=".") # after download unzip and store the contents in the current working dir
    )
Path(commit_data_name).unlink() # since we already unpacked the zip and no longer need it, it is safe to remove

In [None]:
print(files)

In [None]:
import pyam
# since we now have our data file, we can start our analysis
data = pyam.IamDataFrame(files[0])