# Get the latest resource from a given dataset on data.public.lu

Let's suppose that you need to download the latest resource from a dataset every month. This file will then be imported into your database. With the API available on data.public.lu, you can automate this download. 

Let's initialise some stuff:

In [39]:
import requests, re, os
from dotenv import load_dotenv 
load_dotenv()
API = os.environ.get('API')

## Parameters

Here we need to set the identifier of the dataset. You can for example find it in the url of a given dataset.
In a dataset, the data producers can put different kind of files. Some of them even put the documentation as a resource. In this case, it can be helpful to filter the resources based on a regular expression. 

In [40]:
# the id of the Dataset you want to get
dataset = 'parc-automobile-du-luxembourg'

# the pattern matching the names of the files that can be downloaded
fPattern = '^Parc_Automobile_\\d{6}.xlsx$'

## Find the right resource

Here we call the API to get all the info about a dataset. This info contains the list of resources of this dataset. We filter these resources to keep only the ones matching the previously defined pattern and we sort them by publication date. 
The first resource of this list is thus the latest one and we can proceed to download it.

In [41]:
r = requests.get('{}/datasets/{}/'.format(API, dataset))
r.raise_for_status()

resources = r.json()['resources']
resources = list(filter(lambda x: re.search(fPattern, x['title']), resources))
resources.sort(key=lambda x: x['published'], reverse=True)
print('files available in the dataset with id "{}" and matching the pattern /{}/:'.format(dataset, fPattern))
print(list(map(lambda x: x['title'], resources)))

files available in the dataset with id "parc-automobile-du-luxembourg" and matching the pattern /^Parc_Automobile_\d{6}.xlsx$/:
['Parc_Automobile_202206.xlsx', 'Parc_Automobile_202205.xlsx', 'Parc_Automobile_202204.xlsx', 'Parc_Automobile_202203.xlsx', 'Parc_Automobile_202202.xlsx', 'Parc_Automobile_202201.xlsx', 'Parc_Automobile_202112.xlsx', 'Parc_Automobile_202111.xlsx', 'Parc_Automobile_202110.xlsx', 'Parc_Automobile_202109.xlsx', 'Parc_Automobile_202108.xlsx', 'Parc_Automobile_202107.xlsx', 'Parc_Automobile_202106.xlsx', 'Parc_Automobile_202105.xlsx', 'Parc_Automobile_202104.xlsx', 'Parc_Automobile_202103.xlsx', 'Parc_Automobile_202102.xlsx', 'Parc_Automobile_202101.xlsx', 'Parc_Automobile_202012.xlsx', 'Parc_Automobile_202011.xlsx', 'Parc_Automobile_202010.xlsx', 'Parc_Automobile_202009.xlsx', 'Parc_Automobile_202008.xlsx', 'Parc_Automobile_202007.xlsx', 'Parc_Automobile_202006.xlsx', 'Parc_Automobile_202005.xlsx', 'Parc_Automobile_202004.xlsx', 'Parc_Automobile_202003.xlsx', 'Pa

## Download the file

When the resource has been found, we have its URL on data.public.lu. We download the file and store it in the current folder.

In [38]:
if resources:
    print('Downloading: '+ resources[0]['title'])
    s = requests.get(resources[0]['url'])
    s.raise_for_status()
    with open(resources[0]['title'], 'wb') as f:
        f.write(s.content)
    print('Downloaded!')



Downloading: Parc_Automobile_202206.xlsx
Downloaded!
