This repo will guide you through the challenge 3, in which you will implement a data elaboration, which uses and writes data to the Open Data Hub.
The goal of this bootcamp is to foster our community around the Open Data Hub, to get to know each other, the technology, but also for you to just learn stuff and make friends. This is not a competition, so don't be afraid to make mistakes and try out new stuff. Our Open Data Hub team is there to help and support you.
While the result of this bootcamp will be a prototype, if the result is convincing, we consider integrating it into the Open Data Hub permanently, so your effort will be a permanent part of the project!
You will also have an opportunity to present the results to the public at our upcoming event, the Open Data Hub Day
An elaboration is a small application that periodically queries time series data from the Open Data Hub, performs some aggregation or calculation, and then writes the result to the Open Data Hub.
We have elaborations ranging from something as simple as calculating the number of free parking spaces (by subtracting occupied from total capacity) to machine learning models predicting pollution metrics using traffic flow data.
Your challenge will be to implement an elaboration using data from E-charging stations. You can pick as many things from this list, ordered by increasing difficulty. We are also open to suggestions if you think you have a cool idea that's not in the list.
- Calculate the average availability percentage of echarging stations, aggregated per hour
- Determine how many times a charging station has been used in a day (number of charging events)
- How long is the average charging duration (aggregated per day)
- Predict the availability of an echarging station within the next hour (free choice of prediction model)
You are free to use whatever programming language best fits your group, all you need to do is a couple of REST calls and wrangle some JSON. Ours are written in Java, Python and Go.
We strongly suggest you make your app run in a docker container. This way all group members can develop in a similar environment. It's also how we package our own applications on the Open Data Hub. If you don't know docker, this is a good opportunity to get to know it, and we are happy to help you if you run into any issues.
There are three APIs you will interact with.
Note that for this challenge you will interact with the Time Series / Mobility
APIs, and not it's sibling, the Content / Tourism
domain
In this challenge you will
- Ask Keycloak for an authorization token
- Query the Ninja API for e-charging stations and their measurements of data type 'echarging-plug-status'
- Elaborate and aggregate the data
- Push the result to the BDP API, along with a new data type and a unique provenance
Keycloak is an Open Source Identity and Access management server (keycloak.org).
We use it to authenticate and authorize our services via the OAuth2 standard.
For you, this boils down to making a REST call supplying your credentials (which we will provide to you), and you get back an authorization token.
You then have to pass this token as Authorization: Bearer <token>
HTTP header on every call to our Open Data Hub APIs.
Ninja is the name of the API you use this to request time series data from the Open Data Hub. This is where you get your base data from that you elaborate. The production URL is mobility.api.opendatahub.com
BDP (Big Data Platform) is the API you use to write time series data on the Open Data Hub. This API is not intended for public use and can only be accessed with authorized credentials.
Time series data takes the form of Measurements
.
A measurement is a data point with a timestamp.
Each measurement has exactly one
Station
, a geographical points with a name, ID and some additional information. It's the location where measurements are made. Think a physical e-charging station somewhere on a parking lot. Or a thermometer somewhere in a field. Stations can have a parent station, e.g. a thermometer that is part of a greater weather stationData type
, which, identifies what type of measurement it actually is. Is it a temperature? Is it the number of available cars? Is it the current occupation state of an echarging column? Is it an average, or a discrete value?Provenance
, a unique identifier and version number of the app that provided the measurement.Period
, the timeframe (in seconds) that the measurement references, and the periodicity with which it is updated. e.g. a temperature sensor that sends us it's data every 60 seconds has a period of 60 seconds.
A Station might have measurements of 0-n Data types, for example a weather station could have both temperature
and humidity
measurements.
An e-charging station that we know exists, but doesn't provide any real time data, probably has no measurements at all.
Stations exist independently of measurements.
For this challenge you will use the E-charging datasets, which are part of the Open Data Hub's mobility domain.
To make things simpler, we will limit ourselves to data from only one data provider that has a modest number of stations origin: DRIVE
.
If you are feeling adventurous, you can extend it to also include ALPERIA
and route220
,
E-charging stations are organized in two levels:
- station type
EChargingStation
represent a location where one or more EV chargers are located. - station type
EChargingPlug
represents an individual e-charging column, that is always part of a EChargingStation (it's parent)
The stations have measurements of data type number-available
, which indicates how many columns are currently available at the location
The plugs have measurements of data type echarging-plug-status
, which is 0 or 1, indicating if the column is in use or not
For most challenges it makes sense to work at the plug level, and then aggregate your result up to station. Note that many of our data consumers will want to query on a station level, so you should also provide that data.
For the challenge you will work with a local instance of the Time Series services.
- Install docker and docker compose (make sure compose is a recent version 2+)
- In this directory, run
docker compose up
- Wait for the services to start
If everything is running correctly, you should now have a basic Open Data Hub core running:
Service | Note | Protocol | Port | |
---|---|---|---|---|
BDP | time series writer API | http | 8081 | |
Ninja | time series request API | http | 8082 | |
Analytics | visual frontend | http | 8999 | |
Postgis | Postgres database | postgres | 5555 |
curl.sh
contains some basic calls so you know it's working.
Refer to the the API documentation and wiki entries linked below for further details.
We use Oauth2 for authentication and authorization. You will need a valid bearer token to gain write access on the API. You will probably not need a token to read the data you've wrote from the API.
For this challenge, when using the local instance of the APIs, you can use these credentials:
client_id: odh-mobility-datacollector-development client_secret: 7bd46f8f-c296-416d-a13d-dc81e68d0830
With these credentials, using the client_credentials
flow (which effectively means just pass the credentials above in the request) obtain an authentication token from keycloak and add is as Authorization: Bearer
header
Swagger Ninja API
Ninja API repo
BDP API swagger
BDP writer repo
Developing data collectors
Some (partly outdated) howtows
In the examples directory you can find a basic prototype of how to push data in python and curl.
The repo with our elaborations can be found here
parking-free-slot-calculation
is by far the simplest one
traffic-a22-data-quality
is written in go and contains libraries for BDP and Ninja already