GitHub - rufuspollock/ckan-import: Webapp to automate data import into CKAN and its DataStore.

A stand-alone webapp to automate importing data from various sources into CKAN and its DataStore.

Install and Deployment

Get it from github:

git clone ...

We use foreman (as provided by Heroku)

foreman start -f Procfile-dev

Implementation

Focus on import of Data Package stored in Github

Import of a Generic Data Package

Given URL to datapackage.json
- optionally a specific file(s) that have changed
- plus the CKAN API Key (our bearer token)
Identify target CKAN dataset
- Use DataPackage.name attribute for Dataset name
- Identify target resources (by name)
Start import process for each data file

Import of a Single Tabular Data Package resource

Require

CSV file URL
schema (optional?)
Target resource id (or dataset name + resource name?)
CKAN API Key

Steps:

Load CSV file (into memory or do we stream?)
(?) Check Schema is valid
Convert Schema to DataStore schema
Send data to DataStore

Github hook

Receive webhook payload
Determine if any action needed
Boot up Data package import process

Extras / questions

Handle data file renames
How do we deal with a schema change (e.g. a change in type)
- Ans: if we drop data resource content and recreate we should be ok ...
Do we try to be smart with updates and only push changed rows (probably not)

User Stories

Persona:

Data User - less sophisticated (uses Excel but may not know what an API is)
Data Wrangler - more sophisticated (knows what an API is)

Import File and get Data API

As a Data Wrangler I want to provide my file and have it imported into CKAN so that I get a Data API

What kind of file?

CSV file
Excel file
GeoJSON file
...

How do I provide

web interface
API (POST/GET url string or POST file content)

Questions:

Do we validate the file?
Do we have some process for e.g. tweaking the field types
What is the mapping between file and Dataset / Resource

Implementation

DataPusher already does most of this
- What's missing is any kind of edit metadata step
- No user interface

As a XXX I want to push my data file to github and have it automatically create/update the CKAN DataStore so that my Data API is up to date

This is very similar to import file - only difference is we get push notifications (github webhooks). so merge this with that example.

Github import

As a XXX I want to push my tabular data package to github and have it automatically create/update the CKAN DataStore so that my Data API is up to date

As it is already a data package importing should be very simple
If file is large we may need to worry about queues etc but probably keep it simple for present
How do we determine dataset to associate this with in CKAN?

One-Click Create a Dataset

As a XXX I want to provide my file and have it imported into CKAN so that I get a nice Dataset

what distinguishes from existing system? Ans: one-click nature

Automated regular import

As a Data Wrangler I want to have my data file automatically re-imported at regular intervals so that the DataStore (and Data API) stays up to date with my data.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
lib		lib
routes		routes
test		test
views		views
Procfile-dev		Procfile-dev
README.md		README.md
app.js		app.js
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Install and Deployment

Implementation

Import of a Generic Data Package

Import of a Single Tabular Data Package resource

Github hook

User Stories

Import File and get Data API

Github import

One-Click Create a Dataset

Automated regular import

About

Releases

Packages

Languages

rufuspollock/ckan-import

Folders and files

Latest commit

History

Repository files navigation

Install and Deployment

Implementation

Import of a Generic Data Package

Import of a Single Tabular Data Package resource

Github hook

User Stories

Import File and get Data API

Github import

One-Click Create a Dataset

Automated regular import

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages