Skip to content

rufuspollock/ckan-import

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

A stand-alone webapp to automate importing data from various sources into CKAN and its DataStore.

Install and Deployment

Get it from github:

git clone ...

We use foreman (as provided by Heroku)

foreman start -f Procfile-dev

Implementation

Focus on import of Data Package stored in Github

Import of a Generic Data Package

  • Given URL to datapackage.json
    • optionally a specific file(s) that have changed
    • plus the CKAN API Key (our bearer token)
  • Identify target CKAN dataset
    • Use DataPackage.name attribute for Dataset name
    • Identify target resources (by name)
  • Start import process for each data file

Import of a Single Tabular Data Package resource

Require

  • CSV file URL
  • schema (optional?)
  • Target resource id (or dataset name + resource name?)
  • CKAN API Key

Steps:

  • Load CSV file (into memory or do we stream?)
  • (?) Check Schema is valid
  • Convert Schema to DataStore schema
  • Send data to DataStore

Github hook

  • Receive webhook payload
  • Determine if any action needed
  • Boot up Data package import process

Extras / questions

  • Handle data file renames
  • How do we deal with a schema change (e.g. a change in type)
    • Ans: if we drop data resource content and recreate we should be ok ...
  • Do we try to be smart with updates and only push changed rows (probably not)

User Stories

Persona:

  • Data User - less sophisticated (uses Excel but may not know what an API is)
  • Data Wrangler - more sophisticated (knows what an API is)

Import File and get Data API

As a Data Wrangler I want to provide my file and have it imported into CKAN so that I get a Data API

What kind of file?

  • CSV file
  • Excel file
  • GeoJSON file
  • ...

How do I provide

  • web interface
  • API (POST/GET url string or POST file content)

Questions:

  • Do we validate the file?
  • Do we have some process for e.g. tweaking the field types
  • What is the mapping between file and Dataset / Resource

Implementation

  • DataPusher already does most of this
    • What's missing is any kind of edit metadata step
    • No user interface

As a XXX I want to push my data file to github and have it automatically create/update the CKAN DataStore so that my Data API is up to date

  • This is very similar to import file - only difference is we get push notifications (github webhooks). so merge this with that example.

Github import

As a XXX I want to push my tabular data package to github and have it automatically create/update the CKAN DataStore so that my Data API is up to date

  • As it is already a data package importing should be very simple
  • If file is large we may need to worry about queues etc but probably keep it simple for present
  • How do we determine dataset to associate this with in CKAN?

One-Click Create a Dataset

As a XXX I want to provide my file and have it imported into CKAN so that I get a nice Dataset

  • what distinguishes from existing system? Ans: one-click nature

Automated regular import

As a Data Wrangler I want to have my data file automatically re-imported at regular intervals so that the DataStore (and Data API) stays up to date with my data.

About

Webapp to automate data import into CKAN and its DataStore.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published