Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add support for CKAN integration #95

Open
tomkralidis opened this issue Nov 21, 2012 · 22 comments
Open

add support for CKAN integration #95

tomkralidis opened this issue Nov 21, 2012 · 22 comments

Comments

@tomkralidis
Copy link
Member

Similar to GeoNode and Open Data Catalog, CKAN carries its own data model.

Create similar binding to support CKAN CSW server support w/ pycsw.

(FYI closing #73 given the possible duplication messages due to the svn->git changeover)

@kalxas
Copy link
Member

kalxas commented Apr 29, 2013

Since this involves tight integration with CKAN, and right now we are supporting CKAN through DB store, I propose to slip this issue to next release. There is no time for 1.6.0.
Thoughts?

@kalxas
Copy link
Member

kalxas commented May 5, 2014

Current status:
CKAN uses mainly 2 tables to store records: package and package_extra.
The first is a normal relational table:
https://github.com/ckan/ckan/blob/master/ckan/model/package.py#L32
The second one is a key-value pair:
https://github.com/ckan/ckan/blob/master/ckan/model/package_extra.py#L17

In order to be able to integrate pycsw to CKAN there are 3 options:

  1. Create a pycsw plugin to read-write to those 2 tables.
  2. Create a pycsw table in ckan database and hook logical update, delete and insert CKAN actions to sync the pycsw table (and vise versa in case of pycsw harvesting...)
  3. Create a new pycsw CKAN backend which will use the CKAN API to interact with the database instead of SQL (this means abstract fes.py and repository.py and make pycsw a CKAN client).

I am starting a prototype for option 2 as an intermediate solution and will report back.

@tomkralidis
Copy link
Member Author

cc @smrazgs @rclark

@rclark
Copy link

rclark commented May 5, 2014

We pursued something like your second option in our NGDS project. You can have a look here: https://github.com/ngds/ckanext-ngds/tree/master/ckanext/ngds/csw

A lot of the mapping from a CKAN package to the pycsw table is done here: https://github.com/ngds/ckanext-ngds/blob/master/ckanext/ngds/csw/logic/pycsw.py#L31

We needed ISO support so creating the full-text of the XML doc was also tricky:
https://github.com/ngds/ckanext-ngds/blob/master/ckanext/ngds/csw/templates/package_to_iso.xml

@kalxas
Copy link
Member

kalxas commented May 5, 2014

Thanks @rclark
Did you use the action API to catch the package updates or another method?
I see that there is a commented block here:
https://github.com/ngds/ckanext-ngds/blob/master/ckanext/ngds/csw/plugin.py#L100

@rclark
Copy link

rclark commented May 5, 2014

That was the plan, but it was not thoroughly tested. @asonnenschein may know more about recent progress.

@kalxas
Copy link
Member

kalxas commented May 5, 2014

I would like to pursue option no 3 in the future.
This will need a backend refactoring and will open the road for NoSQL backends for pycsw.
This should probably happen for pycsw 2.x, since I expect breakage to happen :)

@kalxas
Copy link
Member

kalxas commented Jun 8, 2014

@kalxas
Copy link
Member

kalxas commented Dec 10, 2014

Full CKAN integration is now complete:
PublicaMundi/ckanext-publicamundi#70

@tomkralidis
Copy link
Member Author

cc @amercader

@kalxas great work here! Does this have any implications against master or is this all downstream in CKAN plugins?

Can you outline the approach? If this work is integrated into ckanext-spatial, then would this eliminate the need for doing the CKAN<->pycsw sync in favour of binding direct?

@kalxas
Copy link
Member

kalxas commented Dec 10, 2014

No implication against pycsw master, all done downstream, within a CKAN plugin called publicamundi_package.

The approach is this:

  1. ckanext-publicamundi has a plugin to define metadata schemas through zope-interface and zope-schema https://github.com/PublicaMundi/ckanext-publicamundi/tree/master/ckanext/publicamundi/lib/metadata
  2. This plugin helps developer define things like ISO-19115, DC etc and creates the schema in package extras of CKAN packages. At the same time it automatically generates a metadata editor UI in CKAN dataset.
  3. publicamundi_package plugin defines the csw_record table:
    https://github.com/PublicaMundi/ckanext-publicamundi/blob/master/ckanext/publicamundi/model/csw_record.py
    so it gets created when plugin is initiated (no need to install pycsw externally, it is in the pip requirements)
  4. Once the user adds a new dataset or updates a current one, we have defined actions to catch that:
    https://github.com/PublicaMundi/ckanext-publicamundi/blob/master/ckanext/publicamundi/plugins.py#L643
    https://github.com/PublicaMundi/ckanext-publicamundi/blob/master/ckanext/publicamundi/plugins.py#L661
  5. The pycsw synchronization happens every time a change happens in CKAN dataset:
    https://github.com/PublicaMundi/ckanext-publicamundi/blob/master/ckanext/publicamundi/lib/pycsw_sync.py

This work is part of geodata.gov.gr and is now in beta: http://labs.geodata.gov.gr/
If you want to try this out, there is an ansible script to install the demo in Debian 7:
https://github.com/PublicaMundi/labs.geodata.gov.gr/tree/master/deployment/common-debian

I also have a dev setup here:
http://83.212.104.89
http://83.212.104.89/csw?service=CSW&version=2.0.2&request=GetRecords&typenames=csw:Record&elementsetname=brief
Please send me an e-mail to create a login for you ;)

@kalxas
Copy link
Member

kalxas commented Dec 10, 2014

@amercader we are also planning to separate the schema plugin we created into a separate extension because we feel it is very useful as standalone. Any advice?

@amercader
Copy link
Contributor

@kalxas This is looking great! I had a quick look and I'm impressed by the amount of work you guys have done, well done.

Separating the schema plugin sounds useful, I wonder if it has some overlap with @wardi's https://github.com/open-data/ckanext-scheming.

The general approach looks fine, once we get the next CKAN release out of the way and I have more time it'd be great to catch up and know more in detail what has been done and what your plans are.

In the meantime if you want to come by to one of the weekly CKAN dev meetings to present this feel free to drop by.

Again, great work!

@kalxas
Copy link
Member

kalxas commented Dec 10, 2014

@amercader I would be happy to attend a dev meeting.
I have seen the work from @wardi recently (from the dev mailing list). I have not seen all the details yet but it would be interesting if we could merge into one big schema extension...

@kalxas
Copy link
Member

kalxas commented Dec 10, 2014

also @amercader @tomkralidis thanks for your nice words :)

@wardi
Copy link

wardi commented Dec 10, 2014

@kalxas I'd love to talk about how you're extending the dataset metadata. Dev meeting might be good. IIUC your plugin supports arbitrary nested data as well as a flat version of the same for form updates, is that right?

@drmalex07
Copy link

@wardi, @amercader, @kalxas,

Yes, indeed, we support arbitrary schemata expressed as zope.schema interfaces. Some core functionality (like flattening/unflattening) is shared across all metadata objects.
We have chosen to use zope.schema as a declarative means, mainly for the following reasons:

  • zope.interface libraries are well designed (and tested), and CKAN already makes heavy use of them as part of its plugin mechanism
  • no need to invent a new declarative "language"
  • it's pure python, so any logic (like field-level or object-level constraints) can be implemented directly there.

Note that, there is much to be done until we consider this as a ready-to-distribute extension. Of course, we are willing to join the conversation and exchange ideas at CKAN's dev meetings!

@kalxas
Copy link
Member

kalxas commented Dec 10, 2014

@wardi cool, is tomorrow's dev meeting at 16 UTC ok for you to discuss this?
me and @drmalex07 can make it.

@wardi
Copy link

wardi commented Dec 10, 2014

@kalxas @drmalex07 yes, I'll be there.

@wardi
Copy link

wardi commented Dec 11, 2014

@kalxas @tomkralidis for multilingual metadata and labels I strongly suggest an approach like https://github.com/open-data/ckanext-fluent data fields or https://github.com/open-data/ckanext-scheming/#label where you accept and produce dicts of BCP-47 language keys with string values

@kalxas
Copy link
Member

kalxas commented Dec 11, 2014

@wardi thanks! We are looking into this right now :)

@frafra
Copy link
Contributor

frafra commented Mar 31, 2023

There has not been a working PyCSW support in CKAN for many years, and PyCSW documentation should be updated accordingly: ckan/ckanext-spatial#297.

I added a PyCSW endpoint to CKAN by creating a tool that harvest from CKAN API and add the datasets to PyCSW: https://github.com/COATnor/coat2pycsw.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants