Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GetRecords handling should not filter records based on typenames value #105

Closed
tomkralidis opened this issue Feb 11, 2013 · 4 comments
Closed
Assignees
Labels
Milestone

Comments

@tomkralidis
Copy link
Member

The current behaviour for handling GetRecords.typename is to filter records based on typename before applying any OGC filters to the query. Example:

  • catalogue A has 20 records
    • 10 Dublin Core (csw:Record)
    • 10 ISO (gmd:MD_Metadata)
  • when a client does a GetRecords request with typenames=csw:Record, catalogue A returns the 10 DC records
  • when a client does a GetRecords request with typenames=gmd:MD_Metadata, catalogue A returns the 10 ISO records
  • when a client does a GetRecords request with typenames=csw:Record,gmd:MD_Metadata, catalogue A returns the 10 DC and 10 ISO records for a total of 20 records

As is happens, GetRecords queries (with filter or not) should always query against all metadata, based on the typeNames schema, and return all results encoded using the outputSchema (which we already do). This is confirmed w/ @smrAzGS' comments as well as CSW spec authors.

So in the codebase, we need to remove the part of the repository query which initially filters by typename so that the entire repository is searched and not filtered by typenames.

@rclark / @smrAzGS: does this make sense?

@ghost ghost assigned tomkralidis Feb 11, 2013
@smrgeoinfo
Copy link

Tom—makes sense to me. I’d adjust the the ‘As it happens,…’ paragraph to say ‘query against all metadata, based on the typeNames schema, and return all results encoded using the outputSchema’. I’m pretty sure most implementations don’t behave this way, because to actually implement, one has to map the query schema elements from the typeName schema to the schema of each metadata schema used in the catalog, and has to transform from records in any schema stored in the dB to the output schema.

To address this issue, GeoPortal maps incoming metadata record elements to Lucene index elements and to build lucene indexes for each queryable property (I think Geonetwork does the same). GeoPortal doesn’t actually honor the outputSchema parameter. GeoNetwork provides output XSLT’s to transform the xml blob from the dB into the requested outputSchema if it is different from the schema for the XML blob in the dB.

Another solution to the problem is used by Deegree—marshal harvested metadata to a relational dB schema, map incoming requests from whatever typeName schema is supported to SQL against the relational dB, and have routines to build XML output in any supported outputSchema.

As I interpret the CSW spec, if the capabilities list an outputSchema, then the server needs to be able to provide any record in its metadata store in that schema.

steve

From: Tom Kralidis [mailto:notifications@github.com]
Sent: Monday, February 11, 2013 10:51 AM
To: geopython/pycsw
Cc: Stephen Richard
Subject: [pycsw] GetRecords handling should not filter records based on typenames value (#105)

The current behaviour for handling GetRecords.typename is to filter records based on typename before applying any OGC filters to the query. Example:

  • catalogue A has 20 records
  • 10 Dublin Core (csw:Record)
  • 10 ISO (gmd:MD_Metadata)
  • when a client does a GetRecords request with typenames=csw:Record, catalogue A returns the 10 DC records
  • when a client does a GetRecords request with typenames=gmd:MD_Metadata, catalogue A returns the 10 ISO records
  • when a client does a GetRecords request with typenames=csw:Record,gmd:MD_Metadata, catalogue A returns the 10 DC and 10 ISO records for a total of 20 records

As is happens, GetRecords queries (with filter or not) should always query against all metadata, in any advertised outputschema (which we already do). This is confirmed w/ @smrAzGS https://github.com/smrAzGS ' comments as well as CSW spec authors.

So in the codebase, we need to remove the part of the repository query which initially filters by typename so that the entire repository is searched and not filtered by typenames.

@rclark https://github.com/rclark / @smrAzGS https://github.com/smrAzGS : does this make sense?


Reply to this email directly or view it on GitHub #105 .

Image removed by sender.

@tomkralidis
Copy link
Member Author

Thanks @smrAzGS. Updated. Will have this implemented by end of week.

@tomkralidis
Copy link
Member Author

Hi @smrAzGS thanks for the additional implementation comments. FYI pycsw does it the deegree way, and we write to any outputschema in the same way (in Python, we refuse XSLT). We shred the XML in db columns and keep on hand the actual XML representation, which is used if the outputschema requested is the same as the XML representation in the DB column, and when elementsetname=full (as an early out).

@tomkralidis
Copy link
Member Author

FYI fixed in master and 1.4 branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants