GetRecords handling should not filter records based on typenames value #105

tomkralidis · 2013-02-11T17:50:50Z

The current behaviour for handling GetRecords.typename is to filter records based on typename before applying any OGC filters to the query. Example:

catalogue A has 20 records
- 10 Dublin Core (csw:Record)
- 10 ISO (gmd:MD_Metadata)
when a client does a GetRecords request with typenames=csw:Record, catalogue A returns the 10 DC records
when a client does a GetRecords request with typenames=gmd:MD_Metadata, catalogue A returns the 10 ISO records
when a client does a GetRecords request with typenames=csw:Record,gmd:MD_Metadata, catalogue A returns the 10 DC and 10 ISO records for a total of 20 records

As is happens, GetRecords queries (with filter or not) should always query against all metadata, based on the typeNames schema, and return all results encoded using the outputSchema (which we already do). This is confirmed w/ @smrAzGS' comments as well as CSW spec authors.

So in the codebase, we need to remove the part of the repository query which initially filters by typename so that the entire repository is searched and not filtered by typenames.

@rclark / @smrAzGS: does this make sense?

The text was updated successfully, but these errors were encountered:

smrgeoinfo · 2013-02-12T14:49:21Z

Tom—makes sense to me. I’d adjust the the ‘As it happens,…’ paragraph to say ‘query against all metadata, based on the typeNames schema, and return all results encoded using the outputSchema’. I’m pretty sure most implementations don’t behave this way, because to actually implement, one has to map the query schema elements from the typeName schema to the schema of each metadata schema used in the catalog, and has to transform from records in any schema stored in the dB to the output schema.

To address this issue, GeoPortal maps incoming metadata record elements to Lucene index elements and to build lucene indexes for each queryable property (I think Geonetwork does the same). GeoPortal doesn’t actually honor the outputSchema parameter. GeoNetwork provides output XSLT’s to transform the xml blob from the dB into the requested outputSchema if it is different from the schema for the XML blob in the dB.

Another solution to the problem is used by Deegree—marshal harvested metadata to a relational dB schema, map incoming requests from whatever typeName schema is supported to SQL against the relational dB, and have routines to build XML output in any supported outputSchema.

As I interpret the CSW spec, if the capabilities list an outputSchema, then the server needs to be able to provide any record in its metadata store in that schema.

steve

From: Tom Kralidis [mailto:notifications@github.com]
Sent: Monday, February 11, 2013 10:51 AM
To: geopython/pycsw
Cc: Stephen Richard
Subject: [pycsw] GetRecords handling should not filter records based on typenames value (#105)

The current behaviour for handling GetRecords.typename is to filter records based on typename before applying any OGC filters to the query. Example:

catalogue A has 20 records
10 Dublin Core (csw:Record)
10 ISO (gmd:MD_Metadata)
when a client does a GetRecords request with typenames=csw:Record, catalogue A returns the 10 DC records
when a client does a GetRecords request with typenames=gmd:MD_Metadata, catalogue A returns the 10 ISO records
when a client does a GetRecords request with typenames=csw:Record,gmd:MD_Metadata, catalogue A returns the 10 DC and 10 ISO records for a total of 20 records

As is happens, GetRecords queries (with filter or not) should always query against all metadata, in any advertised outputschema (which we already do). This is confirmed w/ @smrAzGS https://github.com/smrAzGS ' comments as well as CSW spec authors.

So in the codebase, we need to remove the part of the repository query which initially filters by typename so that the entire repository is searched and not filtered by typenames.

@rclark https://github.com/rclark / @smrAzGS https://github.com/smrAzGS : does this make sense?

—
Reply to this email directly or view it on GitHub #105 .

Image removed by sender.

tomkralidis · 2013-02-12T15:13:08Z

Thanks @smrAzGS. Updated. Will have this implemented by end of week.

tomkralidis · 2013-02-12T23:11:24Z

Hi @smrAzGS thanks for the additional implementation comments. FYI pycsw does it the deegree way, and we write to any outputschema in the same way (in Python, we refuse XSLT). We shred the XML in db columns and keep on hand the actual XML representation, which is used if the outputschema requested is the same as the XML representation in the DB column, and when elementsetname=full (as an early out).

tomkralidis · 2013-02-14T20:31:03Z

FYI fixed in master and 1.4 branch.

ghost assigned tomkralidis Feb 11, 2013

tomkralidis added a commit that referenced this issue Feb 14, 2013

fix GetRecords typenames parameter handling (#105)

750d910

tomkralidis added a commit that referenced this issue Feb 14, 2013

fix GetRecords typenames parameter handling (#105)

0419738

tomkralidis closed this as completed Feb 14, 2013

tomkralidis mentioned this issue Mar 26, 2013

remove typenames and queryables for outputschemas without an official profile #118

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GetRecords handling should not filter records based on typenames value #105

GetRecords handling should not filter records based on typenames value #105

tomkralidis commented Feb 11, 2013

smrgeoinfo commented Feb 12, 2013

tomkralidis commented Feb 12, 2013

tomkralidis commented Feb 12, 2013

tomkralidis commented Feb 14, 2013

GetRecords handling should not filter records based on typenames value #105

GetRecords handling should not filter records based on typenames value #105

Comments

tomkralidis commented Feb 11, 2013

smrgeoinfo commented Feb 12, 2013

tomkralidis commented Feb 12, 2013

tomkralidis commented Feb 12, 2013

tomkralidis commented Feb 14, 2013