Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nextRecord value when maxRecords="0" #22

Closed
tomkralidis opened this issue Dec 14, 2017 · 23 comments
Closed

nextRecord value when maxRecords="0" #22

tomkralidis opened this issue Dec 14, 2017 · 23 comments
Assignees
Labels

Comments

@tomkralidis
Copy link
Contributor

As suggested in the OGC CITE thread, opening this issue to address maxRecords=0 behaviour. Associated pycsw ticket at geopython/pycsw#557 (comment) (CITE trace at geopython/pycsw#557 (comment))

@dstenger
Copy link
Contributor

dstenger commented Dec 15, 2017

Further information from mailing list:

The issue arises when maxRecords="0", which should force the same
behaviour as if resultType="hits" [v07-006r1 "OpenGIS Catalogue
Services Specification specification v2.0.2" 10.8.4.7 p149].  Looking
at what that behaviour should be [10.8.4.3 p147]:

"[...] the catalogue service shall return a < GetRecordsResponse >
element containing an empty < SearchResults > element that indicates
the estimated size of the result set. Optional attributes may or may
not be set accordingly."


When I was making my fix for the other issue, the value of nextRecord
was discussed on the PyCSW IRC channel (freenode #pycsw).  The answer
to the question, what should the nextRecord attribute be when no
records have been returned, but all records have been reported, was
nextRecord=0.  This also is the value nextRecord would take if the
last of the records had been returned. 


However, it is true that the specification does not dictate what this
value should be.  The question to the CITE forum is: is nextRecord=1
expected for some reason that I haven't found yet, or is this test
being over unnecessarily strict on this?
Dirk: the failing test is indeed 23.1.  I've posted the full trace at [1]
and opened a associated GitHub issue at [2]

Thanks

..Tom

[1] https://github.com/geopython/pycsw/pull/557#issuecomment-351815650
[2] https://github.com/opengeospatial/ets-csw202/issues/22

@dstenger
Copy link
Contributor

I also do not see the reason why "csw:SearchResults/@nextRecord = 1" is expected.

@dstenger
Copy link
Contributor

@lorebiga Can you please take a look at the reported behaviour?

@rjaduthie
Copy link

Any further on this? I'm not sure of the lifecycle of an issue in this ecosystem, but it's been 3 months since the last activity.

@dstenger
Copy link
Contributor

@lorebiga What do you think about this?

@lorebiga
Copy link
Member

I've finally been able to investigate this issue. The test query defines a result set of 11 records, of which none is returned (since maxRecords=0). Hence, the response should specify:

  • numberOfRecordsMatched="11"
  • numberOfRecordsReturned="0"

As clarified at p. 159, "the nextRecord attribute will indicate to the client where to begin the next query" (to iterate the result set), so:

  • nextRecord="1"

I hope this is clear. I'm closing this issue.

By the way, I've noticed that the documentation of test tc23.1 includes the condition that numberOfRecordsMatched > 10, but the test implementation actually tests for numberOfRecordsMatched > 1. Both could be amended to align with the other. I've decided to amend the test implementation.

I've also noticed a typo in 07-006r1, §10.8.4.7, p. 149: the mention to "subclause 10.8.4.2" should instead refer to "subclause 10.8.4.3". I've reported it to the OGC staff.

@rjaduthie
Copy link

rjaduthie commented Mar 13, 2018

If you think that this is the definite answer to the question, I can patch my code to comply. To me, it still doesn't make complete sense that this operation, which returns information on all the records, would then assume that the next record to harvest is the first one in the catalogue; it would probably be safer to report next_record=0, not to then potentially mislead a client into potentially re-harvesting the whole catalogue from the start.

@lorebiga
Copy link
Member

I'm not sure I understand. This operation doesn't return information on all the records, the response refers to the result set identified by the query, which may be a subset of the entire catalogue.
The meaning of nextRecord=0 is clearly defined by the specification (p. 159): "a value of 0 means all records have been returned". In this case, no record has been returned, so 0 would be wrong.

@rjaduthie
Copy link

rjaduthie commented Mar 13, 2018 via email

@lorebiga
Copy link
Member

The nextRecord parameter is not applicable to requests, but to responses.
A client may of course repeat a query with a resultType=hits parameter. Every time, it will get no records in the response, and nextRecord=1.
In general, it works like this:

  • the client sends a request with a query
  • the server computes the query result (a list of 1... n records)
  • the server sends a response with some of the records (possibly 0), specifying the index of the next in the list (i.e. the first record which is not returned) in the nextRecord parameter

This protocol was designed by the authors of the CSW 2.0.2 specification, based on the requirements collected. Afaik, it was retained as is also in CSW 3.0.

@rjaduthie
Copy link

rjaduthie commented Mar 13, 2018 via email

@lorebiga
Copy link
Member

lorebiga commented Mar 13, 2018

I contributed to CSW 3.0, but I don't remember a specific discussion around this very point. In general, I think that comments on approved specification should be formulated as Change Requests, which are directed to the appropriate Working Group, or may trigger the formation of a new Revision Working Group.

In fact, arguing about the rationale of CSW specification is out of scope, in this context. We are only concerned abut the correctness of existing compliance tests. This said, I will try to answer your questions :-)

- Is there a situation where there would be a further repeated resultType="hits" query [...] to complete the client's task?
Perhaps a polling use-case? Anyway, we should not make assumptions about client's tasks, in this context.

- Does startPosition have an effect on resultType="hits"? What about "maxRecords"? If so, but I don't think it does in the PyCSW implementation, in the case where the catalogue size is 100,
startPosition=10 and maxRecords=50, then would nextRecord=61 be returned with the response? What about startPosition=90 and maxRecords=50?

The specification seems vague on this, and I don't think there's any specific test. However, note that maxRecords is the maximum number of records that should be returned. A server may return less. The nextRecord parameter may be used by the client to continue fetching the result set from the first record it hasn't got in that response.

On the syntax and semantics of nextRecord: it is mandatory, it is a non-negative integer, and the specification is very clear on the meaning of the expression nextRecord=0: "a value of 0 means all records have been returned".

So, in the specific case of the tc23.1 test, how should a server behave?
It must not return any of the 11 records in the result set, since the client required maxRecords=0.
But it must return a nextRecord with some value >= 0.
That cannot be 0, because it would imply that all the 11 records be in the response, which instead must be empty.
Could that be 2... 11? The specification does not state it clearly, but it is logical to imply that some records would be in the response, in that case, whereas it must be empty.
By setting nextRecord=1, the server indicates to the client that it can continue fetching the result set from the first of the 11 records in the result set.

@rjaduthie
Copy link

rjaduthie commented Mar 14, 2018 via email

@lorebiga
Copy link
Member

It's not a matter of faith: in this context, we have to stick to the wording of the specification and to the existing scripts. Tc23.1 needs no change, as far as nextRecord is concerned.

Your examples are out of scope, though meaningful and interesting. I agree with you on the meaning of nextRecord. In my opinion, the current wording is clear enough: all the parameters in a response refer to the result set related to the request.
However, the change you propose is acceptable. You may submit a formal Change Request.

@rjaduthie
Copy link

rjaduthie commented Mar 14, 2018 via email

@rjaduthie
Copy link

rjaduthie commented Mar 14, 2018 via email

@lorebiga
Copy link
Member

Not that I know of. Change Requests are the OGC mechanism to address cases like this.
You can file it here:
http://ogc.standardstracker.org/

Sorry I can't help you more on this, I'm already overcommitted with maintaining the tests... :-)

@rjaduthie
Copy link

rjaduthie commented Mar 15, 2018

I'm already overcommitted with maintaining the tests

Are you the only person working on this? Do you maintain the whole test suite or just the CSW tests?

Not that I know of.

Well, apart from fixing the test with the current wording! ;¬)

Given that the Change Request is for the benefit of making sure you're happy with what the spec says, can I run this by you. If I were to suggest the simple change to the description on table 66 for nextRecord to read:
"nextRecord | Start position of next record, if the response does not contain the complete result set of the request's query; otherwise, nextRecord should be set to 0."

This covers the case where the result set naturally has no records, plus (I hope you agree) the cases where {resultType='results', maxRecords=0} and {resultType='hits'}. If you think it is still not clear enough, please suggest what would be. I have a feeling that this issue will not be as protracted if well thought out solutions are presented up front.

@lorebiga
Copy link
Member

lorebiga commented Mar 15, 2018

I'm the only maintainer of the CSW 2.0.2 ETS (and I maintain only this ETS).

As far as I'm concerned (i.e. these tests), I don't think the spec needs any change: I'm already happy with what it says :-)
Go ahead with the Change Request as you prefer, sorry I can't help you more on this (I don't think I will be reviewing the Change Request anyway: I think it's the OAB?)

@rjaduthie
Copy link

rjaduthie commented Mar 15, 2018 via email

@lorebiga
Copy link
Member

Roger,

I think we are repeating ourselves here.

I am the maintainer of the CSW 2.0.2 ETS. You can report and refer to me on issues related to the CSW 2.0.2 ETS implementation. I am not the right recipient for arguing the rationale and the design of the CSW specification in general, nor to improve the CSW specification.

Please, don't refer to me on Change Requests to the CSW specification.

I think test tc23.1 is correctly implemented. I quote from above: the test query defines a result set of 11 records, of which none is returned (the request specifies maxRecords=0). Hence, the response should specify:

numberOfRecordsMatched="11"
numberOfRecordsReturned="0"

As clarified at p. 159, "the nextRecord attribute will indicate to the client where to begin the next query" (to iterate the result set), hence:

nextRecord="1"

The CSW specification has been perused many times, this ETS has been executed hundreds, maybe thousand times, and no one has raised this issue before. I respect your opinion, but I disagree with it: I think that test tc23.1 is correctly implemented wrt nextRecord value, and there is nothing to fix.

I hope you can accept my decision and agree on closing this discussion, for the time being.
@dstenger is the manager of the OGC validation tools, you may contact him for his valuable opinion on the matter.

@rjaduthie
Copy link

rjaduthie commented Mar 15, 2018

Thank you for you patience while I try to make my point clear. If I submit a Change Request, I will definitely refer to this discussion in my description, as this is the whole reason for the Change Request. I've already written a draft for the request, in fact. It doesn't have your name in it, but refers to 'test maintainers', etc.

That this issue hasn't been raised does not mean the behaviour is correct. The case in hand would have little exposure, I would expect, because in the use cases for GetRecords requests with resultType='hits' or equivalent, you would normally ignore the nextRecord attribute (... because you have all the records you expect to have!!! ;¬) ). So, the only people who are going to be concerned about these things are the implementation devs, who are not numerous. I expect a lot of little issues are ignored because of barriers to resolution, reading the spec, understanding it, etc. It's probably easier to change your code to match the test suite. You should definitely not do things the way they have been done before, if they have been done incorrectly.

Yes, at danger of repeating ourselves:

As clarified at p. 159, "the nextRecord attribute will indicate to the client where to begin the next
query" (to iterate the result set), hence:
nextRecord="1"

To me, the above statement is contradictory to your point: the response doesn't need to iterate the result set, it has all the records it asked for (i.e 0).

It's maybe too late to convince you, but it's worth noting for the discussion that the [currently worded] spec also states in the "maxRecords attribute" section:
If [the maxRecords attribute] value is set to zero, then the behaviour is identical to setting "resultType=hits" as described in subclause 10.8.4.2.
i.e. the request is made expecting zero records returned and is only concerned with the "numberOfRecordsMatched" attribute to get the number of records that the query matches.

You don't seem be compromising on this, so the next step would be to create the Change Request. The purpose of the Change Request would be to simply to clarify that in the case where no records were requested (or even that no records are present in the query result) and no records are returned, the nextRecord value doesn't tell the client to start the next query at record number 1. Instead, it should be crystal clear that the nextRecord attribute should be set to 0, to signify that the records included in the response (i.e. 0 records) fulfils the entirety of the request.

I am still certain that this is what the current wording of the spec actually says. Maybe @dstenger has an outside opinion to offer?

rjaduthie pushed a commit to rjaduthie/pycsw that referenced this issue Mar 20, 2018
This is to comply with the current ETS-CSW test suite - cf.,
opengeospatial/ets-csw202#22
tomkralidis pushed a commit to geopython/pycsw that referenced this issue Mar 21, 2018
* Code now considers startposition when checking returned records

Previously, the code would check if maxrecords was greater than the
matched records, but did not consider that the startposition might be
non-zero.  This is now addressed for different cases of startposition,
maxrecords and matched.

Code change in csw2.py only, to be tested. The code in csw3.py also
needs modified in a similar way.

* Functional test post_GetRecord-end expected XML corrected

The numberOfRecordsReturned in the expected XML for the test
default/post/GetRecords-end.xml was inconsistent with the number of
records in the expected XML.  This test failed when the code generating
the numberOfRecordsReturned attribute was updated.

Both the code and the expected XML for this test now agree and reflect
the correct value.

* csw3 code uses startposition in calculation of numberOfRecordsReturned

* numberOfRecordsReturned and nextRecord attributes good for maxRecords=0

The case where maxRecords=0 means that the nextRecord parameter should
be set to zero, to indicate that information on all records has been
returned.

* An amend to last commit, without the 'amend': fix to introduced problem
- a

* CITE test with maxRecords=0 now returns nextRecord=0

Although resultType=results, with maxRecords=0, the response should have
nextRecord=0, signifying that information on all the records have been
returned.

The expected XML for this CITE test has been modified to check against a
response with nextRecord=0 (instead of 1).

* If 'maxrecords' = 0, but matched records > 0, 'nextRecord' = 1

This is to comply with the current ETS-CSW test suite - cf.,
opengeospatial/ets-csw202#22

* Rolled back change to pycsw functionaltests to match ets

* CSW3 functionaltests dictate that nextrecord='0' for maxrecords='0'

Although the CSW2 behaviour is for nextrecord='1', because of the policy
of the ets-csw202 test suite, the CSW3 functionaltests in the pycsw repo
prescribe that nextrecord='0'.  This has been so for over a year, so I'm
not going to change the test.  The CSW2 behaviour might still be brought
into line with CSW2, as there is no reason for it not to (apart from the
assertion of the ets-csw202 test maintainer that the standard disallows
it).
tomkralidis pushed a commit to tomkralidis/pycsw that referenced this issue Mar 21, 2018
* Code now considers startposition when checking returned records

Previously, the code would check if maxrecords was greater than the
matched records, but did not consider that the startposition might be
non-zero.  This is now addressed for different cases of startposition,
maxrecords and matched.

Code change in csw2.py only, to be tested. The code in csw3.py also
needs modified in a similar way.

* Functional test post_GetRecord-end expected XML corrected

The numberOfRecordsReturned in the expected XML for the test
default/post/GetRecords-end.xml was inconsistent with the number of
records in the expected XML.  This test failed when the code generating
the numberOfRecordsReturned attribute was updated.

Both the code and the expected XML for this test now agree and reflect
the correct value.

* csw3 code uses startposition in calculation of numberOfRecordsReturned

* numberOfRecordsReturned and nextRecord attributes good for maxRecords=0

The case where maxRecords=0 means that the nextRecord parameter should
be set to zero, to indicate that information on all records has been
returned.

* An amend to last commit, without the 'amend': fix to introduced problem
- a

* CITE test with maxRecords=0 now returns nextRecord=0

Although resultType=results, with maxRecords=0, the response should have
nextRecord=0, signifying that information on all the records have been
returned.

The expected XML for this CITE test has been modified to check against a
response with nextRecord=0 (instead of 1).

* If 'maxrecords' = 0, but matched records > 0, 'nextRecord' = 1

This is to comply with the current ETS-CSW test suite - cf.,
opengeospatial/ets-csw202#22

* Rolled back change to pycsw functionaltests to match ets

* CSW3 functionaltests dictate that nextrecord='0' for maxrecords='0'

Although the CSW2 behaviour is for nextrecord='1', because of the policy
of the ets-csw202 test suite, the CSW3 functionaltests in the pycsw repo
prescribe that nextrecord='0'.  This has been so for over a year, so I'm
not going to change the test.  The CSW2 behaviour might still be brought
into line with CSW2, as there is no reason for it not to (apart from the
assertion of the ets-csw202 test maintainer that the standard disallows
it).
@rjaduthie
Copy link

I have now filled out a change request for clarification on this issue in the CSW spec: http://ogc.standardstracker.org/show_request.cgi?id=547

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants