Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: download linked objects in a WFS using "resolvedepth" #1180

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

emanuelaepure10
Copy link
Contributor

@emanuelaepure10 emanuelaepure10 commented May 21, 2024

Using "resolvedepth" when trying to download data via WFS is now downloading the linked objects. As the requested hits to a WFS is better to be setup to UNKNOWN_SIZE, so the size comes as well to be always UNKNOWN_SIZE, the code related to hits and size has been removed and the related methods have been changed accordingly.

ING-4128
Closes #1084

@emanuelaepure10 emanuelaepure10 force-pushed the fix/ING-4128 branch 2 times, most recently from fc243b7 to ef11840 Compare May 22, 2024 11:56
@emanuelaepure10
Copy link
Contributor Author

@stempler Any idea why my PRs are always failing with the same error in the "Comment with links to the artifacts"?
Thank you

@stempler stempler force-pushed the fix/ING-4128 branch 2 times, most recently from cadb5a6 to 5519274 Compare May 22, 2024 14:41
&& totalFeaturesProcessed >= maxNumberOfFeatures)
|| (size != UNKNOWN_SIZE && totalFeaturesProcessed >= size);
return maxNumberOfFeatures != UNLIMITED
&& totalFeaturesProcessed >= maxNumberOfFeatures;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't we still get problems here because the additional objects are counted as features processed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we just setup the hits == UNKNOWN_SIZE, that's the code left. It didn't make sense to me to keep it as long as size would also be always UNKNOWN_SIZE.
I have used just 2 projects to test, probably it's not enough, but I haven't encountered any problem.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we just setup the hits == UNKNOWN_SIZE

size would also be always UNKNOWN_SIZE

Can you explain why size should always be UNKNOWN_SIZE? In case the request does not contain a RESOLVEDEPTH parameter, the old logic can still be used IMO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From this code

                switch (wfsVersion) {
		case "1.1.0":
			// The "numberOfFeatures" reported by a 1.1.0 WFS may be smaller
			// than the actual number of features matches by the query if the
			// number of features returned per query is limited on the server
			// side. Therefore do not rely on it as a size information here.
			this.size = UNKNOWN_SIZE;
			break;
		case "2.0.0":
		case "2.0.2":
			// The "numberMatched" reported by a 2.0.0/2.0.2 WFS should be
			// number of features matched by the query. If hits equals
			// UNKNOWN_SIZE then size is also set to that value
			this.size = isLimited() ? Math.min(maxNumberOfFeatures, hits) : hits;
			break;
		default:
			this.size = UNKNOWN_SIZE;
		}

having hits = UNKNOWN_SIZE, also for the case 2.0.2 the size would come out as UNKNOWN_SIZE, independently if isLimited() is true or false.

@@ -312,7 +279,18 @@ public WfsBackedGmlInstanceIterator iterator() {
*/
@Override
public boolean isEmpty() {
return size == 0;
if (!emptyInitialized) {
ResourceIterator<Instance> it = iterator();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a more effective way could be using the hits query that was previously used - it could even be used with maxFeatures / count set to 1 as we are only interested if there are matches at all.
With that we would avoid the server having to load and transfer a potentially large amount of data.

Comment on lines 228 to 262
// Use primordial URI and issue "hits" request to check if the WFS will
// return anything at all
int hits;
if (ignoreNumberMatched) {
hits = UNKNOWN_SIZE;
}
else {
try {
hits = requestHits(primordialUri);
} catch (WFSException e) {
log.debug(MessageFormat.format(
"Failed to perform hits query (REQUESTTYPE=hits): {0}", e.getMessage()), e);
hits = UNKNOWN_SIZE;
}
}

switch (wfsVersion) {
case "1.1.0":
// The "numberOfFeatures" reported by a 1.1.0 WFS may be smaller
// than the actual number of features matches by the query if the
// number of features returned per query is limited on the server
// side. Therefore do not rely on it as a size information here.
this.size = UNKNOWN_SIZE;
break;
case "2.0.0":
case "2.0.2":
// The "numberMatched" reported by a 2.0.0/2.0.2 WFS should be
// number of features matched by the query. If hits equals
// UNKNOWN_SIZE then size is also set to that value
this.size = isLimited() ? Math.min(maxNumberOfFeatures, hits) : hits;
break;
default:
this.size = UNKNOWN_SIZE;
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why this must be removed. I think this can still be used in general (as before) when the query does not contain a RESOLVEDEPTH parameter, or am I missing something?

&& totalFeaturesProcessed >= maxNumberOfFeatures)
|| (size != UNKNOWN_SIZE && totalFeaturesProcessed >= size);
return maxNumberOfFeatures != UNLIMITED
&& totalFeaturesProcessed >= maxNumberOfFeatures;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we just setup the hits == UNKNOWN_SIZE

size would also be always UNKNOWN_SIZE

Can you explain why size should always be UNKNOWN_SIZE? In case the request does not contain a RESOLVEDEPTH parameter, the old logic can still be used IMO.

Copy link
Member

@florianesser florianesser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit message and PR should reference the related GitHub issue #1084

@emanuelaepure10
Copy link
Contributor Author

Or if we just setup the hits to UNKNOWN_SIZE, then the ignoreNumberMatched box doesn't make any sense because wont be used.

@emanuelaepure10 emanuelaepure10 force-pushed the fix/ING-4128 branch 6 times, most recently from 70b72af to 91bac9e Compare May 28, 2024 15:18
@emanuelaepure10
Copy link
Contributor Author

@florianesser I have squashed the previous commit and this one just to be able to better see the change.

Comment on lines 231 to 239
boolean isUsedresolvedepth = false;
for (NameValuePair nameValuePair : params) {
if ("resolvedepth".equalsIgnoreCase(nameValuePair.getName())) {
isUsedresolvedepth = true;
break;
}
}

if (ignoreNumberMatched || isUsedresolvedepth) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@emanuelaepure10 I think this can be reduced to a single line as primordialQueryParams contains all query parameters in upper case (see here)

Suggested change
boolean isUsedresolvedepth = false;
for (NameValuePair nameValuePair : params) {
if ("resolvedepth".equalsIgnoreCase(nameValuePair.getName())) {
isUsedresolvedepth = true;
break;
}
}
if (ignoreNumberMatched || isUsedresolvedepth) {
if (ignoreNumberMatched || primordialQueryParams.contains("RESOLVEDEPTH")) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I overlooked primordialQueryParams.

Using "resolvedepth" when trying to download data via WFS is now downloading the linked objects.
As the requested hits to a WFS is better to be setup to UNKNOWN_SIZE if the WFS contains a "resolvedepth" parameter.

ING-4128
Closes halestudio#1084
Copy link
Member

@florianesser florianesser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -228,7 +228,8 @@ public WfsBackedGmlInstanceCollection(LocatableInputSupplier<? extends InputStre
// Use primordial URI and issue "hits" request to check if the WFS will
// return anything at all
int hits;
if (ignoreNumberMatched) {

if (ignoreNumberMatched || primordialQueryParams.containsKey("RESOLVEDEPTH")) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a comment why this is done in case of RESOLVEDEPTH being included?

Did you test if the paging works as expected when additionalObjects are returned?
I could imagine a possible impact could be that the request for the second page uses a wrong offset.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you test if the paging works as expected when additionalObjects are returned?
I could imagine a possible impact could be that the request for the second page uses a wrong offset.

For this I would probably need a bigger project I believe. I will ask the service team.

@@ -225,10 +225,16 @@ public WfsBackedGmlInstanceCollection(LocatableInputSupplier<? extends InputStre
}
}

// for a query containing RESOLVEDEPTH we disable the pagination
if (primordialQueryParams.containsKey("RESOLVEDEPTH")) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussing the issue with @florianesser, we identified a potential solution: disable pagination if the parameters contain "RESOLVEDEPTH".

Another approach could be to disable pagination directly from the UI. I attempted this but haven't found a viable solution yet.

@stempler, your input and ideas would be greatly appreciated to help us find the best solution.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Problem with that would be that in many cases then it would not be possible to load all data of a service.

  • Was your intent to inform the user about this restriction in the UI?
  • Might be unexpected when used from another context, e.g. Gradle or CLI
  • Did you explore if you would see some way to determine if an instance originated from additionalObjects or not? (e.g. see if it is feasible to detect here once entering additionalObjects and adding the information to created instances in the instance metadata)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was your intent to inform the user about this restriction in the UI?

Initially, I wanted to disable the checkbox, but I didn't find the way, but if you have any suggestion how to achieve that I would be very happy to hear.

  1. I have tried to understand in the GmlReaderSettingsPage if there is any way to get the link and like that be able to check if the "resolvedepth" parameter is there.
  2. I tried to extend GmlReaderSettingsPage into the WFS UI package but without success.
  3. Not the best solution would be to add a note in the actual UI and inform the user that that parameter for steps won't be used in case "resolvedepth" has been used.

Might be unexpected when used from another context, e.g. Gradle or CLI

True.

Did you explore if you would see some way to determine if an instance originated from additionalObjects or not?

I have tried but again without getting the info where the instance comes from. I will definitely try your suggestion.

@@ -578,7 +597,31 @@ public Instance next() {
}

Instance instance = iterator.next();
return new StreamGmlInstance(instance, totalFeaturesProcessed++);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stempler Could you help me understand how is it possible that for a total or
totalFeaturesProcessed: 754 which is the BP_Plan and additionalFeatureProcessed: 1844 processed here with a sum of 2598,
HS shows the following numbers:
image
Thank you

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to add a unit test where a WFS FeatureCollection file with additional objects is read. That doesn't allow to test the pagination, but if you include at least one object with a duplicate GML ID you can use this to test:

  1. The correct detection of the "additional objects"
  2. If your mechanism for detecting GML ID duplicates works as expected

Once it has been ruled out that a problem is hidden there the paging behavior can be analysed.


Iterable<QName> mapPropertiess = instance.getPropertyNames();
for (QName qName : mapPropertiess) {
if (qName.getLocalPart().equals("id") && qName.getPrefix().equals("gml")) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The namespace prefix is nothing that should be checked, because that can be an arbitrary value.
Here is an example where we check for a GML namespace.

Since the GML namespace used should be the same in the same data, I would suggest to only determine the property name once and after that use a more efficient approach than iterating all property names by getting the property directly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is something I don't understand.
In the GmlInstanceCollection the TypeDefinition def actually have a namespace URI starting with http://www.opengis.net/wfs,
but when arriving to the WfsBakendGmlInstanceCollection the instance doesn't contain anymore a namespace URI starting with http://www.opengis.net/wfs, but just a QName; id, http://www.opengis.net/gml, gml, so cannot even be compared with a namespace such as http://www.opengis.net/wfs.

@@ -578,7 +597,31 @@ public Instance next() {
}

Instance instance = iterator.next();
return new StreamGmlInstance(instance, totalFeaturesProcessed++);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to add a unit test where a WFS FeatureCollection file with additional objects is read. That doesn't allow to test the pagination, but if you include at least one object with a duplicate GML ID you can use this to test:

  1. The correct detection of the "additional objects"
  2. If your mechanism for detecting GML ID duplicates works as expected

Once it has been ruled out that a problem is hidden there the paging behavior can be analysed.

boolean condition = (maxNumberOfFeatures != UNLIMITED
&& totalFeaturesProcessed >= maxNumberOfFeatures)
|| (size != UNKNOWN_SIZE && totalFeaturesProcessed >= size)
|| (totalFeaturesProcessed == size && !iterator.hasNext());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is there a change to the condition needed with this additional term?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the additional objects found after the last main instance would be lost.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. I think it would be very valuable to have that as a comment on that term.

@emanuelaepure10
Copy link
Contributor Author

@stempler does it happened to remember if there is somewhere a test example that contains "additionalObjects"? Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG]: If downloading data via WFS with parameter "resolvedepth" linked objects are not loaded
3 participants