Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

signed EncodedValues #2473

Merged
merged 14 commits into from
Dec 6, 2021
Merged

signed EncodedValues #2473

merged 14 commits into from
Dec 6, 2021

Conversation

karussell
Copy link
Member

@karussell karussell commented Nov 19, 2021

This change makes it possible to store negative values too: SignedDecimalEncodedValue or SignedIntEncodedValue. E.g. for average_slope #2456.

We'll see if this causes a slow down.

The default value turns out to be problematic because we can map it from "0 to default" but if 0 is accidentally set the default value would be returned. And so we now only accept 0 or infinity as default. And we allow "0 as default" only if no negative values. Maybe we should remove it entirely and rely on explicitly setting the value in the TagParser, but this is probably a different PR.

@karussell karussell added this to the 5.0 milestone Nov 19, 2021
@karussell karussell changed the title initial version of signed EncodedValues SignedEncodedValues Nov 19, 2021
@karussell karussell changed the title SignedEncodedValues signed EncodedValues Nov 19, 2021
@easbar
Copy link
Member

easbar commented Nov 19, 2021

Thanks for this! Does 'signed encoded value' mean that the sign flips automatically when we read an edge in reverse? This is what is needed for average_slope: b6539cf, but maybe not for others?! And btw does this also mean there is no longer a (somewhat) easy option to extend an int encoded value to use 32bits?

@karussell
Copy link
Member Author

karussell commented Nov 19, 2021

Does 'signed encoded value' mean that the sign flips automatically when we read an edge in reverse? This is what is needed for average_slope

Ah, no. But will make it an option. I forgot this when I rewrote it more clean.

And btw does this also mean there is no longer a (somewhat) easy option to extend an int encoded value to use 32bits?

We could add this as an option too or do you mean this is possible with the current master? (when using minValue=0 this should behave identical to current master)

@easbar
Copy link
Member

easbar commented Nov 19, 2021

Ah, no. But will make it an option. I forgot this when I rewrote it more clean.

Ok, yes this would be useful, if it is possible without making this class too complex. Otherwise using a separate implementation for this feature would also be fine I guess.

We could add this as an option too or do you mean this is possible with the current master? (when using minValue=0 this should behave identical to current master)

I think the problem is already that the setInt method takes an int, so it is not possible to set anything larger than 2^31-1 in the first place. If this was a long we could set higher values (for the full 32 unsigned int range) and check the range internally (depending on the number of available bits). I got this info from @michaz recently when he wanted to implement an encoded value for OSM Ids.

@easbar
Copy link
Member

easbar commented Nov 19, 2021

But then also getInt would have to return a long and we would need to cast it back to an int in all kinds of places. So maybe this idea of a 32 bit int does not really work and if anything we should rather implement a LongEncodedValue... Or maybe just add set32IntFromLong/get32IntAsLong for those who want to use 32 bits and will care about casting to an int themselves?

@karussell
Copy link
Member Author

For the purpose of storing OSM IDs I would indeed prefer a separate SignedLongEncodedValue. And keep the normal Java "int" behaviour (Although the user would have always the choice to interpret negative int values as values > Integer.MAX_VALUE. Maybe this is now possible - have not tried it.)

@easbar
Copy link
Member

easbar commented Nov 19, 2021

Actually the maximum (way) OSM ID currently is only around 1 billion, so we do not even need 32 bits for this :) But a LongEncodedValue could be useful anyway.

@karussell
Copy link
Member Author

But then also getInt would have to return a long and we would need to cast it back to an int in all kinds of places

The usual usage via getInt would stay unchanged. Only if the user would need 32 bits he could in theory try minValue=Integer.MIN_VALUE or something and then convert to a long before using the stored int.

@easbar
Copy link
Member

easbar commented Nov 19, 2021

Can you explain why there even is/was this 31 bit restriction?

@karussell
Copy link
Member Author

karussell commented Nov 19, 2021

We could now lift the restriction and use 32 bits I guess (as we have now a signed int). But using more than 31 bits with an unsigned int as in master would not make much sense with int in Java. And using a long setter+getter would fix this problem but then it would make more sense to have a LongEncodedValue instead with actual 64 bit support.

@easbar
Copy link
Member

easbar commented Nov 19, 2021

But using more than 31 bits with an unsigned int as in master would not make much sense with int in Java

Oh ok I thought the whole idea of 'unsigned int' was that we use the 32nd bit to increase the (positive) value range instead of using it for the sign.

And using a long setter+getter would fix this problem but then it would make more sense to have a LongEncodedValue instead with actual 64 bit support.

I agree, especially since we can also use only 32 bits of a 64 bit long encoded value (just like we can only use 3 bits of an int encoded value currently).

*/
public final class SignedDecimalEncodedValue extends SignedIntEncodedValue implements DecimalEncodedValue {
private final double factor;
private final boolean defaultIsInfinity;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not look at this in detail yet, but would it be possible to just set an arbitrary value as default, not just choose between 0 and infinity?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we do this in master. But we do not use this flexibility (we only use 0 and infinity). And setting the default value will then also be possible via setting the value to 0, which is not intuitive as you expect that getInt(ref) returns 0 if you set it to 0 before. Even with infinity we have a problem and reject values with 0 (see last two commits) because there seems to be osm tags with e.g. width=0.

Probably we should try remove the default handing and set this in the TagParser only. But probably in a later commit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably we should try remove the default handing and set this in the TagParser only. But probably in a later commit.

Ok, yes. I'd like to assume that each TagParser sets a value for every edge, and if it does not this is an error and we must expect a 'garbage' value. The only other reasonable expectation would be that it is zero when nothing was set, but only because initially all bits of the underlying int[] or byte[] are zero. However, this would no longer work if we implemented a 'mapped' encoded value where 0 is mapped to something else, so let's just say not setting a value for an edge is an error.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, yes. I'd like to assume that each TagParser sets a value for every edge

I thought a bit about this and currently for some EncodedValues in FlagEncoders we might skip the set call due to getAccess returning "SKIP". So a proper default was/is required in general and I'll keep the default mechanics for now. But we should probably work towards a default-free version of EncodedValues, i.e. ensuring every EncodedValue is called for every edge on import.

Copy link
Member

@easbar easbar Nov 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I meant we should aim for this, but currently this would require further changes that are beyond the scope of this PR. Also #2471 will help with this I think.

@easbar
Copy link
Member

easbar commented Nov 24, 2021

I wonder if there is a better name than SignedEncodedValue... We probably will never add an unsigned version of this again, so 'signed' might be a bit misleading. IntEncodedValue would be the best name maybe because this is exactly what it is (an encoded value that stores ints), but we use this name for the interface already. All the other implementations do not implement the interface directly but extend (Un)signedIntEncodedValue. So maybe call it IntEncodedValueImpl? Not sure...

@karussell
Copy link
Member Author

We probably will never add an unsigned version of this again, so 'signed' might be a bit misleading.

Yes, or maybe FactorIntEncodedValue because we might add a MappedIntEncodedValue? I don't like XYImpl that much :)

@easbar
Copy link
Member

easbar commented Nov 24, 2021

I don't like XYImpl that much :)

I like it when it makes sense, i.e. when there is or can be only one plausible implementation. But yes MappedIntEncodedValue would be another possibility. But probably this could be done using the same class? The current one is also a kind of 'mapped' int encoded value that maps the values linearly. Otherwise we could call the present one ScalableIntEncodedValue? FactorIntEncodedValue also works for me. Either way not crazy important :)

@karussell karussell merged commit 96a8919 into master Dec 6, 2021
@karussell karussell deleted the signed_ev branch December 6, 2021 20:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants