Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

index geo payload #366

Merged
merged 14 commits into from
Mar 20, 2022
Merged

index geo payload #366

merged 14 commits into from
Mar 20, 2022

Conversation

agourlay
Copy link
Member

@agourlay agourlay commented Mar 4, 2022

This PR adds a new indexing scheme for geo payloads #362

Notes:

  • the initial geo_index file has been renamed to geo_hash to make space for the new PersistedGeoMapIndex

@generall
Copy link
Member

generall commented Mar 9, 2022

WDYT degarding splitting of geo_index.rs into several files? 800 lines looks too much for a single file

@agourlay
Copy link
Member Author

agourlay commented Mar 9, 2022

@generall I agree with you sentiment, I will split the geohash functions from the geo indexing 👍

lib/segment/src/index/field_index/geo_index.rs Outdated Show resolved Hide resolved
lib/segment/src/index/field_index/geo_index.rs Outdated Show resolved Hide resolved
lib/segment/src/index/field_index/geo_index.rs Outdated Show resolved Hide resolved
lib/segment/src/index/field_index/geo_index.rs Outdated Show resolved Hide resolved
@agourlay agourlay force-pushed the index-geo-payload branch 2 times, most recently from 8634f4a to fec7d38 Compare March 10, 2022 15:00
values_count
} else {
// between 0 and 1 (there is at least one payload per point)
let payload_per_point = total_points_count as f64 / self.payload_count as f64;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be called point_per_payload, isn't it?

} else {
// between 0 and 1 (there is at least one payload per point)
let payload_per_point = total_points_count as f64 / self.payload_count as f64;
values_count.saturating_sub((values_count as f64 * payload_per_point) as usize)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That could be used as some approximation to expected value, but not min

primary_clauses: vec![],
min,
exp: expected_count,
max: expected_count,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But max != expected in this case. For max we should guarantee, that it is impossible to match more than this amount of points with this query

values_count.saturating_sub((values_count as f64 * payload_per_point) as usize)
};
// don't overflow max number of points
let expected_count = total_points_count.min(values_count);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why?

encode((lon, lat).into(), GEOHASH_MAX_LENGTH)
}

pub fn geo_hash_to_box(geo_hash: &GeoHash) -> GeoBoundingBox {
let rectangle = decode_bbox(geo_hash).unwrap();
let top_left = GeoPoint {
lat: rectangle.max().x,
lon: rectangle.min().y,
lat: rectangle.max().y,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mapping x/y to lat/lon is so error prone 😞

@generall generall marked this pull request as ready for review March 17, 2022 16:48
@generall generall requested a review from gvelo March 17, 2022 16:59
@agourlay agourlay linked an issue Mar 17, 2022 that may be closed by this pull request
if n < 1.0 {
return 1.0; // By definition
}
(2. * PI * n).sqrt().ln() + n * (n / E).ln()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the std lib defines also TAU that we could use here

    /// The full circle constant (τ)
    ///
    /// Equal to 2π.
    #[stable(feature = "tau_constant", since = "1.47.0")]
    pub const TAU: f64 = 6.28318530717958647692528676655900577_f64;

Copy link
Member Author

@agourlay agourlay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@generall thank you very much for pushing this to the end line!

The PR is good to go from my perspective, there are only minor comments remaining.

I can't approve officially this PR in Github as I have created it 🙃

@generall generall merged commit 34048c6 into master Mar 20, 2022
@generall generall deleted the index-geo-payload branch April 9, 2022 15:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[geo indexing] Geo payload indexes
2 participants