Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CRS comparison is extremely expensive #3022

Closed
metasim opened this issue Jul 15, 2019 · 6 comments · Fixed by #3039
Closed

CRS comparison is extremely expensive #3022

metasim opened this issue Jul 15, 2019 · 6 comments · Fixed by #3039

Comments

@metasim
Copy link
Member

metasim commented Jul 15, 2019

On RasterFrames we are having major performance issues with comparisons between CRSs, and are almost out of hacks to work around it. I suspect our usage patterns have broken past assumptions on this. Here's some more context, including some profiling results: locationtech/rasterframes#134.

In RasterFrames we use a type called ProjectedRasterTile which is what is pushed around the most through the API. This results in one or more CRSs being implicitly included in every row. When joins happen, invariably CRSs are involved, where CRS.equals is usually called, which is extremely expensive. Hence the use of LazyCRS. But determining != between CRSs is still very expensive.

I've thought about hitting the GT codebase to construct a fix, but that's pretty delicate code and I'm wary about doing it on my own. Interested in discussion on this.

@pomadchin
Copy link
Member

@metasim is it related to #2890? Do you also have some benchmarks somewhere? They could really help to make it done by the 3.0 release as it sounds very critical.

@metasim
Copy link
Member Author

metasim commented Jul 15, 2019

@pomadchin #2890 is a separate use case.... indeed slow to read, but this ticket is about equals, something done after reading. #2890 is ameliorated somewhat by the caching that goes on.

Here are the benchmarks from RasterFrames:

Standard GeoTrellis

With RasterFrame Hacks

(Sorry, changed the number of benchmarks during the process).

@pomadchin pomadchin added this to the GT 3.0 milestone Jul 15, 2019
@pomadchin
Copy link
Member

@metasim appreciate it! do you know it was in the codebase forever or it was some sort of regression?

@metasim
Copy link
Member Author

metasim commented Jul 15, 2019

@pomadchin In the codebase forever.... we broke it with DataFrames containing a million CRSs 😈 .

@metasim
Copy link
Member Author

metasim commented Jul 15, 2019

Again, see here for the hacks we've been using:

https://github.com/locationtech/rasterframes/blob/develop/core/src/main/scala/org/locationtech/rasterframes/model/LazyCRS.scala

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants