Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CellType reification is overly slow and memory intensive #343

Closed
metasim opened this issue Sep 12, 2019 · 4 comments · Fixed by #344
Closed

CellType reification is overly slow and memory intensive #343

metasim opened this issue Sep 12, 2019 · 4 comments · Fixed by #344
Assignees
Labels
bug When it really isn't a "feature". performance We can't wait any faster

Comments

@metasim
Copy link
Member

metasim commented Sep 12, 2019

True source of issue is in GeoTrellis, but until there's a published release we need a workaround.

@metasim metasim self-assigned this Sep 12, 2019
@metasim metasim added the bug When it really isn't a "feature". label Sep 12, 2019
@metasim
Copy link
Member Author

metasim commented Sep 12, 2019

Backtrace from profile:

Stack Trace	TLABs	Total TLAB Size(bytes)	Pressure(%)
java.util.Arrays.copyOfRange(char[], int, int)	24,402	192,152,047,392	92.372
   java.lang.String.<init>(char[], int, int)	24,374	191,878,530,344	92.24
      java.lang.String.substring(int, int)	23,174	190,519,159,040	91.587
         scala.collection.immutable.StringLike$class.split(StringLike, char)	23,113	189,923,312,952	91.3
            scala.collection.immutable.StringOps.split(char)	23,113	189,923,312,952	91.3
               geotrellis.raster.CellTypeEncoding$class.name(CellTypeEncoding)	23,058	189,299,881,744	91.001
                  geotrellis.raster.CellTypeEncoding$uint16raw$.name()	4,243	34,774,393,120	16.717
                     geotrellis.raster.FixedNoDataEncoding$class.unapplySeq(FixedNoDataEncoding, String)	4,242	34,768,360,496	16.714
                        geotrellis.raster.CellTypeEncoding$uint16raw$.unapplySeq(String)	4,242	34,768,360,496	16.714
                           geotrellis.raster.CellType$.fromName(String)	4,242	34,768,360,496	16.714
                              org.locationtech.rasterframes.encoders.StandardSerializers$$anon$6.from(Object, CatalystSerializer$CatalystIO)	4,242	34,768,360,496	16.714
                                 org.locationtech.rasterframes.encoders.StandardSerializers$$anon$6.from(Object, CatalystSerializer$CatalystIO)	4,242	34,768,360,496	16.714
                                    org.locationtech.rasterframes.encoders.CatalystSerializer$class.fromInternalRow(CatalystSerializer, InternalRow)	4,242	34,768,360,496	16.714
                                       org.locationtech.rasterframes.encoders.StandardSerializers$$anon$6.fromInternalRow(InternalRow)	4,242	34,768,360,496	16.714
                                          org.locationtech.rasterframes.encoders.CatalystSerializer$WithFromInternalRow$.to$extension(InternalRow, CatalystSerializer)	4,242	34,768,360,496	16.714
                                             org.locationtech.rasterframes.encoders.CatalystSerializer$CatalystIO$$anon$2.get(InternalRow, int, CatalystSerializer)	4,242	34,768,360,496	16.714
                                                org.locationtech.rasterframes.encoders.CatalystSerializer$CatalystIO$$anon$2.get(Object, int, CatalystSerializer)	4,242	34,768,360,496	16.714
                                                   org.locationtech.rasterframes.model.TileDataContext$$anon$1.from(Object, CatalystSerializer$CatalystIO)	4,242	34,768,360,496	16.714
                                                      org.locationtech.rasterframes.model.TileDataContext$$anon$1.from(Object, CatalystSerializer$CatalystIO)	4,242	34,768,360,496	16.714
                                                         org.locationtech.rasterframes.encoders.CatalystSerializer$class.fromInternalRow(CatalystSerializer, InternalRow)	4,242	34,768,360,496	16.714
                                                            org.locationtech.rasterframes.model.TileDataContext$$anon$1.fromInternalRow(InternalRow)	4,242	34,768,360,496	16.714
                                                               org.locationtech.rasterframes.encoders.CatalystSerializer$WithFromInternalRow$.to$extension(InternalRow, CatalystSerializer)	4,242	34,768,360,496	16.714
                                                                  org.locationtech.rasterframes.encoders.CatalystSerializer$CatalystIO$$anon$2.get(InternalRow, int, CatalystSerializer)	4,242	34,768,360,496	16.714
                                                                     org.locationtech.rasterframes.encoders.CatalystSerializer$CatalystIO$$anon$2.get(Object, int, CatalystSerializer)	4,242	34,768,360,496	16.714
                                                                        org.locationtech.rasterframes.tiles.InternalRowTile.cellContext()	4,242	34,768,360,496	16.714
                                                                           org.locationtech.rasterframes.tiles.InternalRowTile.cols()	4,228	34,658,454,496	16.661
                                                                           org.locationtech.rasterframes.tiles.InternalRowTile.rows()	13	103,990,960	0.05
                                                                           org.locationtech.rasterframes.tiles.InternalRowTile.realizedTile$lzycompute()	1	5,915,040	0.003

@metasim
Copy link
Member Author

metasim commented Sep 12, 2019

Before

Screen Shot 2019-09-12 at 1 29 16 PM

After

Screen Shot 2019-09-12 at 1 42 22 PM

@vpipkt vpipkt added the performance We can't wait any faster label Sep 12, 2019
metasim added a commit to s22s/rasterframes that referenced this issue Sep 13, 2019
@metasim
Copy link
Member Author

metasim commented Sep 13, 2019

@pomadchin FYI: Planning on creating a PR for GT.

@metasim metasim closed this as completed Sep 13, 2019
@vpipkt
Copy link
Member

vpipkt commented Sep 13, 2019

metasim added a commit to s22s/rasterframes that referenced this issue Sep 13, 2019
* develop: (254 commits)
  Incorporated PR feedback.
  Make python RasterSourceTest.test_list_of_list_of_str clearer, more stable
  Propagate errors encountered in RasterSourceToRasterRefs. Closes locationtech#267.

  Updated release notes.
  Switched Explode tiles to use UnsafeRow for slight improvement on memory pressure. Reworked TileExplodeBench
  Changed CatalystSerialize implementations to store scheams as fields rather than methods.
  Benchmark and fix for CellType reification issue. Closes locationtech#343
  PR feedback edits.
  Fleshed out details on using Scala. Closes locationtech#324
  Fixes locationtech#338.
  Tweaked parquet I/O tests to trigger UDT issue.
  Normalize RasterSourceDataSource param names between python and SQL
  PR feedback
  Run python tile exploder test for projected raster
  Fix for locationtech#333 and additional tests in that vein.
  Add failing unit test for issue 333, error in rf_agg_local_mean
  Updated ExplodeTiles to work with proj_raster type.
  Ignoring RGB composite tests until next round of improvements.
  IT test build fix.
  Incremental work on refactoring aggregate raster creation.
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug When it really isn't a "feature". performance We can't wait any faster
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants