New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't constrain GridBounds size to IntMax x IntMax #2292

Merged
merged 4 commits into from Jul 25, 2017

Conversation

Projects
None yet
4 participants
@fosskers
Contributor

fosskers commented Jul 20, 2017

Issues

  • size returning a Long instead of Int is likely backwards incompatible.
  • I wanted to make coords return an Iterator instead of a Stream, but one single function (RDDKernelDensity.pointFeatureToSpatialKey) in GT requires the output of a for over coords
    to be a Seq, which Array can be cast to and Iterator cannot.
  • I ingested a 60k-by-60k GeoTiff after making this change, and while the ingest itself was very strange (took forever to process, was all single core, no I didn't mess up the Spark config), it did succeed.

Closes #2265 .

def coords: Stream[(Int, Int)] = for {
row <- Stream.range(0, height)
col <- Stream.range(0, width)
} yield (row, col)

This comment has been minimized.

@fosskers

fosskers Jul 20, 2017

Contributor

I'd really love to salvage the ability to use Iterator here.

@lossyrob

This comment has been minimized.

Member

lossyrob commented Jul 20, 2017

The problem here of course is that this breaks the API. Such a minor break, but still breaks it.
If we were to release this as a bugfix release, because it's fixing a bug, the semver gods will frown down upon us.

Client code like:

val gb: GridBounds = ???
val gridSize = gb.size

would be fine, but code like

val gb: GridBounds = ???
val gridSize:Int = gb.size

would fail.

It might be the case that exactly 0 codebases have code that would break on this change. But then again, maybe a million codebases. Or like 2.

Now that we switched to semver how do we handle this? Any thoughts from others? @hectcastro? @echeipesh? @metasim?

@fosskers

This comment has been minimized.

Contributor

fosskers commented Jul 20, 2017

Regarding appeasing the SemVer gods, I suppose we'd have to wait to merge until we're ready to push more 2.0 features through?

@fosskers

This comment has been minimized.

Contributor

fosskers commented Jul 20, 2017

The Array => Stream change is also breaking. They're both collections, but they don't support exactly the same operations. GridBounds.coords is never used this way in GT itself, but technically there was nothing stopping users from doing:

val gbs: GridBounds = ???
val coords: Array[(Int, Int)] = dbs.coords

println(coords(5)) // caution to the wind. Also a senseless operation.

Open question: do we have a right to overlook API breaks when we have sufficient belief that a certain use case (like this one) would never occur in reality? More of me wants to say "no" than "yes", but the "yes" is there nonetheless.

@fosskers fosskers changed the title from Don't constrain GridBounds size to IntMax x IntMax to [WIP] Don't constrain GridBounds size to IntMax x IntMax Jul 20, 2017

@echeipesh

This comment has been minimized.

Contributor

echeipesh commented Jul 20, 2017

GridBounds floats to the surface of the API way too often for us to pretend that its somehow a private class. I don't think I've ever written even a small module that didn't and up holding GridBounds in some way.

I'd be tempted to add: .sizeLong: Long and .cordsIterator: Iterator[(Int, Int)] and try to switch the breaking code to use the new methods. This could spider out but then again it could be tidy.

A cleaner version of above is to make a new class LongGridBounds or some such and convert to it to perform the above operations.

@fosskers

This comment has been minimized.

Contributor

fosskers commented Jul 20, 2017

... to pretend that its somehow a private class.

Luckily I wasn't suggesting that. Can you think of a reason someone would honestly index the coords Array as in my example?

Alarm bells going off regarding LongGridBounds. The bug is occurring in the original class as used in vanilla ingests, so I don't think LGB would solve that.

The .*Long varieties might work, but they birth another hidden bug: the original coords Array being too big and throwing. Okay, so we bring in the invariant back in to protect against that... and we're back to square one.

@echeipesh

This comment has been minimized.

Contributor

echeipesh commented Jul 20, 2017

Luckily I wasn't suggesting that. Can you think of a reason someone would honestly index the coords Array as in my example?

I didn't not mean that as an accusation, just not an option.

I would disagree that original .coords throwing when its too big is a hidden bug, thats pretty much a standard data-structure choice of sequence vs. an iterator. And we're free to @depricate it with a meaningful warning.

Edit: And I agree there is totally no reason to index into array of coords, the discussion is purely around being good software carpenters as prescribed by semver.

@fosskers

This comment has been minimized.

Contributor

fosskers commented Jul 20, 2017

thats pretty much a standard data-structure choice of sequence vs. an iterator.

Yeah true. So if it blows, we say "told you so, here's the Iterator variety for that". @deprecated feels right.

The Stream/Iterator issue is still outstanding though.

@fosskers fosskers force-pushed the fix/cgw/big-gridbounds branch from 1a886a0 to 49e5f63 Jul 20, 2017

def size = width * height
def isEmpty = size == 0
def size: Int = width * height

This comment has been minimized.

@fosskers

fosskers Jul 20, 2017

Contributor

I tried to use the @migration annotation which is mentioned in the official Scaladoc docs, but it doesn't compile. There's no mention of it in the main scala repo.

@@ -50,7 +50,7 @@ object CutTiles {
val extent = inKey.extent
logger.debug(s"Cutting $inKey of ${tile.dimensions} cells covering $extent")
mapTransform(extent)
.coords.toIterator

This comment has been minimized.

@fosskers

fosskers Jul 20, 2017

Contributor

Naughty!

Provide `GridBounds.coordsIter` and use it where possible
- The original `coords` has been marked deprecated, and will be switched with
  `coordsIter` in 2.0.

@fosskers fosskers force-pushed the fix/cgw/big-gridbounds branch from 49e5f63 to 2ecceff Jul 21, 2017

@hectcastro

This comment has been minimized.

Contributor

hectcastro commented Jul 21, 2017

I agree with Eugene's instincts around sizeLong and LongGridBounds. It would be great if we could come up with replacement function or class names that we'd want to keep moving forward (e.g., coords -> coordinates).

size seems a lot trickier. I wasn't able to come up with a good alternative name that didn't lead to confusion. In this case, we may have no choice but to go with something like sizeLong, mark size for deprecation (but note that its type will change vs. actual function removal), rewrite size to return type Long in 2.0, and mark sizeLong for deprecation in 3.0.

As far as the high level plan moving forward, I think the process would be:

  • Publish the new functions with deprecation warnings on the old functions in a minor release
  • Highlight the above change in the CHANGELOG, or anywhere else the 1.2 release Is discussed
  • Schedule removal of the deprecated functions in a major release
@fosskers

This comment has been minimized.

Contributor

fosskers commented Jul 21, 2017

I can add a deprecation warning to size (already done so for sizeLong), and update the CHANGELOG to include these warnings.

@fosskers fosskers changed the title from [WIP] Don't constrain GridBounds size to IntMax x IntMax to Don't constrain GridBounds size to IntMax x IntMax Jul 21, 2017

@fosskers

This comment has been minimized.

Contributor

fosskers commented Jul 21, 2017

Figuring out test failures...

@fosskers fosskers force-pushed the fix/cgw/big-gridbounds branch from 3d987f2 to d1b9b39 Jul 21, 2017

@echeipesh echeipesh added this to the 1.2 milestone Jul 25, 2017

@echeipesh echeipesh merged commit c9db248 into master Jul 25, 2017

3 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details
ip-validation
Details

@fosskers fosskers deleted the fix/cgw/big-gridbounds branch Aug 17, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment