Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rio-calc, take 1 #272

Merged
merged 20 commits into from Mar 20, 2015
Merged

rio-calc, take 1 #272

merged 20 commits into from Mar 20, 2015

Conversation

sgillies
Copy link
Member

@sgillies sgillies commented Feb 7, 2015

Towards closing #175.

Usage, for now, like this:

$ rio calc "0.10*{1} + 125" tests/data/shade.tif out.tif

Results in a new hillshade with shade values of 125 instead of 0.

Like this:

  $ rio calc "0.10*{1} + 125" tests/data/shade.tif out.tif

Results in a new hillshade with shade values of 125 instead of 0.
@ljbade
Copy link

ljbade commented Feb 7, 2015

Is the {} indicating a band? I'd prefer an letter instead as it makes it clearer it is a variable. Currently looks like you are just doing 0.1 * 1 + 125

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.07%) to 96.06% when pulling e49822e on TI-35 into dc0307c on master.

1 similar comment
@coveralls
Copy link

Coverage Status

Coverage decreased (-0.07%) to 96.06% when pulling e49822e on TI-35 into dc0307c on master.

@ljbade
Copy link

ljbade commented Feb 7, 2015

Also how you specify which band(s) the result is assigned to? perhaps a = in there like {1}=..., but make it optional incase you just want the same calculation all all the bands.

@sgillies
Copy link
Member Author

sgillies commented Feb 8, 2015

@ljbade I'm experimenting with curly braces because I like GNU Parallel's positional replacement syntax. To specify a specific raster band (a 2-D numpy ndarray), hybridization of Parallel and Numpy indexing syntax might work: {1,1} as the first band of the first input dataset.

Not sure about output assignment yet.

I'm keeping different functions per band in mind, that's very important. I've seen good results using Python's multithreading module with Rasterio and we should use it here, definitely.

@ljbade
Copy link

ljbade commented Feb 8, 2015

Interesting. I look forward to use this for my normal map stuff. Save me ~600GB of intermediate data.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.03%) to 96.1% when pulling 7405a99 on TI-35 into dc0307c on master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.38%) to 95.75% when pulling d8f3d9d on TI-35 into dc0307c on master.

1 similar comment
@coveralls
Copy link

Coverage Status

Coverage decreased (-0.38%) to 95.75% when pulling d8f3d9d on TI-35 into dc0307c on master.

Also test reducing a 3-band file to 1.
@coveralls
Copy link

Coverage Status

Coverage decreased (-0.38%) to 95.75% when pulling 66ae338 on TI-35 into dc0307c on master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.38%) to 95.76% when pulling dfae6af on TI-35 into 99e9689 on master.

1 similar comment
@coveralls
Copy link

Coverage Status

Coverage decreased (-0.38%) to 95.76% when pulling dfae6af on TI-35 into 99e9689 on master.

@sgillies
Copy link
Member Author

sgillies commented Feb 8, 2015

Here's an example of command syntax for making a fake panchromatic image from the RGB.byte.tif file in rasterio/tests. One command line (ended by ;) results in one output band.

$ rio calc "({1,1} + {1,2} + {1,3})/3;" tests/data/RGB.byte.tif /tmp/faux-pan.tif --dtype uint8

faux-pan

rio-calc could be used to copy a dataset like this:

$ rio calc "{1}" tests/data/RGB.byte.tif /tmp/copy.tif --dtype uint8

Here's an example of making an RGB image from our greyscale hillshade. Three command lines, one for each output band.

$ rio calc "{1,1} + 125; {1,1}; {1,1};" tests/data/shade.tif /tmp/red-shade.tif --dtype uint8

red-shade

Yes? No? As I commented in the code, writing an actual parser for the commands needs to be done.

Other TODOs:

  • Consider other representations of datasets and bands. Let me come straight out and say it: I don't like gdal_calc's and think we can do better. I also don't think what I'm doing above is the only better syntax.
  • Syntax for using other Numpy functions?
  • Syntax for new arrays (like numpy.zeros_like() and numpy.ones_like())?
  • Import of other modules using entry points?

@coveralls
Copy link

Coverage Status

Coverage increased (+0.18%) to 96.32% when pulling aa5c81f on TI-35 into 99e9689 on master.

@ljbade
Copy link

ljbade commented Feb 10, 2015

hmm for some reason i still dont get the parallel syntax, just very confusing in complicated expressions with several input variables, i still prefer letters as it is closer to real math syntax

@ljbade
Copy link

ljbade commented Feb 10, 2015

the qgis thing might help with understandability tho

@coveralls
Copy link

Coverage Status

Coverage increased (+0.47%) to 96.61% when pulling c6d8bfe on TI-35 into 99e9689 on master.

Sean Gillies and others added 5 commits February 12, 2015 13:52
Old syntax traded in for an easy to parse lisp-like syntax.
No need for a temporary variable and much more secure.
Conflicts:
	CHANGES.txt
	requirements-dev.txt
@coveralls
Copy link

Coverage Status

Coverage increased (+0.15%) to 96.54% when pulling 82dec1c on TI-35 into 3f4d5ed on master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.15%) to 96.54% when pulling 89ed798 on TI-35 into 3f4d5ed on master.

First I enhanced fillnodata() so that it could use direct band
references. After that, I extended snuggs, giving rio-calc access
to band() and fillnodata() functions.
@coveralls
Copy link

Coverage Status

Coverage increased (+0.17%) to 96.57% when pulling a47904c on TI-35 into 3f4d5ed on master.

@sgillies
Copy link
Member Author

I've patched @snorfalorpagus's fillnodata() function into rio-calc by extending snuggs. This also required extending snuggs to add rasterio.band() (explained below).

Reminder, our test dataset is a RGB trapezoid surrounded by nodata pixels:

rgb byte

Running the command

$ rio calc "(asarray (fillnodata (band 1 1)) (fillnodata (band 1 2)) (fillnodata (band 1 3)))" \
> tests/data/RGB.byte.tif --dtype uint8 /tmp/out.fill.tif && \
> open /tmp/out.fill.tif

Results in:

fillnodata

Crazy, but exactly the crazy result you'd expect from GDAL's nodata filling algorithm in this case.

Unlike (read 1 1), which makes a Numpy array, (band 1 1) gets a direct reference to a GDAL band and bypasses Numpy array creation. That is to say that the command above fairly rips and is just as fast as using gdal_fillnodata.py.

@brendan-ward @perrygeo what do you think? This is ready to be merged if you agree.

@snorfalorpagus
Copy link
Contributor

@sgillies This looks really good. I've not used lisp before, but I like the look of the syntax.

If you add support for importing functions from other modules (e.g. scipy) it might be worth considering some sort of .rc file, so that a user can define a set of commonly used macros.

@sgillies
Copy link
Member Author

@snorfalorpagus yes, extension from the command line or .rc would be great.

I also plan to add map and band functions, letting us replace the command above with (asarray (map (fillnodata (bands 1)))), as soon as this lands.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.18%) to 96.58% when pulling 361f4f5 on TI-35 into 3f4d5ed on master.

Sean Gillies added 2 commits February 18, 2015 09:47
Sieving of images read directly from GDAL was broken, but is now
fixed, with tests.
@coveralls
Copy link

Coverage Status

Coverage increased (+0.22%) to 96.61% when pulling 0fc110b on TI-35 into 3f4d5ed on master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.22%) to 96.61% when pulling fcb6153 on TI-35 into 3f4d5ed on master.


def read_array(ix, subix=None, dtype=None):
"""Change the type of a read array"""
arr = snuggs._ctx.lookup(ix, subix)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sgillies 2 comments here:

  • looks like you are using a internal API from the signature ._ctx; if you are using it outside snuggs, it probably should be a public API, e.g., snuggs.ctx.lookup() or similar. (going here w/ the convention that leading underscores denote internal functions / objects)
  • some extra validation would be nice here, with simpler error messages. I can make this produce a long exception using:

$ rio calc "(read 2)" tests/data/RGB.byte.tif /tmp/test.tif

read: invalid band index would be a fine message in this case.

@brendan-ward
Copy link
Contributor

@sgillies Great work here! A few overall comments:

  • I think you've got nearly the right level of detail in cli.rst for overviews of all the available rio commands, and shouldn't add a ton more detail there for calc, but linking from there to a dedicated documentation file for calc commands would be super helpful, and give room for many more examples. For example, it would be good to have an example using band (e.g., above).
  • a cross-reference to the syntax in snuggs would be good for the documentation, with caveats about what commands should and should not (if any) be used in this context.
  • better validation of commands from snuggs - so that simple warnings are thrown when commands are not valid or known (mostly a snuggs issue but also relevant here since you've extended its commands here). For example, $ rio calc "(bogus)" tests/data/RGB.byte.tif /tmp/test.tif gives a long exception, when a simple invalid command: bogus would probably be appropriate.

I find myself belaboring validation here because given the complexity of the commands you can issue, it is easy to get things wrong, and simple error messages would make it a whole lot easier to find simple mistakes.

I'm also struggling with take vs read and band (which is it using behind the scenes: read or band?). I would almost prefer that instead of a new command like take that read take either a aliased dataset or index: (read first_raster 1) and (read 1 1) could be synonymous if we use --name first_raster=tests/data/RGB.byte.tif.

@sgillies
Copy link
Member Author

@brendan-ward thanks for the comments. I'm working on more documentation this evening. I totally agree about more user-friendly error messages.

@sgillies
Copy link
Member Author

Using the new friendly errors from snuggs (1.2 out soon), we get:

$ rio calc "band 1 1) 42)" tests/data/shade.tif --dtype uint8 /tmp/out.sieved.tif && open /tmp/out.sieved.tif
Expression Error:
  band 1 1) 42)
  ^
Expected "("

and

$ rio calc "(steve (band 1 1) 42)" tests/data/shade.tif --dtype uint8 /tmp/out.sieved.tif && open /tmp/out.sieved.tif
Expression Error:
  (steve (band 1 1) 42)
   ^
'steve' is not a function or operator

Tests for these coming up.

@brendan-ward Enhancing read... maybe. I want to reserve the option of making dataset reading lazier in the future and I think that means keeping read and take separate. take operates on ndarrays (not files) and is the only means of indexing we have here and allows reordering of bands, etc.

band is the shortcut that is only possible to use with the GDAL algorithm-backed functions like sieve and fillnodata. I'll look for a way to hide this without using too much magic.

@brendan-ward
Copy link
Contributor

@sgillies Ok on read, band, and take; points are well taken about lazy reading and performance - but I think documentation to explain them well will go a long way! I certainly was confused and others could well become so in the future.

Thanks for the snuggs fixes. Sort of begs the question of why steve isn't yet implemented though. :)

@sgillies sgillies merged commit fcb6153 into master Mar 20, 2015
@brendan-ward brendan-ward deleted the TI-35 branch May 28, 2015 04:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants