Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

user-friendly stack python function #1273

Open
choldgraf opened this issue Feb 7, 2018 · 11 comments
Open

user-friendly stack python function #1273

choldgraf opened this issue Feb 7, 2018 · 11 comments

Comments

@choldgraf
Copy link
Contributor

choldgraf commented Feb 7, 2018

We're using rasterio to teach students mapping analytic pipelines, and have found that we consistently use some kind of python "stack" function. I know that rasterio has a stack function already, but this doesn't feel user-facing to me. There's no docstring and the parameters seem quite complicated. Has the community discussed having a helper function that only takes a list of rasters and attempts to stack them into a single raster file? E.g., something like:

def stack(band_paths, out_path, return_raster=True):
    """Stack a set of bands into a single file.
    
    Parameters
    ----------
    bands : list of file paths
        A list with paths to the bands you wish to stack. Bands
        will be stacked in the order given in this list.
    out_path : string
        A path for the output file.
    return_raster : bool
        Whether to return a refernce to the opened output
        raster file.
    """
    # Read in metadata
    first_band = rio.open(band_paths[0], 'r')
    meta = first_band.meta.copy()

    # Replace metadata with new count and create a new file
    counts = 0
    for ifile in band_paths:
        with rio.open(ifile, 'r') as ff:
            counts += ff.meta['count']
    meta.update(count=counts)
    with rio.open(out_path, 'w', **meta) as ff:
        for ii, ifile in tqdm(enumerate(band_paths)):
            bands = rio.open(ifile, 'r').read()
            if bands.ndim != 3:
                bands = bands[np.newaxis, ...]
            for band in bands:
                ff.write(band, ii+1)

    if return_raster is True:
        return rio.open(out_path)

I'm sure there are edge cases I'm not thinking about above, but wanted to mention this in case others have the same feeling.

@lwasser
Copy link
Contributor

lwasser commented Mar 6, 2018

funny enough i'm working with @choldgraf and was searching for raster stack functionality in rasterio given the code we are working on for our courses. Is there any chance that something like the above or something outward facing would be considered for this package. i'm going to rope in @carsonfarmer as he may also have some ideas on this. we stack image data over and over in our field and it would be very useful to be able to stack data with the ultimate goal of plotting and even cropping / plotting. i'd love everyone's advice on how to do this best. i thought the above suggestion was clever as it maintains spatial information vs numpy based.

@sgillies
Copy link
Member

sgillies commented Mar 6, 2018

Please understand that rasterio.rio.stack() is meant to be used as a command line function and in that context has pretty good usage help.

$ rio stack --help
Usage: rio stack [OPTIONS] INPUTS... OUTPUT

  Stack a number of bands from one or more input files into a multiband
  dataset.

  Input datasets must be of a kind: same data type, dimensions, etc. The
  output is cloned from the first input.

  By default, rio-stack will take all bands from each input and write them
  in same order to the output. Optionally, bands for each input may be
  specified using a simple syntax:

    --bidx N takes the Nth band from the input (first band is 1).

    --bidx M,N,0 takes bands M, N, and O.

    --bidx M..O takes bands M-O, inclusive.

    --bidx ..N takes all bands up to and including N.

    --bidx N.. takes all bands from N to the end.

  Examples, using the Rasterio testing dataset, which produce a copy.

    rio stack RGB.byte.tif -o stacked.tif

    rio stack RGB.byte.tif --bidx 1,2,3 -o stacked.tif

    rio stack RGB.byte.tif --bidx 1..3 -o stacked.tif

    rio stack RGB.byte.tif --bidx ..2 RGB.byte.tif --bidx 3.. -o stacked.tif

Options:
  -o, --output PATH            Path to output file (optional alternative to a
                               positional arg).
  -f, --format, --driver TEXT  Output format driver
  -b, --bidx TEXT              Indexes of input file bands.
  --rgb                        Set RGB photometric interpretation.
  --force-overwrite            Always overwrite an existing output file.
  --co, --profile NAME=VALUE   Driver specific creation options.See the
                               documentation for the selected output driver
                               for more information.
  --help                       Show this message and exit.

@lwasser @choldgraf there's a precedent for extracting the core of CLI commands like rio-stack and making new rasterio functions. The dataset_features() function is the newest one: https://github.com/mapbox/rasterio/blob/master/rasterio/features.py#L517 and the model for how these interfaces should look: these functions should take dataset objects as arguments, not paths, for the reasons explained at http://python-patterns.guide/gang-of-four/factory-method/#dodge-use-dependency-injection (I think this example is very good). A stack() function in Rasterio should, in my opinion, look like this.

def stack(sources, dest):
    """Stack bands from a number of sources
    
    Parameters
    ----------
    sources : sequence of dataset objects
        Opened in 'r' mode.
    dest : dataset object
        Opened in 'w' mode.
    """
    ...

I realize that there is a problem here for a stack function in that the number of output bands, and therefore the arguments used to open the output file, depends on the inputs. The good news is that it's only a few lines of user code to sort this out.

import contextlib

with contextlib.ExitStack() as stack:
    sources = [stack.enter_context(rasterio.open(path, **kwds)) for path in band_paths]

    dest_kwargs = sources[0].meta
    dest_count = sum(src.count for src in sources)
    dest_kwargs['count'] = dest_count

    with rasterio.open(out_path, 'w', **dest_kwargs) as dest:
        stack(sources, dest) 

In a nutshell, what I'm saying is that, yes, the stack() you've proposed looks useful, but that Rasterio's command line interface (under rasterio.rio) is intended to be the only part of the project that interacts with files, all of the other Rasterio API functions should deal with dataset objects (opened files), and that I'd like to see users take responsibility for opening those files. Does that make sense?

@choldgraf
Copy link
Contributor Author

Please understand that rasterio.rio.stack() is meant to be used as a command line function and in that context has pretty good usage help.

For sure, I think that's clear from the docs. The challenge is that more and more, classes are being taught in interactive python (e.g. jupyter notebooks/lab) rather than with shell scripts, which provides limited functionality for this kind of thing :-/

I appreciate you giving some suggested code for figuring out the count etc. It sounds like a stack function similar to what I showed above is out of scope for rasterio...if I have a moment I'll try to hack together something that works at the DataSet object level and make a PR if the project is friendly to it

@sgillies
Copy link
Member

sgillies commented Mar 6, 2018

choldgraf thanks for the reply. I wrote mine late last night and was worried that it wasn't very coherent. I'm a bit concerned to read that Rasterio is out of touch with the way that data science or GIS is being taught and that its API isn't considered user-friendly or user-facing. It's because we're working at different levels of abstraction, yes? I can see the utility of a library of functions that operates on GIS data files (like https://github.com/perrygeo/python-rasterstats), for bash-like scripting in a notebook, but that's not what Rasterio is meant to be.

@lwasser
Copy link
Contributor

lwasser commented Mar 6, 2018

Hey @choldgraf @sgillies - i really appreciate this discussion and the expertise that you both bring to it! @sgillies i'd REALLY love your input and direction here with respect to teaching data science. I teach many students and am working hard to make the entrance into python more user friendly for them! It's a fantastic language but sometimes there is a barrier (imagine those new to programming). if i reduce that barrier more students will use python :)

We are using rasterio in our course lessons now because it's a wonderful and useful package. THANK YOU!! However, we would like a bit more "front-end" functionality for interactive environments like jupyter notebooks and ipython. Given we found ourselves repeating tasks like raster stacks and clipping often, it made sense to build helper functions such as presented above by @choldgraf It's overwhelming at the beginning to dive into CLI coding for many students. We want early wins!

EXAMPLE: https://github.com/earthlab/earthlabpy/blob/master/earthlabpy/utils.py

To give you a sense of what i'm working on - my R course is here (lesson with lots of raster stacks linked): https://earthdatascience.org/courses/earth-analytics/multispectral-remote-sensing-modis/normalized-burn-index-dNBR/ . I'm rewriting the course in python and using rasterio as it's great. And ofcourse that course when completed will be published as an open resource, online. yay python! i'm happy to add you to the repo to view the notebooks (it's private given answers to student homeworks!

What do you suggest that we do? We could build helper functions on our end to help students along like what Chris started and maintain them in a package in our organization. OR if it makes sense (and you please tell me if i'm not making any sense!) we could TRY our best to integrate with your efforts and contribute a bit more front end / interactive envt interface capability . I want to do what's best for the community in the end :) And my only goal is to help students get started with spatial python approaches. I want to reduce the barriers when possible!

Again i appreciate this discussion and have no intent to redirect the rasterio efforts in a way that defeats it's true intent! I truly appreciate efforts like yours that build up the open source python community of tools!

@sgillies
Copy link
Member

sgillies commented Mar 8, 2018

@lwasser thanks for taking the time to explain how you're using the package! I'm eager to help and your comments are something I can take to my manager as an argument for getting more time to spend on Rasterio :)

I've got a colleague with a background in ArcGIS who has reminded me of some programming patterns we're using at work that could help us make a user-friendly and maintainable stack implementation. Can I ping you on a sketchy pull release that I'll make next week?

@lwasser
Copy link
Contributor

lwasser commented Mar 8, 2018

hey @sgillies TOTALLY -- ping me on a sketchy PR . I'd love to help where I can.

Also - i wrote this using @choldgraf 's code as a starting place. is this a bit closer to something the project might support? If not no worries! I'm trying to follow the guidelines that you outlined wrt keeping file i/o outside of functions where I can. And i'm thinking about this as I build my raster data lessons and write helper functions to reduce redundant code!

def stack(sources, dest):
    """Stack a set of bands into a single file.

    Parameters
    ----------
    sources : list of rasterio dataset objects
        A list with paths to the bands you wish to stack. Objects
        will be stacked in the order provided in this list.
    dest : a rio.open writable object that will store raster data.
    """

    if not type(sources[0]) == rio._io.RasterReader:
        raise ValueError("The sources object should be of type: rasterio.RasterReader")

    for ii, ifile in enumerate(sources):
            bands = sources[ii].read()
            if bands.ndim != 3:
                bands = bands[np.newaxis, ...]
            for band in bands:
                dest.write(band, ii+1)

@choldgraf
Copy link
Contributor Author

choldgraf commented Mar 8, 2018 via email

@sgillies
Copy link
Member

@choldgraf @lwasser sorry for the delay, been tied up with some other stuff. Instead of a PR, I've put up an RFC based on our stack utility discussion here: #1300. I'd really like to get some feedback from other contributors on this before I start writing code.

@lwasser
Copy link
Contributor

lwasser commented Mar 26, 2018

no prob @sgillies i saw the discussion and will have a closer look. THANK YOU for being willing to spend energy on this!!

I also have a general question about performing several data manipulation tasks on a data set like

  1. stack a set of tif tiles
  2. crop the stacked set of tifs
  3. reproject the stack of tifs

This may be outside the scope of rasterio but i wondered if I have to open and close a new file for each of the tasks above? OR, is there a clever way to combine them before writing out an object. i'm just looking for your guidance here. Also please tell me if this is not the appropriate spot to ask such a question!

@sgillies
Copy link
Member

sgillies commented Apr 11, 2018

@lwasser sorry for the delay! I'm happy to answer here. Rasterio's dataset objects are intended to support operations like the above without necessarily needing to write to a file on disk between each step. For example, if you have functions that stack and crop and take dataset objects, you should be able to do this in Rasterio 1.0:

with rasterio.open("file1.tif", "r") as first, rasterio.open("file2.tif", "w+") as stacked:

    stack(first, stacked)

    with rasterio.open("file3.tif", "w") as cropped:

        crop(stacked, cropped)

I say 1.0 because the "w+" (write and read) mode is very new. Currently, you can get the same effect with "w" mode, but that behavior is going away in 1.0. I think a workaround for now is to reopen the "stacked" file between operations.

with rasterio.open("file1.tif", "r") as first, rasterio.open("file2.tif", "w") as stacked:

    stack(first, stacked)

with rasterio.open("file2.tif", "r") as stacked, rasterio.open("file3.tif", "w") as cropped:

    crop(stacked, cropped)

Does this make sense?

@sgillies sgillies modified the milestones: 1.0, post 1.0 Jun 21, 2018
@snowman2 snowman2 modified the milestones: post 1.0, post-1.3 Nov 18, 2022
@sgillies sgillies removed this from the post-1.3 milestone Dec 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants