Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add rf_clip and rf_rescale and rf_normalize #218

Closed
metasim opened this issue Jul 29, 2019 · 2 comments · Fixed by #449
Closed

Add rf_clip and rf_rescale and rf_normalize #218

metasim opened this issue Jul 29, 2019 · 2 comments · Fixed by #449
Labels
enhancement New feature or request
Milestone

Comments

@metasim
Copy link
Member

metasim commented Jul 29, 2019

@vpipkt Interested in your thoughts on this.

@metasim metasim changed the title Add rf_clip and rf_rescale Add rf_clip and rf_rescale and rf_normalize Jul 29, 2019
@vpipkt
Copy link
Member

vpipkt commented Jul 29, 2019

I have looked into such things a little in the past but never gone very far. Seems maybe geotrellis IfCell could be useful and may be we can expose something like that along the way.

rf_rescale is interesting because that can be done on a per tile basis, but could also imagine a case where the user wants to consider the column min and max. See optional parameters. Similar sentinment for normalize. Easy to do it locally, more powerful to consider entire column mean and standard deviation, but less clear about implementation.

I think there may be some functionality for rescaling in geotrellis, but not exactly sure.

Possible function signatures / doc

rf_ifcell

 Tile rf_ifcell(test_tile, true_tile, false_tile)
 Tile rf_ifcell(test_tile, true_scalar, false_scalar) 

(possibly with (tile, tile, scalar) and (tile, scalar, tile) sigantures also

Return a tile full of true_tile or true_scalar values where test_tile is true, else false_tile or false_scalar. In evaluating test_tile non-zero data values are true, zeros and NoData are interpreted as false.

An alternate name may be rf_where (after numpy).

rf_clip

 Tile rf_clip(tile, min, max)

Return a tile where values below min are set to min and values above max are set to max.

rf_rescale

Tile rf_rescale(Tile tile, [Double min, Double max] )

Rescale cell values such that the minimum is zero and the maximum is one. Other values will be linearly interpolated into the range.

If specified the min parameter will become the zero value and the max parameter will become the max. Values outside that range will be clipped to 0 or 1.

rf_standardize

rf_standardize(Tile tile, [Double mean, Double stddev])

Standardize cell values such that the mean is zero and the standard deviation is one. If specified the mean and stddev are applied to all tiles in the column. If not specified, each tile will be standardized according to the statistics of its cell values.

@metasim
Copy link
Member Author

metasim commented Jul 30, 2019

One thing to work out is that in the case where you don't provide the scaling parameters and expect the system to compute it for you, you can't actually have a columnar function (a regular expression) invoke an aggregation first. Either the user would have to compute the summary statistics first and pass the resultant structure in, or pass in the dataframe for the code to invoke it first.

@metasim metasim added the enhancement New feature or request label Aug 6, 2019
@metasim metasim added this to the 0.8.2 milestone Aug 23, 2019
@metasim metasim modified the milestones: 0.8.2, 0.8.3 Sep 23, 2019
@metasim metasim modified the milestones: 0.8.3, 0.8.4 Oct 9, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants