-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Holes and bad data in topographical variable grids #19
Comments
Problem seems to be in input grids: Look at ldist_tovalley! library(raster)
library(sp)
source("./load_grids.R")
makeMap <- function(topovar) {
map <- rasterToPoints(subset(topostack, topovar))
map <- data.frame(map)
names(map) <- c("x", "y", "var")
ggplot(map, aes(x,y, fill=var)) +
geom_raster()
}
makeMap("ldist_tovalley") |
I'm finding weird artefacts in other input layers as well. These need to be better checked before we use them. Some are ok, but many have problems: makeMap("radiation")
makeMap("relelev_l")
makeMap("ruggedness")
makeMap("specific_catchment.area") # few missing values, but some
makeMap("upflow") # also few missing pts
makeMap("zdist_valley") # Same huge artefacts as in ldist_tovalley |
And more problems with the input layers. For example, the MSD layer is useless (is it calculated correctly?) Note quantiles for values:
We are wasting our time running models on these inputs until they are debugged. |
Looks like we both noticed this independently and I did not see your issue which is why it shows up as a duplicate issue. I checked many of these but yes, some are messed up. The MSD grid calls we srtange from the start, which I commented to you months ago when I saw tise same weird values after following call steps we both agreed on. I will spend some time checking this. |
Ok, here is the full list of tests for all 14 variables. Every input layer has artefacts. For two of those, this appears to be only the streams themselves (elev and wetness). But all others have problems affecting more of the landscape -- although these may all result from the stream burn in. # MSD is clearly wrong, so no need to map it, see above.
makeMap("elev") # Ok except for streams with neg values
makeMap("ldist_tovalley") # Ack, missing huge portions of the region
makeMap("zdist_valley") # Same huge artefacts as in ldist_tovalley. Terrible
makeMap("relelev_l") # same as above
makeMap("ruggedness") # ditto
makeMap("relelev_z") # Steams visible as artefacts and a few small holes that are puzzling
makeMap("radiation") # Vertical artefacts -- high value streaks across landscape that ignore topography
makeMap("specific_catchment.area") # Some artefects, looks related to streams
makeMap("upflow") # same as above, but more artefacts in western portion -- streaking
makeMap("relelev_watershed_minmax") # weird "boundaries" showing watershed max. But one cross a drainage!
makeMap("wetness") # streams are only issue. Looks good otherwise
makeMap("slope_degrees") # no holes, but big artefacts. Streams?
makeMap("ldist_ridge") # artefacts in west, distribution seemas completely wrong. |
All files have been remade, are hole-free (minus wetness), and the old code (before your last changes) runs with the new .asc files. Now that code runs, I will check on wetness and close issue once I inspect. The formula for wetness calc/s is: Ln("fac"/Tan("slope")). I just found a discussion board and will try: Ln("fac" / Tan("slope" / 180/3)) instead in raster calculator. |
All grids including wetness recalculated and hole free, but model performance drops significantly when wetness included so wetness is being left out of model for DM. Will reevaluate for other two ranges the utility of including wetness as a predictor variable. |
Let's leave this open and close with the merge as best practice. That keeps the solution at hand when reviewing past issues. Since this is only solved in theory, not in the repo. |
- Move ascii grids to folders by mountain range and remove from git. We should add code to download these from dropbox or elsewhere but for now they must simply exist locally under ./topo_grids/DM, etc. - Extract topo variables from grids, remove necessity of having sensors-topo.csv -- now these values are extracted as points from the raster:stack objects - Clean up PCA code to use slightly modified sensor file formats now Fixes #15 (all grids have same extent now) Addresses #19 but we still need to add other mountain ranges
- Move ascii grids to folders by mountain range and remove from git. We should add code to download these from dropbox or elsewhere but for now they must simply exist locally under ./topo_grids/DM, etc. - Extract topo variables from grids, remove necessity of having sensors-topo.csv -- now these values are extracted as points from the raster:stack objects - Clean up PCA code to use slightly modified sensor file formats now Fixes #15 (all grids have same extent now) Addresses #19 but we still need to add other mountain ranges
Ok, as of new data (Oct 30, 2015) I see the following issues. Can you check these and explain, @hpoulos ? I've just checked the DM layers. Actual holes seem limited to wetness layer. I'd really like to see the code for all of these. At the least we need to walk through them together as there is something going on with discrete polygons that makes no sense to me.
The map code with notes as comments: makeMap("elev")
makeMap("flow_accum")
makeMap("ldist_ridge") # good
makeMap("ldist_valley") # good
makeMap("msd") #
makeMap("radiation") # ok
makeMap("relelev_l") # good
makeMap("relelev_shed") # Not sure waht this one means, how can there be a relative elevation in a watershed unless bottom is set to sea level?
makeMap("relelev_z") # not sure what is going on
makeMap("slope") # degrees, not radians? Check code.
makeMap("wetness") # HOLES!
makeMap("zdist_ridge") # looks ok
makeMap("zdist_valley") # Weird polygon artefacts |
There are still problems with the topo variable grids. The main issues are radiation, flow_accum, and maybe wetness. radiation is really important -- I know in california what a good predictor insolation is of microhabitat species distribution. The problems appear consistent across mtn ranges, so I will walk through the issues for the CM below
> quantile(temp$flow_accum, c(0.05, 0.25, 0.5, 0.75, 0.95, 0.99, 1))
5% 25% 50% 75% 95% 99% 100%
0 1 2 5 17 45 854 And for the DM there are NA values. Those need to be filled. |
Script used to evaluate grids per #19
What software are you looking at these grids in? It is a bit clearer for some of these in ArcGIS to understand the distributions. As for radiation, you are likely right. I did these calculations in the native WGS1984 projection, and although I specified a z-factor for the appropriate latitude as indicated is neccessary here http://resources.arcgis.com/en/help/main/10.2/index.html#/Area_Solar_Radiation/009z000000t5000000/ It is correct that flow accumulation has skewed values. Only valley bottoms have high values. I do not see no data values (holes) in the DM flow accumulation grid. Is the code you used on git so I can run it and see what you mean? As for MSD, outliers are fine because there are few topographically complex locations on the landscape relative to the total area of the landscape. Wetness: Higher wetness values represent drainage depressions, lower values represent crests and ridges: Yes there are some strange properties at low values. I have always been skeptical of this value because it was the main culprit of the holes in the grids in the past. |
Ok, I misspoke re zdist_valley It was just some discrete value issues which is expected. So radiation is the main problem. The other stuff is probably all correct, just some strange distributions. I'll check missing values again to make sure I was right about that. All the code is in the repo. See script added in c5b2b1c |
Re missing flow_accum data in DM see https://github.com/schwilklab/skyisland-climate/blob/master/scripts/test-gis-grids.R#L25 Or simply run: source("./load_grids.R")
topodf <- data.frame(rasterToPoints(DM.topostack))
any(is.na(topodf$flow_accum))
# [1] TRUE
sum(is.na(topodf$flow_accum))
# [1] 1316 So the reason for this is that there are missing elevation values for the DM: any(is.na(topodf$elev))
# [1] TRUE And here they are:
R. ARC GIS seems to (automatically? or probably you are just skilled with it) do some color scaling to make things pretty, and that is fine, but I need to know what the values actually are. I can get the colors pretty in R with log scales, etc which arc is doing as well. The data are the data so unless there are mistakes in my code, it does not matter what software I'm using. If ARC disagrees with my R code results, I believe my R code or I find the error in my code :). Other possibility would be errors on reading the data from the grids, but we would need to solve that in any case and I don't think there is any problem there. |
OK. so re zdist_valley: I deleted the last post. The error does not appear to be in the raw ascii grid but it is happening in reading the ascii data in for this layer. I opened the file by hand and see the normal range values in meters and I manually read the grid and get a map just like you show above. So this error is in the R code. I'm tracking it down now. |
So, the problem with that one grid was in reading the variable names when there were reading failures because of bad extents. I have downloaded the data again from @hpoulos's dropbox and am assuming that is the canonical correct data. I will close this issue and open separate issues for those topo variable problems still remaining since there are quite different issues for different layers (eg radiation appears to be actual errors, whereas there is something funny going on with ridge definitions that may not be an error. |
There is missing data and weird tearing in the raster predictions from random forest models. Is this a problem with NAs in the input grids or in that there are prediction failures for some topo variable combinations?
Example showing this. If you add the following code the end of the random-forest-tmin.R script to make a pretty map:
You see the empty NA values in the predPC2 raster
Do we have some issue with undefined values?
The text was updated successfully, but these errors were encountered: