Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Holes and bad data in topographical variable grids #19

Closed
dschwilk opened this issue Oct 16, 2015 · 16 comments
Closed

Holes and bad data in topographical variable grids #19

dschwilk opened this issue Oct 16, 2015 · 16 comments
Assignees

Comments

@dschwilk
Copy link
Contributor

There is missing data and weird tearing in the raster predictions from random forest models. Is this a problem with NAs in the input grids or in that there are prediction failures for some topo variable combinations?

Example showing this. If you add the following code the end of the random-forest-tmin.R script to make a pretty map:

tminmap <- rasterToPoints(predPC2)
tminmap <- data.frame(tminmap)
names(tminmap) <- c("x", "y", "PC2")
ggplot(tminmap, aes(x,y, fill=PC2)) +
    geom_raster()

You see the empty NA values in the predPC2 raster

Do we have some issue with undefined values?

@dschwilk
Copy link
Contributor Author

Problem seems to be in input grids: Look at ldist_tovalley!

library(raster)
library(sp)
source("./load_grids.R")

makeMap <- function(topovar) {
    map <- rasterToPoints(subset(topostack, topovar))
    map <- data.frame(map)
    names(map) <- c("x", "y", "var")
    ggplot(map, aes(x,y, fill=var)) +
        geom_raster()
}

makeMap("ldist_tovalley")

@dschwilk
Copy link
Contributor Author

I'm finding weird artefacts in other input layers as well. These need to be better checked before we use them.

Some are ok, but many have problems:
eg

makeMap("radiation")
makeMap("relelev_l")
makeMap("ruggedness")
makeMap("specific_catchment.area") # few missing values, but some
makeMap("upflow") # also few missing pts
makeMap("zdist_valley") # Same huge artefacts as in ldist_tovalley

@dschwilk
Copy link
Contributor Author

And more problems with the input layers. For example, the MSD layer is useless (is it calculated correctly?)

Note quantiles for values:

 quantile(map$MSD)
  0%  25%  50%  75% 100% 
   0    0    0    0    7 

# and all integer values?  Every single one a "7"!
 quantile(map$MSD[map$MSD>6])
 0%  25%  50%  75% 100% 
   7    7    7    7    7 

We are wasting our time running models on these inputs until they are debugged.

@hpoulos
Copy link
Collaborator

hpoulos commented Oct 18, 2015

Looks like we both noticed this independently and I did not see your issue which is why it shows up as a duplicate issue. I checked many of these but yes, some are messed up. The MSD grid calls we srtange from the start, which I commented to you months ago when I saw tise same weird values after following call steps we both agreed on. I will spend some time checking this.

@dschwilk
Copy link
Contributor Author

Ok, here is the full list of tests for all 14 variables. Every input layer has artefacts. For two of those, this appears to be only the streams themselves (elev and wetness). But all others have problems affecting more of the landscape -- although these may all result from the stream burn in.

# MSD is clearly wrong, so no need to map it, see above.
makeMap("elev")  # Ok except for streams with neg values
makeMap("ldist_tovalley") # Ack, missing huge portions of the region
makeMap("zdist_valley") # Same huge artefacts as in ldist_tovalley. Terrible
makeMap("relelev_l")    # same as above
makeMap("ruggedness")   # ditto
makeMap("relelev_z")  # Steams visible as artefacts and a few small holes that are puzzling
makeMap("radiation")  # Vertical artefacts -- high value streaks across landscape that ignore topography
makeMap("specific_catchment.area") # Some artefects, looks related to streams
makeMap("upflow") # same as above, but more artefacts in western portion -- streaking
makeMap("relelev_watershed_minmax") # weird "boundaries"  showing watershed max.  But one cross a drainage!
makeMap("wetness") # streams are only issue.  Looks good otherwise
makeMap("slope_degrees")  # no holes, but big artefacts. Streams?
makeMap("ldist_ridge") # artefacts in west, distribution seemas completely wrong.

@dschwilk dschwilk changed the title predicted PC surfaces have interior NA values in random-forest.R and tmin / tmax versions Holes and bad data in topographical variable grids Oct 22, 2015
@hpoulos
Copy link
Collaborator

hpoulos commented Oct 28, 2015

All files have been remade, are hole-free (minus wetness), and the old code (before your last changes) runs with the new .asc files. Now that code runs, I will check on wetness and close issue once I inspect. The formula for wetness calc/s is: Ln("fac"/Tan("slope")). I just found a discussion board and will try: Ln("fac" / Tan("slope" / 180/3)) instead in raster calculator.

@hpoulos
Copy link
Collaborator

hpoulos commented Oct 28, 2015

All grids including wetness recalculated and hole free, but model performance drops significantly when wetness included so wetness is being left out of model for DM. Will reevaluate for other two ranges the utility of including wetness as a predictor variable.

@hpoulos hpoulos closed this as completed Oct 28, 2015
@dschwilk dschwilk reopened this Oct 28, 2015
@dschwilk
Copy link
Contributor Author

Let's leave this open and close with the merge as best practice. That keeps the solution at hand when reviewing past issues. Since this is only solved in theory, not in the repo.

dschwilk added a commit that referenced this issue Oct 29, 2015
- Move ascii grids to folders by mountain range and remove from git.  We
should add code to download these from dropbox or elsewhere but for now
they must simply exist locally under ./topo_grids/DM, etc.

- Extract topo variables from grids, remove necessity of having
sensors-topo.csv -- now these values are extracted as points from the
raster:stack objects

- Clean up PCA code to use slightly modified sensor file formats now

Fixes #15 (all grids have same extent now)
Addresses #19 but we still need to add other mountain ranges
dschwilk added a commit that referenced this issue Oct 29, 2015
- Move ascii grids to folders by mountain range and remove from git.  We
should add code to download these from dropbox or elsewhere but for now
they must simply exist locally under ./topo_grids/DM, etc.

- Extract topo variables from grids, remove necessity of having
sensors-topo.csv -- now these values are extracted as points from the
raster:stack objects

- Clean up PCA code to use slightly modified sensor file formats now

Fixes #15 (all grids have same extent now)
Addresses #19 but we still need to add other mountain ranges
@dschwilk
Copy link
Contributor Author

Ok, as of new data (Oct 30, 2015) I see the following issues. Can you check these and explain, @hpoulos ? I've just checked the DM layers. Actual holes seem limited to wetness layer. I'd really like to see the code for all of these. At the least we need to walk through them together as there is something going on with discrete polygons that makes no sense to me.

  1. Is that western portion of DM correct? Flat plain?
  2. zdist_valley has strange discrete steps that show up as irregular polygons. I don't believe this one.
  3. relev_shed: This is a weird variable anyway but transitions are strange. But that is because watersheds have arbitrary lowest elevation? I don't think we should use this. We care about what is above a point, not below. How can one define the low point in a watershed which is an idea that only applies above, not below?
  4. relev_z: hard to see, may just be a funny really non-linear distribution
  5. slope: this looks like degrees not radians. Just checking to make sure this is used correctly in later calculations. I can't image degrees being useful but I guess you code converts?
  6. wetness: this appears to be the cause of the holes in the output: CHECK! this layer has scattered holes

The map code with notes as comments:

makeMap("elev")  
makeMap("flow_accum")
makeMap("ldist_ridge")  # good
makeMap("ldist_valley") # good
makeMap("msd")  # 
makeMap("radiation") # ok
makeMap("relelev_l") # good
makeMap("relelev_shed") # Not sure waht this one means, how can there be a relative elevation in a watershed unless bottom is set to sea level?
makeMap("relelev_z") # not sure what is going on
makeMap("slope") # degrees, not radians? Check code.
makeMap("wetness") # HOLES!
makeMap("zdist_ridge") # looks ok
makeMap("zdist_valley") # Weird polygon artefacts

@dschwilk
Copy link
Contributor Author

There are still problems with the topo variable grids. The main issues are radiation, flow_accum, and maybe wetness. radiation is really important -- I know in california what a good predictor insolation is of microhabitat species distribution.

The problems appear consistent across mtn ranges, so I will walk through the issues for the CM below

  1. radiation: the data appears truncated at max values of 2727980, then there is a gap, and a large number of outliers. The "outliers" are all on very flat slopes. This is error. There is no way to have a real distribution like that for insolation/radiation.

radiation

  1. flow_accum is also wonky with crazy outliers. Here are the quantiles for the CM:
> quantile(temp$flow_accum, c(0.05, 0.25, 0.5, 0.75, 0.95, 0.99, 1))
  5%  25%  50%  75%  95%  99% 100% 
   0    1    2    5   17   45  854 

And for the DM there are NA values. Those need to be filled.

  1. msd. This may be ok. But there are some really high outliers again. Not as crazy as some of the others, however
  2. wetness. Probably ok. Some strange discontinuities:
    wetness
  3. zdist_valley: I'm surprised by polygons. These should be continuous variables. But distribution looks ok.

dschwilk added a commit that referenced this issue Mar 22, 2016
Script used to evaluate grids per #19
@hpoulos
Copy link
Collaborator

hpoulos commented Mar 23, 2016

What software are you looking at these grids in? It is a bit clearer for some of these in ArcGIS to understand the distributions. As for radiation, you are likely right. I did these calculations in the native WGS1984 projection, and although I specified a z-factor for the appropriate latitude as indicated is neccessary here http://resources.arcgis.com/en/help/main/10.2/index.html#/Area_Solar_Radiation/009z000000t5000000/
reprojecting the DEM prior to calculating area radiation provides a normally distributed range of values, at least for BB which I just redid. I am redoing these now for each mountain range in NAD83 which is a projection in m.

It is correct that flow accumulation has skewed values. Only valley bottoms have high values.
image

I do not see no data values (holes) in the DM flow accumulation grid. Is the code you used on git so I can run it and see what you mean?

As for MSD, outliers are fine because there are few topographically complex locations on the landscape relative to the total area of the landscape.
image

Wetness: Higher wetness values represent drainage depressions, lower values represent crests and ridges:
image

Yes there are some strange properties at low values. I have always been skeptical of this value because it was the main culprit of the holes in the grids in the past.

zdist_valley: I do not see polygons. These are raster data.
image

@dschwilk
Copy link
Contributor Author

Ok, I misspoke re zdist_valley It was just some discrete value issues which is expected. So radiation is the main problem. The other stuff is probably all correct, just some strange distributions. I'll check missing values again to make sure I was right about that. All the code is in the repo. See script added in c5b2b1c

@dschwilk
Copy link
Contributor Author

Re missing flow_accum data in DM see https://github.com/schwilklab/skyisland-climate/blob/master/scripts/test-gis-grids.R#L25

Or simply run:

source("./load_grids.R")
topodf <- data.frame(rasterToPoints(DM.topostack))
any(is.na(topodf$flow_accum))
# [1] TRUE
sum(is.na(topodf$flow_accum))
# [1] 1316

So the reason for this is that there are missing elevation values for the DM:

any(is.na(topodf$elev))
# [1] TRUE

And here they are:

dm-missing-elevations

What software are you looking at these grids in? It is a bit clearer for some of these in ArcGIS to
understand the distributions.

R. ARC GIS seems to (automatically? or probably you are just skilled with it) do some color scaling to make things pretty, and that is fine, but I need to know what the values actually are. I can get the colors pretty in R with log scales, etc which arc is doing as well. The data are the data so unless there are mistakes in my code, it does not matter what software I'm using. If ARC disagrees with my R code results, I believe my R code or I find the error in my code :). Other possibility would be errors on reading the data from the grids, but we would need to solve that in any case and I don't think there is any problem there.

@dschwilk
Copy link
Contributor Author

OK. so re zdist_valley:

I deleted the last post. The error does not appear to be in the raw ascii grid but it is happening in reading the ascii data in for this layer. I opened the file by hand and see the normal range values in meters and I manually read the grid and get a map just like you show above. So this error is in the R code. I'm tracking it down now.

@dschwilk
Copy link
Contributor Author

Ok, so zdist_valley is fine!
zdist_valley-cm

Now I need to track down where something is going wrong in reading that one grid.

@dschwilk
Copy link
Contributor Author

So, the problem with that one grid was in reading the variable names when there were reading failures because of bad extents. I have downloaded the data again from @hpoulos's dropbox and am assuming that is the canonical correct data. I will close this issue and open separate issues for those topo variable problems still remaining since there are quite different issues for different layers (eg radiation appears to be actual errors, whereas there is something funny going on with ridge definitions that may not be an error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants