Review of ch12 #782

Nowosad · 2022-04-21T17:34:41Z

Hi @jannes-m -- I read the whole chapter (before the most recent changes) and learned a lot!

I also added a number of rather small changes and have a few comments:

I would suggest cleaning most of the commented lines in the chapter.
Line 145 -- contains a link to datacamp materials. Given the problematic history of the company, I suggest to replace it.
Line 463 -- You say "...as expected...". I think it would not be expected by many people not familiar with this topic. I would suggest adding a sentence or a few explaining why this is expected.
Line 631 -- I would suggest adding an example runtime (e.g., "on a modern laptop it took XX seconds")

Merge remote-tracking branch 'origin/main' into rev12 # Conflicts: # 12-spatial-cv.Rmd

jannes-m · 2022-04-21T19:17:50Z

Jakub, thanks a lot for taking the time to review and improve the chapter, very much appreciated! I like your comments, and will address them in this PR - hopefully within the next week.

Robinlovelace · 2022-04-21T19:24:30Z

On a different but related (see CI checks) note, any idea what is causing this message?

Quitting from lines 403-408 (15-eco.Rmd) 
Error in sc[, 1] : incorrect number of dimensions
Calls: local ... eval_with_user_handlers -> eval -> eval -> data.frame
In addition: Warning message:
In `$.crs`(attr(geom, "crs"), "wkt") :
  CRS uses proj4string, which is deprecated.

Merge branch 'main' into rev12 # Conflicts: # 12-spatial-cv.Rmd

Merge branch 'rev12' of github.com:Robinlovelace/geocompr into rev12 # Conflicts: # 12-spatial-cv.Rmd

jannes-m · 2022-04-21T20:53:44Z

On a different but related (see CI checks) note, any idea what is causing this message?

Quitting from lines 403-408 (15-eco.Rmd) 
Error in sc[, 1] : incorrect number of dimensions
Calls: local ... eval_with_user_handlers -> eval -> eval -> data.frame
In addition: Warning message:
In `$.crs`(attr(geom, "crs"), "wkt") :
  CRS uses proj4string, which is deprecated.

I have just merged the main branch into this PR branch, and I guess this should resolve the build issue.

jannes-m · 2022-04-21T21:29:12Z

mmh, Jakub had already merged the main branch, and I should have pulled that earlier. In any case, I can build chapter 15 locally, so I am not quite sure what the problem is...

Robinlovelace · 2022-04-21T21:34:33Z

No problem, we can merge this and fix any issues later if there are any.

jannes-m · 2022-04-21T23:20:24Z

I couldn't really solve the problem and I have really no idea why sc[, 1] did not work in the build process. But at least I could circumvent the problem by saving the response-predictor matrix as an rds file and reading it back in again.

Robinlovelace

These are good changes, I suggest making any final tweaks and merging, or can merge now and do post-merge fixes. Sound reasonable @jannes-m ? Thanks, chapter looking really good.

Robinlovelace · 2022-04-24T09:41:36Z

12-spatial-cv.Rmd

@@ -30,9 +30,9 @@ Required data will be attached in due course.

 Statistical learning\index{statistical learning} is concerned with the use of statistical and computational models for identifying patterns in data and predicting from these patterns.
 Due to its origins, statistical learning\index{statistical learning} is one of R's\index{R} great strengths (see Section \@ref(software-for-geocomputation)).^[
-Applying statistical techniques to geographic data has been an active topic of research for many decades in the fields of Geostatistics, Spatial Statistics and point pattern analysis [@diggle_modelbased_2007; @gelfand_handbook_2010; @baddeley_spatial_2015].
+Applying statistical techniques to geographic data has been an active topic of research for many decades in the fields of geostatistics, spatial statistics and point pattern analysis [@diggle_modelbased_2007; @gelfand_handbook_2010; @baddeley_spatial_2015].


Robinlovelace · 2022-04-24T09:42:13Z

12-spatial-cv.Rmd

 ]
-Statistical learning\index{statistical learning} combines methods from statistics\index{statistics} and machine learning\index{machine learning} and its methods can be categorized into supervised and unsupervised techniques.
+Statistical learning\index{statistical learning} combines methods from statistics\index{statistics} and machine learning\index{machine learning} and can be categorized into supervised and unsupervised techniques.


One question: could we rename the chapter Geostatistical learning?

No, wouldn't do that, geostatistics is basically a field of its own and we are not doing geostatistics here

Robinlovelace · 2022-04-24T09:42:37Z

12-spatial-cv.Rmd

@@ -79,7 +79,7 @@ data("lsl", "study_mask", package = "spDataLarge")
 ta = terra::rast(system.file("raster/ta.tif", package = "spDataLarge"))
 ```

-This should load three objects: a `data.frame` named `lsl`, an `sf` object named `study_mask` and a `SpatRaster` (see Section \@ref(raster-classes)) named `ta` containing terrain attribute rasters.
+The above code loads three objects: a `data.frame` named `lsl`, an `sf` object named `study_mask` and a `SpatRaster` (see Section \@ref(raster-classes)) named `ta` containing terrain attribute rasters.


Robinlovelace · 2022-04-24T09:42:44Z

12-spatial-cv.Rmd

@@ -90,28 +90,26 @@ The 175 non-landslide points were sampled randomly from the study area, with the
 # library(tmap)
 # data("lsl", package = "spDataLarge")
 # ta = terra::rast(system.file("raster/ta.tif", package = "spDataLarge"))
-# lsl_sf = sf::st_as_sf(lsl, coords = c("x", "y"), crs = 32717)
+# lsl_sf = sf::st_as_sf(lsl, coords = c("x", "y"), crs = "EPSG:32717")


Robinlovelace · 2022-04-24T09:43:40Z

12-spatial-cv.Rmd

- `cprof`: profile curvature (rad m^-1^) as a measure of flow acceleration, also known as downslope change in slope angle.
- `elev`: elevation (m a.s.l.) as the representation of different altitudinal zones of vegetation and precipitation in the study area.
- `log10_carea`: the decadic logarithm of the catchment area (log10 m^2^) representing the amount of water flowing towards a location.
+- `slope` -  slope angle (°)


I prefer the previous style here, but without the full stops. So I would make it:

- `slope`: slope angle (°)

@Robinlovelace The new style is consistent with the style in chapters 1-8

@Robinlovelace The new style is consistent with the style in chapters 1-8

Which parts @Nowosad ? Had a quick look at the output below.

rg ' - ' *.Rmd README.Rmd 5: 55: index.Rmd 11: - geocompr.bib 12: - packages.bib _15-ex.Rmd 145: mtry = paradox::p_int(lower = 1, upper = ncol(task$data()) - 1), 14-location.Rmd 332: iter = iter - 1 15-eco.Rmd 65:For this, we will make use of a random forest model\index{random forest} - a very popular machine learning\index{machine learning} algorithm [@breiman_random_2001]. 520: 526: mtry = paradox::p_int(lower = 1, upper = ncol(task$data()) - 1), 650:all(values(pred - pred_2) == 0) 13-transport.Rmd 66:# code that generated the input data - see also ?bristol_ways 576:# online figure - backup 609: - Bonus: find two ways of arriving at the same answer. 617: - Bonus: what proportion of trips cross the proposed routes? 618: - Advanced: write code that would increase this proportion. 633: - Bonus: develop a raster layer that divides the Bristol region into 100 cells (10 by 10) and provide a metric related to transport policy, such as number of people trips that pass through each cell by walking or the average speed limit of roads, from the `bristol_ways` dataset (the approach taken in Chapter \@ref(location)). _12-ex.Rmd 22: - Slope 23: - Plan curvature 24: - Profile curvature 25: - Catchment area 11-algorithms.Rmd 193:abs(T1[1, 1] * (T1[2, 2] - T1[3, 2]) + 194: T1[2, 1] * (T1[3, 2] - T1[1, 2]) + 195: T1[3, 1] * (T1[1, 2] - T1[2, 2]) ) / 2 213:i = 2:(nrow(poly_mat) - 2) 222: abs(x[1, 1] * (x[2, 2] - x[3, 2]) + 223: x[2, 1] * (x[3, 2] - x[1, 2]) + 224: x[3, 1] * (x[1, 2] - x[2, 2]) ) / 2 294: x[1, 1] * (x[2, 2] - x[3, 2]) + 295: x[2, 1] * (x[3, 2] - x[1, 2]) + 296: x[3, 1] * (x[1, 2] - x[2, 2]) 327: i = 2:(nrow(poly_mat) - 2) 343: i = 2:(nrow(poly_mat) - 2) 412: - Which of the best practices covered in Section \@ref(scripts) does it follow? 413: - Create a version of the script on your computer in an IDE\index{IDE} such as RStudio\index{RStudio} (preferably by typing-out the script line-by-line, in your own coding style and with your own comments, rather than copy-pasting --- this will help you learn how to type scripts). Using the example of a square polygon (e.g., created with `poly_mat = cbind(x = c(0, 0, 9, 9, 0), y = c(0, 9, 9, 0, 0))`) execute the script line-by-line. 414: - What changes could be made to the script to make it more reproducible? 415:  417: 418: 421: - How could the documentation be improved? 422:  424: - Reproduce the results on your own computer with reference to the script `10-centroid-alg.R`, an implementation of this algorithm (bonus: type out the commands - try to avoid copy-pasting). 426: - Are the results correct? Verify them by converting `poly_mat` into an `sfc` object (named `poly_sfc`) with `st_polygon()` (hint: this function takes objects of class `list()`) and then using `st_area()` and `st_centroid()`. 434: - Bonus 1: Think about why the method only works for convex hulls and note changes that would need to be made to the algorithm to make it work for other types of polygon. 436: - Bonus 2: Building on the contents of `10-centroid-alg.R`, write an algorithm\index{algorithm} only using base R functions that can find the total length of linestrings represented in matrix form. 439: - Verify it works by running `poly_centroid_sf(sf::st_sf(sf::st_sfc(poly_sfc)))` 440: - What error message do you get when you try to run `poly_centroid_sf(poly_mat)`? 10-gis.Rmd 223:In our case, three arguments seem important - `INPUT`, `OVERLAY`, and `OUTPUT`. 426:The U.S. Army - Construction Engineering Research Laboratory (USA-CERL) created the core of the Geographical Resources Analysis Support System (GRASS)\index{GRASS} [Table \@ref(tab:gis-comp); @neteler_open_2008] from 1982 to 1995. 430:Here, we introduce **rgrass**\index{rgrass (package)} with one of the most interesting problems in GIScience - the traveling salesman problem\index{traveling salesman}. 434:In our case, the number of possible solutions correspond to `(25 - 1)! / 2`, i.e., the factorial of 24 divided by 2 (since we do not differentiate between forward or backward direction). 435:Even if one iteration can be done in a nanosecond, this still corresponds to `r format(factorial(25 - 1) / (2 * 10^9 * 3600 * 24 * 365))` years. 627: 631: 635: 636: 637: 651: 734:#> Extent: (-180.000000, -89.900000) - (179.999990, 83.645130) 896:As a final note, if your data is getting too big for PostgreSQL/PostGIS and you require massive spatial data management and query performance, then the next logical step is to use large-scale geographic querying on distributed computing systems, as for example, provided by GeoMesa (http://www.geomesa.org/) or Apache Sedona [https://sedona.apache.org/; formermly known as GeoSpark - @huang_geospark_2017]. 908: - **RQGIS**, **RSAGA** and **rgrass7** 909: - **sf** 08-read-write-plot.Rmd 97: 101: 105: 213: 223: 224: 413: 414: 604:A KML file stores geographic information in XML format - a data format for the creation of web pages and the transfer of data in an application-independent way [@nolan_xml_2014]. 09-mapping.Rmd 314: rect(xleft = 0:(n - 1), ybottom = i - 1, xright = 1:n, ytop = i - 0.2, 317: text(rep(-0.1, n_colors), (1: n_colors) - 0.6, labels = titles, xpd = TRUE, adj = 1) 344:```{r na-sb, message=FALSE, fig.cap="Map with additional elements - a north arrow and scale bar.", out.width="50%", fig.asp=1, fig.scap="Map with a north arrow and scale bar."} 481:```{r insetmap1, message=FALSE, fig.cap="Inset map providing a context - location of the central part of the Southern Alps in New Zealand.", fig.scap="Inset map providing a context."} 487:Inset map can be saved to file either by using a graphic device (see Section \@ref(visual-outputs)) or the `tmap_save()` function and its arguments - `insets_tm` and `insets_vp`. 809: # abort old way of including - mixed content issues 1011:Additionally, it is possible to modify the `intermax` argument - maximum number of iterations for the cartogram transformation. 1080: - Name two advantages of each based on the experience. 1081: - Name three other mapping packages and an advantage of each. 1082: - Bonus: create three more maps of Africa using these three packages. 1084: - Bonus: improve the map aesthetics, for example by changing the legend title, class labels and color palette. 1089: - Change the default colors to match your perception of the land cover categories 1090: - Add a scale bar and north arrow and change the position of both to improve the map's aesthetic appeal 1091: - Bonus: Add an inset map of Zion National Park's location in the context of the Utah state. (Hint: an object representing Utah can be subset from the `us_states` dataset.) 1093: - With one facet showing HDI and the other representing population growth (hint: using variables `HDI` and `pop_growth`, respectively) 1094: - With a 'small multiple' per country 1097: - Showing first the spatial distribution of HDI scores then population growth 1098: - Showing each country in order 1100: - With **tmap** 1101: - With **mapview** 1102: - With **leaflet** 1103: - Bonus: For each approach, add a legend (if not automatically provided) and a scale bar 1105: - In the city you live, for a couple of users per day 1106: - In the country you live, for dozens of users per day 1107: - Worldwide for hundreds of users per day and large data serving requirements 1109: - Using `textInput()` 1110: - Using `selectInput()` _05-ex.Rmd 50:nrow(nz_height_near_cant) # 75 - 5 more 146:plot(srtm_resampl_all - srtm_resampl1, range = c(-300, 300)) 147:plot(srtm_resampl_all - srtm_resampl2, range = c(-300, 300)) 148:plot(srtm_resampl_all - srtm_resampl3, range = c(-300, 300)) 149:plot(srtm_resampl_all - srtm_resampl4, range = c(-300, 300)) 150:plot(srtm_resampl_all - srtm_resampl5, range = c(-300, 300)) 05-geometry-operations.Rmd 235:To achieve that, each object is firstly shifted in a way that its center has coordinates of `0, 0` (`(nz_sfc - nz_centroid_sfc)`). 241:nz_scale = (nz_sfc - nz_centroid_sfc) * 0.5 + nz_centroid_sfc 264:The `rotation` function accepts one argument `a` - a rotation angle in degrees. 269:nz_rotate = (nz_sfc - nz_centroid_sfc) * rotation(30) + nz_centroid_sfc 289:nz_scale_rotate = (nz_sfc - nz_centroid_sfc) * 0.25 * rotation(90) + nz_centroid_sfc 295:nz_shear = (nz_sfc - nz_centroid_sfc) * shearing(1.1, 0) + nz_centroid_sfc 776:- Nearest neighbor - assigns the value of the nearest cell of the original raster to the cell of the target one. 778:- Bilinear interpolation - assigns a weighted average of the four nearest cells from the original raster to the cell of the target one (Figure \@ref(fig:bilinear)). The fastest method for continuous rasters 779:- Cubic interpolation - uses values of 16 nearest cells of the original raster to determine the output cell value, applying third-order polynomial functions. Used for continuous rasters. It results in a more smoothed surface than the bilinear interpolation, but is also more computationally demanding 780:- Cubic spline interpolation - also uses values of 16 nearest cells of the original raster to determine the output cell value, but applies cubic splines (piecewise third-order polynomial functions) to derive the results. Used for continuous rasters 781:- Lanczos windowed sinc resampling - uses values of 36 nearest cells of the original raster to determine the output cell value. Used for continuous rasters^[More detailed explanation of this method can be found at https://gis.stackexchange.com/a/14361/20955.] 840: 841: 842: 851:- `gdalinfo` - lists various information about a raster file, including its resolution, CRS, bounding box, and more 852:- `gdal_translate` - converts raster data between different file formats 853:- `gdal_rasterize` - converts vector data into raster files 854:- `gdalwarp` - allows for raster mosaicing, resampling, cropping, and reprojecting _04-ex.Rmd 52:# Calculate n. points in each region - this contains the result 169:E7. Calculate the Normalized Difference Water Index (NDWI; `(green - nir)/(green + nir)`) of a Landsat image. 178: (nir - red) / (nir + red) 184: (green - nir) / (green + nir) 233:plot(distance_to_coast_km - distance_to_coast_km2) 04-spatial-operations.Rmd 1003:NDVI&= \frac{\text{NIR} - \text{Red}}{\text{NIR} + \text{Red}}\\ 1014:The raster object has four satellite bands - blue, green, red, and near-infrared (NIR). 1019: (nir - red) / (nir + red) 1064:```{r focal-example, echo = FALSE, fig.cap = "Input raster (left) and resulting output raster (right) due to a focal operation - finding the minimum value in 3-by-3 moving windows.", fig.scap="Illustration of a focal operation."} 03-attribute-operations.Rmd 550:Alternatively, we can use one of **dplyr** functions - `mutate()` or `transmute()`. _02-ex.Rmd 16:# - Its geometry type? 18:# - The number of countries? 20:# - Its coordinate reference system (CRS)? 36:# - What does the `cex` argument do (see `?plot`)? 38:# - Why was `cex` set to the `sqrt(world$pop) / 10000`? 40:# - Bonus: experiment with different ways to visualize the global population. 01-introduction.Rmd 375: 02-spatial-data.Rmd 117: 423:# not printed - enough of these figures already (RL) 450:# Plotted - it is referenced in ch5 (st_cast) 1042:The degree of compression is often referred to as *flattening*, defined in terms of the equatorial radius ($a$) and polar radius ($b$) as follows: $f = (a - b) / a$. The terms *ellipticity* and *compression* can also be used. 1055:Both datums in Figure \@ref(fig:datum-fig) are put on top of a geoid - a model of global mean sea level.^[Please note that the geoid on the Figure exaggerates the bumpy surface of the geoid by a factor of 10,000 to highlight the irregular shape of the planet.] 1075:There are three main groups of projection types - conic, cylindrical, and planar (azimuthal). _03-ex.Rmd 131: mutate(pop_dens_diff_10_15 = pop_dens_15 - pop_dens_10, 136:E11. Change the columns' names in `us_states` to lowercase. (Hint: helper functions - `tolower()` and `colnames()` may help.) 144:The new object should have only two variables - `median_income_15` and `geometry`. 159: mutate(pov_change = poverty_level_15 - poverty_level_10) 165: mutate(pov_pct_change = pov_pct_15 - pov_pct_10) 196:r[c(1, 9, 81 - 9 + 1, 81)]

I based it on the example from ch5.

Found it: https://geocompr.robinlovelace.net/geometric-operations.html#resampling

I think the style is fine but the dash in

slope - slope angle (°)

Does not seem standard. Same with

Nearest neighbor - assigns the value of the nearest cell of the original raster to the cell of the target one. It is fast and usually suitable for categorical rasters

Also if there are full stops (periods) in the bullet points there should a bullet point at the end: https://www.instructionalsolutions.com/blog/bulleted-list-punctuation

Regarding colons vs dashes, I think both

slope --- slope angle (°)

and

slope: slope angle (°)

Would be right with the former being an 'em dash'

@Robinlovelace I am fine with colons -- we just need to use them consistently.

jannes-m · 2022-04-24T16:21:35Z

These are good changes, I suggest making any final tweaks and merging, or can merge now and do post-merge fixes. Sound reasonable @jannes-m ? Thanks, chapter looking really good.

Good plan!

As discussed here: https://github.com/Robinlovelace/geocompr/pull/782

jannes-m · 2022-10-20T08:54:06Z

Hey @Nowosad,
first of all, thanks again for reviewing and improving the chapter. Finally, I am addressing your comments but better late than never.

I would suggest cleaning most of the commented lines in the chapter.

I have deleted most of the comments.

Line 145 -- contains a link to datacamp materials. Given the problematic history of the company, I suggest to replace it.

I have deleted the link.

Line 463 -- You say "...as expected...". I think it would not be expected by many people not familiar with this topic. I would suggest adding a sentence or a few explaining why this is expected.

I have pointed to the corresponding section where we explain in detail why it is to be expected that due to the negligence of spatial autocorrelation a non-spatial cv will yield higher AUROC values than a spatial cv, which basically is overfitting.

Line 631 -- I would suggest adding an example runtime (e.g., "on a modern laptop it took XX seconds")

Ok, I have added that the code can easily run for half a day on a modern laptop

Finally, I have deleted your to dos:

And I think you are referring to this pipe here:

# compute the AUROC as a data.table
score_spcv_glm = rr_spcv_glm$score(measure = mlr3::msr("classif.auc")) %>%
  # keep only the columns you need
  .[, .(task_id, learner_id, resampling_id, classif.auc)]

You probably have to use %>% here, since |> does not support the dot notation or well it does but then you would have to use an anonymous (lambda) function. And I think this makes the syntax overly complicated (if it works at all here in combination with data.table).

Nowosad · 2022-10-21T13:52:53Z

Thanks @jannes-m. The only think I would suggest changing is related to your last answer. This is the only place in the book where we use the %>%. Thus, I just think it would be better to remove the pipe entirely, and just split the code into two lines here...

jannes-m · 2022-10-21T14:03:37Z

OK, fair enough, will do so!

Nowosad added 5 commits April 20, 2022 12:54

starts rev

6916cb6

cleans plotting code

3a7a011

adds some new fixes

64adfd4

completes cleaning ch12

09314a3

merges chanes

4bddbdc

Merge remote-tracking branch 'origin/main' into rev12 # Conflicts: # 12-spatial-cv.Rmd

jannes-m added 2 commits April 21, 2022 22:50

resolve merge conflicts

5ea0419

Merge branch 'main' into rev12 # Conflicts: # 12-spatial-cv.Rmd

resolve merge conflict with Robin's latest commit

cec166a

Merge branch 'rev12' of github.com:Robinlovelace/geocompr into rev12 # Conflicts: # 12-spatial-cv.Rmd

Merge branch 'main' into rev12

b56b773

Robinlovelace approved these changes Apr 24, 2022

View reviewed changes

Robinlovelace merged commit 4554629 into main Apr 24, 2022

Robinlovelace deleted the rev12 branch April 24, 2022 17:18

Robinlovelace referenced this pull request Apr 24, 2022

Update with consistent bullet points/colons

22babdf

As discussed here: https://github.com/Robinlovelace/geocompr/pull/782

Robinlovelace mentioned this pull request Apr 24, 2022

Update with consistent bullet points/colons #783

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Review of ch12 #782

Review of ch12 #782

Nowosad commented Apr 21, 2022

jannes-m commented Apr 21, 2022

Robinlovelace commented Apr 21, 2022

jannes-m commented Apr 21, 2022

jannes-m commented Apr 21, 2022

Robinlovelace commented Apr 21, 2022

jannes-m commented Apr 21, 2022

Robinlovelace left a comment •

edited

Robinlovelace Apr 24, 2022

Robinlovelace Apr 24, 2022

jannes-m Apr 24, 2022

Robinlovelace Apr 24, 2022

Robinlovelace Apr 24, 2022

Robinlovelace Apr 24, 2022

Nowosad Apr 24, 2022

Robinlovelace Apr 24, 2022

Nowosad Apr 24, 2022

Robinlovelace Apr 24, 2022

Robinlovelace Apr 24, 2022

Nowosad Apr 24, 2022

jannes-m commented Apr 24, 2022

jannes-m commented Oct 20, 2022

Nowosad commented Oct 21, 2022

jannes-m commented Oct 21, 2022

Review of ch12 #782

Review of ch12 #782

Conversation

Nowosad commented Apr 21, 2022

jannes-m commented Apr 21, 2022

Robinlovelace commented Apr 21, 2022

jannes-m commented Apr 21, 2022

jannes-m commented Apr 21, 2022

Robinlovelace commented Apr 21, 2022

jannes-m commented Apr 21, 2022

Robinlovelace left a comment • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jannes-m commented Apr 24, 2022

jannes-m commented Oct 20, 2022

Nowosad commented Oct 21, 2022

jannes-m commented Oct 21, 2022

Robinlovelace left a comment •

edited