reviewing chapter_3 #70

jannes-m · 2017-08-10T09:02:28Z

I have also started to review chapter_3. For more information, see specific comments.

jannes-m · 2017-08-10T09:03:47Z

03-attribute-operations.Rmd

-The subsetting functions `[` from base R and `filter()` from the **tidyverse**, for example, can also be used for spatial subsetting: the skills are cross-transferable.
-This chapter therefore provides the foundation for Chapter \@ref(spatial-data-operations), in terms of structure and input data.
+The subsetting functions `[` from base R and `filter()` from the **tidyverse**, for example, are also applicable to spatial data: the skills are cross-transferable.
+This chapter, therefore, provides the foundation for Chapter \@ref(spatial-data-operations) in terms of structure and input data.


What exactly do you mean by foundation in terms of structure?

I mean the structure of c4 mirrors that of c3. Does this sound any better?

This chapter therefore provides the basis for Chapter \@ref(spatial-data-operations).

You could also say something about it mirroring the structure if you can find the right form of words.

Ok, I see. In have just clarified this.

jannes-m · 2017-08-10T09:04:20Z

03-attribute-operations.Rmd

-The reason for this is that simple features have their own class, which behave simultaneously as geographic data objects (e.g. plotting as maps) and square tables (e.g. with attribute columns referred to with the `$` operator).
+As outlined in Chapter \@ref(spatial-class), **sf** provided the support for simple features in R.
+Additionally, **sf** added methods to generic R functions such as `plot()` and `summary()` to work with simple features. To convince yourself run for example `methods("summary")` and/or `methods("plot")`.
+<!--The reason for this is that simple features have their own class, which behave simultaneously as geographic data objects (e.g., plotting as maps) and square tables (e.g., with attribute columns referred to with the `$` operator).-->


I am not sure what this means, can you please clarify (The reason for this is that simple features...)?

I think that commented bit can safely be deleted: we discuss the fact that sf objects are also data frames at some length. Not sure To convince yourself is the best form of words - maybe this would be a more appropriate sentence to replace lines 35:37:

As outlined in Chapter \@ref(spatial-class), **sf** provided support for simple features in R and made them work with generic R functions such as `plot()` and `summary()` (as can be seen by executing `methods("summary")` and/or `methods("plot")`).

Ok, I deleted the commented part and also adopted your wording. Thanks.

jannes-m · 2017-08-10T09:07:15Z

03-attribute-operations.Rmd

 section.^[
-Unlike objects of class `Spatial` defined by the **sp** package, `sf` objects are also compatible with **dplyr** and **data.table** packages, which provide fast and powerful functions for data manipulation (see [Section 6.7](https://csgillespie.github.io/efficientR/data-carpentry.html#data-processing-with-data.table) of @gillespie_efficient_2016).
+Unlike objects of class `Spatial` of the **sp** package, `sf` objects are also compatible with the packages **dplyr** and **data.table** (at least in theory). Both packages provide fast and powerful functions for data manipulation (see [Section 6.7](https://csgillespie.github.io/efficientR/data-carpentry.html#data-processing-with-data.table) of @gillespie_efficient_2016).


I am not sure if sf-object really work well with data.table, I guess they sometimes do, and sometimes not. Edzer also said at the UseR-conference that if somebody would like to see sf working with data.table, he is happy to include corresponding pull requests (he did the same with the tidyverse).

Good point - maybe just delete the bit about data.table: there is no point mentioning it as we do not use it in the book and it could cause confusion. Suggest:

Unlike objects of class `Spatial` of the **sp** package, `sf` objects are also compatible with the **tidyverse** packages **dplyr** and **ggplot2**. The former provides fast and powerful functions for data manipulation (see [Section 6.7](https://csgillespie.github.io/efficientR/data-carpentry.html#data-processing-with-data.table) of @gillespie_efficient_2016) and the latter provides powerful plotting capabilities.

Perfect. I have incorporated that. Thanks again.

jannes-m · 2017-08-10T09:09:44Z

03-attribute-operations.Rmd


+<!--


I was unsure if the subsequent two pipe examples are really needed.

Remove them then ; )

I deleted the two pipe examples.

Robinlovelace · 2017-08-10T09:21:13Z

Great you've started this - please polish anything else in c1 first though so we can merge that and reduce the number of PRs and increase my headspace.

jannes-m · 2017-08-12T17:08:09Z

03-attribute-operations.Rmd


 ## Attribute data aggregation 

 <!-- https://github.com/ropenscilabs/skimr ?? -->

-As demonstrated in chapter \@ref(spatial-class), `summary()` provides a quick summary of the spatial and non-spatial components of spatial objects.
-Enter the following command to for an overview of the `world` object and all its variables (result not shown):
+<!-- As demonstrated in chapter \@ref(spatial-class), `summary()` provides a quick summary of the spatial and non-spatial components of spatial objects.


The comparison is a bit unfair. The summary function is more generic in nature and can be applied to a multitude of classes for summary statistics.
dplyr::summarize is basically an aggregation function, hence, a comparison with tapply, aggregate or by would be fairer. Ok, you can also use dplyr::summarize for summary statistics but so you can with aggregate, etc. So I suggest to either drop the summary-comparison or to compare with a base R aggregate function.

Ok, then I delete the summary part. And since we explain aggregate later on, there is no need to mention it here, ok?

jannes-m · 2017-08-12T17:25:23Z

03-attribute-operations.Rmd

@@ -275,15 +255,15 @@ world_continents = world %>%
 world_continents
 ```

-`sf` objects are well-integrated with the **tidyverse**, as illustrated by the fact that the aggregated objects preserve the geometry of the original `world` object.
+`sf` objects are well-integrated with the **tidyverse**, as illustrated by the fact that the aggregated objects preserve the geometry of the original `world` object.^[Such a spatial aggregation of polygon data is know as "to dissolve polygons" in the GIS world.]


Any idea why some borders are still preserved? The same happens if using a GIS in the background:

test <- run_qgis("saga:polygondissolvebyattribute", POLYGONS = world, FIELD_1 = "continent", DISSOLVED = "out.shp", BND_KEEP = "False", load_output = TRUE)

Hence, I guess the input geometry is somewhat unclean...

And though we perform attribute aggregation here, it is also a spatial operation (dissolving). Borrowing from this blog:

nc <- st_read(system.file("shape/nc.shp", package="sf"), quiet = TRUE) # add an arbitrary grouping variable nc_groups <- nc %>% mutate(group = sample(LETTERS[1:3], nrow(.), replace = TRUE)) # average area by group nc_mean_area <- nc_groups %>% group_by(group) %>% summarise(area_mean = mean(AREA)) # plot ggplot(nc_mean_area) + geom_sf(aes(fill = area_mean)) + scale_fill_distiller("Area", palette = "Greens") + ggtitle("Mean area by group") + theme_bw()

Notice that in addition to the attribute data being aggregated, the geometries have been aggregated as well. All geometries in each group have been combined together and the boundaries between adjacent geometries dissolved. Internally, the function st_union() is used to achieve this.

So I suggest to point this out clearly or to move the entire aggregation subsection to chapter 4.

Yes that's a good plan - it's important to note that it does a spatial data operation 'under the hood' which is clever.

Ok, so what about adding this sentence:
What is more, under the hood sf is already doing a spatial aggregation of polygon data which is known as 'dissolving polygons' in the GIS world - an operation we will explain in more detail in the the next chapter.

Robinlovelace

This is a great set of changes @jannes-m - thanks for the attention-to-detail. I suggest that after a few changes, based on my comments below, we merge this PR later today. Let me know when you think it's 'done' (as with c1 we can always revisit contents).

Robinlovelace · 2017-08-14T08:20:26Z

03-attribute-operations.Rmd

@@ -2,78 +2,88 @@

 ## Prerequisites {-}

- This chapter requires **tidyverse** and **sf**:
+- This chapter requires the packages **tidyverse** and **sf**:


Great to be explicit, thanks for clarifying that for readers.

Robinlovelace · 2017-08-14T08:21:22Z

03-attribute-operations.Rmd


 ```{r, message=FALSE}
 library(sf)
 library(tidyverse)
 ```

- You must have loaded the `world` and `worldbank_df` data which are loaded automatically by the **spData** package:
+- We will also make use of the the `world` and `worldbank_df` data sets. Note that loading the **spData** package automatically attaches these data sets to your global environment:


Again, great attention to detail in the description, thank for that.

Robinlovelace · 2017-08-14T08:26:45Z

03-attribute-operations.Rmd


 ```{r, results='hide'}
 library(spData)
 ```

 ## Introduction

-Attribute data is non-spatial information associated with geographic data.
-In the context of simple features, introduced in the previous chapter, this means a data frame with a column for each variable and one row per geographic feature stored in the `geom` list-column of `sf` objects.
+Attribute data is non-spatial information, e.g., the name of a bus station, associated with geographic data, e.g. the coordinate of this bus station.


Not keen on using acronyms such as i.e. or e.g. mid-text, especially when it's surrounded by enclosing commas - does not flow great. I propose the line is changed to the following:

Attribute data is non-spatial information associated with geographic (geometry) data. A bus station, for example, could be represented by a field containing it's name (attribute data), associated with its latitude and longitude position (geometry data).

Ok, I'll remember that! Changed as requested.

Robinlovelace · 2017-08-14T08:29:20Z

03-attribute-operations.Rmd

-Attribute data is non-spatial information associated with geographic data.
-In the context of simple features, introduced in the previous chapter, this means a data frame with a column for each variable and one row per geographic feature stored in the `geom` list-column of `sf` objects.
+Attribute data is non-spatial information, e.g., the name of a bus station, associated with geographic data, e.g. the coordinate of this bus station.
+Simple features (see previous chapter) store attribute data in a dataframe with each column corresponding to a variable and each row to one observation, e.g., a bus station. 


2 e.g.s in quick succession! I suggest a small change:

Simple features, described in the previous chapter, store attribute data in a data frame, with each column corresponding to a variable (such as 'name') and each row to one observation (such as an individual bus station).

Thanks, incorporated that.

Robinlovelace · 2017-08-14T08:31:42Z

03-attribute-operations.Rmd

-In the context of simple features, introduced in the previous chapter, this means a data frame with a column for each variable and one row per geographic feature stored in the `geom` list-column of `sf` objects.
+Attribute data is non-spatial information, e.g., the name of a bus station, associated with geographic data, e.g. the coordinate of this bus station.
+Simple features (see previous chapter) store attribute data in a dataframe with each column corresponding to a variable and each row to one observation, e.g., a bus station. 
+In addition, a special column, mostly named `geom` or `geometry`, stores the spatial information of an **sf**-object, e.g., the coordinate of the bus station.


A third e.g.! Suggestion (we've already said that the geometry contains the coordinates but it should link to the next sentence):

In addition, a special column, usually named `geom` or `geometry`, stores the geometry data of **sf** objects. For a bus station, that would likely be a single point representing its centroid.

Ok, I will make sure to avoid using e.g. and i.e. :-). Changed that.

Robinlovelace · 2017-08-14T08:35:17Z

03-attribute-operations.Rmd

-The trusty `data.frame` (and extensions to it such as the `tibble` class used in the tidyverse) is a workhorse for data analysis in R.
-Extending this system to work with spatial data has many advantages,
-meaning that all the accumulated know-how in the R community for handling data frames to be applied to geographic data which contain attributes.
+The reliable `data.frame` (and modifications of it such as the `tibble` class used in the tidyverse) is the basis for data analysis in R.


I would say modifications to it rather than modifications of it as the class is modified by an external force (the programmer). Otherwise I think this adjustment to the text is an improvement, thanks for that.

Good point!

Robinlovelace · 2017-08-14T08:37:12Z

03-attribute-operations.Rmd

-meaning that all the accumulated know-how in the R community for handling data frames to be applied to geographic data which contain attributes.
+The reliable `data.frame` (and modifications of it such as the `tibble` class used in the tidyverse) is the basis for data analysis in R.
+Extending this system to work with spatial data has many advantages. 
+The most important one is that the accumulated know-how in the R community for handling data frames is transferable to geographic attribute data.


I would replace is transferable with can be transferred because it's still contingent on knowing how to program with data frames, hence the importance of learning about attribute data operations and reading this chapter.

Another good point! This was just my habit of avoiding the passive voice. Good example when that's no good...

Robinlovelace · 2017-08-14T08:38:48Z

03-attribute-operations.Rmd

-Before proceeding to perform various attribute operations of a dataset, it is worth taking time to think about its basic parameters.
-In this case, the  `world` object contains 10 non-geographical columns (and one geometry list-column) with data for almost 200 countries.
-This can be be checked using base R functions for working with tabular data such as `nrow()` and `ncol()`:
+Before proceeding to perform various attribute operations on a dataset, it is advisable to explore its structure.


Suggestion (more concise, informal and hopefully friendly):

Before proceeding to perform various attribute operations on a dataset, let's explore its structure.

changed that.

Robinlovelace · 2017-08-14T08:40:06Z

03-attribute-operations.Rmd


 ```{r, eval=FALSE}
-world[1:6,] # subset rows by position
+world[1:6, ] # subset rows by position


Great formatting fix. I only realised recently that this is good practice.

Glad you like it. And it's good that we agree on a consistent coding style. Though again I am always happy to adopt yours as well such as the = instead of <-. Consistency it the important thing.

jannes-m · 2017-08-14T18:14:30Z

03-attribute-operations.Rmd

@@ -554,10 +536,11 @@ world %>%
 ```

 ## Removing spatial information
+<!-- Shouln't that be part of chapter 2-->


Shouldn't removing spatial information be part of chapter 2?

Since we are dealing with attribute operations in chapter 3, removing spatial information (which is a column) is very well placed here. I made this comment when I was in a rush (not a good idea) and did so because in the back of my mind I had that there was something on the geometry column in chapter 2. But we can just put a reference there to chapter 3.

OK great, cheers for the feedback.

jannes-m · 2017-08-14T18:17:26Z

03-attribute-operations.Rmd

-A new `sf` object will be a result of these joins. 
-However, the reverse order is also possible and will result in a `data.frame` object.
+Most of the following join examples will have a `sf` object as the first argument and a `data.frame` object as the second argument which results in a new `sf` object.
+However, the reverse order is also possible and will give you back a `data.frame` object.
 This is mostly beyond the scope of this book, but we encourage you to try it.

 ### Left joins



One could think about just presenting the two most important join-types such as the inner- and left join (supported by st_join), and leave the rest as an excercise to the reader mentioning again the join chapter in the R for data science book.

Yes I think that is a good plan - the inner and left are indeed the ones I use most. I think that would be a great exercise for the reader. @Nowosad sound like a plan?

Yes. It makes perfect sense. I'm going to adjust it after the raster part in the second chapter will be completed.

jannes-m · 2017-08-14T18:18:46Z

Yes, it would be good to merge the PR to make sure that our branches do not diverge that much. I can still open a new PR when I am adding the raster stuff.

Merge remote-tracking branch 'upstream/master' into chapter_3 # Conflicts: # 03-attribute-operations.Rmd

reviewing intro and subsetting

68d3215

jannes-m commented Aug 10, 2017

View reviewed changes

jannes-m added 3 commits August 10, 2017 12:28

pipe operator

1dc2787

Merge remote-tracking branch 'upstream/master' into chapter_3

629afd5

reviewing summary subsection

223fe36

jannes-m commented Aug 12, 2017

View reviewed changes

starting with join subsection

3acca1a

Robinlovelace reviewed Aug 14, 2017

View reviewed changes

still tiny changes to the join subsection

8b34bb3

jannes-m commented Aug 14, 2017

View reviewed changes

incorporating Robin's comments and harmonizing with master

aaf764f

Merge remote-tracking branch 'upstream/master' into chapter_3 # Conflicts: # 03-attribute-operations.Rmd

Robinlovelace merged commit dc12197 into geocompx:master Aug 15, 2017

Nowosad mentioned this pull request Aug 15, 2017

Only left and inner joins in the third chapter #73

Closed

reviewing chapter_3 #70

reviewing chapter_3 #70

Conversation

jannes-m commented Aug 10, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jannes-m Aug 10, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Robinlovelace commented Aug 10, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jannes-m Aug 13, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Robinlovelace left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jannes-m Aug 14, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jannes-m commented Aug 14, 2017

jannes-m Aug 10, 2017 •

edited

jannes-m Aug 13, 2017 •

edited

jannes-m Aug 14, 2017 •

edited