Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S4SS - Statistics for Soil Survey Part 1 Revisions 2021 #23

Open
3 of 8 tasks
brownag opened this issue Jan 21, 2021 · 5 comments
Open
3 of 8 tasks

S4SS - Statistics for Soil Survey Part 1 Revisions 2021 #23

brownag opened this issue Jan 21, 2021 · 5 comments

Comments

@brownag
Copy link
Member

brownag commented Jan 21, 2021

2020/01/26 TODOs and suggestions from @smroecker

Core R data manipulation functional themes:

  • loading/fetching data
  • filtering
  • transforming
  • aggregating
  • iterating

TODO

Top priorities:

  • 1. Move mapView and sf examples to Spatial Chapter
  • 2. simple example comparing the diagnostic slot to information parsed from the horizon slot

Demo reproducible examples for functions:

  • "filter" (subset)
  • "transform" (mutate, slice, segment, spc2mpspline)
  • "aggregate" (slab)
  • "iterate" (profileApply)
  • soilDB
    • get_extended_data_from_NASIS_db, get_vegplot_from_NASIS_db
    • fetch functions: fetchOSD, fetchSDA, fetchNASISWebReport, fetchHenry
  • New taxonomic functions
    • (maybe; these interfaces are a bit fluid right now, but students are typically very interested in taxonomic information...)


The changes described below are from the original PR: #21

Re-organization of data chapter, and Part 1 chapters, into bookdown format. Most of the changes pertain to separation of portions of the data chapter out and moving them elsewhere, or a little bit vice versa.

This is the proposed order of sections:

Precourse, Intro, Data, EDA, Spatial, Sampling

Intro

Mostly same content.

There is currently no exposition on the materials in the "appendix" for the Data chapter -- but we need to spend some time focusing on that type of material. I would think it could be part of chapter 1 as a more of a "basic R syntax and concepts" section

Data

"Data" contains the same essential elements/content; really motivating the discussion around pedon data specifically. I would like to see some more references to ecosite data in this chapter eventually.

The examples code still includes exercises like simple plots of point locations, but no detailed exposition on spatial data types. That now comes after EDA. I leave some stubs to allude to future Spatial sections ( and also EDA with dplyr ?) but my thought is we do not get too prescriptive -- just say that there are many ways of doing these things once you have access to the data in a data.frame.

Data now features Soil Reports at the end (previously end of EDA). My thought is these are, or can be, fun exercise that motivate the need for understanding distributions, descriptive statistics etc. and maybe gets people "excited" about what they can do with existing R tools.

EDA

EDA is mostly unchanged except for moving the Soil Reports stuff into Data.

I think we can have them running reports and looking at the output before we really get into the details of the stats. It is nice to have a hard example of something in front of you when learning something new -- that way they can really get a jump start thinking about how they can apply it to their own data / final project etc.

Spatial

Spatial data after that. Emphasize the data.frame skills covered in Precourse/external AgLearn courses/Intro/Data by focusing on sf data.frame objects first. Then cover their interop/conversion to sp objects. This will prepare the students better for sf stuff they will encounter in the wild as sf is the only interface to many packages. They still need essential sp context, links and demos that they will need for examples, existing code etc. but sf is a subclass of data.frame so that sticks with some central themes for the class

Interactive maps, along with the new exactextractr example are our shiny R spatial examples, both featuring sf interfaces. Spatial chapter now really tries to draw parallels to data.frames, and between the methods used for reading/writing, setting CRS etc. across sf, sp and raster objects.

Sampling

Finally, the sampling chapter has examples of using sp objects for spatial sampling. I think this section could be enhanced significantly as a resource, not so much as something covered in detail in class. I would like to provide subsections so we have identical sf st_sample and sp spSample examples to draw parallels. We have the sampling presentation and other materials, so can spend as much time on applying the code examples as is interesting to the group -- but the thought is this chapter should mostly be a self-contained set of reference examples of different sampling strategies applied to simple, but realistic, data. I consider the specific details in this chapter to be more like an end matter for Part 1, something that fits well after discussing details of data, describing data, and how to describe data in space.

@brownag
Copy link
Member Author

brownag commented Jan 23, 2021

The root directory "README.md" file has been updated.

This will replace the index used in /chapters to direct to the various course materials.

http://ncss-tech.github.io/stats_for_soil_survey/

Now the markdown document is knitted from a "README.Rmd" file. It has basic links to the chapters, and the presentations and exercises, following my proposed layout.

I opted for bigger text and fewer links. I still would like to pre-generate a set of .R files for students to use in the class using purl like last year but perhaps we can deliver it as a single ZIP file rather than many links with the chapters. Thoughts?

@brownag
Copy link
Member Author

brownag commented Jan 26, 2021

Updated main post with suggestions from @smroecker

@brownag
Copy link
Member Author

brownag commented Jan 28, 2021

Bookdown "tip" from re-learning some things after @dylanbeaudette questions about keeping links intact

Can control HTML file output names and other tags using

# Header Text {#shortname}

So, for instance

# Introduction to R and RStudio {#intro}

Creates "intro.html". Otherwise it would be "introduction-to-r-and-rstudio.html"

And subsections can be relatively linked e.g. as

[Chapter 1 - Course Overview](intro.html#course-overview)

where "course-overview" is the autogenerated tag (for ## Course Overview), and {#foo} would over-ride

@brownag
Copy link
Member Author

brownag commented Feb 10, 2021

Syntax for unlisted/unnumbered elements in table of contents: https://bookdown.org/yihui/rmarkdown-cookbook/toc-unlisted.html -- requires at least Pandoc 2.10. This could probably be used in several places of Part 1 book in hindsight.

Dylan reminded me to post a few things that can happen when converting old RMarkdown to bookdown-style chapters -- for anyone's information who may be working on Part 2 or new stuff for Part 1:

  • Header levels generally need +1 # added to get TOC to work right -- the level 1 header is essentially the whole book
  • Restart your R session / build book in a fresh session. There is a lot more going on in the background, and weird things can happen if you do not re-start your R session regularly.
  • I would suggest using a RStudio Project with .Rproj in e.g. /newbook folder used specifically for building the book. Note this may be different from a general purpose project used for the /stats_for_soil_survey root directory

@brownag
Copy link
Member Author

brownag commented Feb 10, 2021

AH, and the big one -- you cannot re-use header names or the Table of Contents will (likely) not link them correctly. This appears to need fixing since everyones most recent edits to Part 1. I had previously corrected any duplicates.

For instance, if you use the header ### References in chapter 1 and chapter 3, when you click "References" in the chapter 3 section of the table of contents it will direct you to the wrong tag (e.g. intro.html#references not eda.html#references). The dumb fix for this is to add something that distinguishes the header such that they are unique e.g. ### References (Exploratory Data Analysis) becomes eda.html#references-exploratory-data-analysis as a link.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant