New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce tidyverse earlier #365

Open
jcoliver opened this Issue Apr 4, 2018 · 12 comments

Comments

Projects
None yet
6 participants
@jcoliver
Collaborator

jcoliver commented Apr 4, 2018

Continuing from #364
Briefly: There is some discussion about introducing at least of some the material from later episodes (e.g. 13-dplyr and 14-tidyr) earlier in the schedule. A number of functions, such as read_csv() may impart a lower cognitive load for new learners. Thanks to @maurolepore for raising this point.

I am inclined to continue with base R in the early episodes, and allow tidyverse exploration for later points of the workshop, acknowledging that this means that many workshops may not even get to that material. While there is a considerable amount of functionality offered in tidyverse, and there is growing adoption of those packages, I think many novices would benefit from sticking to base R. Should a specific workshop desire a focus on more tidy-type offerings, the first lesson plan in the instructor notes provides one way to do this.

But as @naupaka says, there are other opinions. It would be great to hear them, too.

@mawds

This comment has been minimized.

Collaborator

mawds commented Apr 5, 2018

I'm torn on this issue; I think a lot of it comes down to what you expect learners to do with R. If it's "pure" data-analysis then I'd be keener on a more tidyverse based approach.

If you're expecting them to do more programming then base R makes a lot more sense (and I'd argue tidyverse could actually be a negative in that case, since learners will be more familiar with non standard evaluation than standard evaluation, which will need to be unpicked at some point).

Having said that, I teach the tidyverse on our carpentry-based R for data-analysis course here at the University of Manchester. But I do feel a little guilty about storing up problems for the learners who want to use it for more than "standard" data-analysis problems.

I found this blog (and its comments) were particularly useful when deciding between base R and tidyverse for our course: http://varianceexplained.org/r/teach-tidyverse/

I agree with @naupaka (in #364) that any shift towards a tidyverse-first approach would need a broad consensus from instructors. In my opinion which is the better approach comes down to what we expect the learners to do with R after the course.

@ateucher

This comment has been minimized.

Contributor

ateucher commented Apr 5, 2018

I think this issue is also tied up with the question of what level of experience is expected of the learners? My understanding is that originally SWC was targeted at scientists who already write quite a lot of code, but want to learn better programming techniques (iteration and flow control, small, testable, composable functions etc). If that is the case then I think the focus on base R is appropriate. However my sense is that more and more SWC is teaching R/python to complete novices... and in that case I think the tidyverse-first approach is superior.

@philippbayer

This comment has been minimized.

philippbayer commented Oct 31, 2018

FWIW, I'm planning to teach a short course later this month based on the gapminder materials, but we wanted to teach the tidyverse first.

Based on the suggestions here https://swcarpentry.github.io/r-novice-gapminder/guide/, we came up with this order:

  • 01 Introduction to R and RStudio
  • 02 Project Management With RStudio (super short)
  • 03 Seeking Help (super short - use Google essentially)
  • 04 Data Structures (only creating vectors with c() - no Tidyverse here since it's just c(), but explain what the tidyverse's basic tibble data structure is instead)
  • 05 Exploring Data Frames (“Realistic example” section onwards), 06 Subsetting Data (excluding factor, matrix and list subsetting)
    -- This is the biggest change - 05 and 06 can be replaced with 13, Dataframe Manipulation with dplyr, but also needs an intro to readr to download and read the data in the first place. A lot of what's happening in 05 and 06 is relatively 'simple' in tidyverse. 05/06 spends a bit of time on subsetting and the different ways of doing that, but in tidyverse it's a combination of select and filter.
  • 08 Creating Publication-Quality Graphics with ggplot2

Hope this helps someone?

@maurolepore

This comment has been minimized.

maurolepore commented Oct 31, 2018

@philippbayer that sounds like a great plan!

RE:

but also needs an intro to readr to download and read the data in the first place.

Have you thought of using the point-and-click interface for now?

image

@philippbayer

This comment has been minimized.

philippbayer commented Nov 1, 2018

@maurolepore Thanks for the tip, I never used that button - that will reduce some of the cognitive load!

@philippbayer

This comment has been minimized.

philippbayer commented Nov 16, 2018

I've made some changes in my fork, maybe someone else can use this:

Intro to basic R (basic maths, exp(), comparisons) - 20 min
https://github.com/philippbayer/r-novice-gapminder/blob/gh-pages/_episodes/01-rstudio-intro.md

Rstudio - 10 min
https://github.com/philippbayer/r-novice-gapminder/blob/gh-pages/_episodes/02-project-intro.md

Finding help - 10 min
https://github.com/philippbayer/r-novice-gapminder/blob/gh-pages/_episodes/03-seeking-help.md

Data structures in tidyverse (tibbles) - 1 hour?
https://github.com/philippbayer/r-novice-gapminder/blob/gh-pages/_episodes/04-data-structures-part1.md

Manipulating data structures the old school way (deleting, adding
columns etc) - 1 hour?
https://github.com/philippbayer/r-novice-gapminder/blob/gh-pages/_episodes/05-data-structures-part2.md

Effectively using tidyverse (mostly dplyr ) - 1 hour?
https://github.com/philippbayer/r-novice-gapminder/blob/gh-pages/_episodes/08-dplyr.md

Plotting in ggplot2 and connecting ggplot2 with dplyr - 30 min
https://github.com/philippbayer/r-novice-gapminder/blob/gh-pages/_episodes/13-plot-ggplot2.md

Originally, ggplot2 is lesson 8 and dplyr is lesson 13, I switched their
places. At the end of the dplyr lesson there was a 'connect ggplot2
and dplyr' piece which I've now moved over to the end of lesson 13.

@naupaka

This comment has been minimized.

Member

naupaka commented Nov 19, 2018

@philippbayer thanks for adding this here! You may also want to contribute this order and timings to the _extras/guide.md as a PR.

Relatedly, I think it is time to have a serious reconsideration of the order and flow of the lessons to catch them up to modern best practices. @jcoliver @mawds maybe for the next semi-annual release? Potentially should have the tidyverse vs base R discussion in curriculum committee.

@maurolepore

This comment has been minimized.

maurolepore commented Nov 19, 2018

@naupaka, these slides support your suggestion:
https://speakerdeck.com/minecr/let-them-eat-cake-first

@philippbayer

This comment has been minimized.

philippbayer commented Nov 19, 2018

I'm in favor of switching to tidyverse - if you look at my lesson for 04-data-structures-part1.md, I changed the data-frames to tibbles and removed matrices etc., which allowed me to mostly delete stuff that was just annoying to discuss (for example, I could delete all references to stringsAsFactors)

@mawds

This comment has been minimized.

Collaborator

mawds commented Nov 23, 2018

@naupaka - I agree. Probably also worth speaking with the curriculum committee about how this lesson fits with the other R lesson (http://swcarpentry.github.io/r-novice-inflammation)?

As the focus of the other lesson is more "regular" programming and this is more data-analysis then I think having this lesson tidyverse focussed and that base R focussed makes a lot of sense.

@mawds

This comment has been minimized.

Collaborator

mawds commented Nov 26, 2018

See also issue #442

@jcoliver

This comment has been minimized.

Collaborator

jcoliver commented Nov 26, 2018

I would be interested in seeing a plan for (1) what tidyverse functions are included and (2) where they would end up. I worry a little bit about providing an introduction to R that skips some pretty fundamental things [1]. But having some rearrangement so the amazing tools of dplyr and ggplot are covered earlier makes sense to me.

[1] Recognizing that we don't teach things just because they are "fundamental", but because they are important for early wins and later successes. For example, I think the repeated practice with data.frames is important, but I may be a Luddite.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment