Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove plyr dependency #540

Closed
AmeliaMN opened this issue Jul 19, 2019 · 22 comments · Fixed by #885
Closed

Remove plyr dependency #540

AmeliaMN opened this issue Jul 19, 2019 · 22 comments · Fixed by #885
Labels
status:need more info More information needed type:discussion Discussion or feedback about the lesson

Comments

@AmeliaMN
Copy link

From the plyr GitHub repository: "plyr is retired: this means only changes necessary to keep it on CRAN will be made. We recommend using dplyr (for data frames) or purrr (for lists) instead."

I think it would be good practice to remove plyr from this lesson. Looking at the files in this repo, I can see that the main place plyr is shown is in 12-plyr.Rmd, with a few references in other places. The tasks in the plyr episode are similar to the ones done in 13-dplyr.Rmd, and I think the tasks in 13-dplyr.Rmd are actually simpler than the ones in 12-plyr.Rmd.

With that in mind, I propose the following changes:

  • move 13-dplyr.Rmd up to be earlier in the sequence
  • do something with 12-plyr.Rmd. I could imagine a few different possibilities:
    • remove this lesson altogether, to not have more than one episode that feels too similar
    • re-write this lesson to use dplyr, updating the tasks to a more modern syntax
    • replace this lesson with an episode on purrr

I am happy to work on a pull request that does any of those things, but I am also very new to the community (I'm submitting this issue as part of the instructor training check-out process!), so if this doesn't sound like a useful contribution, please let me know.

@jcoliver
Copy link
Contributor

A good point, @AmeliaMN . This switch to dplyr is reflected in the suggested lesson plans included in the Instructor Notes. None of the plans include the plyr episode. I'm not sure that wholesale removal of the lesson is necessarily the direction to take, but updating the dplyr lesson with critical elements of the plyr lesson could be a useful endeavor. @naupaka ?

@jcoliver jcoliver added status:need more info More information needed type:discussion Discussion or feedback about the lesson labels Jul 31, 2019
@naupaka
Copy link
Member

naupaka commented Aug 2, 2019

This is a great issue submission and some good ideas about how to resolve it. Thanks, @AmeliaMN!

I would love to see option 3, replacing the plyr lesson with a purrr lesson. That would be a really nice contribution, and something that is currently not available in SWC or DC lessons as far as I know. I agree it is time to retire the plyr lesson. Perhaps there is a pasture where we can let it roam free until the end of its days?

@jcoliver what say you?

@jcoliver
Copy link
Contributor

jcoliver commented Aug 2, 2019

A purrr lesson would be great (really, a "functional programming with purrr" lesson would be great).

A logistical question then: can we replace episode 12 with a purrr episode without breaking links to the plyr lesson (i.e. http://swcarpentry.github.io/r-novice-gapminder/12-plyr/index.html), orphaned as they may be?

Additionally, given the evolution of packages, future developments may warrant that episodes are problem-based rather than package-based. Who knows, maybe in 4 years we'll all be using plot.ly for publication-quality graphics...

@jpiaskowski
Copy link
Contributor

I agree with the purrr suggestion (replace plyr with a purrr lessson). The downside to keeping the plyr lesson in gapminder lessson set currently is that when people use this lesson set for their SWC checkout, they have to be prepared to teach the plyr section, as well.

@AmeliaMN
Copy link
Author

@jpiaskowski I'll admit I was a little terrified of the plyr episode being chosen for my checkout! Although, I think I could do 5 minutes of the lesson reasonably smoothly.

@jpiaskowski
Copy link
Contributor

Challenge 5 of section 1 asks learners to install plyr - another area maybe we should change to "dplyr", although this is only an exercise in how to install packages.

@jcoliver
Copy link
Contributor

Great points @jpiaskowski and @AmeliaMN . @naupaka and I are working on a strategy for retiring the plyr materials without destroying them entirely. I've submitted a PR to the training materials repo to indicate instructors in training should not be asked to teach the plyr lesson.

@kelseygonzalez
Copy link

I was looking through the r-novice-gapminder lesson and also remarked at the abnormality of plyr still being included in the lessons. Glad to see that @AmeliaMN already brought this issue up! I agree that the logical transition would be to make 12. Splitting and Combining Data Frames with plyr into a 12. functional programming with purrr lesson.

I look forward to having this lesson available in the future, since it's a great tool but quite hard to teach. A great place to start the lesson development would be the R4DS chapters 21.5 and 21.7.

@jcoliver
Copy link
Contributor

@kelseygonzalez indeed. Note that because there is so much material in these lessons, we have three different recommended combinations of episodes (http://swcarpentry.github.io/r-novice-gapminder/guide/) and none of them include the plyr lesson. It's been a year, so I suppose now would be a good time to get serious about plyr's retirement. :D

@blongworth
Copy link
Contributor

Is a functional programming lesson being worked on somewhere? I agree that this would be a great addition. I very much like the approach that R4DS takes to this: start from looping, move through *apply(), and finish with purrr.

@jcoliver
Copy link
Contributor

Not yet, @blongworth . We are working on the logistics of retiring the plyr lesson, but would be keen on a purrr replacement.

@mlell
Copy link
Contributor

mlell commented Nov 6, 2020

While thinking about how to structure a possible lesson about purrr, the new features of dplyr 1.0.0 kept crossing my mind. Most importantly,

  • rowwise() is no longer discouraged, albeit it's use is not with do() anymore, but with ...
  • standard verbs, most importantly summarise(), that also can handle tibbles and vectors > 1 as results.

At least for me, this eliminates the main use for purrr, that was to create models inside a map call, like in this R4DS chapter:

by_country <- by_country %>% 
  mutate(model = map(data, country_model))

I think, that the new features make the following more straight-forward:

by_country <- by_country %>%
  rowwise() %>%
  mutate(model = list(country_model(data))

or even directly returning the model results as data frame:

by_country <- by_country %>%
  rowwise() %>%
  summarise(broom:tidy(country_model(data))

... so I have the feeling that tabular data structures might not be a good example of purrr usage anymore. What do you think about that? If you agree, do you have any idea which data set to use to show functional programming with purrr?

P.S. on the other hand, purrr::map, lapply and so on are still very useful when parallelizing (-> future.apply, ->foreach), though I'm not sure whether this is in the scope of this tutorial......

@jcoliver
Copy link
Contributor

jcoliver commented Nov 9, 2020

These are great points, @mlell . I think the place to start would be with a standard reverse instructional design: what do we want learners to be able to do at the end of the episode? We want to avoid teaching purrr for the sake of teaching purrr, so clearly defined learning objectives should guide the episode development process (as well as what data to use).

@blongworth
Copy link
Contributor

In keeping with focussing on lesson goals/concepts rather than tools, one approach would be to include functional programming in the lesson on looping and add some of the dplyr 1.0 functional stuff to the dplyr chapter.

It would also be useful to tie looping, functional programming, and dataframe operations together with a thread on "behind the scenes" looping across various lessons. It's debatable whether novices need to know that any vectorized operation has looping happening down in some C code somewhere, but for some, knowing how this stuff works and how it's connected is useful for learning.

@jcoliver
Copy link
Contributor

For functional programming lesson(s), the material at https://github.com/dlab-berkeley/R-functional-programming provides some nice ideas (kudos to @kelseygonzalez for pointing out this resource).

zkamvar added a commit that referenced this issue Mar 9, 2021
Co-authored-by: Zhian N. Kamvar <zkamvar@gmail.com>
@StoianAndrei
Copy link

Hello everyone, I am new to the Carpentry community, and from what I have seen so far, I love it. Thank you. Now I would like as a check-out to give purrr a go tonight and tomorrow. I believe that purrr is a must as most of the things in life come in lists. I took down the repo and would try and see how to incorporate some Jenny C lego lessons into it. Or what do you think of Hadley's pepper jar example? Anyhow here goes nothing. Talk soon. Have a superb day.

@blongworth
Copy link
Contributor

Hi Andrei, I'm partial to the pepper jar, but Jenny's stuff is great too. I really like the way that the purrr chapter in R4DS starts with iteration using loops and builds from there. Especially for novices, this would be a good way to segue into mapping functions, and I think the lesson that it's OK to use for loops is an important one that's sometimes lost with R. If you're working in a fork and could use a hand, let me know.

@StoianAndrei
Copy link

@blongworth Thank you. Yes I took a fork over to my account and now trying to see how to introduce functional programing so that then we use your_list <- purrr::map(.x = list_of_items, .f = ~your_function_that_is_to_be_applied_to_the_list_of_items( items_to_apply_on = .x ). This combined with other things like nesting and such will make your head hurt but once grasped your work is so much more organized, optimized, all of the zed's. I was thinking of for this intro lesson just to say you have your input and you know, or have an ideea how your output should look. The bit in the middle that is where your function lies. That is what you will build. I will try a first pass this week and submit to be tendered nicely:) I love this type of collaboration so yes please.

@mlell
Copy link
Contributor

mlell commented Nov 22, 2021

Is a functional programming lesson being worked on somewhere?

@blongworth @StoianAndrei I just saw this R lesson in the making which takes a dplyr-first approach and also introduces purrr.

https://carpentries-incubator.github.io/open-science-with-r/07-making-functions-r/index.html

@StoianAndrei
Copy link

Good morning all, that lesson look so good. I did submit a lesson some time ago and it must have gotten lost. I never did follow up from there. (https://carpentries-incubator.github.io/open-science-with-r/07-making-functions-r/index.html) But this one it does it justice. It is so good. I just followed how the plyr lesson was built. I can give you a copy. Let me know if you want to follow up. Thank you for picking the topic up. Real life for me got really busy. Best Regards.

@tobyhodges
Copy link
Member

Bumping this old thread, as it's now five years since plyr was retired: was a consensus ever reached about whether this episode could be removed and, if not, what could be used to replace plyr to teach the relevant skills/concepts?

@milanmlft
Copy link
Contributor

Bumping this old thread, as it's now five years since plyr was retired: was a consensus ever reached about whether this episode could be removed and, if not, what could be used to replace plyr to teach the relevant skills/concepts?

Ran into this today while reviewing the material for a workshop. I think the plyr episode can simply be removed, as the concepts it discusses are covered by the group_by()/summarise() examples in the dplyr episode.

Also, I think the title "Splitting and Combining Data Frames" of the plyr episode is misleading, because the episode is actually about calculating stratified summary statistics on a data set. I assume the reference to splitting and combining comes from the "split-appy-combine" problem, but might be confuses with actual splitting (as in separating) and combining (as in uniting) dataframes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status:need more info More information needed type:discussion Discussion or feedback about the lesson
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants