Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove plyr dependency #540

Open
AmeliaMN opened this issue Jul 19, 2019 · 18 comments
Open

Remove plyr dependency #540

AmeliaMN opened this issue Jul 19, 2019 · 18 comments

Comments

@AmeliaMN
Copy link

@AmeliaMN AmeliaMN commented Jul 19, 2019

From the plyr GitHub repository: "plyr is retired: this means only changes necessary to keep it on CRAN will be made. We recommend using dplyr (for data frames) or purrr (for lists) instead."

I think it would be good practice to remove plyr from this lesson. Looking at the files in this repo, I can see that the main place plyr is shown is in 12-plyr.Rmd, with a few references in other places. The tasks in the plyr episode are similar to the ones done in 13-dplyr.Rmd, and I think the tasks in 13-dplyr.Rmd are actually simpler than the ones in 12-plyr.Rmd.

With that in mind, I propose the following changes:

  • move 13-dplyr.Rmd up to be earlier in the sequence
  • do something with 12-plyr.Rmd. I could imagine a few different possibilities:
    • remove this lesson altogether, to not have more than one episode that feels too similar
    • re-write this lesson to use dplyr, updating the tasks to a more modern syntax
    • replace this lesson with an episode on purrr

I am happy to work on a pull request that does any of those things, but I am also very new to the community (I'm submitting this issue as part of the instructor training check-out process!), so if this doesn't sound like a useful contribution, please let me know.

@jcoliver
Copy link
Contributor

@jcoliver jcoliver commented Jul 31, 2019

A good point, @AmeliaMN . This switch to dplyr is reflected in the suggested lesson plans included in the Instructor Notes. None of the plans include the plyr episode. I'm not sure that wholesale removal of the lesson is necessarily the direction to take, but updating the dplyr lesson with critical elements of the plyr lesson could be a useful endeavor. @naupaka ?

@naupaka
Copy link
Member

@naupaka naupaka commented Aug 2, 2019

This is a great issue submission and some good ideas about how to resolve it. Thanks, @AmeliaMN!

I would love to see option 3, replacing the plyr lesson with a purrr lesson. That would be a really nice contribution, and something that is currently not available in SWC or DC lessons as far as I know. I agree it is time to retire the plyr lesson. Perhaps there is a pasture where we can let it roam free until the end of its days?

@jcoliver what say you?

@jcoliver
Copy link
Contributor

@jcoliver jcoliver commented Aug 2, 2019

A purrr lesson would be great (really, a "functional programming with purrr" lesson would be great).

A logistical question then: can we replace episode 12 with a purrr episode without breaking links to the plyr lesson (i.e. http://swcarpentry.github.io/r-novice-gapminder/12-plyr/index.html), orphaned as they may be?

Additionally, given the evolution of packages, future developments may warrant that episodes are problem-based rather than package-based. Who knows, maybe in 4 years we'll all be using plot.ly for publication-quality graphics...

@jpiaskowski
Copy link
Contributor

@jpiaskowski jpiaskowski commented Aug 12, 2019

I agree with the purrr suggestion (replace plyr with a purrr lessson). The downside to keeping the plyr lesson in gapminder lessson set currently is that when people use this lesson set for their SWC checkout, they have to be prepared to teach the plyr section, as well.

@AmeliaMN
Copy link
Author

@AmeliaMN AmeliaMN commented Aug 12, 2019

@jpiaskowski I'll admit I was a little terrified of the plyr episode being chosen for my checkout! Although, I think I could do 5 minutes of the lesson reasonably smoothly.

@jpiaskowski
Copy link
Contributor

@jpiaskowski jpiaskowski commented Aug 12, 2019

Challenge 5 of section 1 asks learners to install plyr - another area maybe we should change to "dplyr", although this is only an exercise in how to install packages.

@jcoliver
Copy link
Contributor

@jcoliver jcoliver commented Aug 14, 2019

Great points @jpiaskowski and @AmeliaMN . @naupaka and I are working on a strategy for retiring the plyr materials without destroying them entirely. I've submitted a PR to the training materials repo to indicate instructors in training should not be asked to teach the plyr lesson.

@kelseygonzalez
Copy link

@kelseygonzalez kelseygonzalez commented Sep 17, 2020

I was looking through the r-novice-gapminder lesson and also remarked at the abnormality of plyr still being included in the lessons. Glad to see that @AmeliaMN already brought this issue up! I agree that the logical transition would be to make 12. Splitting and Combining Data Frames with plyr into a 12. functional programming with purrr lesson.

I look forward to having this lesson available in the future, since it's a great tool but quite hard to teach. A great place to start the lesson development would be the R4DS chapters 21.5 and 21.7.

@jcoliver
Copy link
Contributor

@jcoliver jcoliver commented Sep 17, 2020

@kelseygonzalez indeed. Note that because there is so much material in these lessons, we have three different recommended combinations of episodes (http://swcarpentry.github.io/r-novice-gapminder/guide/) and none of them include the plyr lesson. It's been a year, so I suppose now would be a good time to get serious about plyr's retirement. :D

@blongworth
Copy link
Contributor

@blongworth blongworth commented Oct 19, 2020

Is a functional programming lesson being worked on somewhere? I agree that this would be a great addition. I very much like the approach that R4DS takes to this: start from looping, move through *apply(), and finish with purrr.

@jcoliver
Copy link
Contributor

@jcoliver jcoliver commented Oct 19, 2020

Not yet, @blongworth . We are working on the logistics of retiring the plyr lesson, but would be keen on a purrr replacement.

@mlell
Copy link
Contributor

@mlell mlell commented Nov 6, 2020

While thinking about how to structure a possible lesson about purrr, the new features of dplyr 1.0.0 kept crossing my mind. Most importantly,

  • rowwise() is no longer discouraged, albeit it's use is not with do() anymore, but with ...
  • standard verbs, most importantly summarise(), that also can handle tibbles and vectors > 1 as results.

At least for me, this eliminates the main use for purrr, that was to create models inside a map call, like in this R4DS chapter:

by_country <- by_country %>% 
  mutate(model = map(data, country_model))

I think, that the new features make the following more straight-forward:

by_country <- by_country %>%
  rowwise() %>%
  mutate(model = list(country_model(data))

or even directly returning the model results as data frame:

by_country <- by_country %>%
  rowwise() %>%
  summarise(broom:tidy(country_model(data))

... so I have the feeling that tabular data structures might not be a good example of purrr usage anymore. What do you think about that? If you agree, do you have any idea which data set to use to show functional programming with purrr?

P.S. on the other hand, purrr::map, lapply and so on are still very useful when parallelizing (-> future.apply, ->foreach), though I'm not sure whether this is in the scope of this tutorial......

@jcoliver
Copy link
Contributor

@jcoliver jcoliver commented Nov 9, 2020

These are great points, @mlell . I think the place to start would be with a standard reverse instructional design: what do we want learners to be able to do at the end of the episode? We want to avoid teaching purrr for the sake of teaching purrr, so clearly defined learning objectives should guide the episode development process (as well as what data to use).

@blongworth
Copy link
Contributor

@blongworth blongworth commented Nov 10, 2020

In keeping with focussing on lesson goals/concepts rather than tools, one approach would be to include functional programming in the lesson on looping and add some of the dplyr 1.0 functional stuff to the dplyr chapter.

It would also be useful to tie looping, functional programming, and dataframe operations together with a thread on "behind the scenes" looping across various lessons. It's debatable whether novices need to know that any vectorized operation has looping happening down in some C code somewhere, but for some, knowing how this stuff works and how it's connected is useful for learning.

@jcoliver
Copy link
Contributor

@jcoliver jcoliver commented Nov 10, 2020

For functional programming lesson(s), the material at https://github.com/dlab-berkeley/R-functional-programming provides some nice ideas (kudos to @kelseygonzalez for pointing out this resource).

zkamvar added a commit that referenced this issue Mar 9, 2021
Co-authored-by: Zhian N. Kamvar <zkamvar@gmail.com>
@StoianAndrei
Copy link

@StoianAndrei StoianAndrei commented Apr 17, 2021

Hello everyone, I am new to the Carpentry community, and from what I have seen so far, I love it. Thank you. Now I would like as a check-out to give purrr a go tonight and tomorrow. I believe that purrr is a must as most of the things in life come in lists. I took down the repo and would try and see how to incorporate some Jenny C lego lessons into it. Or what do you think of Hadley's pepper jar example? Anyhow here goes nothing. Talk soon. Have a superb day.

@blongworth
Copy link
Contributor

@blongworth blongworth commented Apr 19, 2021

Hi Andrei, I'm partial to the pepper jar, but Jenny's stuff is great too. I really like the way that the purrr chapter in R4DS starts with iteration using loops and builds from there. Especially for novices, this would be a good way to segue into mapping functions, and I think the lesson that it's OK to use for loops is an important one that's sometimes lost with R. If you're working in a fork and could use a hand, let me know.

@StoianAndrei
Copy link

@StoianAndrei StoianAndrei commented Apr 19, 2021

@blongworth Thank you. Yes I took a fork over to my account and now trying to see how to introduce functional programing so that then we use your_list <- purrr::map(.x = list_of_items, .f = ~your_function_that_is_to_be_applied_to_the_list_of_items( items_to_apply_on = .x ). This combined with other things like nesting and such will make your head hurt but once grasped your work is so much more organized, optimized, all of the zed's. I was thinking of for this intro lesson just to say you have your input and you know, or have an ideea how your output should look. The bit in the middle that is where your function lies. That is what you will build. I will try a first pass this week and submit to be tendered nicely:) I love this type of collaboration so yes please.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
8 participants