Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider dplyr::count? #2

Closed
apreshill opened this issue Sep 22, 2020 · 5 comments
Closed

Consider dplyr::count? #2

apreshill opened this issue Sep 22, 2020 · 5 comments
Assignees
Labels
enhancement New feature or request

Comments

@apreshill
Copy link

Hi @spcanelon!

I love these slides so much, and thank you so much for your wonderful treatment of our dear penguins 🐧

As I was working through your slides, I noticed a great opportunity for teaching dplyr::count() and wanted to suggest it to you as a possible addition for your slides on data wrangling. Specifically, where you show:

penguins %>% 
  group_by(species, sex) %>%
  summarize(count = n())

You could instead show:

penguins %>% 
  count(species, sex)

Then to show the group/summarize pattern, you could group by species/sex then use min(flipper_length) for example, or max/mean/median, etc. instead of n().

For mutate, you currently have this:

penguins %>% 
  group_by(species) %>%
  mutate(count_species = n()) %>%
  ungroup() %>%
  group_by(species, sex, count_species) %>%
  summarize(count = n()) %>%
  mutate(prop = count/count_species*100)

which would be another great opportunity for using count() to simplify, using add_count() with the wt argument!

penguins %>% 
  count(species, sex) %>%
  add_count(species, wt = n) %>%
  mutate(prop = n/nn*100)

Anyway, hope you don't mind the suggestion. I think count() and friends can help make series of pipes like this more readable. But again, great job and thank you for sharing 🤩
Alison

@spcanelon
Copy link
Owner

Hi @apreshill ! Thank you for your kind and encouraging words.

I love these suggestions, thank you for taking the time to write them up!

I absolutely agree that count() and friends can help make the code more readable -- it's very satifsying to see the pipeline collapse to half the size!

Reading these thoughtful suggestions was a nice learning opportunity for me and I'm sure others would appreciate having more options as well. I'll be updating the tutorial and the slides to include these additions! If it sounds good to you I'll link back to you and this issue in both 😃

@spcanelon spcanelon self-assigned this Sep 22, 2020
@spcanelon spcanelon added the enhancement New feature or request label Sep 22, 2020
@apreshill
Copy link
Author

Yay- thanks for being so open to this idea! I'm excited to see v2 🥰

For what it's worth, I typically teach count after group/summarize, with a transition like "many times you just want to count things, like the number of rows per group like species. This is so common there is a special function for it." I also suggest learners think of it as count_by(), which I wish had been the function name 😉 , since it is counting the number of rows BY some other variable.

@apreshill
Copy link
Author

Ooh also (sorry I'm excited!), they added a nice name argument awhile back, so you can do:

penguins %>% 
     count(species, sex) %>%
     add_count(species, wt = n, name = "denom") %>%
     mutate(prop = n/denom*100)

For example. Anyway, thanks for letting me nerd out on counting all the things with you! 🤓

@spcanelon
Copy link
Owner

Haha, I'm here for it 🤓

Thanks for that teaching tip, I think your transition is a nice way to highlight how count() is helpful to simply your process in a subset of cases. And heavy +1 for count_by() 😄, I find it more intuitive.

Bonus points for sharing the name argument! That's neat and useful.

I'm so used to using the group_by() + summarize() combo because I'm usually using n_distinct() to count two different kinds of observations within a same grouping. In my case I'll count distinct patient identifiers to obtain number of patients, and also distinct pregnancy identifiers to obtain number of deliveries. All that to say, this conversation has been such a great reminder to step out of my routine!

@spcanelon
Copy link
Owner

Done!

  • Updated slides can be viewed here, and corresponding parent and child docs can be found here
  • An updated tutorial Rmd file can be found here

Thanks again for the suggestions and for giving me the opportunity to make my first reference to an issue 🚀

spcanelon pushed a commit that referenced this issue Sep 23, 2020
spcanelon pushed a commit that referenced this issue Sep 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants