Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add blog post on for vs apply #125

Merged
merged 15 commits into from Nov 2, 2023
Merged

Add blog post on for vs apply #125

merged 15 commits into from Nov 2, 2023

Conversation

Bisaloo
Copy link
Member

@Bisaloo Bisaloo commented Oct 23, 2023

Fix #23

  • The post specifies a license if you don't want to use the default CC BY
  • All authors have an ORCID iD
  • Relevant keywords / tags has been added. In particular, if you want your post to be shared on R-bloggers, you must tag it with R
  • Images or other external resources have been committed and pushed
  • The post uses pure quarto syntax, rather than HTML or R code, unless necessary

Right before merging:

  • The date field has been updated
  • A PR has been opened in the blueprints to link to this post
  • The post has been re-rendered and content of the _freeze/ folder is up-to-date

@jpavlich
Copy link
Member

@Bisaloo I agree with the arguments, which not only apply for apply, but also for most classic higher-order functions in any language with functional characteristics.

The only inconvenience I have found in other languages is that debugger support tended to be poorer than for other constructs of the language. For instance, debugging functional code in Java used to be a little harder, since the stack traces weren't that informative and the debugger sometimes had trouble diving into the lambdas/closures of the code.

Although those problemas have been solved in recent years. I wonder if something equivalent happens in R. If so, it would be important to mention it in the article.

Comment on lines 2 to 3
title: "Why use `apply()` instead of `for` loops?"
subtitle: "Going beyond the debunked performance argument."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a really concise read and conveyed the message quite quickly.

My only comment is that the current title might undersell the article as many people might have read articles with a similar title but different content. I would entitle it as "Beyond the need for speed: Lesser-known reasons to prefer apply() functions over for loops" or something along those lines to draw out the content of the article. I was trying to play on words with "need for speed".

@TimTaylor
Copy link

TimTaylor commented Oct 23, 2023

@Bisaloo - Would it be useful to touch upon where users should be cautious with apply family? The main one I've run in to in the past is something to do with how the call gets evaluated (this is touched upon in the docs). As an example, if you start trying to play around with call type things within lapply() you could be surprised:

rincewind <- function(x) match.call()
rincewind(1L)
#> rincewind(x = 1L)
lapply(1L, rincewind)
#> [[1]]
#> FUN(x = X[[i]])
lapply(1L, function(x) rincewind(i))
#> [[1]]
#> rincewind(x = i)

This means lapply() cannot always be a drop in replacement for a for-loop without a little caution. Granted, it's not common, so not sure if you want to flag or not.

The other thing I wondered is whether it would be worth combining points (1) and (2) as they are both to do with "grokability" of the code. I could be persuaded either way here.

Small comment specific to the example in (1) - is it worth doing a different example where there's not a vectorised alternative (lengths())? I'm aware the example is to highlighting the point you are making but it's common to come across code that uses loops / apply functions when there are vectorised alternatives so I think it's best not to use an example that does this.

Copy link
Member

@pratikunterwegs pratikunterwegs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Bisaloo for writing this, it's a comprehensive but concise read.

Perhaps some points that could be added since we are on the topic in this post already:

  1. Iterating over multiple list-like objects, as I find mapply(x_list, y_list, FUN = fun) easier to read with less visual noise than the equivalent for loop. Caveats about return types of course.
  2. Related to above, the case of nested for loops and nested functionals - I find the latter easier to parse as well.
  3. A bit more about using functional programming to replace difficult to read, error-prone or inefficient for loops; e.g. replacing loop based data merging with Reduce(), especially when intermediate products are also required. I find it's easy to make mistakes in indexing in such cases.
  4. A caveat on not getting carried away looking for a 'clever' functional-based solution where a for loop would suffice - I find myself spending extra time thinking about these and recommending them to others even when the case for them is weak.

@Bisaloo
Copy link
Member Author

Bisaloo commented Oct 26, 2023

Thanks all for the great comments!

@jpavlich

also for most classic higher-order functions in any language with functional characteristics.

I considered adding a note about other languages but opted not to because I'm not familiar enough to claim enough with assurance (excepted for python). Would you like to propose some text for Java?
I have added a note about python in 9cb16d4.

The only inconvenience I have found in other languages is that debugger support tended to be poorer than for other constructs of the language.

Yes, I think this is true in R, for reasons related to what @TimTaylor mentioned. This is already visible in error messages, even before using debugging features:

f <- function(x) {
  if (x == 10) {
    stop("x cannot be 10")
  }
  return(TRUE)
}

lapply(1:20, f)
#> Error in FUN(X[[i]], ...): x cannot be 10

Created on 2023-10-26 with reprex v2.0.2

@jamesmbaazam

Thanks, I like that idea! I am proposing to split it into title + subtitle to keep it short. I see two options:

title: "Beyond the need for speed: `apply()` vs for loops"
subtitle: "Lesser-known reasons to prefer apply() functions over for loops"

or

title: "Lesser-known reasons to prefer `apply()` over for loops"
subtitle: "Beyond the need for speed 🏎️"

Which one is better in your opinion?

@TimTaylor

Would it be useful to touch upon where users should be cautious with apply family? The main one I've run in to in the past is something to do with how the call gets evaluated (this is touched upon in the docs).

This is a good point but I wonder if that wouldn't be too much info to add for a quite niche & advanced case. I would propose to keep it either as a footnote, or as a comment on a post after publication. How does this sound?

The other thing I wondered is whether it would be worth combining points (1) and (2) as they are both to do with "grokability" of the code. I could be persuaded either way here.

Yes, they both relate to grokability but I believe they are conceptually quite different. The first relates to the comprehension of programming concepts, while the second has to do with the practical aspect of mental load. I really want to drive the point home for both and believe they both get more importance as distinct categories.

Small comment specific to the example in (1) - is it worth doing a different example where there's not a vectorised alternative (lengths())? I'm aware the example is to highlighting the point you are making but it's common to come across code that uses loops / apply functions when there are vectorised alternatives so I think it's best not to use an example that does this.

I have split it into two examples in 46a4b53: one more realistic without a vectorized alternative, and just a quick note that lintr can suggest vectorized alternatives for some apply() calls.

@pratikunterwegs

  1. Iterating over multiple list-like objects, as I find mapply(x_list, y_list, FUN = fun) easier to read with less visual noise than the equivalent for loop. Caveats about return types of course.
  2. Related to above, the case of nested for loops and nested functionals - I find the latter easier to parse as well.

Thanks for the note. I have added examples in 563b4ed to illustrate better the benefit of apply() in complex cases.

  1. A bit more about using functional programming to replace difficult to read, error-prone or inefficient for loops; e.g. replacing loop based data merging with Reduce(), especially when intermediate products are also required. I find it's easy to make mistakes in indexing in such cases.

This is a good suggestion but outside the scope of this post IMO but Reduce() is recursion. I have added a link to the advanced R book which talks about all of this in more details in 9739b3c.

  1. A caveat on not getting carried away looking for a 'clever' functional-based solution where a for loop would suffice - I find myself spending extra time thinking about these and recommending them to others even when the case for them is weak.

I disagree with this one, especially in the case of package development. While it may indeed take slightly longer sometimes (especially if one is not used to using these tools), the improvement in maintainability over years quickly compensates this initial investment.

@TimTaylor
Copy link

This is a good point but I wonder if that wouldn't be too much info to add for a quite niche & advanced case. I would propose to keep it either as a footnote, or as a comment on a post after publication. How does this sound?

A footnote sounds good and could refer to the lapply() documentation. This way it's clearly acknowledged as a niche concern rather than an afterthought in comments and encourages further reading.

One other thing I was thinking about was functions that may (or are more likely to) error. This can be important for longer running functions with lapply() being all or nothing. Things like purrr::safely() and other condition handling approaches can be of use here so could be worth mentioning??

@jamesmbaazam
Copy link
Member

Which one is better in your opinion?

Hmmm I'm torn. I really like the first but it seems too long when you combine the title and subtitle. The second is more concise and makes me wonder if we need the subtitle.

@Bisaloo
Copy link
Member Author

Bisaloo commented Nov 2, 2023

One other thing I was thinking about was functions that may (or are more likely to) error. This can be important for longer running functions with lapply() being all or nothing. Things like purrr::safely() and other condition handling approaches can be of use here so could be worth mentioning??

This sounds like a great topic for a follow-up post 😉

Hmmm I'm torn. I really like the first but it seems too long when you combine the title and subtitle. The second is more concise and makes me wonder if we need the subtitle.

Thanks for the suggestions! Following your feedback, I have removed the subtitle. Puns and references are nice but in this specific case, I was afraid it would give the impression that speed is indeed a component in the equation.

@Bisaloo Bisaloo merged commit 9028eef into main Nov 2, 2023
3 checks passed
@Bisaloo Bisaloo deleted the for-vs-apply branch November 2, 2023 15:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

for loops vs apply()
5 participants