Add blog post on for vs apply #125

Bisaloo · 2023-10-23T14:17:41Z

The post specifies a license if you don't want to use the default CC BY
All authors have an ORCID iD
Relevant keywords / tags has been added. In particular, if you want your post to be shared on R-bloggers, you must tag it with R
Images or other external resources have been committed and pushed
The post uses pure quarto syntax, rather than HTML or R code, unless necessary

Right before merging:

The date field has been updated
A PR has been opened in the blueprints to link to this post
The post has been re-rendered and content of the _freeze/ folder is up-to-date

jpavlich · 2023-10-23T14:33:45Z

@Bisaloo I agree with the arguments, which not only apply for apply, but also for most classic higher-order functions in any language with functional characteristics.

The only inconvenience I have found in other languages is that debugger support tended to be poorer than for other constructs of the language. For instance, debugging functional code in Java used to be a little harder, since the stack traces weren't that informative and the debugger sometimes had trouble diving into the lambdas/closures of the code.

Although those problemas have been solved in recent years. I wonder if something equivalent happens in R. If so, it would be important to mention it in the article.

jamesmbaazam · 2023-10-23T14:57:46Z

posts/for-vs-apply/index.qmd

+title: "Why use `apply()` instead of `for` loops?"
+subtitle: "Going beyond the debunked performance argument."


This was a really concise read and conveyed the message quite quickly.

My only comment is that the current title might undersell the article as many people might have read articles with a similar title but different content. I would entitle it as "Beyond the need for speed: Lesser-known reasons to prefer apply() functions over for loops" or something along those lines to draw out the content of the article. I was trying to play on words with "need for speed".

TimTaylor · 2023-10-23T20:00:26Z

@Bisaloo - Would it be useful to touch upon where users should be cautious with apply family? The main one I've run in to in the past is something to do with how the call gets evaluated (this is touched upon in the docs). As an example, if you start trying to play around with call type things within lapply() you could be surprised:

rincewind <- function(x) match.call()
rincewind(1L)
#> rincewind(x = 1L)
lapply(1L, rincewind)
#> [[1]]
#> FUN(x = X[[i]])
lapply(1L, function(x) rincewind(i))
#> [[1]]
#> rincewind(x = i)

This means lapply() cannot always be a drop in replacement for a for-loop without a little caution. Granted, it's not common, so not sure if you want to flag or not.

The other thing I wondered is whether it would be worth combining points (1) and (2) as they are both to do with "grokability" of the code. I could be persuaded either way here.

Small comment specific to the example in (1) - is it worth doing a different example where there's not a vectorised alternative (lengths())? I'm aware the example is to highlighting the point you are making but it's common to come across code that uses loops / apply functions when there are vectorised alternatives so I think it's best not to use an example that does this.

pratikunterwegs

Thanks @Bisaloo for writing this, it's a comprehensive but concise read.

Perhaps some points that could be added since we are on the topic in this post already:

Iterating over multiple list-like objects, as I find mapply(x_list, y_list, FUN = fun) easier to read with less visual noise than the equivalent for loop. Caveats about return types of course.
Related to above, the case of nested for loops and nested functionals - I find the latter easier to parse as well.
A bit more about using functional programming to replace difficult to read, error-prone or inefficient for loops; e.g. replacing loop based data merging with Reduce(), especially when intermediate products are also required. I find it's easy to make mistakes in indexing in such cases.
A caveat on not getting carried away looking for a 'clever' functional-based solution where a for loop would suffice - I find myself spending extra time thinking about these and recommending them to others even when the case for them is weak.

Bisaloo · 2023-10-26T09:58:46Z

Thanks all for the great comments!

@jpavlich

also for most classic higher-order functions in any language with functional characteristics.

I considered adding a note about other languages but opted not to because I'm not familiar enough to claim enough with assurance (excepted for python). Would you like to propose some text for Java?
I have added a note about python in 9cb16d4.

The only inconvenience I have found in other languages is that debugger support tended to be poorer than for other constructs of the language.

Yes, I think this is true in R, for reasons related to what @TimTaylor mentioned. This is already visible in error messages, even before using debugging features:

f <- function(x) {
  if (x == 10) {
    stop("x cannot be 10")
  }
  return(TRUE)
}

lapply(1:20, f)
#> Error in FUN(X[[i]], ...): x cannot be 10

^{Created on 2023-10-26 with reprex v2.0.2}

@jamesmbaazam

Thanks, I like that idea! I am proposing to split it into title + subtitle to keep it short. I see two options:

title: "Beyond the need for speed: `apply()` vs for loops"
subtitle: "Lesser-known reasons to prefer apply() functions over for loops"

or

title: "Lesser-known reasons to prefer `apply()` over for loops"
subtitle: "Beyond the need for speed 🏎️"

Which one is better in your opinion?

@TimTaylor

Would it be useful to touch upon where users should be cautious with apply family? The main one I've run in to in the past is something to do with how the call gets evaluated (this is touched upon in the docs).

This is a good point but I wonder if that wouldn't be too much info to add for a quite niche & advanced case. I would propose to keep it either as a footnote, or as a comment on a post after publication. How does this sound?

The other thing I wondered is whether it would be worth combining points (1) and (2) as they are both to do with "grokability" of the code. I could be persuaded either way here.

Yes, they both relate to grokability but I believe they are conceptually quite different. The first relates to the comprehension of programming concepts, while the second has to do with the practical aspect of mental load. I really want to drive the point home for both and believe they both get more importance as distinct categories.

Small comment specific to the example in (1) - is it worth doing a different example where there's not a vectorised alternative (lengths())? I'm aware the example is to highlighting the point you are making but it's common to come across code that uses loops / apply functions when there are vectorised alternatives so I think it's best not to use an example that does this.

I have split it into two examples in 46a4b53: one more realistic without a vectorized alternative, and just a quick note that lintr can suggest vectorized alternatives for some apply() calls.

@pratikunterwegs

Iterating over multiple list-like objects, as I find mapply(x_list, y_list, FUN = fun) easier to read with less visual noise than the equivalent for loop. Caveats about return types of course.

Related to above, the case of nested for loops and nested functionals - I find the latter easier to parse as well.

Thanks for the note. I have added examples in 563b4ed to illustrate better the benefit of apply() in complex cases.

A bit more about using functional programming to replace difficult to read, error-prone or inefficient for loops; e.g. replacing loop based data merging with Reduce(), especially when intermediate products are also required. I find it's easy to make mistakes in indexing in such cases.

This is a good suggestion but outside the scope of this post IMO but Reduce() is recursion. I have added a link to the advanced R book which talks about all of this in more details in 9739b3c.

A caveat on not getting carried away looking for a 'clever' functional-based solution where a for loop would suffice - I find myself spending extra time thinking about these and recommending them to others even when the case for them is weak.

I disagree with this one, especially in the case of package development. While it may indeed take slightly longer sometimes (especially if one is not used to using these tools), the improvement in maintainability over years quickly compensates this initial investment.

TimTaylor · 2023-10-26T10:13:54Z

This is a good point but I wonder if that wouldn't be too much info to add for a quite niche & advanced case. I would propose to keep it either as a footnote, or as a comment on a post after publication. How does this sound?

A footnote sounds good and could refer to the lapply() documentation. This way it's clearly acknowledged as a niche concern rather than an afterthought in comments and encourages further reading.

One other thing I was thinking about was functions that may (or are more likely to) error. This can be important for longer running functions with lapply() being all or nothing. Things like purrr::safely() and other condition handling approaches can be of use here so could be worth mentioning??

jamesmbaazam · 2023-10-26T15:22:43Z

Which one is better in your opinion?

Hmmm I'm torn. I really like the first but it seems too long when you combine the title and subtitle. The second is more concise and makes me wonder if we need the subtitle.

and use an example without vectorized alternative

Co-authored-by: James Azam <jamesmbaazam@users.noreply.github.com>

Bisaloo · 2023-11-02T15:12:09Z

One other thing I was thinking about was functions that may (or are more likely to) error. This can be important for longer running functions with lapply() being all or nothing. Things like purrr::safely() and other condition handling approaches can be of use here so could be worth mentioning??

This sounds like a great topic for a follow-up post 😉

Hmmm I'm torn. I really like the first but it seems too long when you combine the title and subtitle. The second is more concise and makes me wonder if we need the subtitle.

Thanks for the suggestions! Following your feedback, I have removed the subtitle. Puns and references are nice but in this specific case, I was afraid it would give the impression that speed is indeed a component in the equation.

jamesmbaazam approved these changes Oct 23, 2023

View reviewed changes

pratikunterwegs mentioned this pull request Oct 25, 2023

Benchmarking dependency adoption #36

Merged

pratikunterwegs reviewed Oct 25, 2023

View reviewed changes

Bisaloo force-pushed the for-vs-apply branch from 60b796e to f349d84 Compare October 26, 2023 09:59

Bisaloo and others added 12 commits November 2, 2023 13:55

First complete draft

6ef12dc

Update date

278c1a9

Update subtitle

7251a59

Fix typos

fe79e2a

Add callout on other languages

ff6179b

Split points about human readability and lintr

8cc61be

and use an example without vectorized alternative

Add link to advanced R chapter for more functionals

a7ffbbf

Add complex example with double iteration

a23282b

Fix lints

f258a32

Update title based on James' suggestion

8181ad9

Co-authored-by: James Azam <jamesmbaazam@users.noreply.github.com>

Add lintr to renv

02dbb2c

Add caveat from Tim

0f15009

Bisaloo force-pushed the for-vs-apply branch from f349d84 to 0f15009 Compare November 2, 2023 15:10

Bisaloo added 3 commits November 2, 2023 16:18

Add acknowledgements

0c943a9

Update date

dd32732

Add _freeze

8b27cbe

Bisaloo force-pushed the for-vs-apply branch from 995dcad to 8b27cbe Compare November 2, 2023 15:18

Bisaloo merged commit 9028eef into main Nov 2, 2023

Bisaloo deleted the for-vs-apply branch November 2, 2023 15:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add blog post on for vs apply #125

Add blog post on for vs apply #125

Uh oh!

Bisaloo commented Oct 23, 2023 •

edited

Loading

Uh oh!

jpavlich commented Oct 23, 2023

Uh oh!

jamesmbaazam Oct 23, 2023

Uh oh!

TimTaylor commented Oct 23, 2023 •

edited

Loading

Uh oh!

pratikunterwegs left a comment

Uh oh!

Bisaloo commented Oct 26, 2023

Uh oh!

TimTaylor commented Oct 26, 2023

Uh oh!

jamesmbaazam commented Oct 26, 2023

Uh oh!

Bisaloo commented Nov 2, 2023

Uh oh!

Uh oh!

		title: "Why use `apply()` instead of `for` loops?"
		subtitle: "Going beyond the debunked performance argument."

Add blog post on for vs apply #125

Add blog post on for vs apply #125

Uh oh!

Conversation

Bisaloo commented Oct 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jpavlich commented Oct 23, 2023

Uh oh!

jamesmbaazam Oct 23, 2023

Choose a reason for hiding this comment

Uh oh!

TimTaylor commented Oct 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pratikunterwegs left a comment

Choose a reason for hiding this comment

Uh oh!

Bisaloo commented Oct 26, 2023

@jpavlich

@jamesmbaazam

@TimTaylor

@pratikunterwegs

Uh oh!

TimTaylor commented Oct 26, 2023

Uh oh!

jamesmbaazam commented Oct 26, 2023

Uh oh!

Bisaloo commented Nov 2, 2023

Uh oh!

Uh oh!

Bisaloo commented Oct 23, 2023 •

edited

Loading

TimTaylor commented Oct 23, 2023 •

edited

Loading