Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

episode 9 (vectorizing): compare with for loop output? #523

Closed
monicathieu opened this issue Jun 14, 2019 · 4 comments

Comments

Projects
None yet
3 participants
@monicathieu
Copy link

commented Jun 14, 2019

Hi Carpentries folks,

The listed key point in episode 9 about vectorizing is to

"Use vectorized operations instead of loops."

which is great! However, in the episode, there are no side-by-side examples of accomplishing the same task using both a vectorized call and a for loop. The first piece of explanatory text in the lesson plan explicitly compares vectorizing with for looping:

... without needing to loop through
and act on each element one at a time. This makes writing code more
concise, easy to read, and less error prone.

This could couple really well with a worked example showing a multi-line for loop that loops over the indices of a vector, and contrasting it with a single-line vectorized call that has equivalent output. That might serve to make very concrete that vectorized functions are often less verbose than equivalent for loops. Learners coding along would likely find that typing the vectorized call is much faster and less typo-prone than typing out the for loop, thus hitting the point home.

Since this episode comes after the control flow episode, presumably learners will have seen a for loop before, so including a for loop in the worked code hopefully wouldn't take too much time for explanation.

I know adding worked code examples to episodes is tough since they're already so info-packed, but this might be able to be substituted for the chunk explaining %*% that starts here. (People coming from other not-super-vectorized languages like Matlab probably benefit from being shown that the matrix multiplication operator is not the default in R, but this may not be a super common use case that needs a worked code example in this intro-level lesson.)

happy to discuss and maybe to try suggesting edits in a PR, but figured opening an issue was a good place to start.

thanks folks!

(PS: I am doing this as part of instructor training checkout)

@jcoliver

This comment has been minimized.

Copy link
Contributor

commented Jun 24, 2019

Thanks for this suggestion, @monicathieu . I think it raises a great point about having side-by-side comparisons accomplishing the same task via loop and via the built-in vectorization. Perhaps Challenge 1 could be revised to ask for two solutions: a for loop and a vectored approach (the former might seem a bit artificial at this point of the lesson, but would help reinforce loop syntax). But other means of accomplishing this are worth investigating.

@fjuniorr

This comment has been minimized.

Copy link
Contributor

commented Jun 27, 2019

I think a side-by-side comparison is the best way to let learners see that vectorized code is more concise, easy to read, and less error prone as claimed in the lesson.

As noted by @monicathieu, since the for-loop syntax was already covered in the control flow episode, I don't even think that we are actually adding a new concept, just making the claims about vectorized code more concrete for learners.

My suggestion would be to add something like the following right before Challenge 1.


Here is how we would achieve the same result of adding two vectors together using a for loop:

output_vector <- c()
for(i in 1:4) {
  output_vector[i] <- x[i] + y[i]
}

output_vector
[1]  7  9 11 13
identical(output_vector, x+y) # checking that the results are the same
[1] TRUE
@jcoliver

This comment has been minimized.

Copy link
Contributor

commented Jul 1, 2019

Thanks, @fjuniorr . One minor suggestion would be to skip identical, and instead assign the output of x + y to a variable, and print that one out to the screen, just as was done with output_vector. e.g.

sum_xy <- x + y

output_vector
[1]  7  9 11 13

sum_xy
[1]  7  9 11 13
@fjuniorr

This comment has been minimized.

Copy link
Contributor

commented Jul 14, 2019

@jcoliver I can see how identical can be unnecessary in this case. But since the x+y variable was just printed above in the lesson, maybe we can just print the output_vector variable to the screen.

I've submitted PR #538 with this approach. Please let me know if you and @monicathieu think it achieves the intended purpose.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.