Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

02-numpy: clarify numpy.diff() part #777

Merged
merged 3 commits into from
Jan 31, 2020
Merged

02-numpy: clarify numpy.diff() part #777

merged 3 commits into from
Jan 31, 2020

Conversation

elcorto
Copy link
Contributor

@elcorto elcorto commented Jan 24, 2020

Using a variable called npdiff to hold an array is highly confusing,
given that the function to be learned here is called {numpy,np}.diff().

Also, stress that the array returned by diff() is shorter by one.

Using a variable called `npdiff` to hold an array is highly confusing,
given that the function to be learned here is called {numpy,np}.diff().

Also, stress that the array returned by `diff()` is shorter by one.
@maxim-belkin
Copy link
Contributor

Thanks, Steve. I agree with your reasoning and like the changes you make to the text. I wonder, though, if a is a good name for a variable. Shall we use a full word like array or something else? My co-maintainers, Anne and Lauren, might have some ideas.

Anyway, thanks for the pull request!

@ldko
Copy link
Contributor

ldko commented Jan 24, 2020

Hi Steve, I agree with Maxim--I like the clarifications your edits contribute to the text but would prefer to avoid the short variable name a and try for something slightly more meaningful (realizing we are giving a generic example which makes it difficult to be very specific). I would, however, also prefer to avoid calling the variable array because of numpy having an array function and the output in the exercise appearing as array([2, 3, 4, 5]) which could confuse some learners into making a connection between the variable name array and the function. I would be okay with something like values or data or value_array etc.

Rename variable `a` -> `row_start` and add text relating that to the
`data` array encountered earlier.

The text part about "patient data is _longitudinal_" is redundant since
that answers the question posed later regarding the usage of the `axis`
keyword. It intoduces the usage of `axis`, while the following intro to
diff() first uses a 1d array.
@elcorto
Copy link
Contributor Author

elcorto commented Jan 29, 2020

Valid points regarding a as variable name. I was thinking in numpy docstring terms where short variable names are often used, but of course in this context another naming is needed. I just pushed an update.

@maxim-belkin
Copy link
Contributor

Thanks, Steve. I had the same concern as Lauren so I'm glad you reverted that change.

In any case, a question to both of you: what if we name our variable patient3_week1 (or something like that)? I know this is not the best name for a variable but... it makes it easy to understand what information the variable contains.

@elcorto
Copy link
Contributor Author

elcorto commented Jan 29, 2020

I didn't do that since the original example uses an artificial 1d array of length 5 (enough for explaining what diff() does) instead of pulling a row (length 40) from the data array read in earlier in the episode, probably by design. My compromise was adding row_start = data[i,:5] in the text to make a connection from the following 1d example to the 2d array data, which wasn't present before.

@ldko
Copy link
Contributor

ldko commented Jan 29, 2020

I am ok with using row_start, since the data doesn't actually map to a patient but hints at the possibility.

@maxim-belkin
Copy link
Contributor

maxim-belkin commented Jan 30, 2020

row_start is OK... What I was trying to say is that this part of the lesson is currently independent of the lesson and I suggest to blend it in a little bit. For example, we can say that to see if there are any peculiarities in the behavior of inflammation of a particular patient over time, we can use numpy.diff() function. But we can certainly do it in a separate PR.

@ldko
Copy link
Contributor

ldko commented Jan 30, 2020

If @elcorto wants to modify the exercise to reflect the actual patient data and further blend the exercise into the lesson as you suggest @maxim-belkin, I think that would be great, otherwise, I think it is ok to merge as-is and further blend it in a separate PR.

@maxim-belkin
Copy link
Contributor

Agreed.

@elcorto
Copy link
Contributor Author

elcorto commented Jan 31, 2020

Unfortunately, I won't have time to improve this episode more for now in the way you outlined. However, I found other issues that I hope to fix in later PRs.

As for this PR, I don't have any pending changes to add.

@ldko ldko merged commit b69e0cc into swcarpentry:gh-pages Jan 31, 2020
@elcorto elcorto deleted the feature-02-numpy-clarify-diff branch January 31, 2020 23:19
zkamvar pushed a commit that referenced this pull request Apr 21, 2023
* 02-numpy: clarify numpy.diff() part

Using a variable called `npdiff` to hold an array is highly confusing,
given that the function to be learned here is called {numpy,np}.diff().

Also, stress that the array returned by `diff()` is shorter by one.

* 02-numpy: diff(): rename variable, streamline text

Rename variable `a` -> `row_start` and add text relating that to the
`data` array encountered earlier.

The text part about "patient data is _longitudinal_" is redundant since
that answers the question posed later regarding the usage of the `axis`
keyword. It intoduces the usage of `axis`, while the following intro to
diff() first uses a 1d array.

* 02-numpy: diff(): Re-add introduction lines
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants