# Episode 2 Contribution: Better explaining the axis parameter of numpy array functions#906

Open
opened this issue Jan 30, 2021 · 4 comments
Open

# Episode 2 Contribution: Better explaining the axis parameter of numpy array functions #906

opened this issue Jan 30, 2021 · 4 comments

### asheikh97 commented Jan 30, 2021

 Hi Team, Episode 2: Analyzing Patient Data introduces the axis parameter of array functions. Whilst teaching this section I realized that many learners begin to question the logic of the axis and its purpose. I believe adding slightly more detailed clarification would enhance the learners ability to grasp the usefulness of the axis parameter, and then apply it on their own with a few more examples. I have typed up the following explanation to go along with the existing diagram posted in this episode. Better explaining the axis parameter of numpy array functions: Every numpy 2D array has two axes we are able to operate over, namely axis 0 and axis 1. We can define this axis parameter with most array functions such as mean(), max(), min(), std(). Axis 0 allows us to operate over the values vertically, in other words iterating over each row. Axis 1 allows us to operate over the values horizontally, therefore iterating over each column. By defining the axis, we are able to work with specific subsets of our data without having to use slicing. For example, in the case of our inflammation data-set, if we wanted to compute a value for each individual patient over all days of the study, we would define the axis = 1. If we wanted to compute a value for each individual day of all the patients, as opposed to each patient over all days, we would define the axis = 0. insert diagram shown in ep2 for axis 0 & axis 1 here Let’s find the max inflammation for each patient over all days: `max_inflam_per_patient = np.max(data, axis = 1)` `print(max_inflam_per_patient)` Or we can find the min inflammation for each patient over all days: `min_inflam_per_patient = np.min(data, axis = 1)` `print(min_inflam_per_patient)` Now let’s find the average inflammation for each day of the all patients: `avg_inflam_per_day = np.mean(data, axis = 0)` `print(avg_inflam_per_day)` Or we can find the max inflammation for each day of the all patients: `max_inflam_per_day = np.max(data, axis = 0)` `print(max_inflam_per_day)` The text was updated successfully, but these errors were encountered:

### ldko commented Feb 1, 2021

 Hi @asheikh97 , Thank you for raising this issue. I have also noticed when teaching this lesson some confusion about axis 0 and axis 1 and which to specify. I could see your text suggestions being incorporated into the episode where it currently says: To support this functionality, most array functions allow us to specify the axis we want to work on. If we ask for the average across axis 0 (rows in our 2D example), we get: I could see your wording turning it into: To support this functionality, most array functions, such as mean(), max(), min(), and std(), allow us to specify the axis we want to work on. A NumPy 2D array has two axes we are able to operate over, namely axis 0 and axis 1. Specifying axis 0 allows us to operate over the values vertically, in other words iterating over each row. Axis 1 allows us to operate over the values horizontally, therefore iterating over each column. Thus, if we want to compute a value for each individual patient over all days of the study, we would specify `axis=1`. Alternatively, if we want to compute a value for each individual day over all the patients, we specify `axis=0`. If we ask for the average across axis 0, we get: We welcome you to open a PR to propose changes. Thanks again!

### ChrisHandley commented Feb 17, 2021

 I was reading over the suggestions here, and I think something that is missed here is that we need to reinforce the link between the axis index and element index - and without resorting to the ideas of "vertical" and "horizontal". The above diagram (remade for the lesson) shown next to the diagram we currently have to explain the axis. But the text should contain something like the following; We have already shown that accessing an array element relies on indices. The first index or "index 0", is the first index we can potentially iterate over. Therefore when using `axis=0` in conjunction with some operation such as `mean()` in effect means we are finding the mean for each combination of indices along the other axes, where we have effectively iterated over the values of "index 0", i.e along `axis=0`. The effect is akin to taking a slice and performing the mathematical operation upon the contents of that slice. It could also be beneficial to show axis with respect to a 3D array, displayed as a cube, and thus show how axis in effect taking slices of data through the cube-like array. Perhaps this can then be reinforced with a further 3d array such as; I quite like this last diagram because it really hammers home the idea of axis, how it relates to array index, and thus how using axis is in effect taking slices. I hope that is helpful.

### ldko commented Feb 26, 2021

 Hi @ChrisHandley , Thanks very much for providing feedback on this and taking the time to provide additional images here. Personally, I am not finding the language about a relationship between element index and axis index helping me understand how the `axis` parameter in the `numpy` functions we are using relates to our data. Due to limited time in the workshop format of this lesson, I would also be concerned about bringing in a 3d array, and also potentially see it as contributing to cognitive overload. Other opinions are welcome on the issue!

### danbromberg commented Jul 27, 2023

 I wish the cube analogy (by Chris) had a tad more detail as I didn't follow the diagrams or where the cube fit into whatever the original array was.