<a href="https://colab.research.google.com/github/gauravml/NUMPY/blob/main/NumpyArithmetic1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Arithmetic on Numpy

## Prep

Let's recall how to access data in lists. For that we will leverage the dtaset for this course, whch is the results of an actual beheivoural experiment conducted by Universidad de la Matanza (UNLaM)

In [1]:
%%writefile get_data.sh
if [ ! -f dataset.csv ]; then
  wget -O dataset.csv https://www.dropbox.com/s/9t5lc04vxwvjvo6/dataset.csv?dl=0
fi

Writing get_data.sh


In [2]:
!bash get_data.sh


--2025-05-06 21:40:20--  https://www.dropbox.com/s/9t5lc04vxwvjvo6/dataset.csv?dl=0
Resolving www.dropbox.com (www.dropbox.com)... 162.125.65.18, 2620:100:6021:18::a27d:4112
Connecting to www.dropbox.com (www.dropbox.com)|162.125.65.18|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://www.dropbox.com/scl/fi/vdpumrhhu5yhwhgmzwday/dataset.csv?rlkey=ezm9dpl2wbkd1tzxqrn2rzrrn&dl=0 [following]
--2025-05-06 21:40:21--  https://www.dropbox.com/scl/fi/vdpumrhhu5yhwhgmzwday/dataset.csv?rlkey=ezm9dpl2wbkd1tzxqrn2rzrrn&dl=0
Reusing existing connection to www.dropbox.com:443.
HTTP request sent, awaiting response... 302 Found
Location: https://uc1c8d1a53559d8b38429adbcb44.dl.dropboxusercontent.com/cd/0/inline/CpNBwNCo2nYxz-2aLtjQQlEGjGUwEX2TNPx1afK8OlhxxOLtv2TOR2u92wovF3iTVQImQ4vjKfWUmqyHKceJAiMyzeo9JmOHgcaTvIvuqEzWtTJvnd-w_xpSKCx41b1H1v8flBvA2Y9YC3t_Ny2cHeRS/file# [following]
--2025-05-06 21:40:21--  https://uc1c8d1a53559d8b38429adbcb44.dl.dropboxusercontent.co

In [3]:
import numpy as np

In [4]:
numpy_arr = np.genfromtxt('dataset.csv', delimiter=',')

## Basic Arithmetic between arrays

We can easily do the normal operations like addition or multiplication between numpy arrays

In [5]:
a = numpy_arr[2:4, :5]
b = numpy_arr[5:7, 5:10]

In [6]:
a.shape

(2, 5)

In [7]:
b.shape

(2, 5)

In [8]:
a+b

array([[3.33, 5.  , 6.  , 4.  , 6.  ],
       [5.33, 3.  , 4.  , 2.  , 4.33]])

In [9]:
a*b

array([[2.66, 0.  , 9.  , 0.  , 8.  ],
       [6.99, 0.  , 4.  , 0.  , 4.66]])

As you can see, and we have seen in the demo on Broadcasting, the operation is done elementwise.

If the dimensions don't match or they broadcasting fails, we get an exception

In [11]:
c= numpy_arr[3:6, 5:10]
print(f'a shape is {a.shape} and c shape is {c.shape}')

a shape is (2, 5) and c shape is (3, 5)


In [12]:
a*c

ValueError: operands could not be broadcast together with shapes (2,5) (3,5) 

## Arithmetic within the array

However, sometimes we want to do operations within the array, for example: *What is the sum of the second column? Or the mean?*

In [13]:
a = numpy_arr[:,1]
a.sum()

np.float64(235.0)

In [14]:
np.sum(a)

np.float64(235.0)

Both work! Notice in this case it was easy because there was only one dimension over to take the operand.

We can also calculate the *mean*, *std*, *abs*, all trigonometric functions, and more...

In [15]:
np.abs(a).mean()

np.float64(1.1809045226130652)

In [16]:
d = np.cos(a)   # Returns an array, it broadcasts the function

In [17]:
d.mean()

np.float64(0.2849004620475214)

Let's see what happens if we want to handle other columns

In [18]:
e = numpy_arr[:, 2:8]
e.shape

(199, 6)

In [19]:
e.sum()

np.float64(nan)

**What happened?**

When dealing with `NaN` the following rules apply:



1.   `Scalar + NaN = NaN`
2.   `Scalar * NaN = NaN`

So we need to tackle the `NaN`s first, luckily we have done it in a previous demo


## Handling NaN

In [20]:
~np.isnan(numpy_arr)

array([[ True,  True,  True, ...,  True,  True,  True],
       [ True,  True,  True, ...,  True,  True,  True],
       [ True,  True,  True, ...,  True,  True,  True],
       ...,
       [ True,  True, False, ..., False, False, False],
       [ True,  True, False, ..., False, False, False],
       [ True,  True, False, ..., False, False, False]])

Here we have the indexer of values that are not NaN.

Now we need to get a True if all of the columns are True for that row. That means on axis=1

In [21]:
~np.isnan(numpy_arr).any(axis=1)

array([ True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True, False, False, False, False, False,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False,

The rows with `False` have `NaN` values and we will drop them. We will reassin this into a new array

In [22]:
cleaned_numpy_arr = numpy_arr[~np.isnan(numpy_arr).any(axis=1)]

In [23]:
cleaned_numpy_arr.shape

(151, 11)

In [24]:
cleaned_numpy_arr.sum()

np.float64(16658.97)

It worked!


## Working over axis


Now, how can we get the sum and average over the columns? Specifying the axis to be zero! In that case the axis 0 (rows) is where it will sum

In [25]:
cleaned_numpy_arr.sum(axis=0)

array([13215.  ,   174.  ,   478.  ,   203.  ,   294.  ,   324.99,
         441.  ,   491.  ,   445.  ,   458.98,   134.  ])

In [26]:
cleaned_numpy_arr.mean(axis=0)

array([87.51655629,  1.15231788,  3.16556291,  1.34437086,  1.94701987,
        2.15225166,  2.9205298 ,  3.25165563,  2.94701987,  3.03960265,
        0.88741722])

**To finalise, can you get the mean and standard deviation of the 5th column?**

In [27]:
f = cleaned_numpy_arr[:,7]
mean = f.mean(axis=0)
std = f.std(axis=0)
print(f'The mean is {mean} and the std is {std}')

The mean is 3.251655629139073 and the std is 1.0991781580483504
