# Numpy Introduction

**Week02, Numpy Introduction**

ISM6136

&copy; 2023 Dr. Tim Smith


<a target="_blank" href="https://colab.research.google.com/github/prof-tcsmith/dm-f23/blob/main/W02/W02b-Numpy.ipynb#offline=1">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

---

## Introduction

Numpy is a Python library for working with arrays. It is particularly useful for linear algebra, Fourier transform, and random number capabilities. Numpy also has a C API, which means that it is possible to write fast code in C and use it in Python through Numpy.

## 1.0 Installing and importing numpy

The numpy package is installed as part of the Anaconda distribution (you will not need to install it, only import it and use it).

In [255]:
import numpy # this is one way to import

In [256]:
numpy.asarray([1,2,3])

array([1, 2, 3])

In [257]:
# but, you can also rename a package (to a shorter name) when importing

import numpy as np # this is a very common way to import numpy

np.asarray([2,3,4])

array([2, 3, 4])

## 2. Numpy basics

### 2.1 Lists versus numpy array

In [258]:
# Here are some regular lists...
height = [1.75, 1.34, 1.56, 1.54, 1.48]
weight = [65.4, 56.7, 73.8, 65.4, 49.8]

Numpy is a package that is commonly used in data science. It allows you to easily conduct vector/matrix calculations; and in general, conduct operations easily (and fast) over large arrays of data. 

In [259]:
height_np = np.asarray(height)
weight_np = np.asarray(weight)

In [260]:
bmi = weight_np / height_np ** 2
print(bmi)

[21.35510204 31.57718868 30.32544379 27.57631978 22.73557341]


In [261]:
# notice that we can't do a calculation of bmi as easily if you only had the lists
# note that the code in this cell creates and error (to demonstrate now lists and numpy differ)
#bmi = weight / height ** 2
#print(bmi)

### 2.2 Operations on numpy arrays

In [262]:
np_ra1 = np.array([1,2,3,4])

In [263]:
np_ra1 * 2 # multiple all elements of the array by 2

array([2, 4, 6, 8])

In [264]:
np_ra1 ** 2 # take easy element in the array and square it

array([ 1,  4,  9, 16])

In [265]:
np_ra2 = np.array([3,4,5, 8])

In [266]:
np_ra1 * np_ra2 # we can even multiple (and subtract, add, or divide) two lists

array([ 3,  8, 15, 32])

In [267]:
np_ra1 / np_ra2 

array([0.33333333, 0.5       , 0.6       , 0.5       ])

In [268]:
np_ra1 + np_ra2 

array([ 4,  6,  8, 12])

In [269]:
np_ra1 - np_ra2 

array([-2, -2, -2, -4])

### 2.3 Numpy arrays only contain one type

In [270]:
np.array([1.0, "is", True]) # np will turn all of these into a string

array(['1.0', 'is', 'True'], dtype='<U32')

In [271]:
np.array([1.0, 2, 3.9]) # these will all become float

array([1. , 2. , 3.9])

Notice that Numpy will choose the smallest datatype that can hold all the content. 

### 2.4 NumPy Subsetting

In [272]:
bmi

array([21.35510204, 31.57718868, 30.32544379, 27.57631978, 22.73557341])

In [273]:
bmi > 23

array([False,  True,  True,  True, False])

In [274]:
bmi[bmi > 23]

array([31.57718868, 30.32544379, 27.57631978])

## 3.0 Multidimensional Numpy arrays

In [275]:
np_2d = np.array([np.asarray(height), np.asarray(weight)])
np_2d

array([[ 1.75,  1.34,  1.56,  1.54,  1.48],
       [65.4 , 56.7 , 73.8 , 65.4 , 49.8 ]])

In [276]:
np_2d.shape  # NOTICE that this isn't a method, rather it's an attribut of the np_2d object

(2, 5)

In [277]:
# selecting first row, 3rd element (this is the same as lists!)
np_2d[0][2]

1.56

In [278]:
# BUT, with numpy arrays we can also do...
np_2d[0,2] # once you get used to this approach, it's more intiutive and easier that the list approach

1.56

In [279]:
np_2d[:,1:3] # select the columns at index 1 and 2 for all rows

array([[ 1.34,  1.56],
       [56.7 , 73.8 ]])

In [280]:
np_2d[1,:] # select all the weights

array([65.4, 56.7, 73.8, 65.4, 49.8])

### 5.0 Numpy Basic Stats

In [281]:
np_height = np.array(height)
np_weight = np.array(weight)

In [282]:
np_height.mean()

1.534

In [283]:
np_height.std()

0.13260467563400619

In [284]:
# np_height.median() # note that there isn't a median method... 

... using np functions (not methods)....

In [285]:
np.median(np_height)

1.54

In [286]:
np.mean(np_height)

1.534

In [287]:
np.corrcoef(np_height, np_weight)

array([[1.       , 0.5034277],
       [0.5034277, 1.       ]])

In [288]:
np.std(np_height)

0.13260467563400619

In [289]:
np.var(np_height)

0.017583999999999995

In [290]:
np.sum(np_height)

7.67

In [291]:
# here, we will generate 20 weights and height from randomly sampling a normal distribution
np_height = np.round(np.random.normal(1.75, 0.2, 20), 2)
np_weight = np.round(np.random.normal(60.32, 15, 20), 2)
np_people = np.column_stack((np_height, np_weight))
np_people

array([[ 2.12, 48.84],
       [ 1.84, 35.2 ],
       [ 1.72, 53.89],
       [ 1.78, 79.8 ],
       [ 1.61, 66.26],
       [ 1.95, 51.2 ],
       [ 1.85, 51.44],
       [ 1.64, 45.29],
       [ 1.62, 44.77],
       [ 1.77, 53.68],
       [ 1.74, 56.99],
       [ 1.73, 55.24],
       [ 1.84, 72.78],
       [ 1.7 , 81.47],
       [ 1.64, 61.05],
       [ 1.5 , 53.54],
       [ 2.03, 51.99],
       [ 1.94, 60.81],
       [ 1.49, 61.01],
       [ 1.74, 54.32]])

## 4.0 Comparison operators and logical operations on nupy arrays

In [292]:
### 4.1 Comparison Operators

In [293]:
bmi > 25

array([False,  True,  True,  True, False])

In [294]:
bmi[bmi > 25]

array([31.57718868, 30.32544379, 27.57631978])

In [295]:
# this will cause an error...
# bmi > 21 and bmi < 25

### 4.2 Logical Operations

We can't directly use the logiccal operators on numpy arrays, but numpy had a few functions that can be used instead

- `np.logical_and()`
- `np.logical_or()`
- `np.logical_not()`
- `np.logical_xor()`

In [296]:
np.logical_and(bmi > 21, bmi < 25)

array([ True, False, False, False,  True])

In [297]:
a = np.array([True, True, False, False])
b = np.array([True, False, True, False])
print(np.logical_and(a, b))
print(np.logical_or(a, b))
print(np.logical_not(a))
print(np.logical_xor(a, b))

[ True False False False]
[ True  True  True False]
[False False  True  True]
[False  True  True False]


## 5.0 Looping over NumPy array

In [298]:
heights = np.array([1.75, 1.34, 1.56, 1.54, 1.48])
weights = np.array([65.4, 56.7, 73.8, 65.4, 49.8])

In [299]:
for height in heights:
    print(height)

1.75
1.34
1.56
1.54
1.48


In [300]:
meas = np.array([heights, weights])
for val in meas:
    print(val)

[1.75 1.34 1.56 1.54 1.48]
[65.4 56.7 73.8 65.4 49.8]


In [301]:
meas = np.array([heights, weights])
for val in np.nditer(meas):
    print(val)

1.75
1.34
1.56
1.54
1.48
65.4
56.7
73.8
65.4
49.8


## 6.0 List comprehensions with numpy

Most often, the most elegent way to handle loop like functionality on numpy arrays is to use a list comprehension.

In the cell below we create a numpy array called n_ra that contains 10 random numbers between 0 and 1. 

In [302]:
n_ra = np.array([x for x in np.random.normal(0,1,10)])

n_ra

array([-1.17665209, -0.68632243,  1.40027683, -2.433345  , -0.29070195,
        0.14993095,  0.3853368 ,  0.26373253, -1.96660534,  0.32058926])

Often, we do not need this many decimal places. In such cases, we can incorporate the round function.

In [303]:
n_ra = np.array([x for x in np.random.normal(0,1,10).round(3)])

n_ra

array([ 0.973, -0.222,  0.338,  0.433,  0.137,  0.081,  0.087,  1.476,
       -1.215,  0.489])

In the cell below, we create a numpy array called n_ra that selects all values from the original n_ra that are greater than 0. It then rounds these reusults to 2 decimal places. 

In [304]:

n_ra = np.array([x**2 for x in n_ra if x > 0]).round(2)

n_ra

array([0.95, 0.11, 0.19, 0.02, 0.01, 0.01, 2.18, 0.24])

## 7.0 Aritheic operations on numpy arrays

Let's create two numpy arrays, one called `height` and one called `width`.

In [305]:
height = np.array([1.75, 1.34, 1.56, 1.54, 1.48])
width = np.array([65.4, 56.7, 73.8, 65.4, 49.8])

We could calculate an array of area's by using the following vector operations.

In [306]:
area = height * width

area


array([114.45 ,  75.978, 115.128, 100.716,  73.704])

In the following cells, we will see that numpy supports all the common arithmetic operators (+, -, *, /, **, etc.) and that they are applied elementwise to arrays.

In [307]:
a = np.array([1,2,3])
b = np.array([4,5,6])

In [308]:
# subtract two numpy arrays
a - b

array([-3, -3, -3])

In [309]:
# multiple two numpy arrays
a * b


array([ 4, 10, 18])

In [310]:
# divide two numpy arrays
a / b

array([0.25, 0.4 , 0.5 ])

In [311]:
# Remainder of two numpy arrays

a % b

array([1, 2, 3])

In [312]:
# Integer division of numpy arrays
a // b

array([0, 0, 0])

In [313]:
# Exponentiation of numpy arrays
b** a

array([  4,  25, 216])