# Intro to NumPy

- Short for Numerical Python
- It is a fundamental library in the DS ecosystem
- It provides powerful tools for working with multidimensional arrays and matrices:
    - n-dimension arrays `ndarray`: core data structure. They offer efficient storage for various data types: strings, integers, floats, booleans, etc...
    - Array operations: a rich set of mathematical and statistical operations (functions)
    - Indexing, slicing, and merging or arrays
    - Linear Algebra concepts
    - Random array/matrix generation
- Advantages:
    - Performance and speed
    - IntegrationL it integrates easily with different DS modules, such as Pandas, Tensorflow, Schi-kit Learn, MatplotLib....
    - It's the foundation of so many DS libraries

## Basics

`pip install numpy`

> make sure you restart the kernel after installation 

In [2]:
import numpy as np

In [3]:
pip freeze

absl-py==1.4.0
accelerate==1.3.0
ace_tools==0.0
acres==0.5.0
aiofiles==24.1.0
aiohttp==3.9.5
aiosignal==1.3.1
alpaca-py==0.42.0
alpaca-trade-api==3.2.0
altair==5.5.0
ambivalent==0.3.0
annotated-types==0.7.0
anyio==4.4.0
anywidget==0.9.13
appdirs==1.4.4
appnope==0.1.3
aquarel==0.0.6
argon2-cffi==23.1.0
argon2-cffi-bindings==21.2.0
arrow==1.3.0
asttokens==2.2.1
astunparse==1.6.3
async-lru==2.0.4
attrs==23.1.0
audioread==3.0.1
Babel==2.15.0
backcall==0.2.0
basemap==1.4.1
basemap-data==1.3.2
beautifulsoup4==4.12.2
bleach==6.1.0
blinker==1.9.0
bokeh==3.6.2
Boruta==0.3
branca==0.6.0
cachetools==5.3.1
calmap==0.0.11
calplot==0.1.7.5
catboost==1.2.8
catppuccin==2.3.4
certifi==2024.2.2
cffi==1.15.1
chardet==5.1.0
charset-normalizer==3.1.0
ci-info==0.3.0
click==8.1.4
click-plugins==1.1.1
cligj==0.7.2
colorcet==3.1.0
colormaps==0.4.2
comm==0.1.3
configobj==5.0.9
configparser==7.2.0
contourpy==1.3.1
cryptography==41.0.1
cssselect==1.2.0
cssutils==2.11.1
cycler==0.11.0
dash==3.0.4
dataclasses-json=

In [4]:
# let's build our first array
arr = np.array([5,6,7])
arr

array([5, 6, 7])

In [5]:
type(arr)

numpy.ndarray

In [6]:
#convert a list into an array

my_list = [4,5,6,7,8,9]

arr = np.array(my_list)

print(type(my_list))
print(type(arr))

<class 'list'>
<class 'numpy.ndarray'>


`arange()` function

In [7]:
list(range(0,21))

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]

In [8]:
np.arange(0,21) #sequence from 0 to 20

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20])

In [9]:
#fractional 
arr2 = np.arange(0,5.5,0.5)
arr2

array([0. , 0.5, 1. , 1.5, 2. , 2.5, 3. , 3.5, 4. , 4.5, 5. ])

In [10]:
arr2.dtype

dtype('float64')

In [11]:
print('Num of dim:', arr2.ndim)
print('Array datatype:', arr2.dtype)

Num of dim: 1
Array datatype: float64


### Speed test between `ndarray` and `list`

In [12]:
import time

# Define the size of the data
size = 10**6

# Create a list and NumPy array with random data
list_data = list(range(size))
numpy_array = np.arange(size)

# Perform element-wise multiplication using a loop (for list)
start_time = time.time()
for i in range(size):
    list_data[i] *= 2
end_time = time.time()
list_time = end_time - start_time

# Perform element-wise multiplication using NumPy
start_time = time.time()
numpy_array *= 2
end_time = time.time()
numpy_time = end_time - start_time

print(f"Time taken for list: {list_time} seconds")
print(f"Time taken for NumPy array: {numpy_time} seconds")


Time taken for list: 0.05801892280578613 seconds
Time taken for NumPy array: 0.0016629695892333984 seconds


## Array Dimensions

In [13]:
#0d array (scaler)
a0 = np.array(40)
a0.ndim

0

In [14]:
#1d array - vector/line
a1 = np.array([5,6,8])
a1.ndim

1

In [15]:
#2d array - matrix or table
a2 = np.array([[10,5,7],
               [12,7,15]])
a2.ndim

2

In [16]:
# get the num of rows and cols
#num of rows, num of cols
a2.shape

(2, 3)

In [17]:
a2.size #num of elements

6

In [18]:
a2 = np.array([[10,5,7],
               [12,7,np.nan]])

![im](https://miro.medium.com/v2/resize:fit:931/1*XIOuiEjfXAXOFa0-w2_pTw.jpeg)

In [19]:
a3 = np.array([[[5,6,7,8],
                [4,7,2,3],
                [5,8,0,3]
                ],#layer 2
                [[5,0,7,8],
                [1,8,2,1],
                [3,8,8,3]
                ],#layer 3
                [[5,6,7,8],
                [4,7,2,3],
                [4,7,2,3]
                ]
                ])

a3

array([[[5, 6, 7, 8],
        [4, 7, 2, 3],
        [5, 8, 0, 3]],

       [[5, 0, 7, 8],
        [1, 8, 2, 1],
        [3, 8, 8, 3]],

       [[5, 6, 7, 8],
        [4, 7, 2, 3],
        [4, 7, 2, 3]]])

In [20]:
a3.ndim

3

In [21]:
a3.shape #(layer, row, col)

(3, 3, 4)

In [22]:
print(f'This array has {a3.shape[0]} layers, {a3.shape[1]} rows, and {a3.shape[2]} columns')

This array has 3 layers, 3 rows, and 4 columns


## Arithmetic Operations in NumPy

In [23]:
arr1 = np.array([2,0,8])
arr2 = np.array([9,7,4])

In [24]:
arr1 + arr2

array([11,  7, 12])

In [25]:
arr1 * arr2

array([18,  0, 32])

#### Broadcasting 

The term broadcasting describes how NumPy treats arrays with different shapes during arithmetic operations.

In [26]:
arr1 = np.array([1,2,3]) #1d
arr2 = np.array(2) #0d

arr1 * arr2

array([2, 4, 6])

![bc](https://numpy.org/doc/stable/_images/broadcasting_1.png)

In [27]:
import numpy as np
a = np.array([[ 0.0,  0.0,  0.0],
              [10.0, 10.0, 10.0],
              [20.0, 20.0, 20.0],
              [30.0, 30.0, 30.0]])
b = np.array([1.0, 2.0, 3.0])
a + b


array([[ 1.,  2.,  3.],
       [11., 12., 13.],
       [21., 22., 23.],
       [31., 32., 33.]])

![bc](https://numpy.org/doc/stable/_images/broadcasting_2.png)

In [28]:
b = np.array([1.0, 2.0, 3.0, 4.0])
a + b 

ValueError: operands could not be broadcast together with shapes (4,3) (4,) 

![m](https://numpy.org/doc/stable/_images/broadcasting_3.png)

In [29]:
b = np.array([[1.0], [2.0], [3.0], [4.0]]) #vertical 4-row array
a + b 

array([[ 1.,  1.,  1.],
       [12., 12., 12.],
       [23., 23., 23.],
       [34., 34., 34.]])

### Additional NumPy Functions

`reshape()`

In [30]:
arr = np.arange(1,7)
arr

array([1, 2, 3, 4, 5, 6])

In [31]:
#convert to a 2d array with 2x3 shape
arr_reshaped = arr.reshape(2,3)
arr_reshaped

array([[1, 2, 3],
       [4, 5, 6]])

In [32]:
arr_reshaped.ndim

2

In [33]:
arr = np.arange(1,13)
arr.shape

(12,)

In [34]:
arr_3d = arr.reshape(2,2,3)
arr_3d

array([[[ 1,  2,  3],
        [ 4,  5,  6]],

       [[ 7,  8,  9],
        [10, 11, 12]]])

In [35]:
arr_3d.reshape(-1) #flatten the array regardless of size (back to 1 d)

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12])

`resize()` will force the array to take the specified shape

In [36]:
np.resize(arr,(2,3))

array([[1, 2, 3],
       [4, 5, 6]])

## Arrays with String Data Type

In [37]:
a = np.array(['a','b','c']) #U means unicode string
a

array(['a', 'b', 'c'], dtype='<U1')

In [38]:
greetings = np.array(['Hello', 'Welcome'])
subjects = np.array([' Learners', ' to the Class!'])

In [39]:
#to apply functions on strings we need to use char sub-module
np.char.add(greetings, subjects)

array(['Hello Learners', 'Welcome to the Class!'], dtype='<U21')

In [40]:
# concatenate function will merge the 2 lists with all elements independently
np.concatenate((greetings,subjects))

array(['Hello', 'Welcome', ' Learners', ' to the Class!'], dtype='<U14')

In [72]:
#text formatting
a = np.array(['Hello', 'hello', 'heLLo','HELLO', 'welCome','WELcome']) 
np.unique(a)

array(['HELLO', 'Hello', 'WELcome', 'heLLo', 'hello', 'welCome'],
      dtype='<U7')

In [73]:
a = np.char.upper(a)
a

array(['HELLO', 'HELLO', 'HELLO', 'HELLO', 'WELCOME', 'WELCOME'],
      dtype='<U7')

In [74]:
np.unique(a)

array(['HELLO', 'WELCOME'], dtype='<U7')

In [75]:
np.unique(a, return_counts=True)

(array(['HELLO', 'WELCOME'], dtype='<U7'), array([4, 2]))

**Exercise** Clean the array below (remove extra spaces) and make it consistent.

In [59]:
a = np.array([' Model A ', ' Model B ', ' Model C ', ' Concept D ', ' Concept E '])

In [60]:
a = np.char.replace(a, 'Model', 'Concept')
a

array([' Concept A ', ' Concept B ', ' Concept C ', ' Concept D ',
       ' Concept E '], dtype='<U11')

In [61]:
a = np.char.strip(a)
a

array(['Concept A', 'Concept B', 'Concept C', 'Concept D', 'Concept E'],
      dtype='<U11')

> for additional string functions, go to https://numpy.org/devdocs/reference/generated/numpy.char.chararray.html

In [None]:
# concat can glue 2 arrays together

arr1 = np.array([[10,20],
                 [5,8]])

arr2 = np.array([[6,0],
                 [1,9]])

np.concatenate((arr1,arr2), axis=0)

array([[10, 20],
       [ 5,  8],
       [ 6,  0],
       [ 1,  9]])

In [50]:
# to stitch them together side by side, use axis =1
np.concatenate((arr1,arr2), axis=1)

array([[10, 20,  6,  0],
       [ 5,  8,  1,  9]])

In [43]:
arr

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12])

In [46]:
arr1 = np.array([[10,20],
                 [5,8]])
arr1

array([[10, 20],
       [ 5,  8]])

In [45]:
np.pad(arr1, pad_width=1)

array([[ 0,  0,  0,  0],
       [ 0, 10, 20,  0],
       [ 0,  5,  8,  0],
       [ 0,  0,  0,  0]])

In [48]:
np.resize(arr1, (4,4))

array([[10, 20,  5,  8],
       [10, 20,  5,  8],
       [10, 20,  5,  8],
       [10, 20,  5,  8]])

In [51]:
#transpose function
arr4 = np.array([[10,20,8],
                 [5,8,9]])

arr4.T

array([[10,  5],
       [20,  8],
       [ 8,  9]])

In [52]:
a3.T

array([[[5, 5, 5],
        [4, 1, 4],
        [5, 3, 4]],

       [[6, 0, 6],
        [7, 8, 7],
        [8, 8, 7]],

       [[7, 7, 7],
        [2, 2, 2],
        [0, 8, 2]],

       [[8, 8, 8],
        [3, 1, 3],
        [3, 3, 3]]])

### Statistical Functions 

In [62]:
# generate random array
arr = np.random.rand(1,10) #1d array with 10 elements
arr

array([[0.60595497, 0.15186761, 0.33890448, 0.98481206, 0.34582998,
        0.07005466, 0.59527585, 0.28616717, 0.69132383, 0.42187157]])

In [64]:
print(f'Array average: {np.mean(arr):.2f}')

Array average: 0.45


In [66]:
print(f'Array median: {np.median(arr)}')
print(f'Array standard deviation: {np.std(arr):.2f}')
print(f'Array maximum: {np.max(arr)}')

Array median: 0.383850777954323
Array standard deviation: 0.26
Array maximum: 0.9848120626086528


In [67]:
a = np.array([[ 2.0,  7.0,  0.0],
              [12.0, 10.0, 19.0],
              [21.0, 25.0, 20.0],
              [40.0, 30.0, 32.0]])

np.mean(a) #a.mean()

18.166666666666668

![ax](https://www.sharpsightlabs.com/wp-content/uploads/2018/12/numpy-arrays-have-axes_updated_v2.png)

In [68]:
np.mean(a, axis=0)

array([18.75, 18.  , 17.75])

In [71]:
np.mean(a, axis=1)

array([ 3.        , 13.66666667, 22.        , 34.        ])

### Filtering in NumPy

In [78]:
arr = np.array([1,2,3,5,6,7,8,9,10,12,13,22,30,8,9,18])


In [79]:
# get the elements in the array that are less than 13
arr_filt = arr[arr<13]
arr_filt

array([ 1,  2,  3,  5,  6,  7,  8,  9, 10, 12,  8,  9])

In [80]:
#multiple conditions
arr_filt = arr[(arr<13)&(arr>4)]
arr_filt

array([ 5,  6,  7,  8,  9, 10, 12,  8,  9])

Using `where()`

In [81]:
arr_filt = arr[np.where(arr>5)]
arr_filt

array([ 6,  7,  8,  9, 10, 12, 13, 22, 30,  8,  9, 18])

`where()` can work like the `IF()` in Excel

IF (condition, then, else)

In [83]:
arr

array([ 1,  2,  3,  5,  6,  7,  8,  9, 10, 12, 13, 22, 30,  8,  9, 18])

In [82]:
np.where(arr>4, True, False)

array([False, False, False,  True,  True,  True,  True,  True,  True,
        True,  True,  True,  True,  True,  True,  True])

In [98]:
np.where(arr>4, 'more than 4', 'less than 4')

array(['less than 4', 'less than 4', 'less than 4', 'more than 4',
       'more than 4', 'more than 4', 'more than 4', 'more than 4',
       'more than 4', 'more than 4', 'more than 4', 'more than 4',
       'more than 4', 'more than 4', 'more than 4', 'more than 4'],
      dtype='<U11')

In [91]:
np.where(arr>4, np.where(arr<20, 'less than 20', arr)
         , 'less than 4')

array(['less than 4', 'less than 4', 'less than 4', 'less than 20',
       'less than 20', 'less than 20', 'less than 20', 'less than 20',
       'less than 20', 'less than 20', 'less than 20', '22', '30',
       'less than 20', 'less than 20', 'less than 20'], dtype='<U21')

In [92]:
def square_val(x):
    return x ** 2

def double_val(x):
    return x * 2

In [93]:
np.where(arr>=10, double_val(arr), square_val(arr))

array([ 1,  4,  9, 25, 36, 49, 64, 81, 20, 24, 26, 44, 60, 64, 81, 36])

In [None]:
np.where(arr>=10, (lambda x: x-5)(arr), square_val(arr))

array([ 1,  4,  9, 25, 36, 49, 64, 81,  5,  7,  8, 17, 25, 64, 81, 13])

In [95]:
np.where((lambda x: x %2 == 0)(arr), 0, arr)

array([ 1,  0,  3,  5,  0,  7,  0,  9,  0,  0, 13,  0,  0,  0,  9,  0])

In [104]:
a = np.array([4,4,5,2,3,4])

org, dup_cnts = np.unique(a, return_counts=True)
dup = org[dup_cnts>1]
dup

array([4])

In [105]:
np.where(np.isin(a, dup), np.nan,a)

array([nan, nan,  5.,  2.,  3., nan])

In [None]:
a = np.array([[ 2.0,  7.0,  0.0],
              [12.0, 10.0, 19.0],
              [21.0, 25.0, 20.0],
              [40.0, 30.0, 32.0]])


**Exercise** if the num > 5, subtract 3, otherwise, leave it as is.

In [89]:
np.where(a>5,a-3,a)

array([[ 2.,  4.,  0.],
       [ 9.,  7., 16.],
       [18., 22., 17.],
       [37., 27., 29.]])

### Slicing and Dicing Using Indices

![ind](https://www.oreilly.com/api/v2/epubs/9781449323592/files/httpatomoreillycomsourceoreillyimages2172112.png)

In [107]:
a = np.array([[ 2.0,  7.0,  0.0],
              [12.0, 10.0, 19.0],
              [21.0, 25.0, 20.0]])


[row pos, col pos]

In [108]:
a[0,0]

2.0

In [109]:
a[2,1]

25.0

In [None]:
a[2] #3rd row

array([21., 25., 20.])

In [None]:
a[:,1] #2nd col

array([ 7., 10., 25.])

In [None]:
#ranges
a[1:] #last 2 rows

array([[12., 10., 19.],
       [21., 25., 20.]])

[row range, col range]

In [113]:
a[1:, 1:]

array([[10., 19.],
       [25., 20.]])

In [115]:
a[:2,1:]

array([[ 7.,  0.],
       [10., 19.]])

In [116]:
a = np.array([[ 2.0,  7.0,  0.0],
              [12.0, 10.0, 19.0],
              [21.0, 25.0, 20.0],
              [4   , 5   , 8   ]])


In [117]:
a[1:,1:]

array([[10., 19.],
       [25., 20.],
       [ 5.,  8.]])

In [119]:
a[:3,1]

array([ 7., 10., 25.])

In [121]:
a[1:3,1:]

array([[10., 19.],
       [25., 20.]])

In [122]:
a[:,1] = np.where(a[:,1]>7,8,a[:,1])
a

array([[ 2.,  7.,  0.],
       [12.,  8., 19.],
       [21.,  8., 20.],
       [ 4.,  5.,  8.]])

In [144]:
a[-1]

array([4., 5., 8.])

### Data Generation and Importing in NumPy

In [123]:
np.random.randint(low=1,high=55, size=40)

array([43,  6,  8,  2,  2,  3, 37, 36,  3, 10, 29, 44, 42, 35, 12, 50,  5,
       35, 14, 45, 28, 26, 50, 25, 32, 45, 32, 31, 12,  4, 22, 17, 54, 36,
       54, 10, 39, 39, 14, 51])

In [134]:
np.random.randint(low=1,high=55, size=(10,10))

array([[17, 17, 41, 23, 20, 46, 30, 37, 34, 30],
       [41, 52, 28,  5, 49, 10, 43,  5, 54,  9],
       [46, 22, 39, 47, 30, 41, 21,  3, 17, 32],
       [10, 36, 15,  4, 49, 17, 10, 33, 32, 49],
       [ 4, 38, 37, 18, 30, 24,  4, 13, 51, 35],
       [12, 28, 50, 25, 41, 28,  6, 26, 37, 26],
       [16,  7, 40, 39, 24, 13,  1,  4, 32, 16],
       [46, 25, 25, 29, 22, 16, 26, 29,  4, 37],
       [45,  6, 43, 35, 14, 28, 43,  1,  4, 18],
       [ 9, 44, 44, 40, 49, 12, 45, 15, 43, 39]])

In [None]:
# build a sequence betwee 2 numbers
# start, end, divisions
np.linspace(2,8,4)

array([2., 4., 6., 8.])

In [139]:
np.linspace(0,1,500)

array([0.        , 0.00200401, 0.00400802, 0.00601202, 0.00801603,
       0.01002004, 0.01202405, 0.01402806, 0.01603206, 0.01803607,
       0.02004008, 0.02204409, 0.0240481 , 0.0260521 , 0.02805611,
       0.03006012, 0.03206413, 0.03406814, 0.03607214, 0.03807615,
       0.04008016, 0.04208417, 0.04408818, 0.04609218, 0.04809619,
       0.0501002 , 0.05210421, 0.05410822, 0.05611222, 0.05811623,
       0.06012024, 0.06212425, 0.06412826, 0.06613226, 0.06813627,
       0.07014028, 0.07214429, 0.0741483 , 0.0761523 , 0.07815631,
       0.08016032, 0.08216433, 0.08416834, 0.08617234, 0.08817635,
       0.09018036, 0.09218437, 0.09418838, 0.09619238, 0.09819639,
       0.1002004 , 0.10220441, 0.10420842, 0.10621242, 0.10821643,
       0.11022044, 0.11222445, 0.11422846, 0.11623246, 0.11823647,
       0.12024048, 0.12224449, 0.1242485 , 0.12625251, 0.12825651,
       0.13026052, 0.13226453, 0.13426854, 0.13627255, 0.13827655,
       0.14028056, 0.14228457, 0.14428858, 0.14629259, 0.14829

### Importing text/csv files in NumPy

In [142]:
data = np.loadtxt('dummy_data.txt', delimiter=',', dtype=int)
data

array([[ 5,  1],
       [ 5,  3],
       [ 9,  5],
       [ 5,  7],
       [ 8,  8],
       [ 5,  9],
       [ 6,  0],
       [ 5, 11],
       [ 4,  5]])

> The syntax above works as the text file is in the same location as the notbook. If not, you need to specify the path

In [None]:
data = np.loadtxt('/bassel/document/dummy_data.txt', delimiter=',', dtype=int)
data

In [143]:
data = np.genfromtxt('dummy_data.txt', delimiter=',', dtype=int)
data

array([[ 5,  1],
       [ 5,  3],
       [ 9,  5],
       [ 5,  7],
       [ 8,  8],
       [ 5,  9],
       [ 6,  0],
       [ 5, 11],
       [ 4,  5]])