# Intro to NumPy

- Short for Numerical Python 
- It is a free and open-source library that is mostly used for mathematical operations in scientific and engineering applications.
- It is a fundamental library in DS and ML.
- It is a Python library used for working with arrays.
- It consists of a multidimensional array of objects and a collection of functions for manipulating them.
- It conducts mathematical and logical operations on arrays.

**Note:** The array object in NumPy is called ndarray.

Advantages of NumPy
- It provides an array object that is faster than traditional Python lists.
- It provides supporting functions.
- Arrays are frequently used in data science.
- NumPy arrays are stored in one continuous place in memory unlike lists.

- To install numpy, run: `pip install numpy`
- Then, restart your kernel

In [2]:
import numpy as np #giving numpy an alias called np 

In [2]:
# let's build our first array

arr = np.array([1,2,3])
arr

array([1, 2, 3])

In [3]:
type(arr)

numpy.ndarray

In [4]:
my_list = [3,4,5,6,8,8]

my_arr = np.array(my_list)

print(type(my_list))
print(type(my_arr))

<class 'list'>
<class 'numpy.ndarray'>


In [5]:
list(range(0,20))

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]

NumPy has its own sequence generation function called `arange()`

In [6]:
arr = np.arange(2,20)
arr

array([ 2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17, 18,
       19])

In [7]:
#using decimal steps
arr2 = np.arange(0, 5, 0.5)
arr2

array([0. , 0.5, 1. , 1.5, 2. , 2.5, 3. , 3.5, 4. , 4.5])

In [8]:
arr2.dtype

dtype('float64')

In [9]:
# negative sequence
arr2_neg = np.arange(0, -5, -0.5)
arr2_neg

array([ 0. , -0.5, -1. , -1.5, -2. , -2.5, -3. , -3.5, -4. , -4.5])

### Speed Test Between Lists and Arrays

In [10]:
import time

# Define the size of the data - 1,000,000
size = 10**6

# defining list sequence and array sequence
list_data = list(range(0,size))
numpy_array = np.arange(0,size)


# Perform element-wise multiplication using a loop (for list)
start_time = time.time()
for i in range(size):
    list_data[i] *= 2
end_time = time.time()
list_time = end_time - start_time

# Perform element-wise multiplication using NumPy
start_time = time.time()
numpy_array *= 2
end_time = time.time()
numpy_time = end_time - start_time

print(f"Time taken for list: {list_time} seconds")
print(f"Time taken for NumPy array: {numpy_time} seconds")


Time taken for list: 0.06223106384277344 seconds
Time taken for NumPy array: 0.0019981861114501953 seconds


In [14]:
numpy_array

array([      0,       2,       4, ..., 1999994, 1999996, 1999998])

## Array Dimensions

In [3]:
# 0d array (scaler) - point
a0 = np.array(50)
a0.ndim #get the number of dimensions

0

In [4]:
#1d array - line or vector
a1 = np.array([4,5,1,3])
a1.ndim

1

In [7]:
a1.shape

(4,)

In [5]:
#2d array - square/matrix
a2 = np.array([[4,5,6],
               [9,7,2]])

a2.ndim

2

In [8]:
a2.shape #(row count, column count)

(2, 3)

In [13]:
#3d array - cube
a3 = np.array([[[1,2,3],
                [4,5,6],
                [10,12,3]],
                [[4,5,3],
                [9,9,2],
                [1,13,6]]
                ])

a3.ndim

3

In [10]:
type(a3)

numpy.ndarray

In [9]:
a3.shape #layer, row, column

(2, 3, 3)

In [15]:
type(a3.shape)

tuple

In [18]:
print(f'This array has {a3.shape[0]} layers, {a3.shape[1]} rows, and {a3.shape[2]} columns')

This array has 2 layers, 3 rows, and 3 columns


## Arithmetic Operations in NumPy

In [19]:
arr1 = np.array([3,3,6])
arr2 = np.array([7,9,10])

In [20]:
arr1 + arr2

array([10, 12, 16])

In [21]:
arr1 * arr2

array([21, 27, 60])

In [22]:
arr1 / arr2

array([0.42857143, 0.33333333, 0.6       ])

### Broadcasting

Broadcasting is NumPy's method of working with arrays with different shapes.

In [23]:
a = np.array([4,7,5])

b = np.array(10) 

a*b

array([40, 70, 50])

![a](https://numpy.org/doc/stable/_images/broadcasting_1.png)

In [24]:

a = np.array([[ 0.0,  0.0,  0.0],
              [10.0, 10.0, 10.0],
              [20.0, 20.0, 20.0],
              [30.0, 30.0, 30.0]])
b = np.array([1.0, 2.0, 3.0])
print(a.shape, b.shape)

(4, 3) (3,)


In [28]:
a + b

array([[ 1.,  2.,  3.],
       [11., 12., 13.],
       [21., 22., 23.],
       [31., 32., 33.]])

![ap](https://numpy.org/doc/stable/_images/broadcasting_2.png)

In [29]:

a = np.array([[ 0.0,  0.0,  0.0],
              [10.0, 10.0, 10.0],
              [20.0, 20.0, 20.0],
              [30.0, 30.0, 30.0]])

b = np.array([1.0, 2.0, 3.0])
print(a.shape, b.shape)

(4, 3) (4,)


In [30]:
a + b

ValueError: operands could not be broadcast together with shapes (4,3) (4,) 

![b](https://numpy.org/doc/stable/_images/broadcasting_3.png)

## NumPy Functions

In [31]:
arr = np.arange(1,10)
arr

array([1, 2, 3, 4, 5, 6, 7, 8, 9])

In [32]:
arr.size

9

In [33]:
arr.reshape(3,3)

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [34]:
arr.reshape(3,3).ndim

2

In [39]:
arr = np.arange(1,13)
arr

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12])

In [36]:
arr.reshape(3,3)

ValueError: cannot reshape array of size 12 into shape (3,3)

In [37]:
arr.reshape(4,3)

array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12]])

In [41]:
#convert to 3d array
arr_3d = arr.reshape(2,2,3)
arr_3d

array([[[ 1,  2,  3],
        [ 4,  5,  6]],

       [[ 7,  8,  9],
        [10, 11, 12]]])

In [42]:
# shortcut to flatten an array

arr_flattened = arr_3d.reshape(-1)
arr_flattened

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12])

## Strings in NumPy

In [43]:
arr = np.array(['a', 'b', 'c'])
arr

array(['a', 'b', 'c'], dtype='<U1')

In [44]:
a = np.array(['Hello', 'Welcome'])
b = np.array([' learners', ' to the class'])

In [45]:
# to use functions on string based arrays, you need to use char submodule

np.char.add(a,b)

array(['Hello learners', 'Welcome to the class'], dtype='<U20')

In [46]:
# to join 2 lists together
a = np.array(['Apples', 'Oranges'])
b = np.array(['Kiwi', 'Mango'])
np.concatenate((a,b))

array(['Apples', 'Oranges', 'Kiwi', 'Mango'], dtype='<U7')

Using Concatenate with Numbers

In [47]:
arr1 = np.array([[10,20],
                 [11,4],
                 [15,9]])

arr2 = np.array([[2,7],
                 [8,8]])

In [48]:
# merge the 2 arrays on top of each other (union in SQL)
np.concatenate((arr1,arr2), axis=0)

array([[10, 20],
       [11,  4],
       [15,  9],
       [ 2,  7],
       [ 8,  8]])

In [50]:
arr1 = np.array([[10,20],
                 [11,4]])

np.concatenate((arr1,arr2), axis=1)

array([[10, 20,  2,  7],
       [11,  4,  8,  8]])

In [51]:
a = np.array(['Hello', 'Welcome'])

np.char.upper(a)

array(['HELLO', 'WELCOME'], dtype='<U7')

In [53]:
a = np.array(['Model B', 'Model C', 'Model B'])

np.char.replace(a, 'B', 'X')

array(['Model X', 'Model C', 'Model X'], dtype='<U7')

## Statistical Functions in NumPy

In [54]:
arr = np.arange(4,30)
arr

array([ 4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
       21, 22, 23, 24, 25, 26, 27, 28, 29])

In [55]:
np.median(arr)

16.5

In [56]:
np.mean(arr)

16.5

In [57]:
np.std(arr)

7.5

Statistics with Axes

In [58]:
a = np.array([[ 13,  0.0,  7.0],
              [10.0, 10.0, 10.0],
              [20.0, 20.0, 20.0],
              [30.0, 30.0, 30.0]])

np.mean(a)

16.666666666666668

![ax](https://www.sharpsightlabs.com/wp-content/uploads/2018/12/numpy-arrays-have-axes_updated_v2.png)

In [61]:
np.mean(a, axis=0)

16.666666666666668

In [60]:
np.mean(a, axis=1)

array([ 6.66666667, 10.        , 20.        , 30.        ])

In [64]:
a3 = np.array([[[1,2,3],
                [4,5,6],
                [10,12,3]],
                [[4,5,3],
                [9,9,2],
                [1,13,6]]
                ])
np.mean(a3, axis=2)

array([[2.        , 5.        , 8.33333333],
       [4.        , 6.66666667, 6.66666667]])

### Filtering in NumPy

In [65]:
arr = np.arange(4,30)
arr

array([ 4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
       21, 22, 23, 24, 25, 26, 27, 28, 29])

In [66]:
filtered_array = arr[arr<13]
filtered_array

array([ 4,  5,  6,  7,  8,  9, 10, 11, 12])

In [67]:
filtered_array = arr[(arr<13) & (arr>4)]
filtered_array

array([ 5,  6,  7,  8,  9, 10, 11, 12])

**Exercise** build an array that shows an indicator whether a person is under 18 or over

In [69]:
np.where(arr>18, 'Above 18', 'Below 18') #similar to IF() in Excel
#(condition, do the following if it's true, do the following if it's false)

array(['Below 18', 'Below 18', 'Below 18', 'Below 18', 'Below 18',
       'Below 18', 'Below 18', 'Below 18', 'Below 18', 'Below 18',
       'Below 18', 'Below 18', 'Below 18', 'Below 18', 'Below 18',
       'Above 18', 'Above 18', 'Above 18', 'Above 18', 'Above 18',
       'Above 18', 'Above 18', 'Above 18', 'Above 18', 'Above 18',
       'Above 18'], dtype='<U8')

In [70]:
# add 5 years to each age that's over 18
np.where(arr>18, arr+5, arr)

array([ 4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 24, 25,
       26, 27, 28, 29, 30, 31, 32, 33, 34])

In [72]:
def my_sq(x):
    return x*x

np.where(arr<8, my_sq(arr), arr+2)

array([16, 25, 36, 49, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22,
       23, 24, 25, 26, 27, 28, 29, 30, 31])

## Slicing and Dicing Arrays in NumPy

In [74]:
a = np.array([3,5,7,8,9,1,2])

a[:4]

array([3, 5, 7, 8])

![tag](https://www.oreilly.com/api/v2/epubs/9781449323592/files/httpatomoreillycomsourceoreillyimages2172112.png)

In [75]:
arr = np.array([[1, 2, 3],
                [4, 5, 6],
                [7, 8, 9]])

In [76]:
#[row position/range, col position/range]
arr[2,0]

7

In [77]:
arr[1,2]

6

In [79]:
arr[1:,0:2]

array([[4, 5],
       [7, 8]])

In [80]:
#a single number will get the full row based on the value
arr[2]

array([7, 8, 9])

In [81]:
#to get the last row, you can do negative index
arr[-1]

array([7, 8, 9])

In [82]:
arr[0]+arr[2]

array([ 8, 10, 12])

In [83]:
a3 = np.array([[[1,2,3],
                [4,5,6],
                [10,12,3]],
                [[4,5,3],
                [9,9,2],
                [1,13,6]]
                ])


In [84]:
a3[0,1,2] #first layer, second row, 3rd column

6

In [85]:
a3[1, 1:,0:2] #1st layer, row 1 to 2, column 0 to 1

array([[ 9,  9],
       [ 1, 13]])

## Generating Data

In [100]:
# generate an array with specific size with random numbers with a specific value range

np.random.randint(low=1, high=101, size=40)

array([95, 64, 10, 21, 42, 84, 59, 60, 27, 60, 88, 15,  3, 36, 42, 81, 38,
       43,  5, 18, 27, 63, 80, 26, 99, 46, 91, 98, 98, 30, 99, 38, 38, 50,
       91, 88, 53, 29,  2,  8])

In [110]:
#specify the shape of the array
np.random.randint(1, 101, (10,10))

array([[ 52,  23,  13,  13,  44,  14,  34,  16,  37,  69],
       [ 92,   9, 100,  91,  68,  59,  15,  48,  30,   5],
       [ 20,  81,  28,  25,  36,  13,   9,  31,  18,  63],
       [ 44,  12, 100,  47,  72,  25,  68,  92,  78,  51],
       [ 67,  54,   1,  23,  58,  38,  12,  15,  14,   3],
       [ 40,  27,   2,  10,  18,   5,  69,   1,  52,   7],
       [ 98,  11,  69,  58,  57,  78,  41,  11,  49,  53],
       [ 12,  45,  83,   4,   1,  29,  82,  65,  39,  76],
       [ 82,  43,  25,  86,  49,  25,  30,  70,  95,  64],
       [ 66,  24,  99,  42,  34,  24,  13,  49,  30,  31]])

In [112]:
np.linspace(0,1,20, retstep=True)

(array([0.        , 0.05263158, 0.10526316, 0.15789474, 0.21052632,
        0.26315789, 0.31578947, 0.36842105, 0.42105263, 0.47368421,
        0.52631579, 0.57894737, 0.63157895, 0.68421053, 0.73684211,
        0.78947368, 0.84210526, 0.89473684, 0.94736842, 1.        ]),
 0.05263157894736842)

### Load Data in Python

In [113]:
data = np.loadtxt('/Users/bassel_instructor/Documents/Datasets/dummy_data.txt', delimiter=',', dtype=int)
data

array([[ 5,  1],
       [ 5,  3],
       [ 9,  5],
       [ 5,  7],
       [ 8,  8],
       [ 5,  9],
       [ 6,  0],
       [ 5, 11],
       [ 4,  5]])