# Introduction to Numpy
© Saefulloh Maslul - 2024

## What is Numpy?
NumPy stands for numerical Python. It's the backbone of all kinds of scientific and numerical computing in Python.

And since machine learning is all about turning data into numbers and then figuring out the patterns, NumPy often comes into play.

## Why NumPy?
NumPy has several advantages that make it a powerful tool for data manipulation in Python. First, it is faster than Python lists, which can be a significant advantage when dealing with large datasets. Second, unlike Python lists, NumPy uses fixed types, which allows it to efficiently store and manipulate data. Third, NumPy uses contiguous blocks of memory, which can improve performance and speed. Fourth, it is optimized for linear algebra operations, which are a common requirement in data analysis. Finally, NumPy serves as the foundation for many other Python libraries, making it a crucial part of the Python data ecosystem.

## Importing NumPy
To use NumPy, you need to import the numpy package. Most people import it as np.

In [1]:
import numpy as np
print(np.__version__)

1.26.4


## Data Types and Attributes
NumPy's main data type is ndarray, which is a multidimensional array of numbers. You can create an ndarray using the np.array() function. For example, the following code creates a one-dimensional array of numbers:

In [8]:
a = np.array([1, 2, 3, 4, 5])
print(a)

[1 2 3 4 5]


To create a two-dimensional array, you can pass a list of lists to the np.array() function. For example, the following code creates a two-dimensional array:

In [9]:
b = np.array([
    [1, 2, 3], 
    [4, 5, 6]
])
print(b)

[[1 2 3]
 [4 5 6]]


And you can create a three-dimensional array by passing a list of lists of lists to the np.array() function. For example, the following code creates a three-dimensional array:

In [10]:
c = np.array([
    [[1, 2], [3, 4]], 
    [[5, 6], [7, 8]]
])
print(c)

[[[1 2]
  [3 4]]

 [[5 6]
  [7 8]]]


Check the type of the array:

In [11]:
print(type(a)), print(type(b)), print(type(c))

<class 'numpy.ndarray'>
<class 'numpy.ndarray'>
<class 'numpy.ndarray'>


(None, None, None)

In [12]:
print(a.dtype), print(b.dtype), print(c.dtype)

int64
int64
int64


(None, None, None)

Check the shape of the array:

In [13]:
print(a.shape), print(b.shape), print(c.shape)

(5,)
(2, 3)
(2, 2, 2)


(None, None, None)

Check the dimension of the array:

In [14]:
print(a.ndim), print(b.ndim), print(c.ndim)

1
2
3


(None, None, None)

Check the size of the array:

In [16]:
print(a.size), print(b.size), print(c.size)

5
6
8


(None, None, None)

### Anatomy of an Array
<img src="./anatomy-of-a-numpy-array.png" height="1000">

**Key items:**
- **Array** - A list of numbers, can be multi-dimensional.
- **Scalar** - A single number (e.g. 7).
- **Vector** - A list of numbers with 1-dimension (e.g. np.array([1, 2, 3])).
- **Matrix** - A (usually) multi-dimensional list of numbers (e.g. np.array([[1, 2, 3], [4, 5, 6]])).

## Creating Arrays
There are many ways to create arrays in NumPy. Here are some common ways:
- `np.array()`
- `np.ones()`
- `np.zeros()`
- `np.random.rand(5, 3)`
- `np.random.randint(10, size=5)`
- `np.random.seed()` - pseudo random numbers
Searching the documentation example (finding `np.unique()` and using it)

In [19]:
# Create a simple array
simple_array = np.array([1, 2, 3, 4, 5])

print(simple_array)
print(simple_array.dtype)

[1 2 3 4 5]
int64


In [22]:
# Create an array of ones
ones_array = np.ones((3, 3))

print(ones_array)
print(ones_array.dtype) # The default datatype is 'float64'

# You can change the datatype by passing the 'dtype' argument
ones_array_int = np.ones((3, 3), dtype=int)

print(ones_array_int)
print(ones_array_int.dtype)

# Or you can change the datatype with .astype()
ones_array_int = ones_array.astype(int)

print(ones_array_int)
print(ones_array_int.dtype)

[[1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]]
float64
[[1 1 1]
 [1 1 1]
 [1 1 1]]
int64
[[1 1 1]
 [1 1 1]
 [1 1 1]]
int64


In [23]:
# Create an array of zeros
zeros_array = np.zeros((3, 3))

print(zeros_array)
print(zeros_array.dtype) # The default datatype is 'float64'

[[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]
float64


In [24]:
# Create an array within a range of values
range_array = np.arange(0, 10, 2) # (start, stop, step)

print(range_array)

[0 2 4 6 8]


In [26]:
# Create an array of random values
random_array = np.random.randint(0, 10, size=(3, 3))

print(random_array)

# Random array of floats (between 0 & 1)
random_array_float = np.random.rand(3, 3)

print(random_array_float)

[[4 9 3]
 [7 7 7]
 [2 7 0]]
[[0.78407769 0.2381956  0.01392624]
 [0.15913637 0.67138439 0.71556289]
 [0.10694238 0.94984581 0.57808624]]


NumPy uses pseudo-random numbers, which means, the numbers look random but aren't really, they're predetermined. For consistency, you might want to keep the random numbers you generate similar throughout experiments. To do this, you can use `np.random.seed()`.

What this does is it tells NumPy, "Hey, I want you to create random numbers but keep them aligned with the seed."

Let's see it.

In [33]:
# Set random seed
np.random.seed(0)

random_array_1 = np.random.randint(10, size=(3, 3))
print(random_array_1)

# If you run this cell again, random_array_1 will be the same

[[5 0 3]
 [3 7 9]
 [3 5 2]]


Set the random seed to a different number and see the difference.

Because `np.random.seed()` is set to 0, the random numbers are the same as the cell with np.random.seed() set to 0 as well.
Setting `np.random.seed()` is not 100% necessary but it's helpful to keep numbers the same throughout your experiments.

## Viewing arrays and matrices (index)

Remember the anatomy of an array? It's a grid of numbers. To view arrays and matrices, you can use the `print()` function or simply call the array itself. For example:

In [39]:
print(random_array)

print(random_array[0]) # First row of the array
print(random_array[1]) # Second row of the array
print(random_array[2]) # Third row of the array

print(random_array[0][0]) # First element of the first row
print(random_array[1][1]) # Second element of the second row
print(random_array[2][2]) # Third element of the third row

# Get the first 2 values of the first 2 rows of both arrays
print(random_array[:2, :2])

[[4 9 3]
 [7 7 7]
 [2 7 0]]
[4 9 3]
[7 7 7]
[2 7 0]
4
7
0
[[4 9]
 [7 7]]


## Manipulating & comparing arrays
- Arithmetic
    - `+`, `-`, `*`, `/`, `//`, `**`
    - `np.exp()`, `np.log()`, `np.sqrt()`
    - `np.dot()` - Dot product
    - Broadcasting
- Aggregation
    - `np.sum()`
    - `np.mean()`
    - `np.std()`
    - `np.var()`
    - `np.min()`
    - `np.max()` 
    - `np.argmin()` - Find the index of the minimum value
    - `np.argmax()` - Find the index of the maximum value
- Reshaping
    - `np.reshape()`
- Transposing
    - `array.T` 
- Comparison
    - `>`, `<`, `>=`, `<=`, `==`, `!=`
- Sorting
    - `np.sort()` 
    - `np.argsort()` - Returns the indices that would sort an array
    - `np.argmax()` - Returns the indices of the maximum values along an axis
    - `np.argmin()` - Returns the indices of the minimum values along an axis

### Arithmetic

In [50]:
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

# Element-wise addition
print(a + b)

[5 7 9]


In [51]:
# Element-wise subtraction
print(a - b)

[-3 -3 -3]


In [52]:
# Element-wise multiplication
print(a * b)

[ 4 10 18]


In [53]:
# Element-wise division
print(a / b)

[0.25 0.4  0.5 ]


In [54]:
# Floor division (removes the decimals)
print(a // b)

[0 0 0]


In [55]:
# Element-wise power
print(a ** b)

[  1  32 729]


In [56]:
# Exponential
print(np.exp(a))

[ 2.71828183  7.3890561  20.08553692]


In [57]:
# Logarithm
print(np.log(a))

[0.         0.69314718 1.09861229]


In [58]:
# Square root
print(np.sqrt(a))

[1.         1.41421356 1.73205081]


In [60]:
# Broadcasting is a feature of NumPy which performs operations on arrays of different shapes.
print(a + 10)
print(a * 10)
print(a / 10)

[11 12 13]
[10 20 30]
[0.1 0.2 0.3]


### Aggregation

In [61]:
a = np.array([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10], [11, 12, 13, 14, 15]])

print(np.sum(a)) # Sum of all elements
print(np.mean(a)) # Mean of all elements
print(np.std(a)) # Standard deviation
print(np.var(a)) # Variance
print(np.min(a)) # Minimum value
print(np.max(a)) # Maximum value
print(np.argmin(a)) # Index of the minimum value
print(np.argmax(a)) # Index of the maximum value


120
8.0
4.320493798938574
18.666666666666668
1
15
0
14


**What's mean?**
Mean is the same as average. You can find the average of a set of numbers by adding them up and dividing them by how many there are.

**What's standard deviation?**
Standard deviation is a measure of how spread out numbers are.

**What's variance?**
The variance is the averaged squared differences of the mean.

To work it out, you:

1. Work out the mean
2. For each number, subtract the mean and square the result
3. Find the average of the squared differences

In [63]:
# Demo of variance
high_var_array = np.array([1, 100, 200, 300, 4000, 5000])
low_var_array = np.array([2, 4, 6, 8, 10])

np.var(high_var_array), np.var(low_var_array)

(4296133.472222221, 8.0)

In [65]:
# The standard deviation is the square root of the variance
print(np.sqrt(np.var(high_var_array)))
print(np.std(high_var_array))

2072.711623024829
2072.711623024829


### Reshaping

In [70]:
a = np.array([
    [1, 2, 3], 
    [4, 5, 6]
])
b = np.array([
    [7, 8], 
    [9, 10], 
    [11, 12]
])

print(a.shape)
print(b.shape)

(2, 3)
(3, 2)


In [71]:
# Reshape a to the shape of b
# If we calculate a + b, it will throw an error because the shapes are different
# We can reshape a to the shape of b

a_reshaped = a.reshape(3, 2)
print(a_reshaped)
print(a_reshaped.shape)

# Now we can add a and b
print(a_reshaped + b)

[[1 2]
 [3 4]
 [5 6]]
(3, 2)
[[ 8 10]
 [12 14]
 [16 18]]


### Transposing

In [76]:
# Transposing is a common operation in linear algebra where the rows and columns of a matrix are swapped.
a = np.array([
    [1, 2, 3], 
    [4, 5, 6]
])

print(a)
print(a.T)

[[1 2 3]
 [4 5 6]]
[[1 4]
 [2 5]
 [3 6]]


In [78]:
# Transposing a 1D array does nothing
b = np.array([1, 2, 3])

print(b)
print(b.T) # Transposing a 1D array does nothing

[1 2 3]
[1 2 3]


In [79]:
# Transposing a 3D array
c = np.array([
    [[1, 2], [3, 4]], 
    [[5, 6], [7, 8]]
])

print(c)
print(c.T)

[[[1 2]
  [3 4]]

 [[5 6]
  [7 8]]]
[[[1 5]
  [3 7]]

 [[2 6]
  [4 8]]]


### Comparison

In [80]:
a = np.array([1, 5, 7, 4])
b = np.array([2, 3, 9, 4])

print(a > b)
print(a < b)
print(a >= b)
print(a <= b)
print(a == b)
print(a != b)

[False  True False False]
[ True False  True False]
[False  True False  True]
[ True False  True  True]
[False False False  True]
[ True  True  True False]


In [82]:
a = np.array([[1, 2, 3], [4, 5, 6]])
b = np.array([[1, 2, 3], [4, 5, 6]])

print(a == b)
print(a != b)

[[ True  True  True]
 [ True  True  True]]
[[False False False]
 [False False False]]


### Sorting

In [83]:
a = np.array([3, 2, 0, 1])
print(np.sort(a))

[0 1 2 3]


In [87]:
a = np.array([[3, 2, 0], [1, 5, 4]])
print(np.sort(a))

[[0 2 3]
 [1 4 5]]


In [88]:
a = np.array([3, 2, 0, 1])
print(np.argsort(a)) # Returns the indices that would sort an array

[2 3 1 0]


In [89]:
a = np.array([3, 2, 0, 1])
print(np.argmax(a)) # Returns the indices of the maximum values along an axis

0


In [90]:
a = np.array([3, 2, 0, 1])
print(np.argmin(a)) # Returns the indices of the minimum values along an axis

2


### Dot product

So what's the dot product?

The dot product is where we multiply matching members, then sum up:
    
    ```python
    [1, 2, 3] • [4, 5, 6] = 1*4 + 2*5 + 3*6
    ```
    
You can perform matrix multiplication using `np.dot()`.

In [85]:
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

np.dot(a, b)

32

In [86]:
a = np.array([[1, 2], [3, 4]])
b = np.array([[5, 6], [7, 8]])

np.dot(a, b)

array([[19, 22],
       [43, 50]])

## Conclusion

NumPy is a powerful library for numerical computing in Python. It provides a high-performance multidimensional array object and tools for working with these arrays. NumPy is the foundation for many other Python libraries, including pandas, scikit-learn, and TensorFlow. By learning NumPy, you will be well-equipped to work with data in Python and to build machine learning models.

In this notebook, you learned how to create arrays, view arrays, manipulate arrays, and compare arrays using NumPy. You also learned about the anatomy of an array, data types, attributes, and common operations. You can use this knowledge to work with data in Python and to build machine learning models.

In [91]:
# Path: intro-numpy.ipynb