<a href="https://colab.research.google.com/github/mkekulandara/Datascience_workshops/blob/main/Workshop_3_Introduction_to_NumPy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Introduction to NumPy**

Objective: In this tutorial, we will delve deeper into the NumPy library, understanding their powerful data manipulation and analysis functionalities.

Duration: Approximately 1 hour

## **Introduction to NumPy**

**a. Introducing NumPy for Numerical Computing:**

NumPy (Numerical Python) is a powerful library in Python that provides support for large, multi-dimensional arrays and matrices, along with a wide range of mathematical functions to operate on these arrays efficiently. NumPy is widely used in scientific computing, data analysis, machine learning, and other numerical tasks due to its speed and ease of use.

**b. The Concept of Arrays and Their Advantages over Lists:**

Arrays in NumPy are similar to Python lists, but they offer several advantages for numerical computing:

1. **Efficient Element-Wise Operations:** NumPy allows efficient element-wise operations on arrays, making it faster and more convenient to perform mathematical operations on large datasets compared to using Python lists.

2. **Memory Efficiency:** NumPy arrays are more memory-efficient than Python lists because they store elements of the same data type contiguously in memory. This results in reduced memory overhead and better performance.

3. **Broadcasting:** NumPy supports broadcasting, which enables element-wise operations between arrays of different shapes, automatically handling size mismatches.

4. **Support for Mathematical Functions:** NumPy provides a wide range of mathematical functions, such as trigonometric, logarithmic, statistical, and linear algebra operations, making it a versatile library for numerical computations.

5. **Array Slicing and Indexing:** NumPy offers powerful array slicing and indexing capabilities, allowing easy extraction and manipulation of data.

**c. Creating Arrays and Performing Basic Array Operations:**

To use NumPy, you need to install it first. You can install NumPy using `pip`:

```
pip install numpy
```

Once installed, you can import NumPy in your Python script as follows:

```python
import numpy as np
```

Now, let's see how to create arrays and perform basic array operations:

```python
import numpy as np

# Create a 1D array from a Python list
arr1d = np.array([1, 2, 3, 4, 5])

# Create a 2D array (matrix) from a list of lists
arr2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

# Get the dimensions of the arrays
print("Shape of arr1d:", arr1d.shape)
print("Shape of arr2d:", arr2d.shape)

# Access elements in the array using indexing
print("Element at index 2 in arr1d:", arr1d[2])
print("Element at row 1, column 2 in arr2d:", arr2d[1, 2])

# Perform basic operations on arrays
arr_sum = arr1d + arr1d
arr_product = arr1d * 2
arr_squared = arr1d ** 2

# Broadcasting: Operations between arrays of different shapes
arr_broadcast = arr1d + 10

# Statistical operations
arr_mean = np.mean(arr1d)
arr_max = np.max(arr1d)

print("Sum of arr1d:", arr_sum)
print("Double of arr1d:", arr_product)
print("Squared of arr1d:", arr_squared)
print("Broadcasted arr1d:", arr_broadcast)
print("Mean of arr1d:", arr_mean)
print("Max value of arr1d:", arr_max)
```

In this example, we create 1D and 2D arrays using NumPy's `array()` function. We then perform basic array operations, such as addition, multiplication, and broadcasting, as well as statistical operations, such as mean and maximum value.

NumPy's capabilities extend beyond these basic operations, providing a wealth of functionalities for numerical computing. You can explore more advanced topics like array broadcasting, array slicing, and matrix operations to unlock the full potential of NumPy for scientific and numerical computing tasks.

## **NumPy Array Operations**

**a. Element-Wise Operations on Arrays:**

NumPy allows for efficient element-wise operations on arrays, where the corresponding elements of two or more arrays are combined using mathematical operators. These operations are performed on each element of the arrays independently, and the resulting array has the same shape as the original arrays.

```python
import numpy as np

# Create two 1D arrays
arr1 = np.array([1, 2, 3, 4, 5])
arr2 = np.array([10, 20, 30, 40, 50])

# Element-wise addition
result_add = arr1 + arr2

# Element-wise subtraction
result_subtract = arr1 - arr2

# Element-wise multiplication
result_multiply = arr1 * arr2

# Element-wise division
result_divide = arr1 / arr2

print("Array 1:", arr1)
print("Array 2:", arr2)
print("Addition:", result_add)
print("Subtraction:", result_subtract)
print("Multiplication:", result_multiply)
print("Division:", result_divide)
```

**b. Broadcasting in Array Operations:**

Broadcasting is a powerful feature in NumPy that allows element-wise operations between arrays of different shapes. When performing operations between arrays with different shapes, NumPy automatically broadcasts the smaller array to match the shape of the larger array, making the operation possible.

```python
import numpy as np

# Create a 1D array
arr = np.array([1, 2, 3, 4, 5])

# Broadcasting: Add a scalar to each element of the array
result_broadcast = arr + 10

print("Original Array:", arr)
print("Broadcasted Array:", result_broadcast)
```

In this example, the scalar value 10 is broadcasted to each element of the 1D array `arr`.

**c. Array Slicing and Indexing Techniques:**

NumPy allows for powerful array slicing and indexing, making it easy to extract and manipulate subsets of arrays.

```python
import numpy as np

# Create a 1D array
arr = np.array([1, 2, 3, 4, 5])

# Slicing: Extract elements from index 1 to 3 (exclusive)
sliced_array = arr[1:4]

print("Original Array:", arr)
print("Sliced Array:", sliced_array)
```

In this example, array slicing is used to extract elements from index 1 to 3 (exclusive), resulting in the array `[2, 3, 4]`.

**d. Array Operations like Broadcasting and Aggregation:**

NumPy provides a variety of array operations beyond element-wise operations, including broadcasting and aggregation functions.

```python
import numpy as np

# Create a 2D array
arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

# Broadcasting: Add a scalar to the entire array
broadcasted_array = arr + 10

# Aggregation: Compute the sum and mean of all elements
sum_of_elements = np.sum(arr)
mean_of_elements = np.mean(arr)

print("Original Array:")
print(arr)
print("Broadcasted Array:")
print(broadcasted_array)
print("Sum of Elements:", sum_of_elements)
print("Mean of Elements:", mean_of_elements)
```

In this example, broadcasting is used to add the scalar value 10 to the entire 2D array `arr`. Additionally, aggregation functions like `np.sum()` and `np.mean()` are used to calculate the sum and mean of all elements in the array.

NumPy's array operations, including broadcasting and aggregation, make it a powerful library for performing numerical computations efficiently and conveniently. Students can explore more advanced operations and functions available in NumPy for scientific computing and data analysis tasks.

## Aggregations: Min, Max, and Everything In Between

Often when faced with a large amount of data, a first step is to compute summary statistics for the data in question.
Perhaps the most common summary statistics are the mean and standard deviation, which allow you to summarize the "typical" values in a dataset, but other aggregates are useful as well (the sum, product, median, minimum and maximum, quantiles, etc.).

NumPy has fast built-in aggregation functions for working on arrays; we'll discuss and demonstrate some of them here.

### Summing the Values in an Array

As a quick example, consider computing the sum of all values in an array.
Python itself can do this using the built-in ``sum`` function:

In [None]:
import numpy as np
L = np.random.random(100)
sum(L)
#print(L)

48.43306925758312

The syntax is quite similar to that of NumPy's ``sum`` function, and the result is the same in the simplest case:

In [None]:
np.sum(L)

48.43306925758311

In [None]:
%timeit sum(L)
%timeit np.sum(L)

12.9 µs ± 3.95 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)
3.85 µs ± 126 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


### Minimum and Maximum

Similarly, Python has built-in ``min`` and ``max`` functions, used to find the minimum value and maximum value of any given array:

In [None]:
big_array = np.random.rand(1000000)
min(big_array), max(big_array)

(8.924255656683755e-08, 0.9999992303732037)

NumPy's corresponding functions have similar syntax, and again operate much more quickly:

In [None]:
np.min(big_array), np.max(big_array)

(8.924255656683755e-08, 0.9999992303732037)

Whenever possible, make sure that you are using the NumPy version of these aggregates when operating on NumPy arrays!

In [None]:
%timeit min(big_array)
%timeit np.min(big_array)

71.4 ms ± 1.55 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
384 µs ± 15.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


### Multi dimensional aggregates

One common type of aggregation operation is an aggregate along a row or column.
Say you have some data stored in a two-dimensional array:

In [None]:
M = np.random.random((3, 4))
print(M)

[[0.70068266 0.57310636 0.49072334 0.50066492]
 [0.5549067  0.43787666 0.82076915 0.22311522]
 [0.33520357 0.69326758 0.39145078 0.65617501]]


By default, each NumPy aggregation function will return the aggregate over the entire array:

In [None]:
np.sum(M)

6.377941951065398

Aggregation functions take an additional argument specifying the *axis* along which the aggregate is computed. For example, we can find the minimum value within each column by specifying ``axis=0``:

In [None]:
M.min(axis=0)

array([0.33520357, 0.43787666, 0.39145078, 0.22311522])

The function returns four values, corresponding to the four columns of numbers.

Similarly, we can find the maximum value within each row:

In [None]:
M.max(axis=1)

array([0.70068266, 0.82076915, 0.69326758])

### Other aggregation functions

The following table provides a list of useful aggregation functions available in NumPy:

|Function Name      |   NaN-safe Version  | Description                                   |
|-------------------|---------------------|-----------------------------------------------|
| ``np.sum``        | ``np.nansum``       | Compute sum of elements                       |
| ``np.prod``       | ``np.nanprod``      | Compute product of elements                   |
| ``np.mean``       | ``np.nanmean``      | Compute mean of elements                      |
| ``np.std``        | ``np.nanstd``       | Compute standard deviation                    |
| ``np.var``        | ``np.nanvar``       | Compute variance                              |
| ``np.min``        | ``np.nanmin``       | Find minimum value                            |
| ``np.max``        | ``np.nanmax``       | Find maximum value                            |
| ``np.argmin``     | ``np.nanargmin``    | Find index of minimum value                   |
| ``np.argmax``     | ``np.nanargmax``    | Find index of maximum value                   |
| ``np.median``     | ``np.nanmedian``    | Compute median of elements                    |
| ``np.percentile`` | ``np.nanpercentile``| Compute rank-based statistics of elements     |
| ``np.any``        | N/A                 | Evaluate whether any elements are true        |
| ``np.all``        | N/A                 | Evaluate whether all elements are true        |


## **Advanced NumPy Operations**

**a. More Complex Array Manipulations using Boolean Indexing and Fancy Indexing:**

NumPy offers advanced array manipulation techniques, including boolean indexing and fancy indexing, to efficiently extract, modify, and manipulate array elements based on specific conditions or predefined index arrays.

**Boolean Indexing:**

Boolean indexing allows you to use boolean masks to select elements from an array that satisfy certain conditions.

```python
import numpy as np

# Create a 1D array
arr = np.array([1, 2, 3, 4, 5])

# Boolean mask to select elements greater than 2
mask = arr > 2

# Select elements using the mask
selected_elements = arr[mask]

print("Original Array:", arr)
print("Boolean Mask:", mask)
print("Selected Elements:", selected_elements)
```

In this example, we create a boolean mask `mask` to select elements from the array `arr` that are greater than 2. The result is `[3, 4, 5]`.

**Fancy Indexing:**

Fancy indexing allows you to use integer arrays to access specific elements from an array.

```python
import numpy as np

# Create a 1D array
arr = np.array([10, 20, 30, 40, 50])

# Integer array with indices of elements to select
indices = np.array([1, 3])

# Select elements using fancy indexing
selected_elements = arr[indices]

print("Original Array:", arr)
print("Indices:", indices)
print("Selected Elements:", selected_elements)
```

In this example, we create an integer array `indices` with the indices of elements we want to select from the array `arr`. The result is `[20, 40]`.

**b. Array Reshaping and Concatenation:**

**Array Reshaping:**

NumPy allows you to reshape arrays, converting them into different shapes without changing the data.

```python
import numpy as np

# Create a 1D array
arr = np.array([1, 2, 3, 4, 5, 6])

# Reshape the array to a 2D array with 2 rows and 3 columns
reshaped_array = arr.reshape(2, 3)

print("Original Array:")
print(arr)
print("Reshaped Array:")
print(reshaped_array)
```

In this example, we reshape the 1D array `arr` into a 2D array with 2 rows and 3 columns. The result is:

```
Original Array:
[1 2 3 4 5 6]
Reshaped Array:
[[1 2 3]
 [4 5 6]]
```

**Array Concatenation:**

NumPy allows you to concatenate multiple arrays along a specified axis.

```python
import numpy as np

# Create two 1D arrays
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])

# Concatenate the arrays horizontally (along axis=0)
concatenated_array = np.concatenate((arr1, arr2))

print("Array 1:", arr1)
print("Array 2:", arr2)
print("Concatenated Array:", concatenated_array)
```

In this example, we concatenate `arr1` and `arr2` horizontally along `axis=0`, resulting in the array `[1, 2, 3, 4, 5, 6]`.

Students can use these advanced techniques to efficiently manipulate arrays, perform data filtering, and combine arrays for various data analysis and scientific computing tasks. Understanding boolean indexing, fancy indexing, array reshaping, and concatenation expands the possibilities of working with NumPy arrays effectively.

## Sorting Arrays

### Fast Sorting in NumPy: ``np.sort`` and ``np.argsort``

Although Python has built-in ``sort`` and ``sorted`` functions to work with lists, we won't discuss them here because NumPy's ``np.sort`` function turns out to be much more efficient and useful for our purposes.
By default ``np.sort`` uses an $\mathcal{O}[N\log N]$, *quicksort* algorithm, though *mergesort* and *heapsort* are also available. For most applications, the default quicksort is more than sufficient.

To return a sorted version of the array without modifying the input, you can use ``np.sort``:

In [None]:
x = np.array([2, 1, 4, 3, 5])
np.sort(x)

array([1, 2, 3, 4, 5])

If you prefer to sort the array in-place, you can instead use the ``sort`` method of arrays:

In [None]:
x.sort()
print(x)

[1 2 3 4 5]


A related function is ``argsort``, which instead returns the *indices* of the sorted elements:

In [None]:
x = np.array([2, 1, 4, 3, 5])
i = np.argsort(x)
print(i)

[1 0 3 2 4]


The **first element** of this result gives the **index of the smallest element**, the **second value** gives the **index of the second smallest**, and so on.


In [None]:
x[i]

### Sorting along rows or columns

A useful feature of NumPy's sorting algorithms is the ability to sort along specific rows or columns of a multidimensional array using the ``axis`` argument. For example:

In [None]:
rand = np.random.RandomState(42)
X = rand.randint(0, 10, (4, 6))
print(X)

[[6 3 7 4 6 9]
 [2 6 7 4 3 7]
 [7 2 5 4 1 7]
 [5 1 4 0 9 5]]


In [None]:
# sort each column of X
np.sort(X, axis=0)

array([[2, 1, 4, 0, 1, 5],
       [5, 2, 5, 4, 3, 7],
       [6, 3, 7, 4, 6, 7],
       [7, 6, 7, 4, 9, 9]])

In [None]:
# sort each row of X
np.sort(X, axis=1)

array([[3, 4, 6, 6, 7, 9],
       [2, 3, 4, 6, 7, 7],
       [1, 2, 4, 5, 7, 7],
       [0, 1, 4, 5, 5, 9]])

Keep in mind that this treats each row or column as an independent array, and any relationships between the row or column values will be lost!

### Partial Sorts: Partitioning

Sometimes we're not interested in sorting the entire array, but simply want to find the *k* smallest values in the array. NumPy provides this in the ``np.partition`` function. ``np.partition`` takes an array and a number *K*; the result is a new array with the smallest *K* values to the left of the partition, and the remaining values to the right, in arbitrary order:

In [None]:
x = np.array([71, 25, 33, 18, 63, 51, 42])
np.partition(x, 3)

Note that the first three values in the resulting array are the three smallest in the array, and the remaining array positions contain the remaining values.
Within the two partitions, the elements have arbitrary order.

Similarly to sorting, we can partition along an arbitrary axis of a multidimensional array:

In [None]:
X

In [None]:
np.partition(X, 2, axis=1)

The result is an array where the first two slots in each row contain the smallest values from that row, with the remaining values filling the remaining slots.




## **Practice Exercises**

**Exercise 1: Boolean Indexing**

Given an array of temperatures (in Celsius), use boolean indexing to find and print temperatures above 30 degrees.

**Exercise 2: Fancy Indexing**

Given an array of integers, use fancy indexing to extract all elements at even indices and store them in a new array.

**Exercise 3: Array Reshaping**

Create a 1D array of numbers from 1 to 20. Reshape the array into a 2D array with 4 rows and 5 columns.

**Exercise 4: Array Concatenation**

Create two 1D arrays of equal length. Concatenate these arrays horizontally and vertically to create two new arrays.

**Exercise 5: Element-Wise Operations**

Create two 2D arrays of the same shape. Perform element-wise addition, subtraction, multiplication, and division on these arrays and store the results in separate arrays.

**Exercise 6: Array Broadcasting**

Create a 1D array of numbers from 1 to 5. Add a constant value of 10 to the array using broadcasting.

**Exercise 7: Array Aggregation**

Create a 2D array of random integers. Calculate and print the sum, mean, maximum, and minimum of all elements in the array.

**Exercise 8: Array Sorting**

Create a 1D array of random integers. Sort the array in ascending and descending order.

**Exercise 9: Transposing Arrays**

Create a 2D array and then transpose it (swap rows and columns) to obtain a new array.

**Exercise 10: Universal Functions**

Create a 1D array of numbers. Use NumPy's universal functions (`np.sin`, `np.cos`, `np.exp`, etc.) to calculate the corresponding trigonometric and exponential values for each element.

**Exercise 11: Random Number Generation**

Use NumPy's random number generation functions to create arrays of random integers and random floating-point numbers.

**Exercise 12: Linear Algebra Operations**

Create two 2D arrays and perform matrix multiplication and matrix inversion on them.