## Numpy through Problem solving
#### Author - Harshwardhan Fartale

In [None]:
import warnings
warnings.filterwarnings('ignore', category=RuntimeWarning)
import numpy as np

# Creating a large list and numpy array
l = list(range(100000))
a = np.arange(100000)

# Timing the numpy operation
%time np.sum(a ** 3)

# Timing the list comprehension operation
%time sum([x ** 3 for x in l])


### What are Arrays and Matrices in NumPy?

> In NumPy, there is only one data structure: NumPy arrays. A NumPy array can be one-dimensional, two-dimensional, or 1000-dimensional. It’s one concept to rule them all

The NumPy array is the core object of the whole NumPy library. You have to know it by heart before you can go on
and understand the operations provided by the NumPy library.

![image.png](attachment:image.png)

In [None]:
import numpy as np

a=np.array([1,2,3],dtype=np.int16)

print(a) # [1, 2, 3]
print(a.dtype)

In [None]:
b=np.array([1,2,3],dtype=np.float64)
print(b) # [1,2,3]
print(b.dtype) #float64
# Observation - even if we pass a list of integers as a function argument, NumPy will convert the type to np.float64



## Axes & the Shape of an Array
The second restriction of NumPy arrays is the following. 
> NumPy does not simply store a bunch of arbitrarily typed data values—for that purpose you can use lists. Instead, NumPy imposes a strict homogeneous data type.

**Additionally, NumPy, in contrast to lists, organizes data
in fix-sized axes. Axes represent the different dimensions
of NumPy arrays**

You may or may not represent your high dimensional data, say n-dimensional, with n axes.
For example, **it is possible to represent a 3-dimensional
using a one-dimensional or single-axis array, e.g., [1, 2, 3]**

In [None]:
array_1d = np.array([1, 2, 3])
print(array_1d.shape)

In [None]:
array_2d = np.array([[1, 2, 3]])
print(array_2d.shape)

In [None]:
array_3d = np.array([[[1, 2, 3]]])
print(array_3d.shape)

### Pro Tip
> If you want to know the number of axes of a NumPy array, count the number of opening brackets “[” until reaching the first numerical value

In [None]:
a = np.array([1, 2, 3])
print(a.ndim)
# 1
b = np.array([[1, 2], [2, 3], [3, 4]])
print(b.ndim)
# 2
c = np.array([[[1, 2], [2, 3], [3, 4]],
[[1, 2], [2, 3], [3, 4]]])
print(c.ndim)
# 3

But there is another important piece of information you
will often need to know about a NumPy array: the shape.
The shape returns not only the number of axes but also
the number of elements in each axis, that is the size

In [None]:
a = np.array([1, 2, 3])
print(a.shape)
# (3, )
b = np.array([[1, 2], [2, 3], [3, 4]])
print(b.shape)
# (3, 2)
c = np.array([[[1, 2], [2, 3], [3, 4]],[[1, 2], [2, 3], [3, 4]]])
print(c.shape)
# (2, 3, 2)


The axes are ordered from the outermost to the innermost nesting level. The number of axes is stored in the ndim property. The shape property represents the number of elements in each axis.


In [None]:
#Quiz
# 2D numpy array
a = np.array([[1, 2, 3], [4, 5, 6]])
print(a.shape)
# 3D numpy array
b = np.array([[[1, 2], [3, 4], [5, 6]],
[[1, 2], [3, 4], [5, 6]]])
print(b.shape)

##### But what if you want to create huge arrays with thousands of values?

In [None]:
a = np.zeros((10, 10, 10, 10, 10))
print(a.shape)
# (10, 10, 10, 10, 10)
b = np.zeros((2,3))
print(b)
# [[0. 0. 0.]
# [0. 0. 0.]]
c = np.ones((3, 2, 2))
print(c)
# [[[1. 1.]
# [1. 1.]]
#
# [[1. 1.]
# [1. 1.]]
#
# [[1. 1.]
# [1. 1.]]]
print(c.dtype)
# float64


Note that the data types are implicitly converted to floats.Floating point numbers are the default NumPy array
data type (on my computer: the np.float64 type). But what if you want to create a NumPy array of integer values? Just specify the data type of the NumPy array as a second argument to the ones() or zeros() functions

In [None]:
a = np.zeros((2,3), dtype=np.int16)
print(a)
# [[0 0 0]
# [0 0 0]]
print(a.dtype)
# int16

### The Numpy arange function
The NumPy function np.arange(start[, stop[, step]) creates a new NumPy array with evenly spaced numbers between start (inclusive) and stop (exclusive) with the given step size. For example, np.arange(1, 6, 2) creates the NumPy array [1, 3, 5].

In [None]:
a = np.arange(2, 10)
print(a)
# [2 3 4 5 6 7 8 9]
b = np.arange(2, 10, 2)
print(b)
# [2 4 6 8]
c = np.arange(2, 10, 2, dtype=np.float64)
print(c)
# [2. 4. 6. 8.]

> If you want to create an evenly spaced sequence of float values in a specific in interval, don’t use the NumPy arange function. The documentation discourages this because it’s improper handling of boundaries. Instead, the official NumPy tutorial
recommends using the NumPy linspace() function instead.

The NumPy linspace function works like the NumPy arange function. But there is one important difference:instead of defining the step size, you define the number of elements in the interval between the start and stop
values.

In [None]:
a = np.linspace(0.5, 9.5, 10)
print(a)
# [0.5 1.5 2.5 3.5 4.5 5.5 6.5 7.5 8.5 9.5]
b = np.linspace(0.5, 9.5, 5)
print(b)
# [0.5 2.75 5. 7.25 9.5 ]

### How Does Indexing and Slicing Work in NumPy?
In NumPy, you have to differentiate between one-dimensional arrays and multi-dimensional arrays because slicing works differently for both.

In [None]:
## Slicing in Python Examples
a = np.arange(0, 10)
print(a)
# [0 1 2 3 4 5 6 7 8 9]
print(a[:])
# [0 1 2 3 4 5 6 7 8 9]
print(a[1:])
# [1 2 3 4 5 6 7 8 9]

print(a[1:3])
# [1 2]
print(a[1:-1])
# [1 2 3 4 5 6 7 8]
print(a[::2])
# [0 2 4 6 8]
print(a[1::2])
# [1 3 5 7 9]
print(a[::-1])
# [9 8 7 6 5 4 3 2 1 0]
print(a[:1:-2])
# [9 7 5 3]
print(a[-1:1:-2])
# [9 7 5 3]



> ### Question - why a[-1:1:-2] is exactly the same as a[:1:-2] ?

<details>
<summary>Click for Answer</summary>
Python’s slicing thoroughly, you may remember |that the default start index for negative step sizes is -1
</details>

### IMP Point
But in contrast to regular slicing, NumPy is a bit more powerful. See the next example of how NumPy handles the assignment of a value to an extended slice

In [None]:
# l = list(range(10))
# l[::2] = 999  #This operation means that we are assigning 999 to every even element
# Throws error --> assign iterable to extended slice
a = np.arange(10)
a[::2] = 999
print(a)


### Slicing in multi-dimensional
For multi-dimensional slices, you can use one-dimensional slicing for each axis separately. You define the slices for each axis, separated by a comma. 

In [None]:
a = np.arange(16)
a = a.reshape((4,4))
a

In [None]:
print(a[:, 1])
#  Second column:

In [None]:
print(a[1, :])
# Second row


In [None]:
print(a[1, ::2])
#  Second row, every other element


In [None]:
print(a[:, :-1])
#  All columns except last one

In [None]:
print(a[:-1])
# Same as a[:-1, :]

As you can see in the above examples, slicing multidimensional NumPy arrays is easy - if you know NumPy arrays and how to slice one-dimensional arrays. The most important information to remember is that you can slice each axis separately. If you don’t specify the slice notation for a specific axis, the interpreter applies the default slicing (i.e., the colon :).

Instead of defining the slice to carve out a sequence of elements from an axis, you can select an arbitrary combination of elements from the NumPy array. How? Simply specify a boolean array with exactly the same shape. If the boolean value at position (i,j) is True, the element will be selected, otherwise not. As simple as that. Here is an example.

In [None]:
a = np.arange(9)
a = a.reshape((3,3))
print(a)
# [[0 1 2]
# [3 4 5]
# [6 7 8]]
b = np.array(
[[ True, False, False],
[ False, True, False],
[ False, False, True]])
print(a[b])
# Flattened array with selected values from a
# [0 4 8]

### Learning Numpy through problem solving

#### Problem 1: Descriptive Statistics Calculator
Question - Write a Python function to calculate various descriptive statistics metrics for a given dataset. The function should take a list or NumPy array of numerical values and return a dictionary containing mean, median, mode, variance, standard deviation, percentiles (25th, 50th, 75th), and interquartile range (IQR)

Example:

Input:
```python
[10, 20, 30, 40, 50]
```
Output:
```python
{
    'mean': 30.0,
    'median': 30.0,
    'mode': 10,
    'variance': 200.0,
    'standard_deviation': 14.142135623730951,
    '25th_percentile': 20.0,
    '50th_percentile': 30.0,
    '75th_percentile': 40.0,
    'interquartile_range': 20.0
}
```

<details>
<summary>Click for Answer</summary>

```python

import numpy as np 
def descriptive_statistics(data):
    mean = np.mean(data)
    median = np.median(data)
    vals, counts = np.unique(data, return_counts=True)
    mode = vals[counts == np.max(counts)][0] 
    variance = np.var(data)
    std_dev = np.std(data)
    percentiles = [np.percentile(data, 25), np.percentile(data, 50), np.percentile(data, 75)]
    iqr = percentiles[2] - percentiles[0]
    
    stats_dict = {
        "mean": mean,
        "median": median,
        "mode": mode,
        "variance": np.round(variance, 4),
        "standard_deviation": np.round(std_dev, 4),
        "25th_percentile": percentiles[0],
        "50th_percentile": percentiles[1],
        "75th_percentile": percentiles[2],
        "interquartile_range": iqr
    }
    return stats_dict
```
</details>

In [19]:
#Starter code
import numpy as np 
def descriptive_statistics(data):
	# Your code here
	stats_dict = {
        "mean": mean,
        "median": median,
        "mode": mode,
        "variance": np.round(variance,4),
        "standard_deviation": np.round(std_dev,4),
        "25th_percentile": percentiles[0],
        "50th_percentile": percentiles[1],
        "75th_percentile": percentiles[2],
        "interquartile_range": iqr
    }
	return {}

In [None]:
def test_descriptive_statistics():
    # Test case 1: Basic test with the example data
    test_data = [10, 20, 30, 40, 50]
    expected_output = {
        'mean': 30.0,
        'median': 30.0,
        'mode': 10,
        'variance': 200.0,
        'standard_deviation': 14.1421,
        '25th_percentile': 20.0,
        '50th_percentile': 30.0,
        '75th_percentile': 40.0,
        'interquartile_range': 20.0
    }
    
    result = descriptive_statistics(test_data)
    
    # Check each statistic
    assert np.isclose(result['mean'], expected_output['mean']), "Mean calculation failed"
    assert np.isclose(result['median'], expected_output['median']), "Median calculation failed"
    assert np.isclose(result['mode'], expected_output['mode']), "Mode calculation failed"
    assert np.isclose(result['variance'], expected_output['variance']), "Variance calculation failed"
    assert np.isclose(result['standard_deviation'], expected_output['standard_deviation']), "Standard deviation calculation failed"
    assert np.isclose(result['25th_percentile'], expected_output['25th_percentile']), "25th percentile calculation failed"
    assert np.isclose(result['50th_percentile'], expected_output['50th_percentile']), "50th percentile calculation failed"
    assert np.isclose(result['75th_percentile'], expected_output['75th_percentile']), "75th percentile calculation failed"
    assert np.isclose(result['interquartile_range'], expected_output['interquartile_range']), "IQR calculation failed"

if __name__ == "__main__":
    try:
        test_descriptive_statistics()
        print("All test cases passed!")
    except AssertionError as e:
        print(f"Test failed: {str(e)}")

### Linear Algebra Operations using numpy


In [None]:
A = np.random.rand(4,4)

print("Determinant of A " , np.linalg.det(A))

E,W = np.linalg.eig(A)   ##Eigen Value and Eigen Vector (normalised)


## SVD
U,S,V = np.linalg.svd(A)


## Norm of vector 
a = np.array([1,2,3,4,5])

print("Norm of vector A : " , np.linalg.norm(a,2))

## Norm of Matrix
print("Norm of Matrix A : " , np.linalg.norm(A,2))

# Transpose of a matrix
a = np.array([[1,2,3],[4,5,6]])
b=np.transpose(a)
#This also works
# b=a.T
print(b)

### Calculate Covariance Matrix

# Covariance Matrix Calculation

## 1. Input Data Structure
- Input: A matrix $X$ where each row represents a feature vector
- Shape: $p$ features × $n$ samples
- Example: $X = \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{bmatrix}$

## 2. Mean Calculation
- For each feature $i$, calculate mean $\mu_i$:
- $\mu_i = \frac{1}{n}\sum_{j=1}^n x_{ij}$
- Example: $\mu_1 = \frac{1+2+3}{3} = 2$ and $\mu_2 = \frac{4+5+6}{3} = 5$

## 3. Data Centering
- Subtract mean from each feature:
- $X_{centered} = X - \mu$
- Example: $X_{centered} = \begin{bmatrix} -1 & 0 & 1 \\ -1 & 0 & 1 \end{bmatrix}$

## 4. Covariance Matrix Formula
The covariance between features $i$ and $j$ is:

$cov(X_i, X_j) = \frac{1}{n-1}\sum_{k=1}^n (x_{ik} - \mu_i)(x_{jk} - \mu_j)$

In matrix form:
$Cov(X) = \frac{1}{n-1}X_{centered}X_{centered}^T$

Where:
- $n$ is number of samples
- $X_{centered}$ is the centered data matrix
- $X_{centered}^T$ is its transpose

## 5. Properties of Covariance Matrix
- Symmetric: $cov(X_i, X_j) = cov(X_j, X_i)$
- Diagonal elements are variances: $cov(X_i, X_i)$ is variance of feature $i$
- Size: $p × p$ matrix where $p$ is number of features

## 6. Example Result
For input $[[1, 2, 3], [4, 5, 6]]$:
$Cov(X) = \begin{bmatrix} 1.0 & 1.0 \\ 1.0 & 1.0 \end{bmatrix}$

This matrix shows:
- Variance of first feature: 1.0 (top-left)
- Variance of second feature: 1.0 (bottom-right)
- Covariance between features: 1.0 (off-diagonal elements)

#### Without using the Numpy method

<details>
<summary>Click for Answer</summary>

```python

import numpy as np
def calculate_covariance_matrix(vectors: list[list[float]]) -> list[list[float]]:
   
    X = np.array(vectors)
    
    # Calculate means for each feature
    means = np.mean(X, axis=1)
    
    # Center the data
    X_centered = X - means.reshape(-1, 1)
    n = X.shape[1]
    cov_matrix = (X_centered @ X_centered.T) / (n - 1)
    return cov_matrix.tolist()
```
</details>

In [36]:
import numpy as np
def calculate_covariance_matrix(vectors: list[list[float]]) -> list[list[float]]:
   
    X = np.array(vectors)
    
    # Calculate means for each feature
    means = np.mean(X, axis=1)
    
    # Center the data
    X_centered = X - means.reshape(-1, 1)
    n = X.shape[1]
    cov_matrix = (X_centered @ X_centered.T) / (n - 1)
    return cov_matrix.tolist()

#### Using the Numpy cov method

In [41]:
def calculate_covariance_matrix(vectors: list[list[float]]) -> list[list[float]]:
    X = np.array(vectors, dtype=float)
    
    # Calculate covariance matrix using numpy's cov function
    cov_matrix = np.cov(X)
    return cov_matrix.tolist()

In [None]:

def test_covariance_matrix():
    # Test case 1: 2x3 matrix
    test_data_1 = [[1, 2, 3], [4, 5, 6]]
    expected_output_1 = [[1.0, 1.0], [1.0, 1.0]]
    
    result_1 = calculate_covariance_matrix(test_data_1)
    
    # Check each element of the matrix
    for i in range(len(expected_output_1)):
        for j in range(len(expected_output_1[0])):
            assert np.isclose(result_1[i][j], expected_output_1[i][j]), \
                f"Test case 1 failed at position [{i}][{j}]: expected {expected_output_1[i][j]}, got {result_1[i][j]}"
    
    # Test case 2: 3x3 matrix
    test_data_2 = [[1, 5, 6], [2, 3, 4], [7, 8, 9]]
    expected_output_2 = [[7.0, 2.5, 2.5], [2.5, 1.0, 1.0], [2.5, 1.0, 1.0]]
    
    result_2 = calculate_covariance_matrix(test_data_2)
    
    # Check each element of the matrix
    for i in range(len(expected_output_2)):
        for j in range(len(expected_output_2[0])):
            assert np.isclose(result_2[i][j], expected_output_2[i][j]), \
                f"Test case 2 failed at position [{i}][{j}]: expected {expected_output_2[i][j]}, got {result_2[i][j]}"

    # Additional test case 3: Empty matrix
    test_data_3 = [[]]
    try:
        calculate_covariance_matrix(test_data_3)
        assert False, "Test case 3 should raise an error for empty matrix"
    except:
        pass

    # Additional test case 4: Single element
    test_data_4 = [[1]]
    try:
        calculate_covariance_matrix(test_data_4)
        assert False, "Test case 4 should raise an error for single element (n-1=0 in denominator)"
    except:
        pass

if __name__ == "__main__":
    try:
        test_covariance_matrix()
        print("All test cases passed!")
    except AssertionError as e:
        print(f"Test failed: {str(e)}")

## Numpy references are not Deep Copies

### NumPy Array Copying: Deep vs Shallow
Understanding Array Assignment vs Copying
When working with NumPy arrays, it's crucial to understand the difference between creating a new reference to an array and making a true copy of it.

### Shallow Copy (Reference Assignment)
When you assign a NumPy array to a new variable using =, you create a new reference pointing to the same data in memory. Any changes made through either variable will affect the underlying data.


In [None]:
import numpy as np
from pprint import pprint

# Create original array
original = np.ones((3, 3))
print("Original array:")
pprint(original)
# [[1. 1. 1.]
#  [1. 1. 1.]
#  [1. 1. 1.]]

# Create a reference (shallow copy)
reference = original

# Modify through the reference
reference[1, 1] = 100

print("\nAfter modification:")
print("Reference array:")
pprint(reference)
print("\nOriginal array (also changed):")
pprint(original)
# Both arrays show:
# [[1.   1.   1.  ]
#  [1.   100. 1.  ]
#  [1.   1.   1.  ]]

### Deep Copy
To create an independent copy of an array where modifications won't affect the original, use the .copy() method. This creates a new array with its own data in memory.

In [None]:
import numpy as np
from pprint import pprint

# Create original array
original = np.ones((3, 3))
print("Original array:")
pprint(original)
# [[1. 1. 1.]
#  [1. 1. 1.]
#  [1. 1. 1.]]

# Create a deep copy
independent_copy = original.copy()

# Modify the copy
independent_copy[1, 1] = 100

print("\nAfter modification:")
print("Independent copy:")
pprint(independent_copy)
print("\nOriginal array (unchanged):")
pprint(original)
# Independent copy shows:
# [[1.   1.   1.  ]
#  [1.   100. 1.  ]
#  [1.   1.   1.  ]]
# 
# Original remains:
# [[1. 1. 1.]
#  [1. 1. 1.]
#  [1. 1. 1.]]

### More Understanding through Puzzles

### Sorting an Array 

In [None]:
#Puzzle 1
import numpy as np
# Quiz scores for different students
scores = np.array([7, 6, 8, 5, 9])
scores = np.sort(scores)
print(scores)
scores = scores[::-1]
print(scores)
print(scores[-2])


In [None]:
#Puzzle 2
import numpy as np
# Sensor IDs and the corresponding
# sensor values
ids = [56, 61, 33, 17, 82]
values = np.array([10, 6, 8, 7, 9])
indices = np.argsort(values)
print(ids[indices[2]])


In [None]:
#Puzzle 3
import numpy as np
bool_array = np.array([True, False, True])
print(sum(bool_array) > sum([1, 0, 1]))

In [None]:
#Puzzle 4
import numpy as np
# Students' grades for an examn
grades = np.array(['A', 'B', 'D', 'A', 'E'])
# Filter students who did not pass
grades_filter = grades > 'D'
print(grades_filter[1])
print(grades_filter[-1])


In [None]:
#Puzzle 5
import numpy as np
# Book ratings
ratings = np.array([('Numpy Book', 4.6),
('Harry Potter', 4.3),
('Winnie Pooh', 3.4),
('Python Book', 4.7)])
bestseller_filter = [float(x[1]) > 4.5 for x in ratings]
number_of_bestsellers =len(ratings[bestseller_filter])
print(number_of_bestsellers)

In [None]:
#Puzzle 6
import numpy as np
# Filter for 3x3 matrices
filter_3x3 = np.array([[False, True, False],
[True, False, True],
[False, True, False]])
inverted_filter = np.invert(filter_3x3)
numbers_matrix = np.arange(1, 10).reshape(3, 3)
print(sum(numbers_matrix[inverted_filter]))