# Linear Algebra

In this notebook, you will:

* Perform linear algebra operations using NumPy.
* Compare the efficiency of different approaches (for loops, apply, NumPy primitives) for vectorized operations.
* Use timing functions to measure the performance of each approach.

In [1]:
import numpy as np
import pandas as pd
import time
import seaborn as sns
import matplotlib.pyplot as plt

Addition, scalar multiplication, and multiplication

In [2]:
arr = np.arange(1, 13)

reshaped_matrix = arr.reshape(3, 4)

In [3]:
int_array = np.array([1, 0, 3, 5])

float_array = int_array.astype(float)

bool_array = int_array.astype(bool)

print("Original array (int):")
print(int_array)

print("\nArray cast to float:")
print(float_array)


print("\nArray cast to boolean:")
print(bool_array)

Original array (int):
[1 0 3 5]

Array cast to float:
[1. 0. 3. 5.]

Array cast to boolean:
[ True False  True  True]


In [4]:
int_array = np.array([1, 0, 3, 5]).astype(np.int16)
int_array.nbytes

8

In [5]:
A = np.array([[1, 2, 3], 
              [4, 5, 6], 
              [7, 8, 9]])

B = np.array([[1], [2], [3]])

# Matrix multiplication using NumPy
result = np.dot(A, B)
print("Matrix multiplication result (A x B):")
print(result)

# Transpose of a matrix
A_transpose = A.T
print("\nTranspose of matrix A:")
print(A_transpose)

Matrix multiplication result (A x B):
[[14]
 [32]
 [50]]

Transpose of matrix A:
[[1 4 7]
 [2 5 8]
 [3 6 9]]


## Below find a collection of useful expressions.
(Won't go over, test on own time)

In [10]:
# 1. Create NumPy Arrays
# Create a 1D array
array_1d = np.array([1, 2, 3, 4, 5])
print("1D array:", array_1d)

# Create a 2D array (matrix)
array_2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print("2D array:\n", array_2d)

1D array: [1 2 3 4 5]
2D array:
 [[1 2 3]
 [4 5 6]
 [7 8 9]]


In [11]:
# 2. Basic Array Operations
# Element-wise addition, subtraction, multiplication, and division
array_sum = array_1d + 2  # Add 2 to each element
array_mul = array_1d * 3  # Multiply each element by 3
print("Array after addition:", array_sum)
print("Array after multiplication:", array_mul)

Array after addition: [3 4 5 6 7]
Array after multiplication: [ 3  6  9 12 15]


In [8]:
# 3. Matrix Multiplication
matrix_a = np.array([[1, 2], [3, 4]])
matrix_b = np.array([[5, 6], [7, 8]])

matrix_mul = np.dot(matrix_a, matrix_b)  # Matrix multiplication using dot product
print("Matrix A:\n", matrix_a)
print("Matrix B:\n", matrix_b)
print("Matrix multiplication (A dot B):\n", matrix_mul)

Matrix A:
 [[1 2]
 [3 4]]
Matrix B:
 [[5 6]
 [7 8]]
Matrix multiplication (A dot B):
 [[19 22]
 [43 50]]


In [9]:
# 4. Array Slicing and Indexing
# Access specific rows, columns, or elements
print("First row of 2D array:", array_2d[0])       # First row
print("First column of 2D array:", array_2d[:, 0]) # First column
print("Element at position (1,1):", array_2d[1, 1])  # Element at (1,1)

First row of 2D array: [1 2 3]
First column of 2D array: [1 4 7]
Element at position (1,1): 5


In [10]:
# 5. Broadcasting 
# Apply operations on arrays of different shapes
array_1d_broad = np.array([1, 2, 3])
array_2d_broad = np.array([[4], [5], [6]])

broadcast_result = array_1d_broad + array_2d_broad
print("Broadcasting result:\n", broadcast_result)

Broadcasting result:
 [[5 6 7]
 [6 7 8]
 [7 8 9]]


# Sampling
Won't go over, look over your own time

In [12]:
# Uniform Distribution
random_uniform = np.random.rand(5)  # 5 random numbers between 0 and 1
print("Random numbers from uniform distribution:", random_uniform)

# Normal distribution (mean=0, std=1)
random_normal = np.random.randn(5)  # 5 random numbers from a standard normal distribution
print("Random numbers from normal distribution:", random_normal)

# Random integers with range
random_integers = np.random.randint(10, 100, size=5)  # 5 random integers between 10 and 100
print("Random integers between 10 and 100:", random_integers)

# Random sampling from a 1D array
array = np.array([10, 20, 30, 40, 50])
random_sample = np.random.choice(array, size=3, replace=False)  # Random sample of 3 elements without replacement
print("Random sample from array:", random_sample)

# 5. Generating random permutations of arrays
random_permutation = np.random.permutation(array)  # Generate a random permutation of the array
print("Random permutation of the array:", random_permutation)

Random numbers from uniform distribution: [0.43532323 0.20035114 0.85744449 0.94537654 0.76921452]
Random numbers from normal distribution: [ 0.69936219 -1.07754021  1.57889053 -1.74498079  1.2860253 ]
Random integers between 10 and 100: [15 26 96 76 95]
Random sample from array: [20 50 30]
Random permutation of the array: [10 50 20 30 40]


# Benchmarking Time

In [13]:
data = np.random.rand(1000000)

start_time = time.time()

squared_for_loop = []
for x in data:
    squared_for_loop.append(x ** 2)

end_time = time.time()
for_loop_time = end_time - start_time

print(f"Time taken using a for loop: {for_loop_time:.4f} seconds")

Time taken using a for loop: 0.2098 seconds


In [18]:
df = pd.DataFrame(data, columns=['Values'])

start_time = time.time()

squared_apply = df['Values'].apply(lambda x: x ** 2)

end_time = time.time()
apply_time = end_time - start_time

print(f"Time taken using Pandas apply(): {apply_time:.4f} seconds")

Time taken using Pandas apply(): 0.3184 seconds


In [30]:
start_time = time.time()

# Calculate the square of each element using NumPy vectorized operation
squared_numpy = np.square(data)

end_time = time.time()
numpy_time = end_time - start_time

print(f"Time taken using NumPy vectorization: {numpy_time:.4f} seconds")

Time taken using NumPy vectorization: 0.0054 seconds


# Your turn | Part 1

#### Using np.random.rand, create a 1000 x 1000 matrix, and square every element using a forloop, pandas, and numpy. Benchmark its speed.

In [48]:
matrix = np.random.rand(1000, 1000)

start_time = time.time()

squared_matrix = np.array([[ (lambda x: x**2)(val) for val in row ] for row in matrix])
end_time = time.time()
squared_matrix = pd.DataFrame(squared_matrix)



for_loop_time = end_time - start_time

print(f"Time taken using a for loop: {for_loop_time:.4f} seconds")


Time taken using a for loop: 0.4723 seconds


In [49]:
squared_matrix

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,990,991,992,993,994,995,996,997,998,999
0,0.702738,0.263695,0.000073,0.163405,0.420013,0.017558,0.182860,0.206532,0.026974,0.022638,...,0.553697,0.541373,0.054893,0.005803,0.998997,0.078294,0.070987,0.450162,0.490978,0.518671
1,0.182547,0.068819,0.290485,0.000564,0.447577,0.227485,0.587796,0.050988,0.001394,0.545704,...,0.095277,0.519542,0.872052,0.400914,0.025225,0.124691,0.018122,0.328552,0.960085,0.093793
2,0.590529,0.032066,0.168663,0.026405,0.877437,0.009878,0.428883,0.528965,0.027687,0.024143,...,0.526008,0.745926,0.925255,0.042025,0.140906,0.001593,0.101882,0.103125,0.004332,0.847354
3,0.266225,0.027990,0.018614,0.273727,0.585266,0.335884,0.000136,0.863548,0.357473,0.006497,...,0.329193,0.000053,0.265218,0.474862,0.003389,0.901688,0.541307,0.673355,0.254380,0.438243
4,0.012510,0.216450,0.175609,0.027478,0.223422,0.146684,0.027475,0.735977,0.099811,0.695720,...,0.000020,0.590954,0.355227,0.007538,0.000421,0.737024,0.007477,0.329881,0.042046,0.158994
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
995,0.433006,0.878671,0.281746,0.525957,0.811328,0.085448,0.011764,0.364288,0.537720,0.583321,...,0.234649,0.012782,0.013824,0.553846,0.061096,0.598210,0.541455,0.326384,0.109317,0.133882
996,0.258995,0.399316,0.062369,0.001199,0.175682,0.398680,0.138865,0.570466,0.080284,0.108639,...,0.426254,0.179807,0.062793,0.068589,0.059469,0.968701,0.253140,0.640614,0.125996,0.873858
997,0.002231,0.628184,0.000066,0.442441,0.455095,0.362633,0.405864,0.699863,0.639539,0.225824,...,0.263449,0.387783,0.560546,0.533201,0.198184,0.274832,0.065974,0.004802,0.804189,0.447884
998,0.374148,0.043393,0.058970,0.002308,0.010706,0.135996,0.196277,0.998735,0.074724,0.645682,...,0.448996,0.947833,0.589411,0.703693,0.307943,0.649183,0.544285,0.257590,0.760747,0.005636


#### Using the code above, write a function that creates two random matrix (i.e. A and B) of size S x S and multiplies them. 

In [47]:
import numpy as np

In [50]:
def mult_rand_matrix(S):
    matrix1 = np.random.rand(S, S)
    matrix2 = np.random.rand(S, S)
    res = np.matmul(matrix1, matrix2)

    return matrix1, matrix2, res
    
    

What happens when you make the size 1000? 1,000,000? 100,000,000? 

In [54]:
ex_1 = mult_rand_matrix(1000)
ex_2 = mult_rand_matrix(1000000)
ex_3 = mult_rand_matrix(100000000)

### Computing Distances
Write a numpy-based function that calculates a) the Euclidean distance between two points and b)the Manhattan distance. No cheating and using built-in functions.

In [6]:
# Use these points to test
point1 = np.array([1, 2])
point2 = np.array([4, 6])

In [9]:
# your code for Euclidean
def distances(a, b):
    a, b = np.array(a), np.array(b)

    diff = a - b
    euclidean = np.sqrt(np.sum(diff ** 2))
    manhattan = np.sum(np.abs(diff))

    return euclidean, manhattan

In [10]:
distances([3,4], [4,2])

(2.23606797749979, 3)

In [None]:
# your code for cosine similarity

# Part Two | Understanding regression
In this section we will simulate regression data, and then inverse calculate the betas. (This may stretch to the next class). The goal is to get you thinking in matrices instead of numbers.

$y = \beta_0 + \beta_1 x + \epsilon$ where $\epsilon \in N(0,10)$

Generate a thousand points based on this model, then make a scatter plot using plt.scatter or sns.regplot

In [None]:
# True parameters of the linear model
beta_0 = 5   # Intercept
beta_1 = 3   # Slope

In [None]:
n = 1000
x = np.random.rand(n) * 10


#### Generalize this to a function that takes in n (the number of points), the slope (b1), and intercept (b0). It should return x and y.

Inverse computation: Due to the availability of derivaties, an exact solution can be found for the $\beta$ values. Using the following equations to calculate the beta values.

$\beta_1 = \frac{\sum_{i=1}^{n} (x_i - \bar{x})(y_i - \bar{y})}{\sum_{i=1}^{n} (x_i - \bar{x})^2}$

    
$\beta_0 = \bar{y} - \beta_1 \bar{x}$


In [None]:
# your code

### Challenge: What happens when you increase the sample size? Calculate the error of your estimated $\beta_1$ to the real $\beta_1$ and plot your error as the sample size increases.