# Data Structures for Data Analysis

## Numpy


In [None]:
import numpy as np

### Vector

In [3]:
a = np.array([1, 2, 3])

In [5]:
a[0]

np.int64(1)

In [6]:
a[:2]

array([1, 2])

In [17]:
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

#### Operators

In [9]:
a + b

array([5, 7, 9])

In [12]:
a * b

array([ 4, 10, 18])

In [11]:
a.dot(b)

np.int64(32)

In [18]:
a @ b

np.int64(32)

In [19]:
len(a)

3

Two vectors are orthogonal if they are perpendicular, i.e. their dot product = 0.

In [21]:
v1 = np.array([1, 2])
v2 = np.array([2, -1])
v1 @ v2

np.int64(0)

In [23]:
v1 = np.array([1, -2, 1])
v2 = np.array([1, 1, 2])
v1 @ v2

np.int64(1)

To normalize a vector means to scale it so its length = 1, without changing its direction.

In [26]:
v = np.array([1, 2])
norm = np.linalg.norm(v)  # compute sqrt(1^2 + 2^2)
v_unit = v / norm
v_unit

array([0.4472136 , 0.89442719])

### Complex Number

In [14]:
a = np.array([1 + 2j, 3 + 4j, 5 + 6j])
b = np.array([1 + 2j, 3 + 4j, 5 + 6j])

In [15]:
a + b

array([ 2. +4.j,  6. +8.j, 10.+12.j])

In [16]:
a * b

array([ -3. +4.j,  -7.+24.j, -11.+60.j])

### Matrix

In [27]:
A = np.array([[1, 2, 3], [4, 5, 6]])
print(A)

[[1 2 3]
 [4 5 6]]


In [28]:
print("shape =", A.shape)

shape = (2, 3)


In [31]:
np.arange(6).reshape(2, 3)

array([[0, 1, 2],
       [3, 4, 5]])

In [30]:
np.ones((2, 2))

array([[1., 1.],
       [1., 1.]])

In [29]:
np.eye(3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

In [40]:
A = np.array([[1, 2, 3], [4, 5, 6]])
A.T

array([[1, 4],
       [2, 5],
       [3, 6]])

In [41]:
A.transpose()

array([[1, 4],
       [2, 5],
       [3, 6]])

Vectors are linearly dependent if one can be written as a combination of others.

In [32]:
A = np.array([[1, 2], [2, 4]])  # v2 = 2 * v1

rank = np.linalg.matrix_rank(A)
print("Matrix rank:", rank)
print("Shape:", A.shape)

Matrix rank: 1
Shape: (2, 2)


When a matrix acts on a vector, it usually changes both direction and length. But for some special vectors, eigenvectors, it only stretches or shrinks the vector (doesn’t rotate it).

In [34]:
A = np.array([[2, 0], [0, 3]])

vals, vecs = np.linalg.eig(A)

print("Eigenvalues:", vals)
print("Eigenvectors:\n", vecs)

Eigenvalues: [2. 3.]
Eigenvectors:
 [[1. 0.]
 [0. 1.]]


The inverse of a square matrix $A$ (when it exists) is the matrix $A^{-1}$ such that $A A^{-1} = A^{-1} A = I$, where $I$ is the identity matrix of the same size.

In [33]:
A = np.array([[1, 2], [3, 4]])

A_inv = np.linalg.inv(A)
print("A inverse:\n", A_inv)

# Check that A @ A_inv = Identity
print("Check:\n", np.round(A @ A_inv, 3))

A inverse:
 [[-2.   1. ]
 [ 1.5 -0.5]]
Check:
 [[1. 0.]
 [0. 1.]]


For some square matrices $A$, we can write $A = V \, \Lambda \, V^{-1}$, where columns of $V$ are eigenvectors and $\Lambda$ is diagonal with eigenvalues. This “diagonalizes” $A$, turning matrix powers or exponentials into easy operations on $\Lambda$.

In [36]:
A = np.array([[4, 1], [2, 3]])

vals, vecs = np.linalg.eig(A)

# Rebuild A from its decomposition
A_reconstructed = vecs @ np.diag(vals) @ np.linalg.inv(vecs)

print("Eigenvalues:", np.round(vals, 3))
print("Eigenvectors:\n", np.round(vecs, 3))
print("Reconstructed A:\n", np.round(A_reconstructed, 3))

Eigenvalues: [5. 2.]
Eigenvectors:
 [[ 0.707 -0.447]
 [ 0.707  0.894]]
Reconstructed A:
 [[4. 1.]
 [2. 3.]]


### Further Reading
[Making sense of principal component analysis, eigenvectors & eigenvalues](https://stats.stackexchange.com/questions/2691/making-sense-of-principal-component-analysis-eigenvectors-eigenvalues)  
[Eigenfaces, for Facial Recognition](https://jeremykun.com/2011/07/27/eigenfaces/)  

## Pandas


In [42]:
import pandas as pd

## Polars


In [43]:
import polars as pl


## Memory, Disks and Databases
