# NumPy Notes (Basics to Advanced)

## Table of contents
- What is NumPy and why it matters
- Creating arrays
- Data types (dtype)
- Shape, dimensions, and axes
- Indexing and slicing
- Boolean and fancy indexing
- Vectorized operations and ufuncs
- Broadcasting rules
- Aggregations and statistics
- Reshaping and stacking
- Views vs copies
- Linear algebra
- Random numbers
- File I/O
- Performance tips and common pitfalls

---

## What is NumPy and why it matters
NumPy is the core library for numerical computing in Python. Its main object is the `ndarray`, a fast, fixed-type, n-dimensional array.

Key benefits:
- Fast vectorized operations in compiled C
- Contiguous memory layout (better cache efficiency)
- Rich math and linear algebra functionality
- Interoperability with pandas, SciPy, scikit-learn, and more

List vs array (conceptual):
- Python list can hold mixed types and uses per-element boxing
- NumPy array has a single dtype, enabling fast, SIMD-friendly operations

---

## Creating arrays
Common creation functions:
```python
import numpy as np

np.array([1, 2, 3])
np.array([[1, 2], [3, 4]])
np.arange(0, 10, 2)      # 0,2,4,6,8
np.linspace(0, 1, 5)     # 0.0..1.0 inclusive
np.zeros((2, 3))
np.ones((2, 3))
np.full((2, 3), 7)
np.eye(3)                # identity
```

Random creation (see section below for best practice with `default_rng`):
```python
np.random.rand(2, 3)      # uniform [0,1)
np.random.randn(2, 3)     # standard normal
```

---

## Data types (dtype)
Every `ndarray` has a fixed dtype.
```python
a = np.array([1, 2, 3])
a.dtype           # int64 on most systems

b = np.array([1, 2, 3], dtype=np.float32)
b.dtype

c = a.astype(np.float64)
```

Type promotion happens automatically:
```python
np.array([1, 2.5, 3]).dtype   # float64
```

Useful attributes:
- `dtype` type of elements
- `itemsize` bytes per element
- `nbytes` total bytes

---

## Shape, dimensions, and axes
```python
a = np.array([[1, 2, 3], [4, 5, 6]])

a.shape   # (2, 3)
a.ndim    # 2
len(a)    # size of axis 0 (rows)
```

Axis meaning for 2D:
- `axis=0` goes down rows (operate column-wise)
- `axis=1` goes across columns (operate row-wise)

---

## Indexing and slicing
1D slicing:
```python
a = np.array([10, 20, 30, 40, 50])
a[0]       # 10
a[1:4]     # 20,30,40
```

2D indexing:
```python
m = np.array([[1, 2, 3], [4, 5, 6]])
m[0, 1]    # 2
m[:, 1]    # column 1
m[1, :]    # row 1
```

Fancy indexing:
```python
idx = [0, 2, 4]
a[idx]     # elements at 0,2,4
```

Boolean indexing:
```python
mask = a > 25
a[mask]
```

---

## Vectorized operations and ufuncs
NumPy operates element-wise without explicit Python loops.
```python
x = np.array([1, 2, 3])
y = np.array([10, 20, 30])

x + y
x * y
np.sqrt(x)
np.exp(x)
```

Ufuncs (universal functions) support `out` for in-place compute:
```python
np.add(x, y, out=x)
```

---

## Broadcasting rules
Broadcasting lets arrays of different shapes work together.
Rules (simplified):
1) Compare shapes from right to left.
2) Dimensions are compatible if equal or one of them is 1.
3) Missing dimensions are treated as 1.

Example:
```python
A = np.array([[1, 2, 3], [4, 5, 6]])  # shape (2,3)
b = np.array([10, 20, 30])            # shape (3,)
A + b  # b is broadcast to (2,3)
```

---

## Aggregations and statistics
Common reductions:
```python
a = np.array([[1, 2, 3], [4, 5, 6]])

a.sum()           # 21

a.sum(axis=0)     # column-wise

a.sum(axis=1)     # row-wise

np.mean(a)
np.min(a)
np.max(a)
np.std(a)
```

Missing values:
```python
np.nanmean(a)
np.nanstd(a)
```

---

## Reshaping and stacking
```python
a = np.arange(6)
a.reshape(2, 3)

b = a.reshape(3, 2)
b.T                # transpose
```

Stacking and concatenation:
```python
x = np.array([[1, 2]])
y = np.array([[3, 4]])

np.concatenate([x, y], axis=0)
np.vstack([x, y])
np.hstack([x, y])
```

Flattening:
```python
a.ravel()   # view when possible

a.flatten() # copy
```

---

## Views vs copies
Slicing often returns a view (shares memory).
```python
a = np.arange(5)
view = a[1:4]
view[0] = 999
# a is modified
```

Make an explicit copy when needed:
```python
copy = a[1:4].copy()
```

---

## Linear algebra
```python
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])

A @ B               # matrix multiply
np.dot(A, B)

np.linalg.det(A)
np.linalg.inv(A)    # use solve instead when possible
np.linalg.solve(A, np.array([1, 0]))
```

---

## Random numbers (best practice)
Prefer the new Generator API:
```python
rng = np.random.default_rng(42)

rng.random((2, 3))       # uniform [0,1)

rng.normal(0, 1, (2, 3)) # mean 0, std 1
```

---

## File I/O
```python
np.save('arr.npy', a)
loaded = np.load('arr.npy')

np.savetxt('arr.csv', a, delimiter=',')
loaded_txt = np.loadtxt('arr.csv', delimiter=',')
```

---

## Performance tips and common pitfalls
- Prefer vectorized operations over Python loops
- Pre-allocate arrays if size is known
- Use appropriate dtype to reduce memory
- Avoid unnecessary copies; be mindful of views
- Watch out for integer division (`//`) vs float division (`/`)
- Check shapes early to avoid silent broadcasting bugs


In [12]:
import numpy as np

# very basic

marks = [85, '90', True, 92, 88]
type(marks)  # list

total = 0
for i in marks:
    total +=i
print(total)


TypeError: unsupported operand type(s) for +=: 'int' and 'str'

In [21]:
a = np.array([85, 90, '78', '92', 88,], dtype=np.float64)
type(a)  # numpy.ndarray
a.ndim
a.shape
a.dtype
print(a)

for i in a:
    total += i
print(total)

# conclusion :
# The numpy array is more efficient for numerical operations compared to a list.
# It automatically handles type conversion and provides useful attributes for analysis.
# it stores data in a contiguous block of memory, allowing for faster access and manipulation.

[85. 90. 78. 92. 88.]
3117.0


In [27]:
# next topic
# operation  ??
# add , subtract, multiply, divide
a = np.array([1, 2, 3, 4]) # ndim 1
b = np.array([5, 6, 7, 8]) # ndim 1
c = a % b
c

array([1, 2, 3, 4])

In [None]:
# 2D data
a = np.array([[1, 2, 3], [4, 5, 6]]) # ndim 2
b = np.array([[7, 8, 9], [10, 11]]) # ndim 2
c = a + b
c

# here you are getting error because size mismatch

ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (2,) + inhomogeneous part.

In [40]:
# sales of 3 products provided daily sales
import pandas as pd
sales = np.array([[100, 200, 300], [400, 500, 600],[500, 600, 700]]) # ndim 2

# df = pd.DataFrame(sales, columns=["Product A", "Product B", "Product C"])
# df.describe()

sales.mean(axis=0)  # mean sales per product
# axis mean 0 calculates the mean across columns (daily sales), resulting in mean sales for each product
# if you change axis to 1- row-wise mean sales for each day

array([333.33333333, 433.33333333, 533.33333333])

In [41]:
sales

array([[100, 200, 300],
       [400, 500, 600],
       [500, 600, 700]])

In [42]:
df = pd.DataFrame(sales, columns=["Product A", "Product B", "Product C"])
df.describe()

Unnamed: 0,Product A,Product B,Product C
count,3.0,3.0,3.0
mean,333.333333,433.333333,533.333333
std,208.1666,208.1666,208.1666
min,100.0,200.0,300.0
25%,250.0,350.0,450.0
50%,400.0,500.0,600.0
75%,450.0,550.0,650.0
max,500.0,600.0,700.0


In [44]:
# matrix operations
mat_a = np.array([[1, 2], [3, 4]])
print('shape of mat_a:', mat_a.shape)
print(' mat_a:', mat_a)
mat_b = np.array([[5, 6], [7, 8]])
print(' mat_b:', mat_b)
print(' shape of mat_b:', mat_b.shape)
mat_c = mat_a + mat_b
print(' mat_c:', mat_c)

shape of mat_a: (2, 2)
 mat_a: [[1 2]
 [3 4]]
 mat_b: [[5 6]
 [7 8]]
 shape of mat_b: (2, 2)
 mat_c: [[ 6  8]
 [10 12]]


In [74]:
mat_a = np.array([[1, 2], [4, 5]])
print('shape of mat_a:', mat_a.shape)

mat_b = np.array([[2], [3]])
print('shape of mat_b:', mat_b.shape)

# lets try to multiply
mat_d = mat_a * mat_b
print('shape of mat_d:', mat_d.shape)
mat_d

# broadcasting? # link https://numpy.org/doc/stable/user/basics.broadcasting.html
# https://blog.finxter.com/numpy-broadcasting-a-simple-tutorial/

shape of mat_a: (2, 2)
shape of mat_b: (2, 1)
shape of mat_d: (2, 2)


array([[ 2,  4],
       [12, 15]])

In [56]:
np.linspace(0, 10, num=10)

array([ 0.        ,  1.11111111,  2.22222222,  3.33333333,  4.44444444,
        5.55555556,  6.66666667,  7.77777778,  8.88888889, 10.        ])

In [65]:
a = np.arange(6)
a.ndim
b = a.reshape(3, 3) # here 2 means rows and 3 means columns
b

ValueError: cannot reshape array of size 6 into shape (3,3)

In [68]:
a

array([0, 1, 2, 3, 4, 5])

In [72]:
a[1:3]  # first row, second column

array([1, 2])

In [73]:
a = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])
a[a > 5]  # all elements greater than 5

array([ 6,  7,  8,  9, 10, 11, 12])

In [None]:
# https://numpy.org/doc/stable/user/absolute_beginners.html
# please refer official documentation : Assignment
