# NumPy Tutorial: From Basic to Advanced

## Introduction to NumPy

NumPy (Numerical Python) is the fundamental package for scientific computing in Python. It provides:

- A powerful N-dimensional array object
- Sophisticated (broadcasting) functions
- Tools for integrating C/C++ and Fortran code
- Useful linear algebra, Fourier transform, and random number capabilities

### Installation

To install NumPy, run the following command in your terminal or notebook:

```python
!pip install numpy

### 1. Basic NumPy Operations
- Importing NumPy
- All NumPy functions are typically imported under the np alias:

In [3]:
import numpy as np

### Creating Arrays
- NumPy's main object is the homogeneous multidimensional array (called ndarray).

In [4]:
# 1D array
arr1d = np.array([1, 2, 3, 4, 5])
print("1D Array:")
print(arr1d)

# 2D array (matrix)
arr2d = np.array([[1, 2, 3], [4, 5, 6]])
print("\n2D Array:")
print(arr2d)

# 3D array
arr3d = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])
print("\n3D Array:")
print(arr3d)

1D Array:
[1 2 3 4 5]

2D Array:
[[1 2 3]
 [4 5 6]]

3D Array:
[[[1 2]
  [3 4]]

 [[5 6]
  [7 8]]]


### Array Attributes
- NumPy arrays have several important attributes:

In [5]:
a = np.array([[1, 2, 3], [4, 5, 6]])

print("Shape:", a.shape)    # Dimensions of the array
print("Number of dimensions:", a.ndim)     
print("Total elements:", a.size)     
print("Data type:", a.dtype)    
print("Item size (bytes):", a.itemsize)

Shape: (2, 3)
Number of dimensions: 2
Total elements: 6
Data type: int32
Item size (bytes): 4


### Special Arrays
- NumPy provides functions to create special arrays:

In [6]:
# Array of zeros
zeros = np.zeros((2, 3))
print("Zeros array:")
print(zeros)

# Array of ones
ones = np.ones((3, 2))
print("\nOnes array:")
print(ones)

# Identity matrix
identity = np.eye(3)
print("\nIdentity matrix:")
print(identity)

# Array with a range
range_arr = np.arange(0, 10, 2)  # start, stop, step
print("\nRange array:")
print(range_arr)

# Linearly spaced array
linspace_arr = np.linspace(0, 1, 5)  # start, stop, num_points
print("\nLinspace array:")
print(linspace_arr)

# Random array
random_arr = np.random.random((2, 2))
print("\nRandom array:")
print(random_arr)

Zeros array:
[[0. 0. 0.]
 [0. 0. 0.]]

Ones array:
[[1. 1.]
 [1. 1.]
 [1. 1.]]

Identity matrix:
[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]

Range array:
[0 2 4 6 8]

Linspace array:
[0.   0.25 0.5  0.75 1.  ]

Random array:
[[0.35960502 0.49775543]
 [0.30204535 0.17711659]]


# 2. Array Operations


### Basic Arithmetic
- NumPy arrays support element-wise operations:

In [7]:
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

print("Addition:", a + b)
print("Subtraction:", a - b)
print("Element-wise multiplication:", a * b)
print("Division:", a / b)
print("Exponentiation:", a ** 2)

Addition: [5 7 9]
Subtraction: [-3 -3 -3]
Element-wise multiplication: [ 4 10 18]
Division: [0.25 0.4  0.5 ]
Exponentiation: [1 4 9]


### Dot Product
- The dot product can be calculated using np.dot() or the @ operator:

In [8]:
print("Dot product:", np.dot(a, b))
print("Alternative syntax:", a @ b)

Dot product: 32
Alternative syntax: 32


# Matrix Multiplication
- For matrix multiplication (not element-wise):

In [9]:
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])

print("Matrix multiplication:")
print(A @ B)

Matrix multiplication:
[[19 22]
 [43 50]]


## Universal Functions (ufunc)
- NumPy provides many mathematical functions that operate element-wise:

In [10]:
print("Square roots:", np.sqrt(a))
print("Exponentials:", np.exp(a))
print("Sines:", np.sin(a))
print("Natural logs:", np.log(a))

Square roots: [1.         1.41421356 1.73205081]
Exponentials: [ 2.71828183  7.3890561  20.08553692]
Sines: [0.84147098 0.90929743 0.14112001]
Natural logs: [0.         0.69314718 1.09861229]


# 3. Indexing and Slicing

# 1D Arrays
- Indexing works similarly to Python lists:

In [11]:
arr = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

print("Element at index 3:", arr[3])
print("Slice from index 2 to 5:", arr[2:5])
print("First 5 elements:", arr[:5])
print("Elements from index 5 onward:", arr[5:])
print("Every second element:", arr[::2])
print("Reversed array:", arr[::-1])

Element at index 3: 3
Slice from index 2 to 5: [2 3 4]
First 5 elements: [0 1 2 3 4]
Elements from index 5 onward: [5 6 7 8 9]
Every second element: [0 2 4 6 8]
Reversed array: [9 8 7 6 5 4 3 2 1 0]


# 2D Arrays
- For multidimensional arrays, use comma-separated indices:

In [12]:
arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

print("Element at row 0, column 1:", arr[0, 1])
print("All rows, column 1:", arr[:, 1])
print("Rows from 1 onward, first two columns:", arr[1:, :2])

Element at row 0, column 1: 2
All rows, column 1: [2 5 8]
Rows from 1 onward, first two columns: [[4 5]
 [7 8]]


# Boolean Indexing
- You can use boolean conditions to index arrays:

In [14]:
arr = np.array([1, 2, 3, 4, 5, 6])

print("Elements greater than 3:", arr[arr > 3])
print("Elements between 2 and 5:", arr[(arr > 2) & (arr < 5)])

Elements greater than 3: [4 5 6]
Elements between 2 and 5: [3 4]


# 4. Array Manipulation


# Reshaping
- Change the shape of an array without changing its data:

In [15]:
arr = np.arange(12)

print("Reshaped to 3x4:")
print(arr.reshape(3, 4))

print("\nReshaped to 2x3x2:")
print(arr.reshape(2, 3, 2))

Reshaped to 3x4:
[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]

Reshaped to 2x3x2:
[[[ 0  1]
  [ 2  3]
  [ 4  5]]

 [[ 6  7]
  [ 8  9]
  [10 11]]]


# Flattening
- Convert multidimensional arrays to 1D:

In [16]:
arr = np.array([[1, 2, 3], [4, 5, 6]])

print("Flattened array (copy):", arr.flatten())
print("Raveled array (view if possible):", arr.ravel())

Flattened array (copy): [1 2 3 4 5 6]
Raveled array (view if possible): [1 2 3 4 5 6]


# Concatenation
- Combine arrays along existing axes:

In [17]:
a = np.array([[1, 2], [3, 4]])
b = np.array([[5, 6]])

print("Vertical stack:")
print(np.vstack((a, b)))

c = np.array([[7], [8]])
print("\nHorizontal stack:")
print(np.hstack((a, c)))

Vertical stack:
[[1 2]
 [3 4]
 [5 6]]

Horizontal stack:
[[1 2 7]
 [3 4 8]]


# Splitting
- Divide arrays into multiple sub-arrays:

In [18]:
arr = np.arange(12).reshape(3, 4)

print("Split vertically:")
print(np.split(arr, 3))

print("\nSplit horizontally:")
print(np.hsplit(arr, 2))

Split vertically:
[array([[0, 1, 2, 3]]), array([[4, 5, 6, 7]]), array([[ 8,  9, 10, 11]])]

Split horizontally:
[array([[0, 1],
       [4, 5],
       [8, 9]]), array([[ 2,  3],
       [ 6,  7],
       [10, 11]])]


# 5. Advanced NumPy Operations


# Broadcasting
- NumPy's broadcasting allows operations on arrays of different shapes:

In [19]:
a = np.array([[1, 2, 3], [4, 5, 6]])
b = np.array([10, 20, 30])

print("Broadcasted addition:")
print(a + b)

Broadcasted addition:
[[11 22 33]
 [14 25 36]]


# Vectorization
- Vectorized operations are faster than Python loops:

In [20]:
# Non-vectorized approach
def slow_add(a, b):
    result = np.empty(len(a))
    for i in range(len(a)):
        result[i] = a[i] + b[i]
    return result

# Vectorized approach
def fast_add(a, b):
    return a + b

# Compare performance
large_a = np.random.rand(1000000)
large_b = np.random.rand(1000000)

%timeit slow_add(large_a, large_b)
%timeit fast_add(large_a, large_b)

348 ms ± 23.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
3.75 ms ± 20.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


# Saving and Loading Arrays
- NumPy provides efficient I/O functions:

In [21]:
arr = np.arange(10)

# Save to binary file
np.save('my_array.npy', arr)

# Load from binary file
loaded_arr = np.load('my_array.npy')

# Save to text file
np.savetxt('my_array.txt', arr)

# Load from text file
loaded_txt = np.loadtxt('my_array.txt')

print("Original array:", arr)
print("Loaded from .npy:", loaded_arr)
print("Loaded from .txt:", loaded_txt)

Original array: [0 1 2 3 4 5 6 7 8 9]
Loaded from .npy: [0 1 2 3 4 5 6 7 8 9]
Loaded from .txt: [0. 1. 2. 3. 4. 5. 6. 7. 8. 9.]


# 6. Linear Algebra with NumPy
- NumPy's linalg module provides linear algebra operations:

In [22]:
from numpy import linalg

A = np.array([[1, 2], [3, 4]])

print("Determinant:", linalg.det(A))
print("\nInverse:")
print(linalg.inv(A))

print("\nEigenvalues and eigenvectors:")
eigenvalues, eigenvectors = linalg.eig(A)
print("Eigenvalues:", eigenvalues)
print("Eigenvectors:")
print(eigenvectors)

# Solve linear equations
B = np.array([5, 6])
print("\nSolution to linear equations:")
print(linalg.solve(A, B))

Determinant: -2.0000000000000004

Inverse:
[[-2.   1. ]
 [ 1.5 -0.5]]

Eigenvalues and eigenvectors:
Eigenvalues: [-0.37228132  5.37228132]
Eigenvectors:
[[-0.82456484 -0.41597356]
 [ 0.56576746 -0.90937671]]

Solution to linear equations:
[-4.   4.5]


# 7. Random Module
 - NumPy's random module provides various distributions:

In [23]:
# Random numbers between 0 and 1
print("Random 3x2 array:")
print(np.random.rand(3, 2))

# Random integers
print("\nRandom integers (0-9):")
print(np.random.randint(0, 10, size=(2, 3)))

# Normal distribution
print("\nNormal distribution samples:")
print(np.random.normal(0, 1, 5))

# Shuffle array
arr = np.arange(10)
np.random.shuffle(arr)
print("\nShuffled array:")
print(arr)

Random 3x2 array:
[[0.45668589 0.88793494]
 [0.80682843 0.56566332]
 [0.62203752 0.85076104]]

Random integers (0-9):
[[8 8 2]
 [5 7 3]]

Normal distribution samples:
[ 0.50650435  1.14021489 -0.50356883  0.48890505 -0.40008466]

Shuffled array:
[6 3 1 0 4 2 9 5 7 8]


# 8. Performance Tips
# Pre-allocation
- Pre-allocate arrays for better performance:

In [24]:
# Bad approach (growing array)
result = np.array([])
for i in range(1000):
    result = np.append(result, i)

# Good approach (pre-allocated)
result = np.empty(1000)
for i in range(1000):
    result[i] = i

# Vectorized Operations
- Avoid explicit loops when possible:

In [25]:
# Bad (explicit loop)
def slow_square(arr):
    result = np.empty(len(arr))
    for i in range(len(arr)):
        result[i] = arr[i] ** 2
    return result

# Good (vectorized)
def fast_square(arr):
    return arr ** 2

# Compare
large_arr = np.random.rand(1000000)
%timeit slow_square(large_arr)
%timeit fast_square(large_arr)

291 ms ± 17.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
3.31 ms ± 62.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


# 9. Advanced Techniques


# Structured Arrays
- Create arrays with compound data types:

In [27]:
# Define data type
dtype = [('name', 'U10'), ('age', 'i4'), ('height', 'f8')]

# Create structured array
data = np.array([('Alice', 25, 1.65), ('Bob', 30, 1.8)], dtype=dtype)

# Access fields
print("Names:", data['name'])
print("\nFirst record's age:", data[0]['age'])

Names: ['Alice' 'Bob']

First record's age: 25


# Masked Arrays
- Handle missing or invalid data:

In [28]:
import numpy.ma as ma

arr = np.array([1, 2, 3, -1, 5])
masked_arr = ma.masked_where(arr < 0, arr)

print("Masked array:")
print(masked_arr)
print("\nMean (ignoring masked values):", masked_arr.mean())

Masked array:
[1 2 3 -- 5]

Mean (ignoring masked values): 2.75
