# NumPy

NumPy (Numerical Python) is a fundamental library for scientific computing in Python. It provides support for large, multi-dimensional arrays and matrices, along with a collection of mathematical functions to operate on these arrays efficiently.

**Key advantages of NumPy include:**
- Performance: NumPy's array operations are implemented in C, making them much faster than Python lists
- Memory efficiency: NumPy arrays are more compact than Python lists
- Convenience: NumPy provides a wide range of mathematical functions and tools
- Integration: NumPy integrates well with other scientific Python libraries

In [None]:
# Importing NumPy
import numpy as np

## Creating NumPy Arrays

NumPy's main object is the homogeneous multidimensional array. Let's create some arrays:

In [None]:
# Creating a basic array from a list
arr1 = np.array([1, 2, 3, 4, 5])
print("1D array:", arr1)

# Creating a 2D array (matrix)
arr2 = np.array([[1, 2, 3], [4, 5, 6]])
print("\n2D array:")
print(arr2)

# Creating arrays with specific values
zeros = np.zeros((3, 4))  # 3x4 array of zeros
print("\nArray of zeros:")
print(zeros)

ones = np.ones((2, 3))  # 2x3 array of ones
print("\nArray of ones:")
print(ones)

# Creating an array with a range of values
range_arr = np.arange(0, 10, 2)  # Start, stop, step
print("\nRange array:", range_arr)

# Creating an array with evenly spaced values
lin_space = np.linspace(0, 1, 5)  # Start, stop, number of points
print("\nLinear space array:", lin_space)

# Creating an identity matrix
identity = np.eye(3)
print("\nIdentity matrix:")
print(identity)

1D array: [1 2 3 4 5]

2D array:
[[1 2 3]
 [4 5 6]]

Array of zeros:
[[0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]]

Array of ones:
[[1. 1. 1.]
 [1. 1. 1.]]

Range array: [0 2 4 6 8]

Linear space array: [0.   0.25 0.5  0.75 1.  ]

Identity matrix:
[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]


# NumPy Array Attributes
NumPy arrays have several attributes that provide information about their structure:
- shape: The dimensions of the array (rows, columns)
- ndim: The number of dimensions
- size: The total number of elements
- dtype: The data type of the elements

In [None]:
# Create a sample array
arr = np.array([[1, 2, 3], [4, 5, 6]])

# Display array attributes
print("Array:")
print(arr)
print("\nShape:", arr.shape)
print("Dimensions:", arr.ndim)
print("Size:", arr.size)
print("Data type:", arr.dtype)

# Arrays with different data types
float_arr = np.array([1.0, 2.0, 3.0])
print("\nFloat array:")
print(float_arr)
print("\nFloat array data type:", float_arr.dtype)

complex_arr = np.array([1+2j, 3+4j])
print("\nComplex array:")
print(complex_arr)
print("Complex array data type:", complex_arr.dtype)

Array:
[[1 2 3]
 [4 5 6]]

Shape: (2, 3)
Dimensions: 2
Size: 6
Data type: int64

Float array:
[1. 2. 3.]

Float array data type: float64

Complex array:
[1.+2.j 3.+4.j]
Complex array data type: complex128


# Indexing and Slicing NumPy Arrays
NumPy provides powerful indexing and slicing capabilities for accessing and modifying array elements:
- Individual elements can be accessed using indices
- Subarrays can be extracted using slicing
- Boolean indexing can be used to select elements based on conditions

In [None]:
# Create a 2D array
arr = np.array([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])
print("Original array:")
print(arr)

# Indexing
print("\nElement at position (1,2):", arr[1, 2])  # Row 1, Column 2

# Slicing
print("\nFirst two rows:")
print(arr[0:2])

print("\nAll rows, columns 1 to 3:")
print(arr[:, 1:3])

# Fancy indexing
print("\nFirst and third rows:")
print(arr[[0, 2]])

# Boolean indexing
print("\nElements greater than 6:")
print(arr[arr > 6])

# Reshaping the array
reshaped = arr.reshape((2, 6))  # Reshape to 2x6 array
print("\nReshaped array (2x6):")
print(reshaped)


Original array:
[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]

Element at position (1,2): 7

First two rows:
[[1 2 3 4]
 [5 6 7 8]]

All rows, columns 1 to 3:
[[ 2  3]
 [ 6  7]
 [10 11]]

First and third rows:
[[ 1  2  3  4]
 [ 9 10 11 12]]

Elements greater than 6:
[ 7  8  9 10 11 12]

Reshaped array (2x6):
[[ 1  2  3  4  5  6]
 [ 7  8  9 10 11 12]]


# Basic Array Operations

NumPy provides a wide range of operations that can be performed on arrays:
- Arithmetic operations (addition, subtraction, multiplication, division)
- Element-wise operations
- Matrix operations
- Comparison operations

In [None]:
# Create two arrays
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

# Arithmetic operations
print("a + b =", a + b)  # Addition
print("a - b =", a - b)  # Subtraction
print("a * b =", a * b)  # Element-wise multiplication
print("a / b =", a / b)  # Element-wise division

# Matrix multiplication
a_2d = np.array([[1, 2], [3, 4]])
b_2d = np.array([[5, 6], [7, 8]])
print("\nMatrix product (a_2d @ b_2d):")
print(a_2d @ b_2d)  # or np.matmul(a_2d, b_2d)

# Comparison operations
print("\nComparison (a < b):", a < b)
print("All elements in a < b?", np.all(a < b))
print("Any element in a > 2?", np.any(a > 2))


a + b = [5 7 9]
a - b = [-3 -3 -3]
a * b = [ 4 10 18]
a / b = [0.25 0.4  0.5 ]

Matrix product (a_2d @ b_2d):
[[19 22]
 [43 50]]

Comparison (a < b): [ True  True  True]
All elements in a < b? True
Any element in a > 2? True


# Broadcasting

Broadcasting is a powerful mechanism that allows NumPy to work with arrays of different shapes during arithmetic operations. The smaller array is "broadcast" to match the shape of the larger array.

### Broadcasting rules:
- If arrays don't have the same rank (number of dimensions), prepend the shape of the lower-rank array with 1s until both shapes have the same length
- If the shape of the arrays does not match in any dimension, expand the array with shape 1 in that dimension to match the other array
- If in any dimension the sizes disagree and neither is 1, raise an error

In [None]:
# Create arrays with different shapes
a = np.array([[1, 2, 3], [4, 5, 6]])  # 2x3 array
b = np.array([10, 20, 30])  # 1D array with 3 elements

# Broadcasting
print("Array a:")
print(a)
print("\nArray b:", b)
print("\na + b (broadcasting):")
print(a + b)  # b is broadcast to shape (2,3)

# Another broadcasting example
c = np.array([[1], [2]])  # 2x1 array
print("\nArray c:")
print(c)
print("\na + c (broadcasting):")
print(a + c)  # c is broadcast to shape (2,3)


Array a:
[[1 2 3]
 [4 5 6]]

Array b: [10 20 30]

a + b (broadcasting):
[[11 22 33]
 [14 25 36]]

Array c:
[[1]
 [2]]

a + c (broadcasting):
[[2 3 4]
 [6 7 8]]


# Mathematical Functions in NumPy
NumPy provides a comprehensive set of mathematical functions that operate element-wise on arrays:
- Trigonometric functions (sin, cos, tan)
- Exponential and logarithmic functions
- Rounding functions (floor, ceil, round)
- And many more

In [None]:
# Create an array
a = np.array([0, 30, 45, 60, 90])
a_rad = np.radians(a)  # Convert to radians

# Trigonometric functions
print("Array (degrees):", a)
print("Array (radians):", a_rad)
print("\nSine:", np.sin(a_rad))
print("Cosine:", np.cos(a_rad))

# Exponential and logarithmic functions
b = np.array([1, 2, 3, 4])
print("\nArray b:", b)
print("Exponential (e^x):", np.exp(b))
print("Natural logarithm (ln):", np.log(b))
print("Base-10 logarithm:", np.log10(b))

# Rounding functions
c = np.array([1.2, 3.7, 5.5, -2.8])
print("\nArray c:", c)
print("Rounded:", np.round(c))
print("Ceiling:", np.ceil(c))
print("Floor:", np.floor(c))

Array (degrees): [ 0 30 45 60 90]
Array (radians): [0.         0.52359878 0.78539816 1.04719755 1.57079633]

Sine: [0.         0.5        0.70710678 0.8660254  1.        ]
Cosine: [1.00000000e+00 8.66025404e-01 7.07106781e-01 5.00000000e-01
 6.12323400e-17]

Array b: [1 2 3 4]
Exponential (e^x): [ 2.71828183  7.3890561  20.08553692 54.59815003]
Natural logarithm (ln): [0.         0.69314718 1.09861229 1.38629436]
Base-10 logarithm: [0.         0.30103    0.47712125 0.60205999]

Array c: [ 1.2  3.7  5.5 -2.8]
Rounded: [ 1.  4.  6. -3.]
Ceiling: [ 2.  4.  6. -2.]
Floor: [ 1.  3.  5. -3.]


# Statistical Functions in NumPy
NumPy provides various functions for statistical analysis of arrays:
- Measures of central tendency (mean, median, mode)
- Measures of dispersion (variance, standard deviation)
- Min, max, and percentiles
- Correlation and covariance

In [None]:
# Create a sample array
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
print("Array:", arr)

# Basic statistics
print("\nMean:", np.mean(arr))
print("Median:", np.median(arr))
print("Standard deviation:", np.std(arr))
print("Variance:", np.var(arr))
print("Min:", np.min(arr))
print("Max:", np.max(arr))
print("Sum:", np.sum(arr))

# Working with 2D arrays
arr_2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print("\n2D Array:")
print(arr_2d)
print("\nColumn means:", np.mean(arr_2d, axis=0))
print("Row means:", np.mean(arr_2d, axis=1))


Array: [ 1  2  3  4  5  6  7  8  9 10]

Mean: 5.5
Median: 5.5
Standard deviation: 2.8722813232690143
Variance: 8.25
Min: 1
Max: 10
Sum: 55

2D Array:
[[1 2 3]
 [4 5 6]
 [7 8 9]]

Column means: [4. 5. 6.]
Row means: [2. 5. 8.]


# Linear Algebra with NumPy
NumPy provides a module called linalg that includes various functions for linear algebra operations:
- Matrix and vector products
- Decompositions
- Eigenvalues and eigenvectors
- Norms and other matrix operations

In [None]:
# Create matrices
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])
print("Matrix A:")
print(A)
print("\nMatrix B:")
print(B)

# Matrix operations
print("\nMatrix multiplication (A @ B):")
print(np.matmul(A, B))  # or A @ B

# Matrix determinant
print("\nDeterminant of A:", np.linalg.det(A))

# Matrix inverse
print("\nInverse of A:")
print(np.linalg.inv(A))

# Eigenvalues and eigenvectors
eigenvalues, eigenvectors = np.linalg.eig(A)
print("\nEigenvalues of A:", eigenvalues)
print("Eigenvectors of A:")
print(eigenvectors)

# Solving linear system Ax = b
b = np.array([1, 2])
x = np.linalg.solve(A, b)
print("\nSolution to Ax = b:", x)
print("Verification A @ x =", A @ x)


Matrix A:
[[1 2]
 [3 4]]

Matrix B:
[[5 6]
 [7 8]]

Matrix multiplication (A @ B):
[[19 22]
 [43 50]]

Determinant of A: -2.0000000000000004

Inverse of A:
[[-2.   1. ]
 [ 1.5 -0.5]]

Eigenvalues of A: [-0.37228132  5.37228132]
Eigenvectors of A:
[[-0.82456484 -0.41597356]
 [ 0.56576746 -0.90937671]]

Solution to Ax = b: [0.  0.5]
Verification A @ x = [1. 2.]


# Random Number Generation in NumPy
NumPy has a comprehensive random module (numpy.random) for generating random numbers with various distributions.

### Random number generation is essential for:
Simulations
Bootstrapping and resampling
Initializing weights in machine learning models
Generating synthetic data

In [None]:
# Set a seed for reproducibility
np.random.seed(42)

# Generate random numbers from a uniform distribution [0, 1)
uniform = np.random.rand(5)
print("Uniform random numbers:", uniform)

# Generate random integers
integers = np.random.randint(1, 100, 5)  # 5 random integers between 1 and 100
print("\nRandom integers:", integers)

# Generate random numbers from a normal distribution
normal = np.random.normal(loc=0, scale=1, size=5)  # mean=0, std=1
print("\nNormal distribution samples:", normal)

# Random shuffling
arr = np.arange(10)
np.random.shuffle(arr)
print("\nShuffled array:", arr)

# Random sample from an array
sample = np.random.choice(arr, size=5, replace=False)
print("\nRandom sample without replacement:", sample)

Uniform random numbers: [0.37454012 0.95071431 0.73199394 0.59865848 0.15601864]

Random integers: [83 87 75 75 88]

Normal distribution samples: [-0.88523035 -0.41218848 -0.4826188   0.16416482  0.23309524]

Shuffled array: [5 2 7 6 8 9 3 0 4 1]

Random sample without replacement: [2 9 8 4 5]


# Example: Data Analysis with NumPy

Let's demonstrate how NumPy can be used for a simple data analysis task. We'll generate synthetic data for heights (in cm) of 1000 people and perform some basic analysis:

In [None]:
# Set seed for reproducibility
np.random.seed(0)

# Generate height data for 1000 people (normal distribution with mean=170cm, std=7cm)
heights = np.random.normal(170, 7, 1000)

# Basic statistics
print("Height Statistics:")
print(f"Mean: {np.mean(heights):.2f} cm")
print(f"Median: {np.median(heights):.2f} cm")
print(f"Minimum: {np.min(heights):.2f} cm")
print(f"Maximum: {np.max(heights):.2f} cm")
print(f"Standard Deviation: {np.std(heights):.2f} cm")

# Percentiles
percentiles = np.percentile(heights, [25, 50, 75])
print(f"\n25th Percentile: {percentiles[0]:.2f} cm")
print(f"50th Percentile: {percentiles[1]:.2f} cm (same as median)")
print(f"75th Percentile: {percentiles[2]:.2f} cm")

# Count people in different height ranges
short = np.sum(heights < 160)
medium = np.sum((heights >= 160) & (heights < 180))
tall = np.sum(heights >= 180)

print(f"\nHeight Distribution:")
print(f"Short (<160cm): {short} people ({short/10:.1f}%)")
print(f"Medium (160-180cm): {medium} people ({medium/10:.1f}%)")
print(f"Tall (>180cm): {tall} people ({tall/10:.1f}%)")

# Normalize the heights (z-score)
normalized_heights = (heights - np.mean(heights)) / np.std(heights)
print(f"\nNormalized Heights (first 5):")
print(normalized_heights[:5])
print(f"Mean of normalized heights: {np.mean(normalized_heights):.10f}")
print(f"Std of normalized heights: {np.std(normalized_heights):.10f}")


Height Statistics:
Mean: 169.68 cm
Median: 169.59 cm
Minimum: 148.68 cm
Maximum: 189.32 cm
Standard Deviation: 6.91 cm

25th Percentile: 165.11 cm
50th Percentile: 169.59 cm (same as median)
75th Percentile: 174.25 cm

Height Distribution:
Short (<160cm): 78 people (7.8%)
Medium (160-180cm): 850 people (85.0%)
Tall (>180cm): 72 people (7.2%)

Normalized Heights (first 5):
[1.83307829 0.4512654  1.0374471  2.31618349 1.93794371]
Mean of normalized heights: 0.0000000000
Std of normalized heights: 1.0000000000
