# Introduction to NumPy
## A Comprehensive Guide for Data Science Students

# Introduction to NumPy

## What is NumPy?

NumPy, short for Numerical Python, is a foundational package for numerical computations in Python. It provides support for large multi-dimensional arrays and matrices, along with a collection of mathematical functions to operate on these arrays.

## Why NumPy?

- **Performance**: NumPy operations are implemented in C, making them much faster than standard Python.
- **Memory Efficiency**: NumPy arrays are more compact and allow faster access in reading/writing items.
- **Convenience**: Offers a lot of mathematical functions and operations out of the box.
- **Interoperability**: Can integrate with a vast amount of other libraries and tools in the Python ecosystem.

## Installing NumPy

To install NumPy, you can use pip:
```
pip install numpy
```
Once installed, you can import it in your Python script or Jupyter notebook as follows:
```python
import numpy as np
```

In [None]:
import numpy as np

## Basics of NumPy Arrays

At the core of NumPy is the `ndarray` object, which encapsulates n-dimensional arrays of homogeneous data types. Let's explore some basic operations and properties of NumPy arrays.

In [None]:
# Creating a 1D array
arr_1d = np.array([1, 2, 3, 4, 5])
arr_1d

In [None]:
# Creating a 2D array
arr_2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
arr_2d

### Array Attributes

NumPy arrays have several attributes that give information about the array's size, shape, and data type:

In [None]:
# Attributes of the 1D array
print('Shape:', arr_1d.shape)
print('Size:', arr_1d.size)
print('Number of dimensions:', arr_1d.ndim)
print('Data type:', arr_1d.dtype)

In [None]:
# Attributes of the 2D array
print('Shape:', arr_2d.shape)
print('Size:', arr_2d.size)
print('Number of dimensions:', arr_2d.ndim)
print('Data type:', arr_2d.dtype)

### Array Indexing and Slicing

Just like Python lists, NumPy arrays can be indexed and sliced. This allows for efficient access to and modification of the array's contents.

In [None]:
# Indexing a 1D array
print('First element:', arr_1d[0])
print('Second element:', arr_1d[1])
print('Last element:', arr_1d[-1])

In [None]:
# Slicing a 1D array
print('First three elements:', arr_1d[:3])
print('Elements from index 2 to 4:', arr_1d[2:5])
print('Every second element:', arr_1d[::2])

In [None]:
# Indexing a 2D array
print('Element at (0,0):', arr_2d[0, 0])
print('Element at (1,2):', arr_2d[1, 2])
print('Second row:', arr_2d[1])

In [None]:
# Slicing a 2D array
print('First two rows and first two columns:\n', arr_2d[:2, :2])
print('All rows, every other column:\n', arr_2d[:, ::2])

### Array Operations

NumPy arrays support a variety of operations, both unary (operations with one operand) and binary (operations with two operands). These operations are performed element-wise, which means they are applied to each element of the array separately.

In [None]:
# Unary operations
print('Original array:\n', arr_1d)
print('Array + 5:\n', arr_1d + 5)
print('Array squared:\n', arr_1d**2)

In [None]:
# Binary operations
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
print('arr1:', arr1)
print('arr2:', arr2)
print('arr1 + arr2:', arr1 + arr2)
print('arr1 * arr2:', arr1 * arr2)

### Mathematical Functions

NumPy provides a comprehensive set of mathematical functions that can be applied element-wise to arrays. These include trigonometric, logarithmic, exponential, and statistical functions, among others.

In [None]:
# Some mathematical functions
print('Sin values:', np.sin(arr_1d))
print('Natural logarithm:', np.log(arr_1d))
print('Exponential:', np.exp(arr_1d))

### Aggregation Functions

NumPy provides functions to compute aggregated values like sum, mean, median, etc. These can be applied to the entire array or along a specified axis in case of multi-dimensional arrays.

In [None]:
# Aggregation functions on 1D array
print('Sum:', np.sum(arr_1d))
print('Mean:', np.mean(arr_1d))
print('Standard Deviation:', np.std(arr_1d))

In [None]:
# Aggregation functions on 2D array
print('Total Sum:', np.sum(arr_2d))
print('Sum along columns:', np.sum(arr_2d, axis=0))
print('Sum along rows:', np.sum(arr_2d, axis=1))

### Broadcasting

Broadcasting is a powerful feature in NumPy that allows for operations between arrays of different shapes. It does this by 'stretching' the smaller array to match the shape of the larger array, without actually copying any data.

In [None]:
# Broadcasting in action
arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print('Original array:\n', arr)

# Adding a scalar to a 2D array
print('\nArray after adding 5:\n', arr + 5)

# Adding a 1D array to a 2D array
vec = np.array([1, 0, -1])
print('\nArray after adding [1, 0, -1]:\n', arr + vec)

### Advanced Indexing

Apart from basic indexing and slicing, NumPy offers more advanced indexing techniques. This includes integer array indexing and boolean indexing.

In [None]:
# Integer array indexing
print('Original array:\n', arr_1d)
indices = np.array([1, 3, 4])
print('Elements at indices 1, 3, and 4:', arr_1d[indices])

In [None]:
# Boolean indexing
print('Original array:\n', arr_1d)
mask = arr_1d > 3
print('Mask of elements greater than 3:', mask)
print('Elements greater than 3:', arr_1d[mask])

### Reshaping and Transposing

NumPy provides functionalities to change the shape of arrays without changing their data. This is particularly useful when you need to prepare data for certain libraries or operations that expect data in a particular shape.

In [None]:
# Reshaping an array
print('Original 2D array:\n', arr_2d)
reshaped = arr_2d.reshape(1, 9)
print('\nReshaped to 1x9 array:\n', reshaped)

In [None]:
# Transposing an array
print('Original 2D array:\n', arr_2d)
transposed = arr_2d.T
print('\nTransposed array:\n', transposed)

### Conclusion

This was a brief introduction to NumPy, one of the most fundamental packages for numerical computations in Python. We've covered basic array operations, mathematical functions, indexing techniques, and reshaping methods. There's a lot more to explore in NumPy, and practicing with real-world data will solidify your understanding. Happy computing!

In [None]:
# Aggregation functions on 1D array
print('Sum:', np.sum(arr_1d))
print('Mean:', np.mean(arr_1d))
print('Median:', np.median(arr_1d))
print('Standard Deviation:', np.std(arr_1d))

In [None]:
# Aggregation functions on 2D array
print('Sum of all elements:', np.sum(arr_2d))
print('Mean of all elements:', np.mean(arr_2d))
print('Sum along columns:', np.sum(arr_2d, axis=0))
print('Mean along rows:', np.mean(arr_2d, axis=1))

### Broadcasting

Broadcasting is a powerful feature in NumPy that allows you to perform operations on arrays of different shapes. It does this by automatically expanding the dimensions of the smaller array to match the larger array, without making copies of the data.

In [None]:
# Demonstrating broadcasting
arr3 = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
arr4 = np.array([1, 0, 1])

# Broadcasting arr4 onto arr3
result = arr3 + arr4
result

### Advanced Indexing

NumPy provides more advanced indexing mechanisms than regular Python lists. This includes integer array indexing and boolean indexing.

In [None]:
# Integer array indexing
indices = np.array([0, 2])
print('Selecting elements at 0th and 2nd index:', arr_1d[indices])

In [None]:
# Boolean indexing
mask = arr_1d > 3
print('Elements greater than 3:', arr_1d[mask])

### Reshaping and Transposing

NumPy provides functionalities to change the shape of arrays without changing their data. This is particularly useful when you need to prepare data for certain types of operations or visualizations.

In [None]:
# Reshaping an array
reshaped_arr = arr_1d.reshape(5, 1)
reshaped_arr

In [None]:
# Transposing an array
transposed_arr = arr_2d.T
transposed_arr

### Conclusion

This introduction provides a glimpse into the capabilities of NumPy. The library offers a wide range of functionalities for numerical computing, making it an essential tool for data scientists, researchers, and engineers working with Python. Practice and exploration are key to mastering NumPy and leveraging its full potential in your projects.

### Applying User-Defined Functions to Arrays

Often, you might want to apply a custom function to each element of a NumPy array. NumPy provides the `np.vectorize()` method to achieve this. This method transforms a function that accepts and produces scalars into a function that operates on arrays.

Let's explore how to use `np.vectorize()` with both standard functions (defined using `def`) and lambda functions.

In [None]:
# Using a standard function defined with 'def'
def square_and_add_five(x):
    return x**2 + 5

vectorized_function = np.vectorize(square_and_add_five)

result_def = vectorized_function(arr_1d)
result_def

In [None]:
# Using a lambda function
lambda_function = lambda x: x**2 - 3

vectorized_lambda = np.vectorize(lambda_function)

result_lambda = vectorized_lambda(arr_1d)
result_lambda

### Understanding Array Shapes: (n, 1) vs. (n,)

In NumPy, the shape of an array is represented as a tuple indicating the size of each dimension. The shapes `(n, 1)` and `(n,)` might seem similar, but they represent different structures:

- `(n,)`: Represents a one-dimensional array with `n` elements.
- `(n, 1)`: Represents a two-dimensional array with `n` rows and 1 column.

Let's explore these shapes in more detail and see how to convert between them.

In [None]:
# Creating an array with shape (n,)
one_d_array = np.array([1, 2, 3, 4, 5])
print('1D array:', one_d_array)
print('Shape:', one_d_array.shape)

# Reshaping to (n, 1)
two_d_array = one_d_array.reshape(-1, 1)
print('\n2D array:\n', two_d_array)
print('Shape:', two_d_array.shape)

In [None]:
# Converting back to shape (n,)
converted_one_d_array = two_d_array.reshape(-1)
print('Converted 1D array:', converted_one_d_array)
print('Shape:', converted_one_d_array.shape)

### The `flatten` Method

Another way to convert a multi-dimensional array into a one-dimensional array is by using the `flatten` method. This method returns a copy of the original array, flattened to one dimension.

In [None]:
# Using the flatten method
flattened_array = two_d_array.flatten()
print('Flattened array:', flattened_array)
print('Shape:', flattened_array.shape)

### The `ravel` Method

The `ravel` method is another way to flatten multi-dimensional arrays into one dimension. It functions similarly to the `flatten` method but with a key difference:

- `flatten` always returns a copy of the data.
- `ravel` returns a flattened view of the original array whenever possible. This means that if no copy is required, the original data is used, making `ravel` more memory efficient. However, if a copy is necessary (for example, due to non-contiguous memory layout), then `ravel` will return a copy just like `flatten`.

Because of this behavior, modifications to the array returned by `ravel` might affect the original array, whereas modifications to the array returned by `flatten` will never affect the original array.

In [None]:
# Using the ravel method
raveled_array = two_d_array.ravel()
print('Raveled array:', raveled_array)
print('Shape:', raveled_array.shape)

### Inner Products and Matrix Multiplication

Matrix operations are fundamental in linear algebra and have extensive applications in data science, especially in areas like machine learning.

#### Inner Product (Dot Product)
The inner product, also known as the dot product, between two vectors is a single number obtained by multiplying corresponding entries and then summing those products. For two vectors `a` and `b`, the dot product is given by:

a . b = a1 b1 + a2 b2 + ... + an bn

#### Matrix Multiplication
Matrix multiplication, on the other hand, is a way to combine two matrices to produce a new matrix. It's defined such that the number in the i-th row and j-th column of the resulting matrix is the dot product of the i-th row of the first matrix and the j-th column of the second matrix.

In Python, with NumPy, the `@` operator is used as a convenient way to perform matrix multiplication. It's more readable and concise than using the `np.dot()` function.

In [None]:
# Demonstrating the inner product (dot product)
vector_a = np.array([1, 2, 3])
vector_b = np.array([4, 5, 6])

dot_product = vector_a @ vector_b
dot_product

In [None]:
# Demonstrating matrix multiplication using @
matrix_A = np.array([[1, 2], [3, 4]])
matrix_B = np.array([[2, 0], [1, 3]])

result_matrix = matrix_A @ matrix_B
result_matrix

### Aggregation Functions Along Axes

In multi-dimensional arrays (like matrices), you often need to apply aggregation functions not just to the entire array but along a specific axis (like rows or columns) or multiple axes. The `axis` parameter in NumPy's aggregation functions allows you to specify which axis or axes the function should operate over.

- `axis=None` (default): The function aggregates over all the elements of the array.
- `axis=0`: The function aggregates over each column (i.e., along the rows).
- `axis=1`: The function aggregates over each row (i.e., along the columns).

For arrays with more than two dimensions, you can use a tuple to specify multiple axes.

Let's explore this with examples.

In [None]:
# Creating a 3x3 matrix for demonstration
matrix = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
matrix

In [None]:
# Sum over the entire matrix (default behavior)
total_sum = np.sum(matrix)
total_sum

In [None]:
# Sum along axis=0 (sum over each column)
column_sum = np.sum(matrix, axis=0)
column_sum

In [None]:
# Sum along axis=1 (sum over each row)
row_sum = np.sum(matrix, axis=1)
row_sum

### The `np.random` Module

The `np.random` module in NumPy provides a suite of functions to generate random numbers for various distributions. It's a crucial tool for simulations, statistical sampling, and many other tasks in data science and scientific computing. Here are some of the important elements of the `np.random` module:

1. **Random Number Generation**:
   - `rand()`: Generates random numbers between 0 and 1 in a given shape.
   - `randn()`: Generates random numbers from a standard normal distribution (mean 0 and variance 1).
   - `randint()`: Generates random integers between specified low and high values.

2. **Random Sampling**:
   - `choice()`: Generates a random sample from a given 1-D array.
   - `shuffle()`: Modifies a sequence in-place by shuffling its contents.
   - `permutation()`: Returns a shuffled version of a sequence or returns a permuted range.

3. **Distributions**:
   - `binomial()`: Draws samples from a binomial distribution.
   - `normal()`: Draws samples from a normal (Gaussian) distribution.
   - `poisson()`: Draws samples from a Poisson distribution.
   - ... and many more for other distributions.

4. **Random Seed**:
   - `seed()`: Sets the random seed, which allows for reproducibility of random numbers generated.

Let's explore some of these functions with examples.

In [None]:
# Random Number Generation

# Generating random numbers between 0 and 1
random_numbers = np.random.rand(5)
print('Random numbers between 0 and 1:', random_numbers)

# Generating random numbers from a standard normal distribution
normal_numbers = np.random.randn(5)
print('\nRandom numbers from a standard normal distribution:', normal_numbers)

# Generating random integers between 1 and 10
random_integers = np.random.randint(1, 10, size=5)
print('\nRandom integers between 1 and 10:', random_integers)

In [None]:
# Random Sampling

# Generating a random sample from a given 1-D array
sample_array = np.array([10, 20, 30, 40, 50])
random_choice = np.random.choice(sample_array, size=3)
print('Random sample from given array:', random_choice)

# Shuffling a sequence in-place
sequence_to_shuffle = np.array([1, 2, 3, 4, 5])
np.random.shuffle(sequence_to_shuffle)
print('\nShuffled sequence:', sequence_to_shuffle)

# Getting a permuted range
permuted_range = np.random.permutation(5)
print('\nPermuted range:', permuted_range)

In [None]:
# Distributions

# Drawing samples from a binomial distribution
binomial_samples = np.random.binomial(n=10, p=0.5, size=5)
print('Samples from a binomial distribution:', binomial_samples)

# Drawing samples from a normal distribution with mean 0 and standard deviation 1
normal_samples = np.random.normal(loc=0, scale=1, size=5)
print('\nSamples from a normal distribution:', normal_samples)

# Drawing samples from a Poisson distribution with lambda=3
poisson_samples = np.random.poisson(lam=3, size=5)
print('\nSamples from a Poisson distribution:', poisson_samples)

### The `np.linalg` Module

The `np.linalg` module in NumPy provides a collection of linear algebra functions. Linear algebra is a foundational field in mathematics and has vast applications in data science, engineering, physics, and more. Here are some of the important elements of the `np.linalg` module:

1. **Matrix and Vector Products**:
   - `dot()`: Computes the dot product of two arrays.
   - `matmul()`: Performs matrix multiplication.
   - `inner()`: Computes the inner product of two arrays.
   - `outer()`: Computes the outer product of two arrays.

2. **Decompositions**:
   - `cholesky()`: Computes the Cholesky decomposition.
   - `qr()`: Computes the QR decomposition.
   - `svd()`: Computes the singular value decomposition.

3. **Matrix Eigenvalues**:
   - `eig()`: Computes the eigenvalues and right eigenvectors of a square array.
   - `eigh()`: Computes the eigenvalues and eigenvectors of a Hermitian or symmetric matrix.
   - `eigvals()`: Computes the eigenvalues of a square array.

4. **Norms and Other Numbers**:
   - `norm()`: Computes the norm of a matrix or vector.
   - `det()`: Computes the determinant of an array.
   - `matrix_rank()`: Computes the numerical rank of a matrix.

5. **Solving Equations and Inverting Matrices**:
   - `solve()`: Solves a linear matrix equation.
   - `inv()`: Computes the multiplicative inverse of a matrix.

6. **Tensor Operations**:
   - `tensorsolve()`: Solves a tensor equation denoted by Einstein summation.
   - `tensorinv()`: Computes the inverse of a tensor.

Let's explore some of these functions with examples.

In [None]:
# Matrix and Vector Products

# Creating two matrices for demonstration
A = np.array([[1, 2], [3, 4]])
B = np.array([[2, 0], [1, 3]])

# Matrix multiplication using matmul()
matrix_product = np.matmul(A, B)
print('Matrix product using matmul():\n', matrix_product)

# Dot product of two vectors
v1 = np.array([1, 2, 3])
v2 = np.array([4, 5, 6])
dot_product = np.dot(v1, v2)
print('\nDot product of v1 and v2:', dot_product)

In [None]:
# Decompositions

# Singular Value Decomposition (SVD) of matrix A
U, S, VT = np.linalg.svd(A)
print('U from SVD:\n', U)
print('\nS (Diagonal matrix) from SVD:', S)
print('\nVT from SVD:\n', VT)

In [None]:
# Matrix Eigenvalues

# Eigenvalues and eigenvectors of matrix A
eigenvalues, eigenvectors = np.linalg.eig(A)
print('Eigenvalues of matrix A:', eigenvalues)
print('\nEigenvectors of matrix A:\n', eigenvectors)

In [None]:
# Norms and Other Numbers

# Norm of vector v1
vector_norm = np.linalg.norm(v1)
print('Norm of vector v1:', vector_norm)

# Determinant of matrix A
matrix_determinant = np.linalg.det(A)
print('\nDeterminant of matrix A:', matrix_determinant)

# Rank of matrix A
matrix_rank = np.linalg.matrix_rank(A)
print('\nRank of matrix A:', matrix_rank)

In [None]:
# Solving Equations and Inverting Matrices

# Solving a linear matrix equation Ax = b
b = np.array([5, 11])
x = np.linalg.solve(A, b)
print('Solution x for Ax = b:', x)

# Inverse of matrix A
inverse_A = np.linalg.inv(A)
print('\nInverse of matrix A:\n', inverse_A)

### Utility Functions for Creating Arrays

NumPy provides several utility functions to create specific types of arrays with ease. Here are explanations for the functions you mentioned:

1. **`np.ones`**:
   - This function creates an array filled with ones.
   - The desired shape of the array is provided as an argument.
   - Example: `np.ones((2, 3))` creates a 2x3 matrix filled with ones.

2. **`np.zeros`**:
   - This function creates an array filled with zeros.
   - The desired shape of the array is provided as an argument.
   - Example: `np.zeros((3, 3))` creates a 3x3 matrix filled with zeros.

3. **`np.diag`**:
   - This function has two main uses:
     - When provided with a 1-D array, it returns a 2-D array with the entries of the 1-D array as its diagonal and zeros elsewhere.
     - When provided with a 2-D array, it extracts the diagonal elements and returns them as a 1-D array.
   - Example: `np.diag([1, 2, 3])` creates a 3x3 diagonal matrix with the diagonal [1, 2, 3].

4. **`np.identity`**:
   - This function creates a square identity matrix of a specified size.
   - An identity matrix is a square matrix with ones on its main diagonal and zeros elsewhere.
   - Example: `np.identity(3)` creates a 3x3 identity matrix.

Let's demonstrate each of these functions with examples.

In [None]:
# Demonstrating np.ones
ones_array = np.ones((2, 3))
print('Array filled with ones:\n', ones_array)

# Demonstrating np.zeros
zeros_array = np.zeros((3, 3))
print('\nArray filled with zeros:\n', zeros_array)

# Demonstrating np.diag
diagonal_matrix = np.diag([1, 2, 3])
print('\nDiagonal matrix:\n', diagonal_matrix)
extracted_diagonal = np.diag(diagonal_matrix)
print('\nExtracted diagonal from matrix:', extracted_diagonal)

# Demonstrating np.identity
identity_matrix = np.identity(3)
print('\nIdentity matrix:\n', identity_matrix)

### `np.concatenate`

The `np.concatenate` function is used to join two or more arrays along an existing axis. It's a versatile function that allows for the combination of arrays in various ways.

**Parameters**:
- `a1, a2, ...` : Arrays to be concatenated. They must have the same shape, except in the dimension corresponding to the specified axis.
- `axis` : The axis along which the arrays will be joined. Default is 0.
- `out` : If provided, the destination to place the result. The shape must be correct, matching that of what `concatenate` would have returned if no `out` argument were specified.

**Usage**:
```python
np.concatenate((a1, a2, ...), axis=0, out=None)
```

Let's look at some examples to understand how `np.concatenate` works.

In [None]:
# Creating two 1-D arrays for demonstration
array1 = np.array([1, 2, 3])
array2 = np.array([4, 5, 6])

# Concatenating along axis 0 (default)
concatenated_1d = np.concatenate((array1, array2))
print('Concatenated 1-D array:', concatenated_1d)

# Creating two 2-D arrays for demonstration
matrix1 = np.array([[1, 2], [3, 4]])
matrix2 = np.array([[5, 6], [7, 8]])

# Concatenating along axis 0 (rows)
concatenated_rows = np.concatenate((matrix1, matrix2), axis=0)
print('\nConcatenated along rows:\n', concatenated_rows)

# Concatenating along axis 1 (columns)
concatenated_columns = np.concatenate((matrix1, matrix2), axis=1)
print('\nConcatenated along columns:\n', concatenated_columns)

### `np.arange`

The `np.arange` function is used to generate sequences of numbers within a specified range. It's similar to Python's built-in `range` function but returns an array.

**Syntax**:
```python
np.arange([start, ]stop, [step, ]dtype=None)
```

**Parameters**:
- `start` : Start of the interval (inclusive). Default is 0.
- `stop` : End of the interval (exclusive).
- `step` : Spacing between values. Default is 1.
- `dtype` : Data type of the output array. If not specified, the data type is inferred from the input values.

The function returns evenly spaced values within a given interval. The values are generated based on the `start`, `stop`, and `step` values provided.

Let's look at some examples to understand how `np.arange` works.

In [None]:
# Demonstrating np.arange

# Generating numbers from 0 to 4
sequence1 = np.arange(5)
print('Numbers from 0 to 4:', sequence1)

# Generating numbers from 2 to 8 with a step of 2
sequence2 = np.arange(2, 9, 2)
print('\nNumbers from 2 to 8 with a step of 2:', sequence2)

# Generating numbers from 1 to 5 with a float step
sequence3 = np.arange(1, 5.5, 0.5)
print('\nNumbers from 1 to 5 with a float step:', sequence3)