# Detailed Introduction to NumPy for Data Science

NumPy (Numerical Python) is the foundational package for numerical computing in Python. It offers powerful data structures like arrays and matrices, and a variety of functions to perform operations on these structures efficiently. Understanding NumPy is crucial for data science, as it forms the basis of many other libraries like pandas, scikit-learn, and TensorFlow.

## 1. Importing NumPy

We start by importing the NumPy library. It is standard practice to import NumPy as `np` to keep your code concise and clear.

In [None]:
import numpy as np

## 2. Creating NumPy Arrays

NumPy arrays are the core of the NumPy library. They are similar to Python lists but are more powerful due to their ability to handle large datasets and perform operations efficiently.

### 2.1. Creating Arrays from Lists
You can create a NumPy array from a Python list using the `np.array()` function.

In [None]:
# Creating a NumPy array from a Python list
array_from_list = np.array([1, 2, 3, 4, 5])
print("Array from list:", array_from_list)

### 2.2. Arrays with Specific Values
NumPy provides functions to create arrays filled with zeros, ones, or a specific range of values. These are useful for initializing data structures.

In [None]:
# Creating an array of zeros
zeros_array = np.zeros((3, 3))  # 3x3 matrix of zeros
print("Zeros Array:\n", zeros_array)

# Creating an array of ones
ones_array = np.ones((2, 4))  # 2x4 matrix of ones
print("Ones Array:\n", ones_array)

# Creating an array with a specific range
range_array = np.arange(0, 10, 2)  # Array with values [0, 2, 4, 6, 8]
print("Range Array:", range_array)

# Creating an array with random numbers
random_array = np.random.random((2, 2))  # 2x2 matrix of random numbers between 0 and 1
print("Random Array:\n", random_array)

### 2.3. Array of Evenly Spaced Numbers
The `np.linspace` function creates an array of evenly spaced numbers over a specified range. This is particularly useful for generating values for plotting graphs.

In [None]:
# Creating an array of 10 evenly spaced numbers between 0 and 1
linspace_array = np.linspace(0, 1, 10)
print("Linspace Array:", linspace_array)

## 3. Array Attributes

NumPy arrays have several attributes that are useful for understanding the properties of the array.

### 3.1. Array Shape
The shape of an array is a tuple that gives the size of the array along each dimension.

In [None]:
array = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print("Shape of the array:", array.shape)

### 3.2. Number of Dimensions
The `ndim` attribute returns the number of dimensions (axes) of the array.

In [None]:
print("Number of dimensions:", array.ndim)

### 3.3. Array Size
The `size` attribute returns the total number of elements in the array.

In [None]:
print("Size of the array:", array.size)

## 4. Reshaping Arrays

Reshaping arrays is a common operation, especially when preparing data for machine learning models. The `reshape` method allows you to change the shape of an array without changing its data.

In [None]:
# Reshaping a 1D array to a 2D array
reshaped_array = np.arange(12).reshape((3, 4))  # 3x4 matrix
print("Reshaped Array (3x4):\n", reshaped_array)

### Reshaping for Machine Learning
Reshaping is particularly useful in machine learning where you need to transform data into a shape suitable for training models, such as converting 1D arrays into 2D feature matrices.

In [None]:
# Example: Reshaping data for machine learning
data = np.arange(24).reshape((6, 4))  # 6 samples, 4 features each
print("Data reshaped for machine learning:\n", data)

## 5. Indexing and Slicing Arrays

Indexing and slicing are powerful tools for accessing specific elements or subarrays within a larger array. This allows you to manipulate parts of the data without affecting the entire dataset.

### 5.1. Basic Indexing
You can access individual elements in a NumPy array by specifying their indices.

In [None]:
# Accessing elements
element = array[1, 2]  # Element at row index 1, column index 2 (value 6)
print("Element at index [1, 2]:", element)

### 5.2. Slicing
Slicing allows you to extract subarrays from an array by specifying a range of indices. This is useful for working with subsets of data.

In [None]:
# Slicing to get a subarray
subarray = array[0:2, 1:3]  # Subarray from rows 0 to 1 and columns 1 to 2
print("Subarray:\n", subarray)

# Another example: selecting all rows and specific columns
col_slice = array[:, 1:3]  # All rows, columns 1 to 2
print("Column Slice (all rows, cols 1 to 2):\n", col_slice)

### 5.3. Advanced Indexing
Advanced indexing allows you to use arrays of indices to access multiple elements at once. This is useful for more complex data selections.

In [None]:
# Using a list of indices to access specific elements
indexed_elements = array[[0, 1], [1, 2]]  # Elements (0, 1) and (1, 2)
print("Indexed Elements:", indexed_elements)

## 6. Array Operations

NumPy supports a wide range of operations on arrays, including element-wise operations, aggregation functions, and more. These operations are highly optimized for performance.

### 6.1. Element-wise Operations
Element-wise operations perform arithmetic or mathematical operations on each element of the array individually.

In [None]:
# Element-wise addition
array1 = np.array([1, 2, 3])
array2 = np.array([4, 5, 6])
sum_array = array1 + array2  # [5, 7, 9]
print("Element-wise Sum:", sum_array)

# Element-wise multiplication
prod_array = array1 * array2  # [4, 10, 18]
print("Element-wise Product:", prod_array)

# Using mathematical functions
sqrt_array = np.sqrt(array1)  # [1.0, 1.414, 1.732]
print("Square Root of array1:", sqrt_array)

### 6.2. Aggregation Functions
Aggregation functions summarize data across different dimensions of the array, such as computing sums, means, or maximum values.

In [None]:
array = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

# Summing all elements
total_sum = array.sum()  # 45
print("Total Sum:", total_sum)

# Finding the maximum value
max_value = array.max()  # 9
print("Maximum Value:", max_value)

# Computing the mean
mean_value = array.mean()  # 5.0
print("Mean Value:", mean_value)

### 6.3. Axis-Based Operations
Aggregation functions can also be applied along specific axes, which is useful for operations like summing rows or columns.

In [None]:
# Sum along the rows (axis 1)
row_sum = array.sum(axis=1)  # [6, 15, 24]
print("Sum of Rows:", row_sum)

# Mean along the columns (axis 0)
col_mean = array.mean(axis=0)  # [4.0, 5.0, 6.0]
print("Mean of Columns:", col_mean)

## 7. Boolean Indexing

Boolean indexing allows you to filter arrays based on conditions, making it easy to select data that meets certain criteria.

In [None]:
# Boolean indexing example
array = np.array([1, 2, 3, 4, 5, 6])

# Selecting elements greater than 3
greater_than_three = array[array > 3]  # [4, 5, 6]
print("Elements greater than 3:", greater_than_three)

## 8. Combining and Splitting Arrays

Combining and splitting arrays are useful operations for data manipulation. You can concatenate multiple arrays into one or split a single array into multiple smaller arrays.

### 8.1. Concatenating Arrays
Concatenating arrays allows you to join them along a specific axis using functions like `np.concatenate`, `np.vstack`, and `np.hstack`.

In [None]:
# Concatenating arrays vertically (axis 0)
array1 = np.array([[1, 2], [3, 4]])
array2 = np.array([[5, 6]])
vertical_stack = np.vstack((array1, array2))
print("Vertical Stack:\n", vertical_stack)

# Concatenating arrays horizontally (axis 1)
horizontal_stack = np.hstack((array1, array2.T))  # Transpose array2 for horizontal stack
print("Horizontal Stack:\n", horizontal_stack)

### 8.2. Splitting Arrays
Splitting an array into multiple smaller arrays can be done using `np.split`, `np.vsplit`, and `np.hsplit`.

In [None]:
# Splitting an array into 3 sub-arrays along axis 1
array = np.arange(9).reshape(3, 3)
split_arrays = np.split(array, 3, axis=1)
print("Original Array:\n", array)
print("Split Arrays:")
for i, sub_array in enumerate(split_arrays):
    print(f"Sub-array {i+1}:\n", sub_array)

## 9. Array Broadcasting

Broadcasting allows you to perform operations on arrays of different shapes, which can be very powerful for mathematical operations.

In [None]:
# Broadcasting example
array = np.array([1, 2, 3])
scalar = 2

# Adding a scalar to an array
broadcasted_array = array + scalar  # [3, 4, 5]
print("Broadcasted Array:", broadcasted_array)

# Broadcasting with a 2D array
array_2d = np.array([[1, 2, 3], [4, 5, 6]])
scalar_2d = 10
broadcasted_array_2d = array_2d + scalar_2d  # Adding 10 to each element
print("Broadcasted 2D Array:\n", broadcasted_array_2d)

## 10. Array Copying and Views

In NumPy, arrays can be copied or viewed. A copy creates a new array with its own data, while a view creates a new array object that references the original data. Modifications to a view will affect the original array, while modifications to a copy will not.

In [None]:
# Creating an original array
original_array = np.array([1, 2, 3, 4])

# Creating a copy
array_copy = original_array.copy()
array_copy[0] = 100  # Modify copy
print("Original Array:", original_array)
print("Modified Copy:", array_copy)

# Creating a view
array_view = original_array.view()
array_view[0] = 200  # Modify view
print("Original Array after modifying view:", original_array)
print("View:", array_view)

## 11. Mathematical Functions

NumPy provides a wide range of mathematical functions that can be applied element-wise to arrays. These functions include trigonometric functions, exponentiation, logarithms, and more.

In [None]:
# Applying mathematical functions
array = np.array([1, 4, 9, 16])

# Square root
sqrt_array = np.sqrt(array)  # [1.0, 2.0, 3.0, 4.0]
print("Square Root:", sqrt_array)

# Exponential
exp_array = np.exp(array)  # [2.718, 54.598, 810.558, 8882.208]
print("Exponential:", exp_array)

# Logarithm
log_array = np.log(array)  # [0.0, 1.386, 2.197, 2.773]
print("Logarithm:", log_array)

## 12. Linear Algebra Operations

NumPy includes several functions for performing linear algebra operations such as matrix multiplication, determinants, and inverses. These are essential for many applications in data science and machine learning.

In [None]:
# Linear algebra operations
matrix1 = np.array([[1, 2], [3, 4]])
matrix2 = np.array([[5, 6], [7, 8]])

# Matrix multiplication
matrix_product = np.dot(matrix1, matrix2)
print("Matrix Product:\n", matrix_product)

# Determinant
det_matrix1 = np.linalg.det(matrix1)
print("Determinant of matrix1:", det_matrix1)

# Inverse
inverse_matrix1 = np.linalg.inv(matrix1)
print("Inverse of matrix1:\n", inverse_matrix1)

## Conclusion

This notebook provided a comprehensive introduction to NumPy for data manipulation. It covered array creation, reshaping, indexing, mathematical operations, broadcasting, copying, and linear algebra. Mastery of these concepts is fundamental for effective data manipulation and analysis in Python.