<img src="./images/banner.png" width="800">

# Introduction to NumPy

**NumPy** (Numerical Python) is a fundamental package for scientific computing in Python. It provides support for large, multi-dimensional arrays and matrices, along with a vast collection of mathematical functions to operate on these arrays efficiently.

At its core, NumPy offers the `ndarray` object, which is a powerful n-dimensional array data structure. These arrays enable efficient storage and manipulation of large datasets, making NumPy an essential tool for data science, machine learning, and computational physics applications.


<img src="./images/numpy-arrays.png" width="800">

Some key features of NumPy include:

- **Efficient memory usage:** NumPy arrays are stored in contiguous memory blocks, allowing for faster access and computation compared to traditional Python lists.
- **Vectorized operations:** NumPy provides a wide range of mathematical functions that operate on entire arrays, eliminating the need for explicit loops and resulting in concise and efficient code.
- **Broadcasting:** NumPy allows arrays with different shapes to be used in arithmetic operations, enabling efficient and intuitive computations.


Here's a simple example showcasing the creation of a NumPy array:

In [1]:
import numpy as np

arr = np.array([1, 2, 3, 4, 5])
arr

array([1, 2, 3, 4, 5])

NumPy was created in 2005 by Travis Oliphant, building upon the earlier Numeric library. Its development was driven by the need for a powerful numerical computing tool in Python that could efficiently handle large arrays and matrices.


Over the years, NumPy has become an integral part of the scientific Python ecosystem. It serves as the foundation for many other popular libraries, such as:

- **SciPy:** A library for scientific computing that builds upon NumPy, providing additional functionality for optimization, signal processing, and more.
- **Pandas:** A data manipulation library that uses NumPy arrays as its underlying data structure.
- **Matplotlib:** A plotting library that relies on NumPy for numerical computations and data representation.


NumPy's development is actively maintained by a dedicated team of contributors, ensuring its continued improvement and compatibility with the latest advancements in the Python ecosystem.


The combination of NumPy's efficient array operations, extensive mathematical functions, and seamless integration with other libraries has made it a fundamental tool for data scientists, researchers, and developers working with numerical data in Python.


In the following sections, we will explore the advantages of using NumPy and dive deeper into its key features and capabilities.

**Table of contents**<a id='toc0_'></a>    
- [Advantages of using NumPy for Numerical Computing](#toc1_)    
  - [Efficiency and Performance](#toc1_1_)    
  - [Ease of Use and Readability](#toc1_2_)    
  - [Integration with Other Libraries](#toc1_3_)    
- [Key Features and Capabilities of NumPy](#toc2_)    
  - [Multi-dimensional Arrays](#toc2_1_)    
  - [Mathematical Functions](#toc2_2_)    
  - [Broadcasting](#toc2_3_)    
  - [Indexing and Slicing](#toc2_4_)    
  - [Random Number Generation](#toc2_5_)    
  - [Saving and Loading Data](#toc2_6_)    
- [NumPy in the Data Science Ecosystem](#toc3_)    
  - [Relation to Other Data Science Libraries](#toc3_1_)    
  - [Real-world Applications](#toc3_2_)    
- [Getting Started with NumPy](#toc4_)    
  - [Installation](#toc4_1_)    
  - [Importing NumPy](#toc4_2_)    
  - [Basic Usage Examples](#toc4_3_)    
- [Summary and Conclusion](#toc5_)    

<!-- vscode-jupyter-toc-config
	numbering=false
	anchor=true
	flat=false
	minLevel=2
	maxLevel=6
	/vscode-jupyter-toc-config -->
<!-- THIS CELL WILL BE REPLACED ON TOC UPDATE. DO NOT WRITE YOUR TEXT IN THIS CELL -->

## <a id='toc1_'></a>[Advantages of using NumPy for Numerical Computing](#toc0_)

NumPy offers several significant advantages that make it a powerful tool for numerical computing in Python. Let's explore these advantages in detail.


### <a id='toc1_1_'></a>[Efficiency and Performance](#toc0_)


One of the primary advantages of using NumPy is its efficiency and performance. NumPy is designed to handle large arrays and matrices with ease, offering a significant speed boost compared to traditional Python lists.


Here are a few reasons why NumPy is more efficient:

1. **Contiguous memory allocation:** NumPy arrays are stored in contiguous memory blocks, allowing for faster access and manipulation of elements. This contiguous memory layout enables CPU vectorization and cache optimization, resulting in improved performance.

2. **Vectorized operations:** NumPy provides a wide range of mathematical functions that operate on entire arrays, eliminating the need for explicit loops. These vectorized operations are implemented in optimized C code, which executes much faster than pure Python code.


<img src="./images/array-list.png" width="600">

For example, let's compare the performance of adding two arrays using Python lists and NumPy arrays:


In [2]:
import numpy as np
import time

# Using Python lists
list1 = list(range(1000000))
list2 = list(range(1000000))

start_time = time.time()
result_list = [x + y for x, y in zip(list1, list2)]
end_time = time.time()
print(f"Python list addition time: {end_time - start_time:.3f} seconds")

# Using NumPy arrays
arr1 = np.arange(1000000)
arr2 = np.arange(1000000)

start_time = time.time()
result_array = arr1 + arr2
end_time = time.time()
print(f"NumPy array addition time: {end_time - start_time:.3f} seconds")

Python list addition time: 0.047 seconds
NumPy array addition time: 0.001 seconds


As you can see, NumPy array addition is significantly faster than Python list addition.


### <a id='toc1_2_'></a>[Ease of Use and Readability](#toc0_)


Another advantage of using NumPy is its ease of use and readability. NumPy provides a clean and intuitive syntax for performing complex numerical computations, making your code more concise and easier to understand.


NumPy's vectorized operations allow you to express mathematical computations in a more natural and readable way, without the need for explicit loops. This not only improves code readability but also reduces the chances of introducing bugs.


For example, consider the following code that computes the element-wise square of an array:


In [3]:
import numpy as np

arr = np.array([1, 2, 3, 4, 5])
squared_arr = np.square(arr)
print(squared_arr)  # Output: [1 4 9 16 25]

[ 1  4  9 16 25]


With NumPy, you can achieve the same result in a single line of code using the `np.square()` function, making your code more concise and expressive.


### <a id='toc1_3_'></a>[Integration with Other Libraries](#toc0_)


NumPy seamlessly integrates with other popular libraries in the scientific Python ecosystem, serving as the foundation for many of them. This integration allows you to leverage the capabilities of multiple libraries to build powerful and efficient data processing pipelines.


Some notable libraries that build upon NumPy include:

- **SciPy:** SciPy extends NumPy's capabilities by providing additional functions for optimization, linear algebra, integration, interpolation, signal and image processing, and more. It enables you to perform advanced scientific computations efficiently.

- **Pandas:** Pandas is a powerful data manipulation library that uses NumPy arrays as its underlying data structure. It provides high-level data structures like DataFrames and Series, along with functions for data cleaning, transformation, and analysis. NumPy's efficient array operations are crucial for Pandas' performance.

- **Matplotlib:** Matplotlib is a popular plotting library that allows you to create a wide range of static, animated, and interactive visualizations. It relies on NumPy arrays for numerical computations and data representation, enabling you to visualize your NumPy data easily.

- **Scikit-learn:** Scikit-learn is a machine learning library that provides a wide range of algorithms for classification, regression, clustering, and more. It uses NumPy arrays as the primary data structure for representing features and targets, allowing seamless integration with NumPy-based data preprocessing pipelines.


The integration of NumPy with these libraries creates a powerful ecosystem for scientific computing, data analysis, and machine learning in Python. By leveraging NumPy's efficient array operations and combining them with the specialized functionalities provided by other libraries, you can tackle complex computational problems with ease.

## <a id='toc2_'></a>[Key Features and Capabilities of NumPy](#toc0_)


NumPy offers a wide range of features and capabilities that make it a powerful tool for numerical computing. Let's explore some of its key features in detail.


### <a id='toc2_1_'></a>[Multi-dimensional Arrays](#toc0_)


The foundation of NumPy is the `ndarray` object, which represents a multi-dimensional array. These arrays can have one or more dimensions, allowing you to store and manipulate data efficiently.


Creating a NumPy array is simple:


In [4]:
import numpy as np

# Creating a 1-dimensional array
arr_1d = np.array([1, 2, 3, 4, 5])

# Creating a 2-dimensional array
arr_2d = np.array([[1, 2, 3], [4, 5, 6]])

# Creating a 3-dimensional array
arr_3d = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])

NumPy arrays have attributes like `shape`, `size`, and `dtype` that provide information about the array's dimensions, total number of elements, and data type, respectively.


### <a id='toc2_2_'></a>[Mathematical Functions](#toc0_)


NumPy provides a vast collection of mathematical functions that operate on arrays element-wise. These functions allow you to perform various mathematical operations efficiently without the need for explicit loops.


Some commonly used mathematical functions in NumPy include:

- Arithmetic operations: `np.add()`, `np.subtract()`, `np.multiply()`, `np.divide()`
- Trigonometric functions: `np.sin()`, `np.cos()`, `np.tan()`
- Exponential and logarithmic functions: `np.exp()`, `np.log()`, `np.log10()`
- Statistical functions: `np.mean()`, `np.median()`, `np.std()`, `np.var()`


Here's an example that demonstrates the use of mathematical functions:


In [5]:
import numpy as np

arr = np.array([1, 2, 3, 4, 5])
sqrt_arr = np.sqrt(arr)
sqrt_arr


array([1.        , 1.41421356, 1.73205081, 2.        , 2.23606798])

### <a id='toc2_3_'></a>[Broadcasting](#toc0_)


Broadcasting is a powerful feature in NumPy that allows arrays with different shapes to be used in arithmetic operations. It enables you to perform operations between arrays of different sizes without the need for explicit reshaping.


NumPy follows a set of broadcasting rules to determine how arrays with different shapes should be treated during arithmetic operations. These rules allow you to perform operations like scalar-array, array-array, and array-matrix operations efficiently.


<img src="./images/broadcasting.png" width="800">

Here's an example that demonstrates broadcasting:


In [6]:
import numpy as np

arr = np.array([[1, 2, 3], [4, 5, 6]])
scalar = 10
result = arr + scalar
result

array([[11, 12, 13],
       [14, 15, 16]])

In this example, the scalar value `10` is broadcasted to match the shape of the array `arr`, and the addition operation is performed element-wise.


### <a id='toc2_4_'></a>[Indexing and Slicing](#toc0_)


NumPy provides powerful indexing and slicing capabilities that allow you to access and manipulate specific elements or subsets of an array.

- Indexing: You can access individual elements of an array using square brackets and the corresponding indices.
- Slicing: You can extract sub-arrays by specifying a range of indices using the colon (`:`) notation.


Here's an example that demonstrates indexing and slicing:


In [7]:
import numpy as np

arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
arr[0, 1]

2

In [8]:
arr[1:, :2]

array([[4, 5],
       [7, 8]])

In the first example, we access the element at row index 0 and column index 1. In the second example, we extract a sub-array consisting of rows from index 1 onwards and columns up to index 2 (exclusive).


### <a id='toc2_5_'></a>[Random Number Generation](#toc0_)


NumPy provides a random number generation module called `numpy.random` that allows you to generate various types of random numbers and arrays. This module is useful for simulations, statistical sampling, and machine learning tasks.


Some commonly used functions for random number generation include:

- `np.random.rand()`: Generate random floats between 0 and 1.
- `np.random.randint()`: Generate random integers within a specified range.
- `np.random.normal()`: Generate random numbers from a normal (Gaussian) distribution.
- `np.random.shuffle()`: Randomly shuffle the elements of an array in place.


Here's an example that demonstrates random number generation:


In [9]:
import numpy as np

# Generate a random float between 0 and 1
random_float = np.random.rand()
random_float

0.7531757324577198

In [10]:
# Generate a random integer between 1 and 10 (inclusive)
random_int = np.random.randint(1, 11)
random_int


1

In [11]:
# Generate a 2x3 array of random floats between 0 and 1
random_array = np.random.rand(2, 3)
random_array

array([[0.55696043, 0.66659632, 0.50743261],
       [0.11649023, 0.70903422, 0.51930541]])

### <a id='toc2_6_'></a>[Saving and Loading Data](#toc0_)


NumPy provides functions to save and load arrays to and from files, allowing you to store and retrieve data efficiently. This is particularly useful when working with large datasets or when you need to persist data across different sessions.


Some commonly used functions for saving and loading data include:

- `np.save()`: Save a single array to a binary file with a `.npy` extension.
- `np.load()`: Load a single array from a `.npy` file.
- `np.savez()`: Save multiple arrays to a single file with a `.npz` extension.
- `np.load()`: Load multiple arrays from a `.npz` file.


Here's an example that demonstrates saving and loading an array:


```python
import numpy as np

# Create an array
arr = np.array([1, 2, 3, 4, 5])

# Save the array to a file
np.save('my_array.npy', arr)

# Load the array from the file
loaded_arr = np.load('my_array.npy')
print(loaded_arr)  # Output: [1 2 3 4 5]
```


In this example, we create an array `arr`, save it to a file named `my_array.npy` using `np.save()`, and then load it back from the file using `np.load()`.


These are just a few of the key features and capabilities of NumPy. NumPy provides many more functions and tools for advanced numerical computing, including linear algebra, Fourier transforms, and more. As you explore NumPy further, you'll discover its full potential and how it can streamline your numerical computing tasks.

## <a id='toc3_'></a>[NumPy in the Data Science Ecosystem](#toc0_)

NumPy is a fundamental library in the Python data science ecosystem. It plays a crucial role in providing the necessary tools and data structures for efficient numerical computing. Let's explore how NumPy relates to other data science libraries and its real-world applications.


### <a id='toc3_1_'></a>[Relation to Other Data Science Libraries](#toc0_)


NumPy serves as the foundation for many other popular data science libraries in Python. These libraries build upon NumPy's functionality and provide higher-level abstractions and specialized tools for various data science tasks.


Some of the key libraries that rely on NumPy include:

1. **Pandas**: Pandas is a powerful data manipulation library that provides data structures like DataFrames and Series for handling structured data. It uses NumPy arrays as the underlying data storage and leverages NumPy's efficient numerical computations.

2. **Matplotlib**: Matplotlib is a widely used plotting library that allows you to create a variety of visualizations, including line plots, scatter plots, bar plots, and more. It uses NumPy arrays to represent the data being plotted and relies on NumPy's numerical operations for data transformations.

3. **SciPy**: SciPy is a scientific computing library that builds upon NumPy and provides additional functionality for optimization, linear algebra, integration, signal and image processing, and more. It extends NumPy's capabilities to handle more advanced numerical computations.

4. **Scikit-learn**: Scikit-learn is a popular machine learning library that provides a wide range of algorithms for classification, regression, clustering, dimensionality reduction, and model evaluation. It uses NumPy arrays as the primary data structure for representing features and targets.


These libraries leverage NumPy's efficient array operations and numerical computing capabilities to provide high-performance data manipulation, visualization, scientific computing, and machine learning functionalities.


By integrating with these libraries, NumPy enables a seamless data science workflow, allowing you to efficiently process, analyze, and model data using a unified ecosystem of tools.


### <a id='toc3_2_'></a>[Real-world Applications](#toc0_)


NumPy finds applications in various domains and industries where numerical computing and data analysis are essential. Some real-world applications of NumPy include:

1. **Scientific Computing**: NumPy is extensively used in scientific computing applications, such as physics simulations, numerical modeling, and computational biology. Its efficient array operations and mathematical functions make it suitable for handling large datasets and performing complex calculations.

2. **Data Analysis**: NumPy is a fundamental tool for data analysis tasks. It provides the necessary data structures and functions for cleaning, transforming, and analyzing data. NumPy's ability to handle large arrays efficiently makes it valuable for processing and manipulating datasets in fields like finance, marketing, and social sciences.

3. **Machine Learning**: NumPy is a crucial component in machine learning pipelines. It is used for data preprocessing, feature extraction, and model training. NumPy's array operations enable efficient computation of gradients and optimization algorithms, which are essential for training machine learning models.

4. **Image and Signal Processing**: NumPy's multi-dimensional arrays and mathematical functions make it suitable for image and signal processing tasks. It provides capabilities for image manipulation, filtering, and transformations. NumPy is commonly used in applications like computer vision, audio processing, and medical imaging.

5. **Finance**: NumPy is used in financial applications for tasks like risk analysis, portfolio optimization, and quantitative trading. Its efficient numerical computations and integration with libraries like Pandas make it valuable for handling financial data and performing complex financial calculations.

6. **Geospatial Analysis**: NumPy is used in geospatial analysis and geographic information systems (GIS) for tasks like spatial data processing, raster analysis, and terrain modeling. Its ability to handle large arrays and perform mathematical operations on geospatial data makes it suitable for this domain.


These are just a few examples of the real-world applications of NumPy. Its versatility and efficiency make it a valuable tool across various industries and research fields where numerical computing and data analysis are essential.


As you can see, NumPy is not only a powerful library on its own but also serves as the foundation for the broader data science ecosystem in Python. Its seamless integration with other libraries and its wide range of real-world applications make it an indispensable tool for data scientists, researchers, and developers working with numerical data.

## <a id='toc4_'></a>[Getting Started with NumPy](#toc0_)

Now that you have a good understanding of what NumPy is and its key features, let's dive into how to get started with using NumPy in your Python environment.


### <a id='toc4_1_'></a>[Installation](#toc0_)


To use NumPy, you first need to install it in your Python environment. There are several ways to install NumPy, depending on your operating system and preferred installation method.

1. **Using pip**: pip is the package installer for Python. You can install NumPy using pip by running the following command in your terminal or command prompt:

   ```
   pip install numpy
   ```

2. **Using conda**: If you are using the Anaconda distribution of Python, you can install NumPy using the conda package manager. Open the Anaconda Prompt and run the following command:

   ```
   conda install numpy
   ```

3. **From source**: If you prefer to install NumPy from source, you can download the source code from the official NumPy website (https://numpy.org) and follow the installation instructions provided in the documentation.


Once the installation is complete, you can verify that NumPy is installed correctly by running the following command in your Python interpreter or a Jupyter Notebook:

In [12]:
import numpy as np
np.__version__

'1.26.0'

If NumPy is installed correctly, it will display the version number of NumPy.


### <a id='toc4_2_'></a>[Importing NumPy](#toc0_)


To use NumPy in your Python scripts or notebooks, you need to import it. The convention is to import NumPy with the alias `np`. This allows you to refer to NumPy functions and objects using the `np.` prefix.


Here's how you can import NumPy in your Python code:

In [13]:
import numpy as np

By importing NumPy with the alias `np`, you can access its functions and objects conveniently. For example, you can create a NumPy array using `np.array()`, compute the mean of an array using `np.mean()`, and so on.


### <a id='toc4_3_'></a>[Basic Usage Examples](#toc0_)


Now that you have NumPy installed and imported, let's explore some basic usage examples to get you started.


1. **Creating NumPy Arrays**:
   You can create NumPy arrays using various methods, such as `np.array()`, `np.zeros()`, `np.ones()`, and `np.arange()`.


In [14]:
# Creating a 1-dimensional array
arr1 = np.array([1, 2, 3, 4, 5])

# Creating a 2-dimensional array
arr2 = np.array([[1, 2, 3], [4, 5, 6]])

In [15]:
# Creating an array of zeros
zeros_arr = np.zeros((3, 4))
zeros_arr

array([[0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])

In [16]:
# Creating an array of ones
ones_arr = np.ones((2, 2))
ones_arr

array([[1., 1.],
       [1., 1.]])

In [17]:
# Creating an array with a range of values
range_arr = np.arange(0, 10, 2)
range_arr

array([0, 2, 4, 6, 8])

2. **Array Attributes**:
   NumPy arrays have various attributes that provide information about the array, such as its shape, size, and data type.


In [18]:
arr = np.array([[1, 2, 3], [4, 5, 6]])

In [19]:
arr.shape

(2, 3)

In [20]:
arr.size

6

In [21]:
arr.dtype

dtype('int64')

3. **Array Operations**:
   NumPy provides a wide range of mathematical functions and operations that can be applied to arrays element-wise.


In [22]:
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])

In [23]:
# Element-wise addition
arr1 + arr2

array([5, 7, 9])

In [24]:
# Element-wise multiplication
arr1 * arr2

array([ 4, 10, 18])

In [25]:
# Compute the mean of an array
np.mean(arr1)

2.0

In [26]:
# Compute the standard deviation of an array
np.std(arr2)

0.816496580927726

These are just a few basic examples to get you started with NumPy. NumPy provides a vast array of functions and operations for numerical computing, and you can explore the NumPy documentation to learn more about its capabilities.


As you become more familiar with NumPy, you'll discover its power and flexibility in handling numerical data efficiently. NumPy's array operations, mathematical functions, and integration with other libraries make it an essential tool for data manipulation, analysis, and scientific computing in Python.

## <a id='toc5_'></a>[Summary and Conclusion](#toc0_)

In this lecture, we introduced NumPy, a powerful library for numerical computing in Python. Here are the key points we covered:

- **What is NumPy?**
  - NumPy is a fundamental package for scientific computing in Python.
  - It provides support for large, multi-dimensional arrays and matrices, along with a vast collection of mathematical functions to operate on these arrays efficiently.

- **Advantages of using NumPy:**
  - Efficiency and performance through contiguous memory allocation and vectorized operations.
  - Ease of use and readability with a clean and intuitive syntax.
  - Seamless integration with other popular libraries in the scientific Python ecosystem.

- **Key features and capabilities of NumPy:**
  - Multi-dimensional arrays (`ndarray`) for efficient storage and manipulation of large datasets.
  - Wide range of mathematical functions for array operations.
  - Broadcasting for performing arithmetic operations between arrays of different shapes.
  - Indexing and slicing for accessing and manipulating specific elements or subsets of an array.
  - Random number generation for simulations and sampling.
  - Saving and loading data to and from files.

- **NumPy in the data science ecosystem:**
  - NumPy serves as the foundation for many other data science libraries, such as Pandas, Matplotlib, SciPy, and Scikit-learn.
  - It provides the necessary data structures and numerical computing capabilities for these libraries.

- **Getting started with NumPy:**
  - Installation using pip, conda, or from source.
  - Importing NumPy in Python scripts using the `import numpy as np` convention.
  - Basic usage examples demonstrating array creation, attributes, and operations.


In conclusion, NumPy is an essential tool for anyone working with numerical data in Python. Its efficient array operations, extensive mathematical functions, and seamless integration with other libraries make it a cornerstone of the scientific Python ecosystem.


As you progress through the course and delve deeper into NumPy's capabilities, you'll discover its full potential and learn how to leverage its power for your specific needs in data manipulation, analysis, and scientific computing.