<center>

# **Fall 2025 &mdash; CIS 3803<br>Introduction to Data Science**
### Week 2: Python for Data Science

</center>

**Date:** 08 September 2025  
**Time:** 6:00–9:00 PM  
**Instructor:** Dr. Patrick T. Marsh  
**Course Verse:** “It is the glory of God to conceal a matter; to search out a matter is the glory of kings.” &mdash; *Proverbs 25:2 (NIV)*

This notebook provides a crash course in Python syntax and NumPy basics. Next week we'll focus on Pandas as part of data collection and cleaning.

****

## **Learning goals**

- Refresh Python programming (functions, loops, conditionals)
- Learn how to use **NumPy** for arrays, vectorization, and efficient numerical computation

## **Today's Outline**
1. [Opening Devotional and Reflection](#opening-devotional-and-reflection)
2. [Python Crash Course](#python-crash-course)
3. [NumPy](#numpy)


****

## **Opening Devotional and Reflection**

*"and I have filled him with the Spirit of God, with wisdom, with understanding, with knowledge and with all kinds of skills— to make artistic designs for work in gold, silver and bronze, to cut and set stones, to work in wood, and to engage in all kinds of craftsmanship."*

**&mdash; Exodus 31:3-5 (NIV)**

#### **Faith Reflection:** 
This passage describes God giving Bezalel the skills to build the tabernacle. It highlights that God equips people with specific talents and knowledge—what we might call "tools of the trade"—for a purpose. As data scientists, we are being equipped with wisdom, understanding, and skills in a modern craft. How can you use the tools you are learning not just for personal gain, but for a greater purpose that honors God and serves others?

****

## **Python Crash Course**
As I mentioned in our first class, we will leverage open source material to augment our learning in this course. For this week the following books/chapters will be of use as supplemental material.

* *[A Whirlwind Tour of Python](https://jakevdp.github.io/WhirlwindTourOfPython/)* by Jake VanderPlas (Jupyter [Notebooks](https://github.com/jakevdp/WhirlwindTourOfPython/tree/master) can be found on GitHub)
    * [Introduction](https://jakevdp.github.io/WhirlwindTourOfPython/00-introduction.html)
    * [How to Run Python Code](https://jakevdp.github.io/WhirlwindTourOfPython/01-how-to-run-python-code.html)
    * [Basic Python Syntax](https://jakevdp.github.io/WhirlwindTourOfPython/02-basic-python-syntax.html)
    * [Python Semantics: Variables](https://jakevdp.github.io/WhirlwindTourOfPython/03-semantics-variables.html)
    * [Python Semantics: Operators](https://jakevdp.github.io/WhirlwindTourOfPython/04-semantics-operators.html)
    * [Built-in Scalar Types](https://jakevdp.github.io/WhirlwindTourOfPython/05-built-in-scalar-types.html)
    * [Built-in Data Structures](https://jakevdp.github.io/WhirlwindTourOfPython/06-built-in-data-structures.html)
    * [Control Flow Statements](https://jakevdp.github.io/WhirlwindTourOfPython/07-control-flow-statements.html)
    * [Defining Functions](https://jakevdp.github.io/WhirlwindTourOfPython/08-defining-functions.html)
    * [Errors and Exceptions](https://jakevdp.github.io/WhirlwindTourOfPython/09-errors-and-exceptions.html)
    * [Iterators](https://jakevdp.github.io/WhirlwindTourOfPython/10-iterators.html)
    * [List Comprehensions](https://jakevdp.github.io/WhirlwindTourOfPython/11-list-comprehensions.html)
    * [Generators and Generator Expressions](https://jakevdp.github.io/WhirlwindTourOfPython/12-generators.html)
    * [Modules and Packages](https://jakevdp.github.io/WhirlwindTourOfPython/13-modules-and-packages.html)
    * [Strings and Regular Expressions](https://jakevdp.github.io/WhirlwindTourOfPython/14-strings-and-regular-expressions.html)
* *[Python Data Science Handbook](https://jakevdp.github.io/PythonDataScienceHandbook/index.html)* by Jake VanderPlas (Jupyter [Notebooks](https://github.com/jakevdp/PythonDataScienceHandbook/tree/master/notebooks) can be found on GitHub)
    * [Chapter 1: IPython: Beyond Normal Python](https://jakevdp.github.io/PythonDataScienceHandbook/01.00-ipython-beyond-normal-python.html)
    * [Chapter 2: Introduction to NumPy](https://jakevdp.github.io/PythonDataScienceHandbook/02.00-introduction-to-numpy.html)
    * [Chapter 3: Data Manipulation with Pandas](https://jakevdp.github.io/PythonDataScienceHandbook/03.00-introduction-to-pandas.html) (portions of)
* *[Python for Data Analysis](https://wesmckinney.com/book/)* by Wes McKinney (Jupyter [Notebooks](https://github.com/wesm/pydata-book/tree/3rd-edition) of all code examples can be found on GitHub)
    * [Chapter 2: Python Language Basics, IPython, and Jupyter Notebooks](https://wesmckinney.com/book/python-basics)
    * [Chapter 3: Built-in Data Structures, Functions, and Files](https://wesmckinney.com/book/python-builtin)
    * [Chapter 4: NumPy Basics: Arrays and Vectorized Computing](https://wesmckinney.com/book/numpy-basics)
    * [Chapter 5: Getting Started with Pandas](https://wesmckinney.com/book/pandas-basics)

Additionally, here are some free "courses" you might find helpful.

* [Python for Programmers](https://www.codecademy.com/learn/python-for-programmers)
* [Getting Started with Python for Data Sciences](https://www.codecademy.com/learn/getting-started-with-python-for-data-science)

### **Python Essentials**

#### Basic Syntax

In [None]:
# Variables and data types
x = 10
name = "Alice"
pi = 3.14
print(type(x))
print(type(name))
print(type(pi))

# Loop
for s in name:
    print(s)
print(s)

# Enumerated Loop
for i, s in enumerate(name):
    print(i, s)

# Control flow
if x > 5:
    print("x is greater than 5")
elif x == 5:
    print("x is equal to 5")
else:
    print("x is less than 5")

# Functions
def greet(person):
    return f"Hello, {person}!"

greet(name)


#### Data Structures

In [None]:
# Lists
fruits = ["apple", "banana", "cherry"]

# Tuples
point = (2, 3)

# Dictionaries
person = {"name": "Alice", "age": 30}

# List comprehension
squares = [x**2 for x in range(5)]
squares


### **Understanding Python Namespaces**

In Python, a ***namespace*** refers to a container that holds a mapping between names (identifiers) and objects. It ensures that names are unique and prevents naming conflicts. Python has several types of namespaces, each with a different scope:

1. **Built-in Namespace**
   - Contains names like `len()`, `print()`, and `int`.
   - Created when the Python interpreter starts.

2. **Global Namespace**
   - Contains names defined at the top-level of a module or script.
   - Created when a module is loaded.

3. **Local Namespace**
   - Contains names defined inside a function.
   - Created when the function is called and destroyed when the function exits.

#### **LEGB Rule**

Python resolves names using the **LEGB** rule, which stands for:

- **L**ocal — Names assigned within a function.
- **E**nclosing — Names in enclosing functions (nested functions).
- **G**lobal — Names at the module level.
- **B**uilt-in — Names in the built-in namespace.

#### **Why Namespaces Matter**
Namespaces help avoid collisions between identifiers and make code more modular and readable. They are fundamental to understanding scope and variable lifetime in Python.

#### **Namespace Example:**

In [None]:
x = "global"


def outer():
    x = "enclosing"

    def inner():
        x = "local"
        print("Called from inside inner():", x)

    inner()
    print("Called from inside outer() but outside inner():", x)

outer()
print("Called from outside inner() and outer():", x)


****

## **NumPy**

### [**What is NumPy?**](https://numpy.org/doc/stable/user/whatisnumpy.html)

NumPy is ***the*** foundational open-source Python library for scientific computing, providing support for large, multi-dimensional arrays and matrices, along with a collection of high-level mathematical functions. It enables fast, optimized numerical operations, used extensively in fields like physics, engineering, finance, and data science. Key features include its powerful ndarray object, which allows for efficient storage and manipulation of homogeneous data, and its support for vectorized operations that replace slow Python loops with optimized C code

#### **Key Features and Functionality:**

- **N-dimensional Arrays (`ndarray`):** This is the core data structure in NumPy, representing a values of the same type.
- **Mathematical Functions:** A wide range of mathematical functions, including trigonometric, statistical, linear algebra, and Fourier transform operations.
- **Vectorization:** NumPy is capable of performing operations on entire arrays without explicit Python loops, leading to faster more concise code.
- **Broadcasting:** A mechanism that allows operations on arrays of different shapes.
- **Interoperability:** NumPy serves as an interoperability layer, allowing different array computation libraries to work together.

### **Using NumPy**

The widely accepted way to import NumPy into a Python script is:

```python
import numpy as np
```

This is how I will use it throughout the course. Anytime you see the prefix `np` it is refering to the top-level namespace of the NumPy package.

#### **Creating Arrays**

There are several NumPy methods/functions to create NumPy arrays rather than using the Python built-in containers. Some of the more common ways include:

- `np.array()`
- `np.zeros()`
- `np.ones()`
- `np.arange()`
- `np.linspace()`

Examples:

In [None]:
import numpy as np

# From a Python list
my_list = [1, 2, 3, 4, 5]
arr_from_list = np.array(my_list)
print("Array from list:", arr_from_list)

# Using built-in functions
zeros_arr = np.zeros((2, 3)) # 2x3 array of zeros
print("\nArray of zeros:\n", zeros_arr)

ones_arr = np.ones(4) # 1D array of ones
print("\nArray of ones:", ones_arr)

range_arr = np.arange(0, 10, 2) # Start, stop, step
print("\nArray with arange:", range_arr)

lin_arr = np.linspace(0, 1, 5) # 5 evenly spaced numbers between 0 and 1
print("\nArray with linspace:", lin_arr)

#### **Converting Datatypes**

NumPy arrays have a specific data type for all their elements as a result of using compiled C code under the hood. If you need to change the datatype, you can use the `.astype()` method.

Example:

Let's say you have an array of integers and you want to convert it to floating-point numbers.

In [None]:
# Create an array of integers
int_array = np.array([1, 2, 3, 4, 5])
print("Original array (integer type):", int_array, int_array.dtype)

# Convert to float
float_array = int_array.astype(np.float64) # You can also use 'float' or 'f8'
print("Converted array (float type):", float_array, float_array.dtype)

# Convert to complex numbers
complex_array = int_array.astype(np.complex128) # Or 'complex' or 'c16'
print("Converted array (complex type):", complex_array, complex_array.dtype)

Common Data Types in NumPy:

- Signed Integers: `np.int8` | `np.int16` | `np.int32` | `np.int64`
- Unsigned Integers: `np.uint8` | `np.uint16` | `np.uint32` | `np.uint64`
- Floating-Point Numbers: `np.float16` | `np.float32` | `np.float64`
- Complex Numbers: `np.complex64` | `np.complex128`
- Boolean Values (True/False): `np.bool_`
- Python Objects (Won't Be Covered): `np.object_`

##### **Type Promotion**

NumPy also handles type promotion automatically when operations involve different data types. If you add an integer array to a float array, the result will be a float array.

Example:

In [None]:
arr1 = np.array([1, 2, 3]) # Integer array
arr2 = np.array([1.5, 2.5, 3.5]) # Float array

result = arr1 + arr2
print("\nResult of adding integer and float arrays:", result, result.dtype)

When converting, be mindful of potential data loss or changes in precision. For instance, converting a float to an integer will truncate the decimal part.

#### **Array Attributes**

Array attributes can be thought of as the metadata of an array as the attributes describe the array.

- `array.shape`
- `array.ndim`
- `array.size`
- `array.itemsize`
- `array.dtype`

Examples:

In [None]:
my_array = np.array([[1, 2, 3], [4, 5, 6]])

print("Array shape:", my_array.shape)
print("Number of dimensions:", my_array.ndim)
print("Array size (total number of elements):", my_array.size)
print("Size of each element in bytes:", my_array.itemsize)
print("Data type of elements:", my_array.dtype)

#### **Indexing and Slicing**

Indexing refers to accessing a single element within a sequence. You use square brackets `[]` with an integer inside to specify the position of the element you want.

- **Zero-based indexing:** Python sequences are zero-indexed, meaning the first element is at index 0, the second at index 1, and so on.
- **Negative indexing:** You can also use negative indices. Index -1 refers to the last element, -2 to the second-to-last, and so forth.

Examples:

In [None]:
my_list = [10, 20, 30, 40, 50]
my_string = "Python"

# Accessing elements using positive indices
print(f"First element of list: {my_list[0]}")
print(f"Third element of list: {my_list[2]}")
print(f"First character of string: {my_string[0]}")

# Accessing elements using negative indices
print(f"Last element of list: {my_list[-1]}")
print(f"Second-to-last element of list: {my_list[-2]}")
print(f"Last character of string: {my_string[-1]}")

Slicing allows you to extract a range of elements from a sequence, creating a new subsequence. The syntax for slicing is `[start:stop:step]`.

- **start:** The index where the slice begins (inclusive). If omitted, it defaults to the beginning of the sequence (index 0).
- **stop:** The index where the slice ends (exclusive). The element at the stop index is not included in the slice. If omitted, it defaults to the end of the sequence.
- **step:** The interval between elements to include in the slice. If omitted, it defaults to 1 (i.e., consecutive elements).

Examples:

In [None]:
my_list = [10, 20, 30, 40, 50, 60, 70]
my_string = "Programming"

# Basic slicing
print(f"Elements from index 2 up to (but not including) index 5: {my_list[2:5]}")
print(f"Elements from the beginning up to index 3: {my_list[:3]}")
print(f"Elements from index 4 to the end: {my_list[4:]}")
print(f"All elements: {my_list[:]}")

# Slicing with a step
print(f"Every second element: {my_list[::2]}")
print(f"Elements from index 1 to 6, taking every third: {my_list[1:7:3]}")

# Reversing a sequence using slicing
print(f"Reversed list: {my_list[::-1]}")
print(f"Reversed string: {my_string[::-1]}")

#### **Fancy Indexing**

Fancy indexing is a powerful and flexible way to select and reshape NumPy arrays using arrays of integers or booleans. It allows you to access and manipulate non-contiguous or specific elements, which is a key feature that distinguishes NumPy arrays from standard Python lists. There are two main types of fancy indexing: integer array indexing and boolean array indexing.

##### **Integer Array Indexing**
This method uses a separate NumPy array or a Python list of integers to specify the indices you want to access. The resulting array will have the same shape as the index array. This is especially useful for selecting specific rows, columns, or elements in a multidimensional array.

Example:

In [None]:
# Create a sample 2D array
arr = np.array([[10, 20, 30, 40],
                [50, 60, 70, 80],
                [90, 100, 110, 120]])

# Use an integer array to select specific rows
# We want to get the rows at index 0 and 2
# The result will be a new 2x4 array
selected_rows = arr[[0, 2]]
print("Original array:\n", arr)
print("\nSelected rows (0 and 2):\n", selected_rows)

# You can also select specific elements by providing a list of row and column indices.
# The pairs (0, 1), (1, 2), and (2, 0) will be selected.
selected_elements = arr[[0, 1, 2], [1, 2, 0]]
print("\nSelected elements based on paired indices:", selected_elements)

##### **Boolean Array Indexing**
This method uses a boolean array of the same shape as the original array to select elements. The boolean array acts as a mask: it selects only the elements where the corresponding value in the mask is True. This is the most common and intuitive way to filter data based on conditions.

Example:

In [None]:
# Use the same array as before
arr = np.array([[10, 20, 30, 40],
                [50, 60, 70, 80],
                [90, 100, 110, 120]])

# Create a boolean mask to find all elements greater than 50
mask = arr > 50
print("Boolean mask:\n", mask)

# Use the mask to get a new 1D array of the filtered elements
filtered_elements = arr[mask]
print("\nElements greater than 50:", filtered_elements)

Boolean indexing is extremely useful for data cleaning and analysis, as it allows you to easily isolate data that meets specific criteria without writing complex loops. For example, you can filter for all values in a dataset that are positive or within a certain range.

#### **Array Operations**

Element-wise operations in NumPy mean that an operation is performed on each individual element of an array, independently of the other elements. This is in contrast to operations that might treat an entire array as a single entity.

When you perform an arithmetic operation (like addition, subtraction, multiplication, or division) between two NumPy arrays of the **same shape**, or between a NumPy array and a **scalar** (a single number), NumPy applies the operation to corresponding elements.

##### How it Works

- **Array vs. Array:** If you have two arrays, `A` and `B`, both of shape `(m, n)`, then `A + B` will result in a new array `C` where `C[i, j] = A[i, j] + B[i, j]` for all `i` from `0 to m-1` and `j` from `0 to n-1`.
- **Array vs. Scalar (Broadcasting):** If you have an array `A` and a scalar `s`, then `A + s` results in a new array where `s` is added to every element of `A`. NumPy achieves this by "broadcasting" the scalar to match the shape of the array.

Examples

Let's look at some code examples:

In [None]:
# --- Array vs. Array Operations ---

# Two arrays of the same shape
array1 = np.array([1, 2, 3, 4])
array2 = np.array([5, 6, 7, 8])

# Element-wise addition
sum_array = array1 + array2
print("Element-wise addition:", sum_array) # Output: [ 6  8 10 12]

# Element-wise multiplication
product_array = array1 * array2
print("Element-wise multiplication:", product_array) # Output: [ 5 12 21 32]

# --- Array vs. Scalar Operation (Broadcasting) ---

# Adding a scalar to an array
scalar_addition = array1 + 10
print("Scalar addition:", scalar_addition) # Output: [11 12 13 14]

# Multiplying an array by a scalar
scalar_multiplication = array1 * 2
print("Scalar multiplication:", scalar_multiplication) # Output: [2 4 6 8]

##### **Why is this Important?**

Element-wise operations are fundamental to numerical computing because they allow for efficient and concise manipulation of data. Instead of writing explicit loops (which are much slower in Python), NumPy's underlying C implementation handles these operations at a much higher speed. This is a core reason why NumPy is so powerful for tasks involving large datasets and complex mathematical computations.

#### **Array Aggregators**
NumPy's aggregation functions and methods allow you to perform calculations across an array to summarize its data into a single value. These are incredibly useful for understanding the overall characteristics of your numerical data.

##### Key Aggregation Functions | Methods

- `np.sum(array)` | `array.sum()` &ndash; Calculates the sum of all elements in an array.
- `np.mean(array)` | `array.mean()` &ndash; Computes the average (arithmetic mean) of all elements.
- `np.std(array)` | `array.std()` &ndash; Calculates the standard deviation, which measures the amount of variation or dispersion of a set of values.
- `np.var(array)` | `array.var()` &ndash; Calculates the variance, which is the average of the squared differences from the mean.
- `np.min(array)` | `array.min()` &ndash; Finds the smallest element in the array.
- `np.max(array)` | `array.max()` &ndash; Finds the largest element in the array.
- `np.argmin(array)` | `array.argmin()` &ndash; Returns the index of the minimum value.
- `np.argmax(array)` | `array.argmax()` &ndash; Returns the index of the maximum value.

##### The `axis` Parameter

A crucial concept when working with multidimensional arrays (like matrices) is the axis parameter. This parameter specifies along which dimension the aggregation should be performed.

- `axis=0:` Operates along the columns. When applied to a 2D array, it collapses the rows and gives you a result for each column.
- `axis=1:` Operates along the rows. When applied to a 2D array, it collapses the columns and gives you a result for each row.

If `axis` is not specified, the aggregation is performed over the entire array, collapsing all dimensions into a single value.

Examples:

In [None]:
# --- 1D Array Example ---
arr_1d = np.array([1, 2, 3, 4, 5, 6])

print("1D Array:", arr_1d)
print("Sum:", arr_1d.sum())
print("Mean:", arr_1d.mean())
print("Min:", arr_1d.min())
print("Max:", arr_1d.max())
print("Std Dev:", arr_1d.std())
print("Argmin (index of min):", arr_1d.argmin())
print("Argmax (index of max):", arr_1d.argmax())

# --- 2D Array Example ---
arr_2d = np.array([[1, 2, 3],
                   [4, 5, 6],
                   [7, 8, 9]])

print("\n2D Array:\n", arr_2d)

# Aggregations over the entire array (default)
print("Sum of all elements:", arr_2d.sum())

# Aggregations along columns (axis=0)
print("\nSum along columns (axis=0):", arr_2d.sum(axis=0))
print("Mean along columns (axis=0):", arr_2d.mean(axis=0))

# Aggregations along rows (axis=1)
print("\nSum along rows (axis=1):", arr_2d.sum(axis=1))
print("Max along rows (axis=1):", arr_2d.max(axis=1))

These functions are fundamental for data analysis, enabling you to quickly get a sense of the central tendency, spread, and range of your data.

#### **Reshaping and Manipulating Arrays**

Reshaping is changing an array's dimensions (the shape) without changing its data. It's a fundamental operation for preparing data for different algorithms, especially in machine learning. The most common way to do this is with the `.reshape()` method.

The key rule is that the new shape must have the same number of elements as the original array. For example, you can reshape a 1-D array with 12 elements into a 2x6, 3x4, or 4x3 2-D array, but you can't reshape it into a 3x3 array because that only has 9 elements.

A useful trick with `.reshape()` is using `-1` as one of the dimension sizes. NumPy will automatically calculate the correct size for that dimension based on the total number of elements.

Example

In [None]:
import numpy as np

# A 1-D array with 12 elements
arr = np.arange(12)
print("Original 1-D array:", arr)

# Reshape to a 3x4 array
arr_3x4 = arr.reshape(3, 4)
print("\nReshaped to 3x4:\n", arr_3x4)

# Reshape to a 2x6 array
arr_2x6 = arr.reshape(2, 6)
print("\nReshaped to 2x6:\n", arr_2x6)

# Use -1 to let NumPy calculate the dimension
arr_auto = arr.reshape(3, -1) # Will become 3x4
print("\nReshaped using -1:\n", arr_auto)

Manipulating arrays involves a variety of operations to combine, split, or modify them. These are crucial for structuring your data for analysis and modeling.

- **Stacking:** This combines arrays along a new or existing axis.
    - `np.vstack()` &ndash; Stacks arrays vertically (as rows).
    - `np.hstack()` &ndash; Stacks arrays horizontally (as columns).
    - `np.concatenate()` &ndash; The most general stacking function. You can specify the axis along which to join the arrays.
- **Splitting:** This divides a single array into multiple smaller arrays.
    - `np.vsplit()` &ndash; Splits an array vertically.
    - `np.hsplit()` &ndash; Splits an array horizontally.
    - `np.split()` &ndahs; The general function for splitting along a specified axis.
- **Transposing:** This rearranges the data by swapping its axes, like flipping a 2-D array over its diagonal. It's done using the `.T` attribute. A 2x3 array becomes a 3x2 array after transposing.

Example:

In [None]:
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
matrix1 = np.array([[10, 20], [30, 40]])
matrix2 = np.array([[50, 60], [70, 80]])

# Stacking arrays
v_stack = np.vstack([arr1, arr2])
print("Vertical Stack:\n", v_stack)

h_stack = np.hstack([arr1, arr2])
print("\nHorizontal Stack:", h_stack)

# Splitting arrays
split_arr = np.arange(12).reshape(3, 4)
h_split = np.hsplit(split_arr, 2) # Split into 2 arrays horizontally
print("\nSplit horizontally into 2:\n", h_split[0], "\n", h_split[1])

# Transposing a matrix
transposed_matrix = matrix1.T
print("\nOriginal matrix:\n", matrix1)
print("\nTransposed matrix:\n", transposed_matrix)

Two additional array manipulation functions/methods that are important in data science are `np.swapaxes()` and `np.ravel()` (or `np.flatten()`).

- `np.swapaxes(array, axis1, axis2)` returns a new view of an array `a` with two specified axes swapped. This is particularly useful for reorienting multi-dimensional data without copying the data itself, which is a key performance benefit.
- `.ravel()` flattens a multi-dimensional array into a one-dimensional array. It also returns a view of the original array whenever possible. This means that if you modify the new 1-D array, the original array might also change. If a view is not possible, a copy is returned. ***The primary use of `.ravel()` is to transform data from a complex, high-dimensional structure into a simple, linear sequence. This is a common requirement for input data in many machine learning models.*** 


The difference between `.ravel()` and `np.flatten()` is that `.flatten()` always returns a copy of the data, while `.ravel()` tries to return a view. For most use cases, this distinction isn't critical, but for performance-sensitive applications, using `.ravel()` is generally preferred.

Examples:

In [None]:
# Create a 3-D array with shape (1, 2, 3)
arr = np.arange(6).reshape(1, 2, 3)
print("Original array with shape", arr.shape, ":\n", arr)

# Swap the first (axis 0) and third (axis 2) axes
swapped_arr = np.swapaxes(arr, 0, 2)
print("\nSwapped array with shape", swapped_arr.shape, ":\n", swapped_arr)

# Create a 2-D array
arr = np.array([[1, 2, 3],
                [4, 5, 6]])
print("Original 2-D array:\n", arr)

# Flatten the array
flattened_arr = arr.ravel()
print("\nFlattened 1-D array:\n", flattened_arr)

#### **The NumPy Random Module**

The NumPy random module can be used to generate random numbers as well as randomly sample data from an array.

- `np.random.rand(d0, d1, ...)` &ndash; Returns an array of random floats in [0.0,1.0). The arguments are the dimensions of the output array.
- `np.random.randint(low, high=None, size=None)` &ndash; Returns random integers from low (inclusive) to high (exclusive).
- `np.random.randn(d0, d1, ...)` &ndash; Returns random numbers from a ***standard normal distribution*** (mean = 0, std = 1).
- `np.random.normal(loc=0.0, scale=1.0, size=None)` &ndash; Returns random numbers from a normal distribution with a specified mean (loc) and standard deviation (scale).

Example:

In [None]:
# Generate a 3x3 array of random floats
float_array = np.random.rand(3, 3)
print("3x3 array of random floats:\n", float_array)

# Generate a 2x4 array of random integers between 0 and 10 (exclusive of 11)
int_array = np.random.randint(low=0, high=11, size=(2, 4))
print("\n2x4 array of random integers:\n", int_array)

# Generate 5 random numbers from a standard normal distribution
normal_array = np.random.randn(5)
print("\n5 random numbers from a standard normal distribution:\n", normal_array)

NumPy's random module also includes powerful functions for shuffling and sampling.

- `np.random.shuffle(x)` &ndash; **Shuffles** the array `x` **in place**. For a multi-dimensional array, it shuffles along the first axis.
- `np.random.choice(a, size=None, replace=True)` &ndash; Randomly selects elements from the array `a`. You can specify the size of the output and whether to **sample with or without replacement**.
- `np.random.permutation(x)`: Randomly permutes a sequence or returns a permuted range. This returns a new array and does not modify the original.

Examples:

In [None]:
# Create a 1-D array
arr = np.array([1, 2, 3, 4, 5])
print("Original array:", arr)

# Shuffle the array in place
np.random.shuffle(arr)
print("Shuffled array:", arr)

# Sample 3 elements from a list without replacement
sample = np.random.choice(arr, size=3, replace=False)
print("\nSample without replacement:", sample)

To get reproducible results with the legacy module, you seed the global state using `np.random.seed()`. All subsequent calls to `np.random` functions will produce the same sequence of numbers as long as the seed is the same.

Because the RandomState is a single, global object, `np.random.seed()` affects all subsequent random operations, which can sometimes lead to unexpected behavior if multiple parts of your code rely on different streams of random numbers. There is a new method of leveraging the NumPy random module (uses Generators). However, this legacy method should work for our purposes.

In [None]:
# Use the same seed for both runs
np.random.seed(42)
print(f"First run with seed 42: {np.random.randint(1, 100)}")

np.random.seed(42)
print(f"Second run with seed 42: {np.random.randint(1, 100)}")

# Without seeding, the result will be different
print(f"Result without seeding: {np.random.randint(1, 100)}")

#### **NumPy Performance**

In [None]:
import time
import numpy as np

# Python list (loop)
start = time.time()
list_of_vals = []
for i in range(1000000):
    list_of_vals.append(i**2)
print(sum(list_of_vals))
end = time.time()
print("Python list time:", end - start)

# Python list (list comprehension)
start = time.time()
print(sum([i**2 for i in range(1000000)]))
end = time.time()
print("Python list time:", end - start)

# NumPy array
start = time.time()
print(np.sum(np.arange(1000000)**2))
end = time.time()
print("NumPy array time:", end - start)


In [None]:
import timeit
import numpy as np

# Python list with loop
setup_list_loop = ""
stmt_list_loop = """
list_of_vals = []
for i in range(1000000):
    list_of_vals.append(i**2)
sum(list_of_vals)
"""

# Python list with list comprehension
setup_list_comprehension = "numbers = list(range(1000000))"
stmt_list_comprehension = "[x**2 for x in numbers]"

# NumPy array
setup_numpy = "import numpy as np; numbers = np.arange(1000000)"
stmt_numpy = "numbers**2"

# Number of executions for timing
number_of_executions = 100

# Measure execution time
time_list_loop = timeit.timeit(stmt=stmt_list_loop, setup=setup_list_loop,
                               number=number_of_executions)
time_list_comprehension = timeit.timeit(stmt=stmt_list_comprehension,
                                        setup=setup_list_comprehension,
                                        number=number_of_executions)
time_numpy = timeit.timeit(stmt=stmt_numpy, setup=setup_numpy,
                           number=number_of_executions)

# Display results
print(f"Python list (loop) squaring time ({number_of_executions} runs): "
      f"{time_list_loop:.4f} seconds")
print(f"Python list (comprehension) squaring time ({number_of_executions} runs): "
      f"{time_list_comprehension:.4f} seconds")
print(f"NumPy array squaring time ({number_of_executions} runs): "
      f"{time_numpy:.4f} seconds")

# Calculate and display speedups
speedup_comprehension_over_loop = time_list_loop / time_list_comprehension
speedup_loop = time_list_loop / time_numpy
speedup_comprehension = time_list_comprehension / time_numpy

print(f"\nSpeedup of List Comprehension over loop-based list: "
      f"{speedup_comprehension_over_loop:.2f}x")
print(f"\nSpeedup of NumPy over loop-based list: {speedup_loop:.2f}x")
print(f"Speedup of NumPy over list comprehension: {speedup_comprehension:.2f}x")
