# Introduction to NumPy
### BUSI 520: Python for Business Research
### Kerry Back, JGSB, Rice University

## What is NumPy?

NumPy, short for Numerical Python, is a foundational package for numerical computations in Python. It provides support for large multi-dimensional arrays and matrices, along with a collection of mathematical functions to operate on these arrays.

## Why NumPy?

- **Performance**: NumPy operations are implemented in C, making them much faster than standard Python.
- **Memory Efficiency**: NumPy arrays are more compact and allow faster access in reading/writing items.
- **Convenience**: Offers a lot of mathematical functions and operations out of the box.
- **Interoperability**: Can integrate with a vast amount of other libraries and tools in the Python ecosystem.

## Installing NumPy

To install NumPy, you can use pip:
```
pip install numpy
```
Once installed, you can import it in your Python script or Jupyter notebook as follows:
```python
import numpy as np
```

In [35]:
import numpy as np

## Basics of NumPy Arrays

At the core of NumPy is the `ndarray` object, which encapsulates n-dimensional arrays of homogeneous data types. Let's explore some basic operations and properties of NumPy arrays.

In [36]:
# Creating a 1D array
arr_1d = np.array([1, 2, 3, 4, 5])
arr_1d

array([1, 2, 3, 4, 5])

In [37]:
# Creating a 2D array
arr_2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
arr_2d

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

### Array Attributes

NumPy arrays have several attributes that give information about the array's size, shape, and data type:

In [38]:
# Attributes of the 1D array
print('Shape:', arr_1d.shape)
print('Size:', arr_1d.size)
print('Number of dimensions:', arr_1d.ndim)
print('Data type:', arr_1d.dtype)

Shape: (5,)
Size: 5
Number of dimensions: 1
Data type: int32


In [39]:
# Attributes of the 2D array
print('Shape:', arr_2d.shape)
print('Size:', arr_2d.size)
print('Number of dimensions:', arr_2d.ndim)
print('Data type:', arr_2d.dtype)

Shape: (3, 3)
Size: 9
Number of dimensions: 2
Data type: int32


### Array Indexing and Slicing

Just like Python lists, NumPy arrays can be indexed and sliced. This allows for efficient access to and modification of the array's contents.

In [40]:
# Indexing a 1D array
print('First element:', arr_1d[0])
print('Second element:', arr_1d[1])
print('Last element:', arr_1d[-1])

First element: 1
Second element: 2
Last element: 5


In [41]:
# Slicing a 1D array
print('First three elements:', arr_1d[:3])
print('Elements from index 2 to 4:', arr_1d[2:5])
print('Every second element:', arr_1d[::2])

First three elements: [1 2 3]
Elements from index 2 to 4: [3 4 5]
Every second element: [1 3 5]


In [42]:
# Indexing a 2D array
print('Element at (0,0):', arr_2d[0, 0])
print('Element at (1,2):', arr_2d[1, 2])
print('Second row:', arr_2d[1])

Element at (0,0): 1
Element at (1,2): 6
Second row: [4 5 6]


In [43]:
# Slicing a 2D array
print('First two rows and first two columns:\n', arr_2d[:2, :2])
print('All rows, every other column:\n', arr_2d[:, ::2])

First two rows and first two columns:
 [[1 2]
 [4 5]]
All rows, every other column:
 [[1 3]
 [4 6]
 [7 9]]


### Array Operations

NumPy arrays support a variety of operations, both unary (operations with one operand) and binary (operations with two operands). These operations are performed element-wise, which means they are applied to each element of the array separately.

In [44]:
# Unary operations
print('Original array:\n', arr_1d)
print('Array + 5:\n', arr_1d + 5)
print('Array squared:\n', arr_1d**2)

Original array:
 [1 2 3 4 5]
Array + 5:
 [ 6  7  8  9 10]
Array squared:
 [ 1  4  9 16 25]


In [45]:
# Binary operations
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
print('arr1:', arr1)
print('arr2:', arr2)
print('arr1 + arr2:', arr1 + arr2)
print('arr1 * arr2:', arr1 * arr2)

arr1: [1 2 3]
arr2: [4 5 6]
arr1 + arr2: [5 7 9]
arr1 * arr2: [ 4 10 18]


### Mathematical Functions

NumPy provides a comprehensive set of mathematical functions that can be applied element-wise to arrays. These include trigonometric, logarithmic, exponential, and statistical functions, among others.

In [46]:
# Some mathematical functions
print('Sin values:', np.sin(arr_1d))
print('Natural logarithm:', np.log(arr_1d))
print('Exponential:', np.exp(arr_1d))

Sin values: [ 0.84147098  0.90929743  0.14112001 -0.7568025  -0.95892427]
Natural logarithm: [0.         0.69314718 1.09861229 1.38629436 1.60943791]
Exponential: [  2.71828183   7.3890561   20.08553692  54.59815003 148.4131591 ]


### Aggregation Functions

NumPy provides functions to compute aggregated values like sum, mean, median, etc. These can be applied to the entire array or along a specified axis in case of multi-dimensional arrays.

In [47]:
# Aggregation functions on 1D array
print('Sum:', np.sum(arr_1d))
print('Mean:', np.mean(arr_1d))
print('Standard Deviation:', np.std(arr_1d))

Sum: 15
Mean: 3.0
Standard Deviation: 1.4142135623730951


In [48]:
# Aggregation functions on 2D array
print('Total Sum:', np.sum(arr_2d))
print('Sum along columns:', np.sum(arr_2d, axis=0))
print('Sum along rows:', np.sum(arr_2d, axis=1))

Total Sum: 45
Sum along columns: [12 15 18]
Sum along rows: [ 6 15 24]


### Broadcasting

Broadcasting is a powerful feature in NumPy that allows for operations between arrays of different shapes. It does this by 'stretching' the smaller array to match the shape of the larger array, without actually copying any data.

In [49]:
# Broadcasting in action
arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print('Original array:\n', arr)

# Adding a scalar to a 2D array
print('\nArray after adding 5:\n', arr + 5)

# Adding a 1D array to a 2D array
vec = np.array([1, 0, -1])
print('\nArray after adding [1, 0, -1]:\n', arr + vec)

Original array:
 [[1 2 3]
 [4 5 6]
 [7 8 9]]

Array after adding 5:
 [[ 6  7  8]
 [ 9 10 11]
 [12 13 14]]

Array after adding [1, 0, -1]:
 [[2 2 2]
 [5 5 5]
 [8 8 8]]


### Advanced Indexing

Apart from basic indexing and slicing, NumPy offers more advanced indexing techniques. This includes integer array indexing and boolean indexing.

In [50]:
# Integer array indexing
print('Original array:\n', arr_1d)
indices = np.array([1, 3, 4])
print('Elements at indices 1, 3, and 4:', arr_1d[indices])

Original array:
 [1 2 3 4 5]
Elements at indices 1, 3, and 4: [2 4 5]


In [51]:
# Boolean indexing
print('Original array:\n', arr_1d)
mask = arr_1d > 3
print('Mask of elements greater than 3:', mask)
print('Elements greater than 3:', arr_1d[mask])

Original array:
 [1 2 3 4 5]
Mask of elements greater than 3: [False False False  True  True]
Elements greater than 3: [4 5]


### Reshaping and Transposing

NumPy provides functionalities to change the shape of arrays without changing their data. This is particularly useful when you need to prepare data for certain libraries or operations that expect data in a particular shape.

In [52]:
# Reshaping an array
print('Original 2D array:\n', arr_2d)
reshaped = arr_2d.reshape(1, 9)
print('\nReshaped to 1x9 array:\n', reshaped)

Original 2D array:
 [[1 2 3]
 [4 5 6]
 [7 8 9]]

Reshaped to 1x9 array:
 [[1 2 3 4 5 6 7 8 9]]


In [53]:
# Transposing an array
print('Original 2D array:\n', arr_2d)
transposed = arr_2d.T
print('\nTransposed array:\n', transposed)

Original 2D array:
 [[1 2 3]
 [4 5 6]
 [7 8 9]]

Transposed array:
 [[1 4 7]
 [2 5 8]
 [3 6 9]]


The shapes `(n, 1)` and `(n,)` might seem similar, but they represent different structures:

- `(n,)`: Represents a one-dimensional array with `n` elements.
- `(n, 1)`: Represents a two-dimensional array with `n` rows and 1 column.

Let's explore these shapes in more detail and see how to convert between them.

In [54]:
# Creating an array with shape (n,)
one_d_array = np.array([1, 2, 3, 4, 5])
print('1D array:', one_d_array)
print('Shape:', one_d_array.shape)

# Reshaping to (n, 1)
two_d_array = one_d_array.reshape(-1, 1)
print('\n2D array:\n', two_d_array)
print('Shape:', two_d_array.shape)

1D array: [1 2 3 4 5]
Shape: (5,)

2D array:
 [[1]
 [2]
 [3]
 [4]
 [5]]
Shape: (5, 1)


In [55]:
# Converting back to shape (n,)
converted_one_d_array = two_d_array.reshape(-1)
print('Converted 1D array:', converted_one_d_array)
print('Shape:', converted_one_d_array.shape)

Converted 1D array: [1 2 3 4 5]
Shape: (5,)


Another way to convert a multi-dimensional array into a one-dimensional array is by using the `flatten` method. This method returns a copy of the original array, flattened to one dimension.

The `ravel` method is another way to flatten multi-dimensional arrays into one dimension. It functions similarly to the `flatten` method but with a key difference:

- `flatten` always returns a copy of the data.
- `ravel` returns a flattened view of the original array whenever possible. 

Because of this behavior, modifications to the array returned by `ravel` might affect the original array, whereas modifications to the array returned by `flatten` will never affect the original array.

In [56]:
# Using the flatten method
flattened_array = two_d_array.flatten()
print('Flattened array:', flattened_array)
print('Shape:', flattened_array.shape)

# Using the ravel method
raveled_array = two_d_array.ravel()
print('Raveled array:', raveled_array)
print('Shape:', raveled_array.shape)

Flattened array: [1 2 3 4 5]
Shape: (5,)
Raveled array: [1 2 3 4 5]
Shape: (5,)


### Inner Products and Matrix Multiplication

Matrix operations are fundamental in linear algebra and have extensive applications in data science, especially in areas like machine learning.

#### Inner Product (Dot Product)
The inner product, also known as the dot product, between two vectors is a single number obtained by multiplying corresponding entries and then summing those products. For two vectors `a` and `b`, the dot product is given by:

a . b = a1 b1 + a2 b2 + ... + an bn

#### Matrix Multiplication
Matrix multiplication, on the other hand, is a way to combine two matrices to produce a new matrix. It's defined such that the number in the i-th row and j-th column of the resulting matrix is the dot product of the i-th row of the first matrix and the j-th column of the second matrix.

In Python, with NumPy, the `@` operator is used as a convenient way to perform matrix multiplication. It's more readable and concise than using the `np.dot()` function.

In [57]:
# Demonstrating the inner product (dot product)
vector_a = np.array([1, 2, 3])
vector_b = np.array([4, 5, 6])

dot_product = vector_a @ vector_b
dot_product

32

In [58]:
# Demonstrating matrix multiplication using @
matrix_A = np.array([[1, 2], [3, 4]])
matrix_B = np.array([[2, 0], [1, 3]])

result_matrix = matrix_A @ matrix_B
result_matrix

array([[ 4,  6],
       [10, 12]])

In [59]:
# Matrix and Vector Products

# Creating two matrices for demonstration
A = np.array([[1, 2], [3, 4]])
B = np.array([[2, 0], [1, 3]])

# Matrix multiplication using matmul()
matrix_product = np.matmul(A, B)
print('Matrix product using matmul():\n', matrix_product)

# Dot product of two vectors
v1 = np.array([1, 2, 3])
v2 = np.array([4, 5, 6])
dot_product = np.dot(v1, v2)
print('\nDot product of v1 and v2:', dot_product)

Matrix product using matmul():
 [[ 4  6]
 [10 12]]

Dot product of v1 and v2: 32


### Utility Functions for Creating Arrays

NumPy provides several utility functions to create specific types of arrays with ease. Here are explanations for the functions you mentioned:

1. **`np.ones`**:
   - This function creates an array filled with ones.
   - The desired shape of the array is provided as an argument.
   - Example: `np.ones((2, 3))` creates a 2x3 matrix filled with ones.

2. **`np.zeros`**:
   - This function creates an array filled with zeros.
   - The desired shape of the array is provided as an argument.
   - Example: `np.zeros((3, 3))` creates a 3x3 matrix filled with zeros.



3. **`np.diag`**:
   - This function has two main uses:
     - When provided with a 1-D array, it returns a 2-D array with the entries of the 1-D array as its diagonal and zeros elsewhere.
     - When provided with a 2-D array, it extracts the diagonal elements and returns them as a 1-D array.
   - Example: `np.diag([1, 2, 3])` creates a 3x3 diagonal matrix with the diagonal [1, 2, 3].

4. **`np.identity`**:
   - This function creates a square identity matrix of a specified size.
   - An identity matrix is a square matrix with ones on its main diagonal and zeros elsewhere.
   - Example: `np.identity(3)` creates a 3x3 identity matrix.

Let's demonstrate each of these functions with examples.

In [60]:
# Demonstrating np.ones
ones_array = np.ones((2, 3))
print('Array filled with ones:\n', ones_array)

# Demonstrating np.zeros
zeros_array = np.zeros((3, 3))
print('\nArray filled with zeros:\n', zeros_array)



Array filled with ones:
 [[1. 1. 1.]
 [1. 1. 1.]]

Array filled with zeros:
 [[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]


In [61]:
# Demonstrating np.diag
diagonal_matrix = np.diag([1, 2, 3])
print('\nDiagonal matrix:\n', diagonal_matrix)
extracted_diagonal = np.diag(diagonal_matrix)
print('\nExtracted diagonal from matrix:', extracted_diagonal)

# Demonstrating np.identity
identity_matrix = np.identity(3)
print('\nIdentity matrix:\n', identity_matrix)


Diagonal matrix:
 [[1 0 0]
 [0 2 0]
 [0 0 3]]

Extracted diagonal from matrix: [1 2 3]

Identity matrix:
 [[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]


### `np.concatenate`

The `np.concatenate` function is used to join two or more arrays along an existing axis. It's a versatile function that allows for the combination of arrays in various ways.

**Parameters**:
- `a1, a2, ...` : Arrays to be concatenated. They must have the same shape, except in the dimension corresponding to the specified axis.
- `axis` : The axis along which the arrays will be joined. Default is 0.
- `out` : If provided, the destination to place the result. The shape must be correct, matching that of what `concatenate` would have returned if no `out` argument were specified.

**Usage**:
```python
np.concatenate((a1, a2, ...), axis=0, out=None)
```

Let's look at some examples to understand how `np.concatenate` works.

In [62]:
# Creating two 1-D arrays for demonstration
array1 = np.array([1, 2, 3])
array2 = np.array([4, 5, 6])

# Concatenating along axis 0 (default)
concatenated_1d = np.concatenate((array1, array2))
print('Concatenated 1-D array:', concatenated_1d)



Concatenated 1-D array: [1 2 3 4 5 6]


In [63]:
# Creating two 2-D arrays for demonstration
matrix1 = np.array([[1, 2], [3, 4]])
matrix2 = np.array([[5, 6], [7, 8]])

# Concatenating along axis 0 (rows)
concatenated_rows = np.concatenate((matrix1, matrix2), axis=0)
print('\nConcatenated along rows:\n', concatenated_rows)

# Concatenating along axis 1 (columns)
concatenated_columns = np.concatenate((matrix1, matrix2), axis=1)
print('\nConcatenated along columns:\n', concatenated_columns)


Concatenated along rows:
 [[1 2]
 [3 4]
 [5 6]
 [7 8]]

Concatenated along columns:
 [[1 2 5 6]
 [3 4 7 8]]


### `np.arange` and `np.linspace`

The `np.arange` and `np.linspace` functions create grids of evenly spaced points between start and stop values.  For `np.arange`, you specify the space between points (step size).  For `np.linspace`, you specify the number of points (including start and stop).  


In [64]:
# Generating numbers from 0 to 4
sequence1 = np.arange(5)
print('Numbers from 0 to 4:', sequence1)

# Generating numbers from 2 to 8 with a step of 2
sequence2 = np.arange(2, 9, 2)
print('\nNumbers from 2 to 8 with a step of 2:', sequence2)

# Generating numbers from 0 to 1 with a float step
sequence3 = np.arange(0, 1.1, 0.1)
print('\nNumbers from 0 to 1 with step = 0.1:', sequence3)

# Demonstrating np.linspace
sequence4 = np.linspace(0, 1, 11)
print('\nNumbers from 0 to 1 with step = 0.1:', sequence4)


Numbers from 0 to 4: [0 1 2 3 4]

Numbers from 2 to 8 with a step of 2: [2 4 6 8]

Numbers from 0 to 1 with step = 0.1: [0.  0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1. ]

Numbers from 0 to 1 with step = 0.1: [0.  0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1. ]


### The `np.random` Module

The `np.random` module in NumPy provides a suite of functions to generate random numbers for various distributions. It's a crucial tool for simulations, statistical sampling, and many other tasks in data science and scientific computing. Here are some of the important elements of the `np.random` module:

1. **Random Number Generation**:
   - `rand()`: Generates random numbers between 0 and 1 in a given shape.
   - `randn()`: Generates random numbers from a standard normal distribution (mean 0 and variance 1).
   - `randint()`: Generates random integers between specified low and high values.

2. **Random Sampling**:
   - `choice()`: Generates a random sample from a given 1-D array.
   - `shuffle()`: Modifies a sequence in-place by shuffling its contents.
   - `permutation()`: Returns a shuffled version of a sequence or returns a permuted range.



3. **Sampling from Distributions**:
   - `binomial()`: Draws samples from a binomial distribution.
   - `normal()`: Draws samples from a normal (Gaussian) distribution.
   - `poisson()`: Draws samples from a Poisson distribution.
   - ... and many more for other distributions.

4. **Random Seed**:
   - `seed()`: Sets the random seed, which allows for reproducibility of random numbers generated.

Let's explore some of these functions with examples.

In [65]:
# Set a seed (not required)
np.random.seed(0)

# Random Number Generation

# Generating random numbers between 0 and 1
random_numbers = np.random.rand(5)
print('Random numbers between 0 and 1:', random_numbers)

# Generating random numbers from a standard normal distribution
normal_numbers = np.random.randn(5)
print('\nRandom numbers from a standard normal distribution:', normal_numbers)

# Generating random integers between 1 and 10
random_integers = np.random.randint(1, 10, size=5)
print('\nRandom integers between 1 and 10:', random_integers)

Random numbers between 0 and 1: [0.5488135  0.71518937 0.60276338 0.54488318 0.4236548 ]

Random numbers from a standard normal distribution: [-0.84272405  1.96992445  1.26611853 -0.50587654  2.54520078]

Random integers between 1 and 10: [6 9 5 4 1]


In [66]:
# Random Sampling

# Generating a random sample from a given 1-D array
sample_array = np.array([10, 20, 30, 40, 50])
random_choice = np.random.choice(sample_array, size=3)
print('Random sample from given array:', random_choice)

# Shuffling a sequence in-place
sequence_to_shuffle = np.array([1, 2, 3, 4, 5])
np.random.shuffle(sequence_to_shuffle)
print('\nShuffled sequence:', sequence_to_shuffle)

# Getting a permuted range
permuted_range = np.random.permutation(5)
print('\nPermuted range:', permuted_range)

Random sample from given array: [40 10 30]

Shuffled sequence: [5 3 2 1 4]

Permuted range: [0 1 2 4 3]


In [67]:
# Samples from Distributions

# Drawing samples from a binomial distribution
binomial_samples = np.random.binomial(n=10, p=0.5, size=5)
print('Samples from a binomial distribution:', binomial_samples)

# Drawing samples from a normal distribution with mean 0 and standard deviation 1
normal_samples = np.random.normal(loc=0, scale=1, size=5)
print('\nSamples from a normal distribution:', normal_samples)

# Drawing samples from a Poisson distribution with lambda=3
poisson_samples = np.random.poisson(lam=3, size=5)
print('\nSamples from a Poisson distribution:', poisson_samples)

Samples from a binomial distribution: [6 3 7 5 5]

Samples from a normal distribution: [ 1.08081191  0.8644362  -0.74216502  2.26975462 -1.45436567]

Samples from a Poisson distribution: [0 6 1 3 3]


### The `np.linalg` Module

The `np.linalg` module in NumPy provides a collection of linear algebra functions. Here are some of the important elements of the `np.linalg` module:

1. **Matrix and Vector Products**:
   - `dot()`: Computes the dot product of two arrays.
   - `matmul()`: Performs matrix multiplication.
   - `inner()`: Computes the inner product of two arrays.
   - `outer()`: Computes the outer product of two arrays.

2. **Matrix Eigenvalues**:
   - `eig()`: Computes the eigenvalues and right eigenvectors of a square array.
   - `eigh()`: Computes the eigenvalues and eigenvectors of a Hermitian or symmetric matrix.
   - `eigvals()`: Computes the eigenvalues of a square array.


3. **Norms and Other Numbers**:
   - `norm()`: Computes the norm of a matrix or vector.
   - `det()`: Computes the determinant of an array.
   - `matrix_rank()`: Computes the numerical rank of a matrix.

4. **Solving Equations and Inverting Matrices**:
   - `solve()`: Solves a linear matrix equation.
   - `inv()`: Computes the multiplicative inverse of a matrix.

Let's explore some of these functions with examples.

In [68]:
# Matrix Eigenvalues

# Eigenvalues and eigenvectors of matrix A
eigenvalues, eigenvectors = np.linalg.eig(A)
print('Eigenvalues of matrix A:', eigenvalues)
print('\nEigenvectors of matrix A:\n', eigenvectors)

Eigenvalues of matrix A: [-0.37228132  5.37228132]

Eigenvectors of matrix A:
 [[-0.82456484 -0.41597356]
 [ 0.56576746 -0.90937671]]


In [69]:
# Norms and Other Numbers

# Norm of vector v1
vector_norm = np.linalg.norm(v1)
print('Norm of vector v1:', vector_norm)

# Determinant of matrix A
matrix_determinant = np.linalg.det(A)
print('\nDeterminant of matrix A:', matrix_determinant)

# Rank of matrix A
matrix_rank = np.linalg.matrix_rank(A)
print('\nRank of matrix A:', matrix_rank)

Norm of vector v1: 3.7416573867739413

Determinant of matrix A: -2.0000000000000004

Rank of matrix A: 2


In [70]:
# Solving Equations and Inverting Matrices

# Solving a linear matrix equation Ax = b
b = np.array([5, 11])
x = np.linalg.solve(A, b)
print('Solution x for Ax = b:', x)

# Inverse of matrix A
inverse_A = np.linalg.inv(A)
print('\nInverse of matrix A:\n', inverse_A)

Solution x for Ax = b: [1. 2.]

Inverse of matrix A:
 [[-2.   1. ]
 [ 1.5 -0.5]]
