# Numpy Basics
<center><img src="../images/stock/pexels-waffle-truck.jpg"  alt="Person At a Food Truck" width="300"></center>

NumPy, the Numerical Python library, is very important for working with arrays. 

## Arrays - Overview
Arrays are data structures that hold collections of the same data type. Many Python libraries used for number-based calculations rely on NumPy.

Here's what makes NumPy arrays special:

* **Core Component:** The NumPy array is the main part of the NumPy library. It's like a grid of elements, all of the same type.
* **Indexing:** You find elements in a NumPy array using numbers (non-negative integers).
* **Memory and Speed:** NumPy arrays are like Python lists, but they use less memory and are often faster. This is because they use optimized, pre-made C code.
* **Element-wise Operations:** NumPy arrays let you do math on whole arrays at once, with simple, easy-to-read code.

## Installing NumPy

NumPy is a third-party library, meaning it's not included in Python's standard library. 

The easiest way to install it is with the command:

```bash
$ pip install NumPy
```

However, since we're using Jupyter Lab and JupyterHub, NumPy is already installed. Therefore, we only need to import NumPy into our programs.

```python
import numpy as np
```

In [None]:
# 👇 Import NumPy 👇
import numpy as np

## The NumPy ndarray

NumPy's N-dimensional array (ndarray) is a key feature that makes it a powerful tool for data analysis in Python. 

It offers a fast and flexible structure for managing large datasets, and its ability to apply mathematical operations to entire arrays with scalar-like syntax significantly simplifies and speeds up data manipulation.

### Mathematical Operation Preview

NumPy excels at batch computations by extending the intuitive syntax of scalar operations to entire arrays. This contrasts with Python's built-in objects, which generally require explicit iteration to achieve similar element-wise results.

Let's make an Arbitrary 2-D Array from 2 lists called `data`

In [None]:
# Arbitrary 2-D Array
data = np.array([[2, -1.5, 3],  [0, -16,  7.2]])
print(data)



How do you think we'd go about multiplying each and every element within `data` by the number 10? 

In [None]:
# Perform Mathematical Operation 🧠
data *= 10
data



It's quite different from how you might handle this with standard Python lists, isn't it? With NumPy, the beauty is that we can often express these kinds of array-wide operations using the same straightforward notation we're already familiar with from basic mathematics.

### Efficiency

* **Efficiency with Large Arrays:** NumPy is specifically designed for high performance when working with large datasets.

* **Contiguous Memory Storage:**
    * NumPy stores array data in a single, contiguous block of memory.
    * This is unlike Python's built-in objects, where elements can be scattered in memory.

* **C-Based Algorithms:**
    * NumPy's core algorithms are implemented in the C language.
    * These algorithms can operate directly on the contiguous memory block.
    * This avoids the overhead of Python's type checking and other interpreter-level operations for each element.

* **Reduced Memory Footprint:** NumPy arrays generally consume less memory compared to equivalent Python sequences (like lists) for large numerical datasets.

* **Vectorized Operations (No Python Loops):**
    * NumPy allows you to perform complex computations on entire arrays at once.
    * This eliminates the need for explicit and often slow Python `for` loops.

* **Speed Advantage:**
    * Due to its C-based algorithms and efficient memory management, NumPy operations are significantly faster than equivalent pure Python code.
    * Performance gains can often be in the range of **10 to 100 times faster or more** for large arrays.

* **Significant Memory Savings:** Along with speed, NumPy's efficient data storage leads to substantial reductions in memory usage compared to Python lists for numerical data.

**In essence:** NumPy's design prioritizes speed and memory efficiency for numerical computations on large arrays, making it an indispensable tool for data science, machine learning, and numerical competitions in Python.


### Performance Test
To truly appreciate the performance benefits we've been discussing, let's put it to the test. 

We're going to create two data structures, each containing one million integer values: 
* one as a NumPy array 
* one as a standard Python list

Get ready to see the difference when we perform a simple element-wise operation on both...

In [None]:
# Performance Test
numpy_array = np.arange(1_000_000)
python_list = list(range(1_000_000))

%timeit numpy_array * 2
%timeit [x * 2 for x in python_list]

Expect a dramatic improvement when using NumPy: its algorithms generally outperform pure Python by a factor of 10 to 100 (or more) in speed and require substantially less memory for large array operations.

## Creating an ndarray
<center><img src="../images/stock/pexels-azteca-food-truck.jpg" width="500" alt="Azteca Food Truck"></center>

### `numpy.array()`

Using the `array()` method, you can create a NumPy ndarray from data stored in one or more Python lists. 

#### Sequence to ndarray

When you convert a Python sequence into a NumPy ND array, the result is a brand-new array. 

Furthermore, if the original list is simple (not nested), this resulting NumPy array will be one-dimensional:

In [None]:
# List to ndarray

# Simple list
example_list = list(range(10,110,10))
print(f"Original Python List: {example_list}\n" +
      f"Type: {type(example_list)}\n\n")

# Convert to NumPy array
numpy_array = np.array(example_list)
print(f"Numpy Array: {numpy_array}")
print(f"Type: {type(numpy_array)}")


__The Result is a 1D Array with 1 axis:__

* __Axis 0:__ Runs along the array's elements.

#### `.ndim` and `.shape`

__`.ndim` (Number of Dimensions)__: This attribute of a NumPy array tells you the number of axes (dimensions) the array has.

* A 1D array (like [1, 2, 3]) has `.ndim` equal to 1.
* A 2D array (like [[1, 2], [3, 4]]) has `.ndim` equal to 2.
* A 3D array would have `.ndim` equal to 3, and so on.

__`.shape`__ (Shape of the Array): This attribute returns a tuple that indicates the size of the array along each dimension.

* For a 1D array with 5 elements, `.shape` would be `(5,)`. The trailing comma indicates it's a tuple with one element.
* For a 2D array with 3 rows and 4 columns, `.shape` would be `(3, 4)`.
* For a 3D array with dimensions 2x3x4, `.shape` would be `(2, 3, 4)`.

#### Nested Sequences to ndarray

If you provide `np.array()` with a nested Python list where the inner lists have the same length, NumPy will interpret this structure to create a multi-dimensional array. The arrangement of the inner lists becomes the rows (or higher-order dimensions) of the NumPy array.

In [None]:
# A nested Python list
nested_list = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
print(f"Original Nested Python List:\n{nested_list}\n" +
      f"Type: {type(nested_list)}\n\n")

# Convert the nested list to a NumPy ND array
numpy_array = np.array(nested_list)
print(numpy_array)
print(numpy_array.ndim)
print(numpy_array.shape)









The Result is a 2D array with 2 axes:

* __Axis 0:__ Runs vertically downward across the array's rows.
* __Axis 1:__ Runs horizontally across the array's columns.

### `np.zeros()`, `np.ones()`, `np.empty()`

Beyond `np.array()`, NumPy provides several convenient functions for generating new arrays directly. 

* `np.zeros()` and `np.ones()` allow you to create arrays filled with zeros or ones, respectively
* `np.empty()` creates an array without initializing its elements to any specific value, which can be faster in some situations where you intend to fill the array immediately
* When using these methods, you specify the desired size or shape (pass a tuple representing the desired shape).

In [None]:
# 1-D array of zeros
zeros_1d = np.zeros(5)
print(f"1D array of zeros:\n{zeros_1d}")
print(f"Shape: {zeros_1d.shape}")

# 2-D array of ones (3 rows, 4 columns)
ones_2d = np.ones((3, 4))
print(f"\n2D array of ones:\n{ones_2d}")
print(f"Shape: {ones_2d.shape}")

# 3-D array of uninitialized values (2 blocks, 3 rows, 2 columns)
empty_3d = np.empty((2, 3, 2))
print(f"\n3D array (uninitialized):\n{empty_3d}")
print(f"Shape: {empty_3d.shape}")

## Data Types for ndarrays
If you don't specify a data type during array creation, NumPy will automatically infer one from your data. This inferred type is stored in the array's `dtype` attribute.

NumPy's **data type (dtype)** is a special object defining the metadata of array elements, instructing NumPy how to interpret memory. 

This flexibility enables seamless interaction with diverse data sources by directly mapping to underlying binary representations, facilitating efficient data handling and interoperability with low-level languages. 

Numerical dtypes are named with a base type (like `int` or `float`) followed by the number of bits per element; for instance, Python's 64-bit double-precision float corresponds to NumPy's `float64`.

### Data Type Conversion

You can explicitly change or convert the data type of a NumPy array to a different one using the `.astype()` method of the ND array object.

In [None]:
# Data Type Conversion
float_array = np.arange(0.5,10.0)
print(f"Original array: {float_array}, dtype: {float_array.dtype}")

int_array = float_array.astype(int)
print(f"Converted array (to int): {int_array}, dtype: {int_array.dtype}")

float16_array = float_array.astype(np.float16)
print(f"Converted array (to float16): {float16_array}, dtype: {float16_array.dtype}")

### __`numpy.empty()` warning__:

Be aware that `numpy.empty()` does not provide a clean slate. 

It returns an array with uninitialized memory, which can contain unexpected (garbage) values. 

Only use this function if your next step is to immediately assign values to all elements of the array.

## Operations on Arrays

<center><img src="../images/stock/pexels-donut-truck.jpg" width="500" alt="Donut Food Truck"></center>
Consider a list for each food truck at a food hall, containing their revenue over the past three months, you can use the following code to combine all the revenue information into a single NumPy ndarray:

In [None]:
# Past 3 months of revenue (dollars) for 3 food trucks
the_ramen_rover = [25000, 27500, 26000]  
taco_time_machine = [32000, 33000, 34000]
the_waffle_wagon = [21000, 22000, 23000]

# Create a NumPy array to store the revenue data
food_truck_revenue = np.array([the_ramen_rover,
                              taco_time_machine,
                              the_waffle_wagon]
                             )

1. We start by importing the NumPy library as np.
2. We create three Python lists, each representing the past three months of revenue (dollars) for a specific food truck.
    * `the_ramen_rover`: Revenue for "The Ramen Rover."
    * `taco_time_machine`: Revenue for "Taco Time Machine."
    * `the_waffle_wagon`: Revenue for "The Waffle Wagon."
3. We use np.array() to create a NumPy array. We pass a list of lists to np.array(), where each inner list represents the revenue data for one food truck.
4. We then output the array:

In [None]:
# Output the Array
print(food_truck_revenue)
print(food_truck_revenue.ndim)
print(food_truck_revenue.shape)




### Element-Wise Operations

**What is an Element-Wise Operation?**

An element-wise operation means that you perform a calculation on corresponding elements of two or more arrays. The result is a new array where each element is the outcome of the calculation performed on the elements at the same position in the original arrays.

For example, if you add two arrays together using an element-wise addition, the element at position [i, j] in the resulting array will be the sum of the elements at position [i, j] in the original two arrays.

It's straightforward to perform element-wise operations on multiple NumPy arrays that have the same dimensions.

Let's say we had arrays of the each food trucks tips, we could add the monthly revenue and tips together for the total revenue:


In [None]:
# Tip data (in dollars)
the_ramen_rover_tips = [2000, 2500, 2200]
taco_time_machine_tips = [3000, 3200, 3500]
the_waffle_wagon_tips = [1800, 2000, 2100]

# Create NumPy array of tips
food_truck_tips = np.array([the_ramen_rover_tips,
                            taco_time_machine_tips,
                            the_waffle_wagon_tips]
                          )
print(food_truck_tips)
print(food_truck_tips.ndim)
print(food_truck_tips.shape)

# Perform element-wise addition to get total income
total_income = food_truck_revenue + food_truck_tips
print(f"Total Income Array: {total_income}")
print(total_income.ndim)
print(total_income.shape)





__Explanation__

1. We have the `food_truck_revenue` array, as before, containing 3 months of revenue for each food truck.
2. We create a new array, that stores 3 months of tip amounts for each food truck.
3. We use the `+` operator to perform element-wise addition on the revenue ndarray and the tips ndarray.
4. Step 3 creates a new ndarray for the total income, where each element is the sum of the corresponding revenue and tip elements.
Output:
5. We print total income to show the results.

In [None]:
# Output the Total Income



As you can see, the addition operation is a single line of code. The resulting dataset is also a NumPy ndarray, where each element is the sum of the corresponding elements from the the other ndarrays.

## Indexing NumPy Arrays
NumPy arrays are indexed using square brackets `[]`. You can access individual elements or slices of the array using these brackets.

### Indexing 1D Arrays

For a 1D array, you use a single index to access an element. The index starts at 0 for the first element.

Given the `food_truck_names` array, print the elements at index `0` and index `4`:

In [None]:
food_truck_names = np.array([
    "The Ramen Rover",
    "Taco Time Machine",
    "The Waffle Wagon",
    "Burger Brigade",
    "Pizza Patrol",
    "Sweet Street Treats",
    "Curry Cruiser",
    "Falafel Fleet",
    "Noodle Nation",
    "Donut Dynasty"
])

print(f"Food Truck Location Array:\n{food_truck_names}")
print(f"Dimensions: {food_truck_names.ndim}")
print(f"Shape: {food_truck_names.shape}")

# Ouput the Specified Elements
print(food_truck_names[0])
print(food_truck_names[4])


### Indexing 2D Arrays

For a 2D array, you use two indices, separated by a comma, to access an element. The first index refers to the row, and the second index refers to the column.

Given the `food_truck_locations` array below, output the elements located at (Row 0, Column 0) and (Row 2, Column 1):

In [None]:
food_truck_locations = np.array([
    ["Downtown Portland", "Food Cart Pod NW 23rd", "Waterfront Park"],
    ["SE Division Street", "Pearl District", "Alberta Arts District"],
    ["Beaverton Farmers Market", "PSU Campus", "Mississippi Avenue"],
])
print(f"Food Truck Location Array:\n{food_truck_locations}")
print(f"Dimensions: {food_truck_locations.ndim}")
print(f"Shape: {food_truck_locations.shape}")

# Output the Requested Elements
print(food_truck_locations[0,0])
print(food_truck_locations[2,1])





## Slicing NumPy Arrays

You can also use slicing to access a portion of an array. Slicing uses the colon `:` to specify a range of indices.

Explanation of Slicing:

* __`[start:end]`__ - elements from start index to end index - 1.
* __`[start:]`__ - elements from start index to the end.
* __`[:end]`__ - elements from the beginning to end index - 1.
* __`[:]`__ - all elements.
* For 2D arrays, you can use slicing for both rows and columns, separated by a comma.

### Slicing 1D Arrays

Slicing a one-dimensional NumPy array behaves very much like slicing a standard Python list.

In [None]:
# 1D Slicing
array_1d = np.arange(20)
print(f"Array: {array_1d}")
print(f"Dimensions: {array_1d.ndim}")
print(f"Shape: {array_1d.shape}")

# Basic Slicing ([start:stop])
print(array_1d[4:9])
# Slicing from Beginning to End ([:stop])
print(array_1d[:11])
# Slicing to the end ([start:])
print(array_1d[7:])
# Slicing with a step ([start:stop:step])
print(array_1d[2:10:2])




### Slicing 2D Arrays

Slicing with two-dimensional NumPy arrays extends the one-dimensional slicing concept to both axes (rows and columns). 

The syntax is `array[row_slice, column_slice]`.

In [None]:
# Slicing 2D Array
array_2d = np.arange(1, 17).reshape((4, 4))
print(f"Array:\n{array_2d}")

# Slice a sub-array: rows 0 and 1, columns 1 and 2
print(array_2d[0:2, 1:3])
# Slice the first two rows and all columns
print(array_2d[0:2, :])
# Slice the third column and all rows
print(array_2d[: , 2])
print(array_2d[:, 2:3])


#### `reshape()`

The `reshape()` method in NumPy allows you to change the **shape** (the number of dimensions and the size along each dimension) of an array **without changing its data**. 

It returns a new array with the specified shape.

* The total number of elements in the original array must be the same as the total number of elements in the new shape.
* For example, a 4x4 array (16 elements) can be reshaped into an 8x2 array (16 elements) or a 2x8 array (16 elements), but not a 3x5 array (15 elements).


### Slicing ND Arrays

For multi-dimensional arrays, you provide a slice for each axis, separated by commas. 

### Views in NumPy (Slicing and Reshaping)

In NumPy, operations like **slicing** and often **reshaping** do not create independent copies of the array data by default. 

Instead, they produce **views**. 

* A view is essentially a different way of looking at the same underlying data buffer. This means that if you modify a view, you are also modifying the original array, and vice versa. 

* This behavior is designed for memory efficiency, especially when dealing with large datasets, as it avoids unnecessary copying of data. 

* However, it's crucial to be aware of this behavior to prevent unintended side effects in your code. 

* If you need an independent copy, explicitly use the `.copy()` method.

In [None]:
# Original array
original_array = np.arange(6)
print(f"Original Array: {original_array}")

# Creating a slice (a view)
slice_view = original_array[1:4]
print(f"Slice View: {slice_view}")

# Modifying the view
slice_view[0] = 99
print(f"Modified Slice View: {slice_view}")

# Observing the change in the original array
print(f"Original Array After View Modification: {original_array}")

# Creating a copy explicitly
copied_array = original_array[1:4].copy()
copied_array[0] = -1
print(f"\nCopied Array: {copied_array}")

# Original array remains unchanged after modifying the copy
print(f"Original Array After Copy Modification: {original_array}")

## Numpy Statistical Functions

NumPy's statistical methods are your tools for understanding array data. You can easily compute the maximum value of the entire array or along specific axes.

### `numpy.max()`

You can quickly determine the overall maximum value in an array, or even find the maximum value along a specific row or column (or higher-dimensional axis). Remember the NumPy array we created in the previous section? 


Let's see how we can find its largest element using the `.max()` method.

In [None]:
# 👇 Use the max method on a previous array
maximum_value = original_array.max()
minimum_value = original_array.min()
mean_value = original_array.mean()

print(f"{maximum_value}, {minimum_value}, {mean_value}")









### `numpy.amax()`
You can also find the maximum value along a specific direction (axis) of your array using NumPy's `np.amax()` function. 

For our earlier multi-dimensional array, to find the biggest number in each row, you'd use `np.amax(your_array, axis=1)`. 

Setting `axis=1` tells NumPy to look horizontally across the columns of each row and find the maximum value there.


In [None]:
# 👇 Use the amax method on a previous multi-dimensional array










#### Other Statistical Methods

Here are a few useful statistical methods provided by NumPy:

* **`np.mean(array, axis=None)`**: Calculates the arithmetic mean (average) of array elements. You can compute the mean of the entire array (`axis=None`) or along a specific axis (e.g., `axis=0` for columns, `axis=1` for rows).

* **`np.median(array, axis=None)`**: Computes the median (the middle value) of array elements. Similar to `np.mean`, you can calculate the median of the whole array or along a specified axis.

* **`np.std(array, axis=None)`**: Calculates the standard deviation, a measure of the amount of dispersion or spread of a set of values. You can compute it for the entire array or along an axis.

* **`np.min(array, axis=None)`**: Finds the minimum value in an array, similar to `np.max()`. You can find the global minimum or the minimum along a particular axis.

* **`np.sum(array, axis=None)`**: Calculates the sum of array elements. You can sum all elements (`axis=None`) or sum along a specific axis.

For a comprehensive list and detailed explanations of all of NumPy's statistical functions, you can refer to the official NumPy documentation:

[NumPy Statistics](https://numpy.org/doc/stable/reference/routines.statistics.html)

## Conclusion

This notebook has provided a foundational introduction to the NumPy library, a cornerstone of numerical computing in Python. We've explored why NumPy is essential, particularly for handling large datasets efficiently.

Key takeaways include:

* **Efficiency:** NumPy's ND array structure and C-based algorithms offer significant speed and memory advantages over standard Python lists for numerical operations.
* **Creating NDArrays:** We learned various ways to create NumPy arrays, including converting Python lists using `np.array()` and generating arrays with specific values using functions like `np.zeros()`, `np.ones()`, and `np.empty()`.
* **Data Types (`dtype`):** Understanding NumPy's data types is crucial for controlling memory usage and ensuring compatibility with different data sources. NumPy automatically infers data types but allows for explicit casting with `.astype()`.
* **Element-wise Operations:** NumPy enables concise and efficient mathematical operations across entire arrays, mirroring scalar syntax and eliminating the need for explicit Python loops.
* **Indexing and Slicing:** NumPy's powerful indexing and slicing capabilities allow for flexible access and manipulation of array subsets, extending beyond the basic indexing of Python sequences.
* **NumPy Statistical Functions:** We touched upon NumPy's built-in statistical functions like `np.max()`, `np.amax()`, `np.mean()`, `np.median()`, `np.std()`, `np.min()`, and `np.sum()`, which provide convenient tools for analyzing array data along specific axes or across the entire array.

NumPy's capabilities extend far beyond what we've covered here, providing a rich ecosystem for advanced numerical tasks, linear algebra, Fourier analysis, random number generation, and more. This introduction serves as a stepping stone for further exploration and application of NumPy in various scientific and data-intensive domains within Python.

For more information on this vast library, check out the [NumPy Documentation](https://numpy.org/doc/stable/)