### [EXERCISE 1: Creation & Casting]
1. Create a 3x4 array of all ones using `np.ones()`.
2. Cast this array to `float32`.
3. Create an array of strings representing numbers: `['1.25', '-9.6', '42']`. Cast it to `float`.

In [3]:
import numpy as np

# 1. Create a 3x4 array of all ones using np.ones()
ones_array = np.ones((3, 4))
print("1. Original 3x4 array of ones:")
print(ones_array)

# 2. Cast this array to float32
ones_array_float32 = ones_array.astype(np.float32)
print("\n2. Array casted to float32:")
print(ones_array_float32)
print(f"   Dtype of ones_array_float32: {ones_array_float32.dtype}")

# 3. Create an array of strings representing numbers: ['1.25', '-9.6', '42']. Cast it to float.
string_numbers_array = np.array(['1.25', '-9.6', '42'])
print("\n3. Original array of strings:")
print(string_numbers_array)
string_numbers_array_float = string_numbers_array.astype(float)
print("   String array casted to float:")
print(string_numbers_array_float)
print(f"   Dtype of string_numbers_array_float: {string_numbers_array_float.dtype}")

1. Original 3x4 array of ones:
[[1. 1. 1. 1.]
 [1. 1. 1. 1.]
 [1. 1. 1. 1.]]

2. Array casted to float32:
[[1. 1. 1. 1.]
 [1. 1. 1. 1.]
 [1. 1. 1. 1.]]
   Dtype of ones_array_float32: float32

3. Original array of strings:
['1.25' '-9.6' '42']
   String array casted to float:
[ 1.25 -9.6  42.  ]
   Dtype of string_numbers_array_float: float64


# Lesson 1.6: Introduction to NumPy

## Introduction
**NumPy**, short for Numerical Python, is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays.

Many computational and data science packages use NumPy as the main building block. It is a fundamental library for scientific computing in Python.

### Key Features of NumPy:
* **ndarray**: An efficient multidimensional array providing fast array-oriented arithmetic operations and flexible broadcasting capabilities.
* **Vectorization**: Mathematical functions for fast operations on entire arrays of data without having to write loops.
* **Linear Algebra**: Tools for random number generation, Fourier transforms, and matrix manipulation.
* **C API**: For connecting NumPy with libraries written in C, C++, or FORTRAN.

### Advantages over Python Lists:
1. **Contiguous Memory**: NumPy internally stores data in a contiguous block of memory, independent of other built-in Python objects. This allows for significantly faster access and manipulation.
2. **Vectorized Operations**: NumPy algorithms written in C can operate on this memory without type checking or other Python overhead, performing complex computations without slow `for` loops.

![numpy_vs_list](https://github.com/juliustiew/6m-data-1.6-intro-numpy/blob/main/assets/numpy_vs_python_list.png?raw=1)

## Part 1: Performance Benchmark
To give you an idea of the performance difference, consider a NumPy array of one million integers and an equivalent Python list. We use the `%timeit` magic command to measure execution time.

In [1]:
import numpy as np
my_arr = np.arange(1_000_000)
my_list = list(range(1_000_000))

print("NumPy Vectorized Multiplication (my_arr * 2):")
%timeit my_arr2 = my_arr * 2

print("\nPython List Comprehension ([x * 2 for x in my_list]):")
%timeit my_list2 = [x * 2 for x in my_list]

NumPy Vectorized Multiplication (my_arr * 2):
696 µs ± 199 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Python List Comprehension ([x * 2 for x in my_list]):
30.8 ms ± 508 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


## Part 2: The ndarray (N-dimensional array)
The `ndarray` is a fast, flexible container for large datasets. It is a multidimensional array of fixed size with **homogeneous** elements (all elements must be of the same type).

Every array has:
* **shape**: A tuple indicating the size of each dimension.
* **dtype**: An object describing the data type of the array.
* **ndim**: The number of dimensions (axes).

### ndarray illustration
![ndarray](https://github.com/juliustiew/6m-data-1.6-intro-numpy/blob/main/assets/numpy_ndarray.png?raw=1)

In [None]:
# [DEMO] Creating arrays from sequences
data1 = [6, 7.5, 8, 0, 1]
arr1 = np.array(data1)

data2 = [[1, 2, 3, 4], [5, 6, 7, 8]]
arr2 = np.array(data2)

print(f"Array 2:\n{arr2}")
print(f"Shape: {arr2.shape}, Dtype: {arr2.dtype}, Dimensions: {arr2.ndim}")

### Data Types and Casting
NumPy supports specific numerical types like `int32`, `float64`, etc. You can explicitly convert an array from one `dtype` to another using the `astype` method.

**Note:** If you cast floating-point numbers to an integer `dtype`, the decimal part will be truncated.

In [None]:
# [DEMO] Casting arrays
arr = np.array([3.7, -1.2, 0.5, 12.9])
print("Original:", arr)
print("Casted to int32:", arr.astype(np.int32))

### [EXERCISE 1: Creation & Casting]
1. Create a 3x4 array of all ones using `np.ones()`.
2. Cast this array to `float32`.
3. Create an array of strings representing numbers: `['1.25', '-9.6', '42']`. Cast it to `float`.

In [None]:
# Your code here


## Part 3: Arithmetic & Broadcasting
Arithmetic operations are applied as batch operations without for loops. **Broadcasting** describes how arithmetic works between arrays of different shapes.

![vectorization](https://github.com/juliustiew/6m-data-1.6-intro-numpy/blob/main/assets/vectorization.png?raw=1)

Example: A scalar value being replicated (broadcast) to match the shape of a larger array.

In [4]:
# [DEMO] Arithmetic & Broadcasting
arr = np.array([[1., 2., 3.], [4., 5., 6.]])
print("Element-wise multiplication (arr * arr):\n", arr * arr)
print("\nBroadcasting scalar (1 / arr):\n", 1 / arr)

Element-wise multiplication (arr * arr):
 [[ 1.  4.  9.]
 [16. 25. 36.]]

Broadcasting scalar (1 / arr):
 [[1.         0.5        0.33333333]
 [0.25       0.2        0.16666667]]


## Part 4: Indexing and Slicing
One-dimensional arrays act similarly to Python lists. In 2D arrays, indexing can be done with `[row, column]` syntax.

### 2D Array Indexing Syntax
![2d_array_indexing](https://github.com/juliustiew/6m-data-1.6-intro-numpy/blob/main/assets/ndarray_axis_index.png?raw=1)

**Important:** Array slices are **views** on the original array. This means data is not copied, and modifications to the slice will be reflected in the source array.

In [None]:
# [DEMO] Slicing views
arr = np.arange(10)
arr_slice = arr[5:8]
arr_slice[1] = 12345
print("Original array modified via slice:", arr)

# [DEMO] 2D Slicing
arr2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print("\nFirst two rows, columns 1 onwards:\n", arr2d[:2, 1:])

### [EXERCISE 2: The Logic of Slicing]
1. Select the first column of `arr2d` using a slice.
2. Set all values in the second row to 0.
3. **Socratic Prompt:** How does `arr2d[1]` differ from `arr2d[1, :]`? (Hint: check shapes)

In [None]:
# Your code here


## Part 5: Boolean Indexing
Like arithmetic operations, comparisons (such as `==`) with arrays are vectorized. This yields a boolean array which can be used to filter data.

In [5]:
# [DEMO] Filtering scores
names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'])
scores = np.array([[75, 80], [85, 90], [95, 100], [100, 77], [85, 92], [95, 80], [72, 80]])

bob_mask = (names == 'Bob')
print("Mask:", bob_mask)
print("Bob's scores:\n", scores[bob_mask])

Mask: [ True False False  True False False False]
Bob's scores:
 [[ 75  80]
 [100  77]]


### [EXERCISE 3: Complex Filtering]
1. Select all scores where the name is NOT 'Bob'.
2. Select scores for 'Bob' or 'Will' using the `|` operator.
3. Find all scores less than 80 and set them to 0.

In [6]:
# Your code here

# 1. Select all scores where the name is NOT 'Bob'.
not_bob_mask = (names != 'Bob')
print("1. Scores where name is NOT 'Bob':")
print(scores[not_bob_mask])

# 2. Select scores for 'Bob' or 'Will' using the | operator.
bob_or_will_mask = (names == 'Bob') | (names == 'Will')
print("\n2. Scores for 'Bob' or 'Will':")
print(scores[bob_or_will_mask])

# 3. Find all scores less than 80 and set them to 0.
# Create a copy to avoid modifying the original 'scores' array for demonstration purposes
modified_scores = scores.copy()
modified_scores[modified_scores < 80] = 0
print("\n3. Scores with values less than 80 set to 0:")
print(modified_scores)

1. Scores where name is NOT 'Bob':
[[ 85  90]
 [ 95 100]
 [ 85  92]
 [ 95  80]
 [ 72  80]]

2. Scores for 'Bob' or 'Will':
[[ 75  80]
 [ 95 100]
 [100  77]
 [ 85  92]]

3. Scores with values less than 80 set to 0:
[[  0  80]
 [ 85  90]
 [ 95 100]
 [100   0]
 [ 85  92]
 [ 95  80]
 [  0  80]]


## Part 6: Universal Functions (ufuncs) and Methods
A **ufunc** is a function that performs element-wise operations on data in ndarrays.

* **Unary ufuncs**: Take one array (e.g., `sqrt`, `exp`).
* **Binary ufuncs**: Take two arrays (e.g., `add`, `maximum`).
* **Statistical Methods**: `mean`, `sum`, `std` can be computed over the entire array or along an axis.

In [None]:
# [DEMO] Statistical Methods
arr = np.random.randn(3, 4)
print("Random Array:\n", arr)
print("\nMean down rows (axis=0):", arr.mean(axis=0))
print("Sum across columns (axis=1):", arr.sum(axis=1))

## Part 7: Linear Algebra
Linear algebra operations, like matrix multiplication, are crucial for many data science algorithms. Multiplying two arrays with `*` is an element-wise product; for matrix multiplication, use `.dot()` or the `@` operator.

![matrix_multiplication](https://github.com/juliustiew/6m-data-1.6-intro-numpy/blob/main/assets/matrix_multiplication.png?raw=1)

In [7]:
# [DEMO] Matrix Multiplication
x = np.array([[1, 2, 3], [4, 5, 6]])
y = np.array([[6, 23], [-1, 7], [8, 9]])

print("Matrix product (x @ y):\n", x @ y)

Matrix product (x @ y):
 [[ 28  64]
 [ 67 181]]


### [EXERCISE 4: Reshaping & Statistics]
1. Create an array of 15 integers using `arange(15)` and reshape it to `(3, 5)`.
2. Calculate the average value of each row.
3. Use `np.unique()` to find distinct elements in an array of your choice.
4. Transpose the reshaped array using `.T` and check the new shape.

In [8]:
# Your code here
import numpy as np

# 1. Create an array of 15 integers using arange(15) and reshape it to (3, 5).
reshaped_array = np.arange(15).reshape(3, 5)
print("1. Reshaped array (3x5):")
print(reshaped_array)

# 2. Calculate the average value of each row.
row_averages = reshaped_array.mean(axis=1)
print("\n2. Average value of each row:")
print(row_averages)

# 3. Use np.unique() to find distinct elements in an array of your choice.
# Using the reshaped_array for this purpose
unique_elements = np.unique(reshaped_array)
print("\n3. Distinct elements in the reshaped array:")
print(unique_elements)

# 4. Transpose the reshaped array using .T and check the new shape.
transposed_array = reshaped_array.T
print("\n4. Transposed array:")
print(transposed_array)
print(f"   New shape after transpose: {transposed_array.shape}")

1. Reshaped array (3x5):
[[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]]

2. Average value of each row:
[ 2.  7. 12.]

3. Distinct elements in the reshaped array:
[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14]

4. Transposed array:
[[ 0  5 10]
 [ 1  6 11]
 [ 2  7 12]
 [ 3  8 13]
 [ 4  9 14]]
   New shape after transpose: (5, 3)
