# Numpy

## Numpy Array

Using a **NumPy array** instead of a Python list is advantageous in many cases, especially when working with large datasets, numerical computations, or when performance and memory efficiency are important.

Here are key reasons and scenarios when you should use **NumPy arrays** over Python lists:

### 1. **Performance and Speed**
- **Why**: NumPy arrays are much faster than Python lists for numerical operations due to their implementation in C and the use of contiguous memory blocks. Lists are heterogeneous and more flexible but come with significant overhead.
- **When to Use**: When performing operations that involve large datasets or repeated numerical computations (e.g., element-wise addition, multiplication, etc.), NumPy arrays are significantly faster.
  
    **Example**: Adding two arrays
    ```python
    import numpy as np
    arr1 = np.array([1, 2, 3])
    arr2 = np.array([4, 5, 6])
    result = arr1 + arr2  # Element-wise addition
    ```
    The same operation using lists would involve looping through the elements, making it slower:
    ```python
    list1 = [1, 2, 3]
    list2 = [4, 5, 6]
    result = [x + y for x, y in zip(list1, list2)]
    ```

### 2. **Memory Efficiency**
- **Why**: NumPy arrays use much less memory than Python lists because they store elements of the <u>same data type</u> in contiguous memory locations. Python lists are <u>dynamic and store references to objects</u>, which leads to overhead in memory usage.
- **When to Use**: When you need to store large datasets in memory, especially numerical data (e.g., matrices, time series data), NumPy arrays are more efficient in terms of memory.

    **Example**: A list with integers has to store not only the integers but also the references to them, while NumPy arrays store the integers directly.
    ```python
    list_data = [1, 2, 3, 4, 5]
    numpy_data = np.array([1, 2, 3, 4, 5])
    ```

### 3. **Vectorized Operations**
- **Why**: NumPy arrays support **vectorized operations**, which allow for applying mathematical operations directly on arrays without the need for explicit loops. This results in concise and faster code execution.
- **When to Use**: When you need to perform operations on entire arrays or matrices at once (e.g., matrix multiplication, element-wise addition, etc.).

    **Example**: Applying a mathematical function element-wise to all elements in an array.
    ```python
    arr = np.array([1, 2, 3, 4, 5])
    result = arr * 2  # Element-wise multiplication
    ```

    In a Python list, you’d have to use a loop:
    ```python
    list_data = [1, 2, 3, 4, 5]
    result = [x * 2 for x in list_data]
    ```

### 4. **Multidimensional Arrays**
- **Why**: NumPy supports multidimensional arrays (e.g., 2D matrices, 3D tensors), making it much more powerful for working with complex datasets like matrices, image data, and time series data. Python lists can be used to simulate multi-dimensional arrays, but they are less efficient and more cumbersome.
- **When to Use**: When working with higher-dimensional data, such as matrices or tensors (e.g., in machine learning, scientific computing, or image processing).
  
    **Example**: Creating a 2D matrix (a list of lists in Python) and multiplying matrices:
    ```python
    matrix1 = np.array([[1, 2], [3, 4]])
    matrix2 = np.array([[5, 6], [7, 8]])
    result = np.dot(matrix1, matrix2)  # Matrix multiplication
    ```

    In Python lists, this would require manually implementing the multiplication.

### 5. **Broadcasting**
- **Why**: NumPy arrays support **broadcasting**, allowing operations between arrays of different shapes. This feature makes operations like adding a scalar to an array or adding arrays of different sizes much easier without the need for explicit looping.
- **When to Use**: When you need to perform operations between arrays of different shapes without writing complex loops.

    **Example**: Adding a scalar to all elements in an array:
    ```python
    arr = np.array([1, 2, 3])
    result = arr + 10  # Adds 10 to each element of the array
    ```

    Broadcasting also works between arrays of different shapes:
    ```python
    arr = np.array([[1, 2, 3], [4, 5, 6]])
    vector = np.array([1, 2, 3])
    result = arr + vector  # Broadcasts the vector across each row of the matrix
    ```

### 6. **Advanced Indexing and Slicing**
- **Why**: NumPy arrays provide powerful **indexing and slicing capabilities**, including boolean indexing, fancy indexing, and multidimensional slicing. Python lists, in contrast, offer limited indexing and slicing options.
- **When to Use**: When you need to manipulate or extract subsets of data using complex indexing and slicing operations.

    **Example**: Boolean indexing:
    ```python
    arr = np.array([1, 2, 3, 4, 5])
    result = arr[arr > 3]  # Extracts elements greater than 3
    ```
    Multidimensional slicing:
    ```python
    arr = np.array([[1, 2, 3], [4, 5, 6]])
    result = arr[:, 1]  # Extracts the second column
    ```

### 7. **Mathematical and Statistical Functions**
- **Why**: NumPy provides a vast library of built-in **mathematical, statistical, and linear algebra functions**. These functions are highly optimized and designed for efficient numerical computations on large arrays.
- **When to Use**: When you need to perform complex numerical operations (e.g., mean, median, standard deviation, matrix multiplication, etc.) on large datasets.

    **Example**: Calculating mean, sum, and standard deviation of an array:
    ```python
    arr = np.array([1, 2, 3, 4, 5])
    mean = np.mean(arr)
    total_sum = np.sum(arr)
    std_dev = np.std(arr)
    ```

### 8. **Interoperability with Other Libraries**
- **Why**: NumPy arrays are used as the standard array type in many **scientific libraries** such as Pandas, SciPy, and TensorFlow. Python lists, while more general-purpose, are not optimized for such tasks.
- **When to Use**: When you're working with machine learning, data science, or numerical computing libraries, it's best to use NumPy arrays for seamless integration.

---

### Summary: When to Use NumPy Arrays Over Lists

| **Reason**                             | **Use NumPy Arrays When**                                                     |
|----------------------------------------|-------------------------------------------------------------------------------|
| **Performance and Speed**              | You need faster computation on large datasets.                                |
| **Memory Efficiency**                  | You need to store large amounts of data and optimize memory usage.            |
| **Vectorized Operations**              | You want to perform element-wise operations efficiently without loops.        |
| **Multidimensional Arrays**            | You work with matrices or higher-dimensional data (e.g., images, time series).|
| **Broadcasting**                       | You need to perform operations on arrays of different shapes.                 |
| **Advanced Indexing and Slicing**      | You need powerful, complex slicing and indexing capabilities.                 |
| **Mathematical and Statistical Functions** | You need to perform efficient numerical computations (e.g., mean, standard deviation). |
| **Interoperability with Libraries**    | You work with libraries like Pandas, SciPy, TensorFlow, etc. that use NumPy arrays. |

In summary, NumPy arrays are more suitable when working with **large numerical datasets** and **performance-critical applications**, while Python lists are more general-purpose and flexible but less efficient for numerical operations.


In [3]:
import numpy as np
matrix1 = np.array([[1, 2], [3, 4]])
matrix2 = np.array([[5, 6], [7, 8]])
result = np.dot(matrix1, matrix2)
print(matrix1)
print(matrix2)
print(result)

[[1 2]
 [3 4]]
[[5 6]
 [7 8]]
[[19 22]
 [43 50]]


In [4]:
arr = np.array([[1, 2, 3], [4, 5, 6]])
vector = np.array([1, 2, 3])
result = arr + vector  # Broadcasts the vector across each row of the matrix
print(result)

[[2 4 6]
 [5 7 9]]


# Pandas DataFrame's apply() method
**Syntax**: 

```df.apply(function)```

```df.apply(function, axis= )```
- axis=0 -> function is applied over columns
- axis=1 -> function is applied over 

```df.apply(function, result_type= )``` 
- result_type='expand' -> function is applied over columns and returns a DataFrame
- result_type='reduce' -> function is applied over rows and returns a Series
- result_type='broadcast' -> function is applied over rows and returns a scalar
- result_type='series' -> function is applied over columns and returns a Series
- result_type='dataframe' -> function is applied over rows and returns a DataFrame

```df.apply(function, args= ) ```
- function(pd.Series, arg1, arg2, ..., kwarg1=val1, kwarg2=val2, ...)
- args -> [arg1, arg2, ...]

In [6]:
import pandas as pd

scores_df = pd.DataFrame({
    'math': [85, 90, 78, 92, 88],
    'science': [88, 87, 91, 89, 90],
    'english': [92, 91, 85, 88, 95]
})
scores_df


Unnamed: 0,math,science,english
0,85,88,92
1,90,87,91
2,78,91,85
3,92,89,88
4,88,90,95


In [8]:
import numpy as np

scores_df.apply(np.sqrt)

Unnamed: 0,math,science,english
0,9.219544,9.380832,9.591663
1,9.486833,9.327379,9.539392
2,8.831761,9.539392,9.219544
3,9.591663,9.433981,9.380832
4,9.380832,9.486833,9.746794


In [9]:
scores_df.apply(np.mean)

math       86.6
science    89.0
english    90.2
dtype: float64

In [11]:
def divide_by_two(x):
    return x / 2

scores_df.apply(divide_by_two)

Unnamed: 0,math,science,english
0,42.5,44.0,46.0
1,45.0,43.5,45.5
2,39.0,45.5,42.5
3,46.0,44.5,44.0
4,44.0,45.0,47.5


In [10]:
scores_df.apply(lambda x: x / 2)

Unnamed: 0,math,science,english
0,42.5,44.0,46.0
1,45.0,43.5,45.5
2,39.0,45.5,42.5
3,46.0,44.5,44.0
4,44.0,45.0,47.5


In [13]:
scores_df.apply(np.mean, axis=0) # apply function over columns

math       86.6
science    89.0
english    90.2
dtype: float64

In [14]:
scores_df.apply(np.mean, axis=1) # apply function over rows

0    88.333333
1    89.333333
2    84.666667
3    89.666667
4    91.000000
dtype: float64

In [20]:
def span(x):
    return [np.min(x), np.max(x)]
print(scores_df.apply(span))
print(scores_df.apply(span, result_type='expand'))
print(scores_df.apply(span, result_type='expand', axis=1))


   math  science  english
0    78       87       85
1    92       91       95
   math  science  english
0    78       87       85
1    92       91       95
    0   1
0  85  92
1  87  91
2  78  91
3  88  92
4  88  95


In [21]:
print(scores_df.apply(np.mean))
print(scores_df.apply(np.mean, result_type='broadcast'))
print(scores_df.apply(np.mean, axis=1))


math       86.6
science    89.0
english    90.2
dtype: float64
   math  science  english
0    86       89       90
1    86       89       90
2    86       89       90
3    86       89       90
4    86       89       90
0    88.333333
1    89.333333
2    84.666667
3    89.666667
4    91.000000
dtype: float64
