# NumPy - Numerical Python

- **Foundational package for numerical computing in Python**  
- It is widely used because of its following capabilities:

  - `ndarray` is an **efficient multidimensional array** for storing and manipulating numerical data.
  - Enables **faster mathematical operations** on entire arrays without writing explicit loops.
  - Supports **linear algebra, random number generation, and Fourier transform capabilities**.
  - Provides a **C API** for connecting NumPy with libraries written in C, C++, or FORTRAN.


## Why NumPy is Efficient for Large Array Data

- **NumPy is majorly used for its efficient handling of large amounts of array data.**
- It provides this efficiency because:
  - NumPy stores data internally in **contiguous memory blocks**, unlike other Python data structures.
  - The **NumPy library is written in C**, allowing it to use memory directly without type checks or other overhead.
  - Its operations perform **complex computations on entire arrays without the need for Python `for` loops.**
  - It **uses less memory** compared to other Python sequences.
    
Example on **how NumPy performs these optimizations will be demonstrated with a small example** in the next section.


In [2]:
import numpy as np

my_arr = np.arange(1000000)
my_list = list(range(1000000))

print("Time take to multiply entire array with 2: ")
%time for _ in range(10): my_arr2 = my_arr * 2

print("Time take to multiply entire list with 2: ")
%time for _ in range(10): my_list2 = [x * 2 for x in my_list]

Time take to multiply entire array with 2: 
CPU times: user 32.1 ms, sys: 12.7 ms, total: 44.8 ms
Wall time: 46.2 ms
Time take to multiply entire list with 2: 
CPU times: user 527 ms, sys: 363 ms, total: 890 ms
Wall time: 912 ms


## The NumPy `ndarray`: A Multidimensional Array Object

One of the **key features of NumPy** is its **N-dimensional array object (`ndarray`)**, which is a **fast and flexible container for large datasets in Python**.

- An `ndarray` is a **generic multidimensional container for homogeneous data**:
  - All elements in the array must be of the **same data type**.
  
- **Every array has:**
  - **`shape`**: A tuple indicating the **size of each dimension** of the array.
  - **`dtype`**: An object describing the **data type of the array**.


In [3]:
#Simple demonstration of numpy

import numpy as np

data = np.random.randn(2,3) #This will create a random array of dimension 2*3
print(data)

#Print the shape
print(f"Shape: {data.shape}")

#Print the datatype
print(f"Datatype: {data.dtype}"),

[[-0.8630654   0.47843727 -0.09398477]
 [-0.95014735  2.24061672 -1.80952752]]
Shape: (2, 3)
Datatype: float64


(None,)

## Creating `ndarray`s

There are many ways to create `ndarray`s in NumPy:

1. **Using the `array` function**  
   - Pass a **list or any sequence-like object** (including another array) to generate a new `ndarray`.
   - **Note:** Nested sequences (like a list of equal-length lists) will be converted into a **multidimensional array**.

2. **Arrays of zeros and ones**  
   - Use `np.zeros(shape)` to create an array filled with **zeros**.
   - Use `np.ones(shape)` to create an array filled with **ones**.

3. **Creating an empty array**  
   - Use `np.empty(shape)` to create an array that **may contain zeros or garbage values**, depending on the state of the memory.

4. **Using `arange`**  
   - `np.arange` is an **array-valued version of the built-in Python `range` function**.

**Note:**  
The **number of dimensions** of an array can be found using the **`ndim` attribute**.


In [4]:
#Method 1
list_data = [1,2,3,4,5]
arr1 = np.array(list_data)
print(f"Array 1: {arr1}")

#Multidimensional array
multidim_arr = np.array([[3,4,5],[2,3,4]])
print(f"Multidimensional array: {multidim_arr}")
print(f"Number of dimensions: {multidim_arr.ndim}")

#Method 2
arr2 = np.zeros((2,3))    #Here remember we have to pass tuple of dimension
print(f"Array 2: {arr2}")
arr2 = np.ones((2,3))
print(f"Array 2: {arr2}")

#Method 3
arr3 = np.empty((2,3))
print(f"Array 3: {arr3}")

#Method 4
arr4 = np.arange(15)
print(f"Array 4: {arr4}")

Array 1: [1 2 3 4 5]
Multidimensional array: [[3 4 5]
 [2 3 4]]
Number of dimensions: 2
Array 2: [[0. 0. 0.]
 [0. 0. 0.]]
Array 2: [[1. 1. 1.]
 [1. 1. 1.]]
Array 3: [[1. 1. 1.]
 [1. 1. 1.]]
Array 4: [ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14]


## Array Creation Functions in NumPy

| **Name**        | **Syntax**                                | **Description**                                                                                                                                 |
|-----------------|-------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------|
| `array`         | `np.array(data, dtype=None)`             | Converts input data (list, tuple, array, or other sequence) to an `ndarray` by inferring or explicitly specifying `dtype`; copies data by default. |
| `asarray`       | `np.asarray(data, dtype=None)`           | Converts input to `ndarray`, **does not copy** if the input is already an `ndarray`.                                                            |
| `arange`        | `np.arange(start, stop, step)`           | Like the built-in `range` but returns an `ndarray` instead of a list.                                                                          |
| `ones`, `ones_like` | `np.ones(shape, dtype=None)`, `np.ones_like(a)` | Produces an array of all 1s with the given shape and dtype; `ones_like` creates a ones array of the **same shape and dtype as another array**.  |
| `zeros`, `zeros_like` | `np.zeros(shape, dtype=None)`, `np.zeros_like(a)` | Like `ones` and `ones_like` but producing arrays of 0s instead.                                                                                |
| `empty`, `empty_like` | `np.empty(shape, dtype=None)`, `np.empty_like(a)` | Creates new arrays by allocating memory without populating values (may contain garbage values).                                                |
| `full`, `full_like`   | `np.full(shape, fill_value, dtype=None)`, `np.full_like(a, fill_value)` | Produces an array of the given shape and dtype with all values set to the indicated “fill value”; `full_like` uses another array’s shape and dtype. |
| `eye`, `identity`     | `np.eye(N)`, `np.identity(N)`       | Creates a square N × N **identity matrix** (1s on the diagonal, 0s elsewhere).                                                                 |


## Data Types of NumPy `ndarray`s

NumPy provides a **rich set of data types (`dtype`)** for efficient and precise storage of numerical and other data within `ndarray`s. Choosing the right `dtype` helps in:

- **Memory optimization** when working with large datasets  
- **Ensuring compatibility** with C and FORTRAN libraries  
- **Precision control** in numerical computations

---

### Table: Common NumPy Data Types

| **Type** | **Type Code** | **Description** |
|----------|---------------|-----------------|
| `int8`, `uint8` | `i1`, `u1` | Signed and unsigned 8-bit (1 byte) integer types |
| `int16`, `uint16` | `i2`, `u2` | Signed and unsigned 16-bit integer types |
| `int32`, `uint32` | `i4`, `u4` | Signed and unsigned 32-bit integer types |
| `int64`, `uint64` | `i8`, `u8` | Signed and unsigned 64-bit integer types |
| `float16` | `f2` | Half-precision floating point |
| `float32` | `f4` or `f` | Standard single-precision floating point (compatible with C `float`) |
| `float64` | `f8` or `d` | Standard double-precision floating point (compatible with C `double` and Python `float`) |
| `float128` | `f16` or `g` | Extended-precision floating point |
| `complex64`, `complex128`, `complex256` | `c8`, `c16`, `c32` | Complex numbers represented by two 32, 64, or 128-bit floats respectively |
| `bool` | `?` | Boolean type storing `True` and `False` values |
| `object` | `O` | Python object type; elements can be any Python object |
| `string_` | `S` | Fixed-length ASCII string type (1 byte per character); e.g., `'S10'` for 10-character strings |
| `unicode_` | `U` | Fixed-length Unicode string type; platform-dependent byte size; e.g., `'U10'` for 10-character Unicode strings |

---

### Additional Notes

✅ The **`dtype` attribute** of an `ndarray` shows the data type of its elements:
```python
import numpy as np
arr = np.array([1, 2, 3], dtype=np.int32)
print(arr.dtype)  # Output: int32


In [5]:
arr1 = np.array([1,2,3,4], dtype= np.float64)
print(arr1.dtype)
arr2 = np.array([1,2,3,4], dtype= np.int32)
print(arr2.dtype)

float64
int32


## Type Casting with `astype` in NumPy

You can **explicitly convert (cast) an array from one `dtype` to another** using the **`astype` method** of `ndarray`.

### Key Points:

- If you **cast floating-point numbers to an integer `dtype`**, the **decimal part will be truncated** (not rounded).  
- If you have an **array of strings representing numbers**, you can use `astype` to convert them to **numeric form** for computations.

### Note:
- Caution on numpy.string_ Type
- The numpy.string_ type has a fixed size and may truncate input strings without warning.
- For handling non-numeric or text data, it is often better to use pandas, which provides more intuitive behavior for variable-length string data.

**Calling astype always creates a new array (a copy of the data), even if the new dtype is the same as the old dtype.**

In [6]:
import numpy as np

# Floating-point to integer (decimal part truncated)
arr_float = np.array([3.7, 1.2, -2.8])
arr_int = arr_float.astype(np.int32)
print(arr_int)

# Strings representing numbers to numeric
arr_str = np.array(['1.5', '2.3', '3.7'])
arr_float_from_str = arr_str.astype(np.float64)
print(arr_float_from_str)

[ 3  1 -2]
[1.5 2.3 3.7]


## Vectorization and Broadcasting in NumPy

Arrays are important in NumPy because they **enable you to express batch operations on data without writing explicit `for` loops.** This makes your code:

- Faster  
- Cleaner  
- Closer to mathematical notation

---

### Vectorization

**Vectorization** refers to **performing operations on entire arrays without explicit loops**.

Any **arithmetic operations between equal-size arrays apply the operation element-wise.**

### Broadcasting

**Broadcasting** refers to **performing operations on arrays of different shapes and sizes by stretching the smaller array across the larger one automatically to perform element-wise operations**.

This allows:

- Operations between differently shaped arrays without making unnecessary copies.

- Efficient memory and computation for operations like adding a scalar or row/column to an entire matrix.


In [7]:
arr = np.array([[1,2,3], [4,5,6]])
arr

array([[1, 2, 3],
       [4, 5, 6]])

In [8]:
arr * arr

array([[ 1,  4,  9],
       [16, 25, 36]])

In [9]:
arr + arr

array([[ 2,  4,  6],
       [ 8, 10, 12]])

In [10]:
arr - arr

array([[0, 0, 0],
       [0, 0, 0]])

In [11]:
1/ arr

array([[1.        , 0.5       , 0.33333333],
       [0.25      , 0.2       , 0.16666667]])

In [12]:
arr ** 0.5

array([[1.        , 1.41421356, 1.73205081],
       [2.        , 2.23606798, 2.44948974]])

In [13]:
arr > 3

array([[False, False, False],
       [ True,  True,  True]])

## Basic Indexing and Slicing in NumPy

NumPy provides **powerful and intuitive indexing and slicing capabilities** for working with arrays efficiently.

---

### Assigning Values to Slices

- If you assign a **scalar value to a slice**, the value is **propagated (broadcasted) to the entire selection**.

**Example:**
```python
arr[5:8] = 12


In [14]:
arr = np.arange(10)
arr

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [15]:
print(arr[5])

5


In [16]:
#slice indexing
print(arr[4:7])

[4 5 6]


In [17]:
#assining value to slice
arr[3:6] = 12
print(arr)

[ 0  1  2 12 12 12  6  7  8  9]


## Difference Between NumPy Views and Python List Copies

Understanding how **slicing behaves in NumPy arrays vs. Python lists** is essential for efficient data manipulation.

---

### Python Lists: Slicing Creates Copies

- When you slice a Python list, it **creates a new independent copy** of the data.

- Modifying the slice **does *not* affect the original list.**



In [18]:
lst = [1, 2, 3, 4, 5]
sub_lst = lst[1:4]
sub_lst[0] = 99

print(lst)       # [1, 2, 3, 4, 5]
print(sub_lst)   # [99, 3, 4]

[1, 2, 3, 4, 5]
[99, 3, 4]


### NumPy Arrays: Slicing Creates Views

- When you slice a NumPy array, it **creates a view on the same data**, not a copy.

- Modifying the *slice also modifies* the original array.

- NumPy’s behavior is more memory-efficient when working with large datasets, allowing you to modify subarrays without duplicating data.

!! Be cautious: Changes to a slice in NumPy will affect the original array unless you explicitly create a copy using arr.copy().

In [19]:
arr = np.array([1, 2, 3, 4, 5])
sub_arr = arr[1:4]
sub_arr[0] = 99

print(arr)       
print(sub_arr) 

[ 1 99  3  4  5]
[99  3  4]


In [20]:
#Creating copy in numpy
arr = np.array([1,2,3,4,5])
sub_arr = arr[1:4].copy()
sub_arr[0] = 99

print(arr)       
print(sub_arr)   # [99, 3, 4

[1 2 3 4 5]
[99  3  4]


### Indexing in higher dimensions

- With higher dimensional arrays, you have many more options. In a two-dimensional array, the elements at each index are no longer scalars but rather one-dimensional arrays.
- Thus, individual elements can be accessed recursively. But that is a bit too much
work, so you can pass a comma-separated list of indices to select individual elements.

In [21]:
arr = np.array([[1,2,3,4],[2,3,4,5],[2,3,4,5]])
arr

array([[1, 2, 3, 4],
       [2, 3, 4, 5],
       [2, 3, 4, 5]])

In [22]:
arr[2]

array([2, 3, 4, 5])

In [23]:
arr[2][3]

np.int64(5)

In [24]:
arr3d = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])
arr3d

array([[[ 1,  2,  3],
        [ 4,  5,  6]],

       [[ 7,  8,  9],
        [10, 11, 12]]])

In [25]:
# Although arr3d[0][1] produces same result, it is efficient as it not invloves creating intermidiate array
arr3d[0,1] 

array([4, 5, 6])

In [26]:
# Slicing to next level
arr = np.array([[1,2,3],[2,3,4],[7,8,9],[10,22,12]])
#This is same like select first two rows
print(arr[:2])
#This is same like select last two rows
print(arr[2:])

[[1 2 3]
 [2 3 4]]
[[ 7  8  9]
 [10 22 12]]


In [27]:
#By mixing integer indexes and slices, you get lower dimensional slices.
print(arr[:2, 1:])

[[2 3]
 [3 4]]


In [28]:
print(arr[:2, 2])

[3 4]


## Boolean Indexing

In [29]:
names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'])
names

array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'], dtype='<U4')

In [30]:
data = np.random.randn(7, 4)
data

array([[-0.28273789, -1.11208525, -0.37231548,  0.31306062],
       [-0.44194162, -1.54063316, -1.05036211,  0.58080929],
       [ 1.11222845,  0.27647194,  1.48104476,  1.56377178],
       [ 0.35253815, -0.59100801, -0.42571688, -0.52159654],
       [ 0.77240543,  1.2324345 ,  0.70002945, -0.00208388],
       [ 0.85497889,  1.11829505,  0.47427978,  0.76630675],
       [ 1.14187451, -0.425524  , -0.72642112,  0.20043184]])

In [31]:
names == 'Bob'

array([ True, False, False,  True, False, False, False])

In [32]:
data[names=='Bob']

array([[-0.28273789, -1.11208525, -0.37231548,  0.31306062],
       [ 0.35253815, -0.59100801, -0.42571688, -0.52159654]])

In [33]:
data[~(names=='Bob')]

array([[-0.44194162, -1.54063316, -1.05036211,  0.58080929],
       [ 1.11222845,  0.27647194,  1.48104476,  1.56377178],
       [ 0.77240543,  1.2324345 ,  0.70002945, -0.00208388],
       [ 0.85497889,  1.11829505,  0.47427978,  0.76630675],
       [ 1.14187451, -0.425524  , -0.72642112,  0.20043184]])

In [34]:
#In numpy we use | and &, not normal and, or
mask = (names == 'Bob') | (names == 'Will')
data[mask]

array([[-0.28273789, -1.11208525, -0.37231548,  0.31306062],
       [ 1.11222845,  0.27647194,  1.48104476,  1.56377178],
       [ 0.35253815, -0.59100801, -0.42571688, -0.52159654],
       [ 0.77240543,  1.2324345 ,  0.70002945, -0.00208388]])

In [35]:
#To set all of the negative values in data to 0
data[data<0] = 0
data

array([[0.        , 0.        , 0.        , 0.31306062],
       [0.        , 0.        , 0.        , 0.58080929],
       [1.11222845, 0.27647194, 1.48104476, 1.56377178],
       [0.35253815, 0.        , 0.        , 0.        ],
       [0.77240543, 1.2324345 , 0.70002945, 0.        ],
       [0.85497889, 1.11829505, 0.47427978, 0.76630675],
       [1.14187451, 0.        , 0.        , 0.20043184]])

## Fancy Indexing

Fancy indexing is a term adopted by NumPy to describe indexing using integer arrays.

In [36]:
arr = np.empty((8, 4))
for i in range(8):
    arr[i] = i
arr

array([[0., 0., 0., 0.],
       [1., 1., 1., 1.],
       [2., 2., 2., 2.],
       [3., 3., 3., 3.],
       [4., 4., 4., 4.],
       [5., 5., 5., 5.],
       [6., 6., 6., 6.],
       [7., 7., 7., 7.]])

In [37]:
#To select out a subset of the rows in a particular order, you can simply pass a list or ndarray of integers specifying the desired order
arr[[4, 3, 0, 6]]

array([[4., 4., 4., 4.],
       [3., 3., 3., 3.],
       [0., 0., 0., 0.],
       [6., 6., 6., 6.]])

In [38]:
#Using negative indices selects rows from the end
arr[[-3, -5, -7]]

array([[5., 5., 5., 5.],
       [3., 3., 3., 3.],
       [1., 1., 1., 1.]])

In [39]:
#Passing multiple index arrays does something slightly different
#it selects a one-dimensional array of elements corresponding to each tuple of indices
arr = np.arange(32).reshape((8, 4))
arr

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19],
       [20, 21, 22, 23],
       [24, 25, 26, 27],
       [28, 29, 30, 31]])

In [40]:
arr[[1, 5, 7, 2], [0, 3, 1, 2]]
#Here the elements (1, 0), (5, 3), (7, 1), and (2, 2) were selected

array([ 4, 23, 29, 10])

In [41]:
#The behavior of fancy indexing in this case is a bit different from what some users might have expected
arr[[1, 5, 7, 2]][:, [0, 3, 1, 2]]

array([[ 4,  7,  5,  6],
       [20, 23, 21, 22],
       [28, 31, 29, 30],
       [ 8, 11,  9, 10]])

## Transposing Arrays and Swapping Axes

### Transposing Arrays

- **Transposing** is a **special form of reshaping** in NumPy that **returns a *view* on the underlying data without copying anything**.

- This allows you to **rearrange the axes of an array efficiently for numerical computations** without additional memory overhead.

---

### Methods to Transpose

1. **Using the `transpose` method**
2.  **Using the special `.T` attribute**

- Both methods swap the axes of the array to produce the transposed view.

**Transposing and swapping axes are essential when**:

- Performing matrix operations.

- Adjusting axes to match expected input shapes for functions.

- Rearranging data layouts without copying, ensuring efficient memory usage.

In [42]:
arr = np.arange(15).reshape((3, 5))
arr

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

In [43]:
arr.T

array([[ 0,  5, 10],
       [ 1,  6, 11],
       [ 2,  7, 12],
       [ 3,  8, 13],
       [ 4,  9, 14]])

In [44]:
arr.transpose()

array([[ 0,  5, 10],
       [ 1,  6, 11],
       [ 2,  7, 12],
       [ 3,  8, 13],
       [ 4,  9, 14]])

In [45]:
# For higher dimensional arrays, transpose will accept a tuple of axis numbers to per‐ mute the axes
arr = np.arange(16).reshape((2, 2, 4))
arr.transpose((1, 0, 2))
# Here, the axes have been reordered with the second axis first, the first axis second,and the last axis unchanged.

array([[[ 0,  1,  2,  3],
        [ 8,  9, 10, 11]],

       [[ 4,  5,  6,  7],
        [12, 13, 14, 15]]])

In [46]:
# Simple transposing with .T is a special case of swapping axes
# ndarray has the method swapaxes, which takes a pair of axis numbers and switches the indicated axes to rearrange the data
arr.swapaxes(1, 2)
#swapaxes similarly returns a view on the data without making a copy.

array([[[ 0,  4],
        [ 1,  5],
        [ 2,  6],
        [ 3,  7]],

       [[ 8, 12],
        [ 9, 13],
        [10, 14],
        [11, 15]]])

## Universal Functions (ufuncs): Fast Element-Wise Array Functions

A **universal function (ufunc)** in NumPy is a **function that performs fast, element-wise operations on `ndarray` data.**

You can think of ufuncs as **fast, vectorized wrappers around functions that take one or more scalar inputs and produce one or more scalar outputs.**

Ufuncs help in:
- Writing cleaner, loop-free numerical code  
- Utilizing optimized, compiled C backend for speed  
- Expressing element-wise operations naturally

---

## Table: Unary Universal Functions

| **Function** | **Description** |
|--------------|-----------------|
| `abs`, `fabs` | Compute the absolute value element-wise for integer, float, or complex values |
| `sqrt` | Compute the square root of each element (`arr ** 0.5`) |
| `square` | Compute the square of each element (`arr ** 2`) |
| `exp` | Compute the exponent *e^x* of each element |
| `log`, `log10`, `log2`, `log1p` | Natural log, log base 10, log base 2, log(1 + x) |
| `sign` | Compute the sign of each element: 1 (positive), 0 (zero), -1 (negative) |
| `ceil` | Compute the ceiling (smallest integer >= element) |
| `floor` | Compute the floor (largest integer <= element) |
| `rint` | Round to nearest integer, preserving dtype |
| `modf` | Return fractional and integral parts of each element as separate arrays |
| `isnan` | Boolean array indicating whether each value is NaN |
| `isfinite`, `isinf` | Boolean array indicating whether each value is finite or infinite |
| `cos`, `cosh`, `sin`, `sinh`, `tan`, `tanh` | Regular and hyperbolic trigonometric functions |
| `arccos`, `arccosh`, `arcsin`, `arcsinh`, `arctan`, `arctanh` | Inverse trigonometric functions |
| `logical_not` | Compute truth value of `not x` element-wise (equivalent to `~arr`) |

---

## Table: Binary Universal Functions

| **Function** | **Description** |
|--------------|-----------------|
| `add` | Element-wise addition |
| `subtract` | Element-wise subtraction (first - second) |
| `multiply` | Element-wise multiplication |
| `divide`, `floor_divide` | Element-wise division or floor division (truncating remainder) |
| `power` | Raise elements in first array to powers in second array |
| `maximum`, `fmax` | Element-wise maximum (`fmax` ignores NaN) |
| `minimum`, `fmin` | Element-wise minimum (`fmin` ignores NaN) |
| `mod` | Element-wise modulus (remainder after division) |
| `copysign` | Copy sign of values in second array to first array |
| `greater`, `greater_equal`, `less`, `less_equal`, `equal`, `not_equal` | Element-wise comparison, returning boolean arrays (equivalent to `>`, `>=`, `<`, `<=`, `==`, `!=`) |
| `logical_and`, `logical_or`, `logical_xor` | Compute element-wise logical operations |

---


In [47]:
arr = np.arange(10)
np.sqrt(arr)

array([0.        , 1.        , 1.41421356, 1.73205081, 2.        ,
       2.23606798, 2.44948974, 2.64575131, 2.82842712, 3.        ])

In [48]:
np.exp(arr)

array([1.00000000e+00, 2.71828183e+00, 7.38905610e+00, 2.00855369e+01,
       5.45981500e+01, 1.48413159e+02, 4.03428793e+02, 1.09663316e+03,
       2.98095799e+03, 8.10308393e+03])

In [49]:
x = np.random.randn(8)
y = np.random.randn(8)
np.maximum(x, y)

array([ 0.12849997,  0.83476973,  0.31425327, -0.98986966,  0.09767792,
        0.08326504, -0.54368871,  1.27982753])

In [50]:
# While not common, a ufunc can return multiple arrays. modf is one example, a vectorized version of the built-in Python divmod; it returns the fractional and integral parts of a floating-point array
arr = np.random.randn(7) * 5
print(arr)
remainder, whole_part = np.modf(arr)
print(remainder)
print(whole_part)

[ -7.91468106  -5.06372924  -3.60479243   4.37455909   2.92383102
 -12.76334452  -3.47932152]
[-0.91468106 -0.06372924 -0.60479243  0.37455909  0.92383102 -0.76334452
 -0.47932152]
[ -7.  -5.  -3.   4.   2. -12.  -3.]


In [51]:
#Ufuncs accept an optional out argument that allows them to operate in-place on arrays
print(arr)
np.sqrt(arr)

[ -7.91468106  -5.06372924  -3.60479243   4.37455909   2.92383102
 -12.76334452  -3.47932152]


  np.sqrt(arr)


array([       nan,        nan,        nan, 2.09154467, 1.70992135,
              nan,        nan])

In [52]:
np.sqrt(arr, arr)

  np.sqrt(arr, arr)


array([       nan,        nan,        nan, 2.09154467, 1.70992135,
              nan,        nan])

In [53]:
arr

array([       nan,        nan,        nan, 2.09154467, 1.70992135,
              nan,        nan])

## Mathematical and Statistical Methods in NumPy

NumPy provides a **set of mathematical functions that compute statistics on an entire array or along a specified axis.**

- These are available as **methods on the `ndarray` object** or as **top-level NumPy functions**.

- Such operations are often called **aggregations (reductions)** as they reduce an array to a single value or summary values along an axis.

---

## Table: Common Mathematical and Statistical Methods

| **Method** | **Description** |
|------------|-----------------|
| `sum` | Sum of all elements in the array or along an axis; zero-length arrays return 0 |
| `mean` | Arithmetic mean; zero-length arrays return NaN |
| `std`, `var` | Standard deviation and variance, respectively, with optional degrees of freedom adjustment (`ddof`, default denominator `n`) |
| `min`, `max` | Minimum and maximum values in the array |
| `argmin`, `argmax` | Indices of the minimum and maximum elements, respectively |
| `cumsum` | Cumulative sum of elements, starting from 0 |
| `cumprod` | Cumulative product of elements, starting from 1 |

---



In [54]:
arr = np.random.randn(5, 4)
arr

array([[ 0.56169134, -1.91419319, -1.28497082,  0.57616855],
       [ 0.84118541, -0.49722529,  0.94496615, -0.94069129],
       [ 0.49032824,  1.38487464,  0.88954773,  1.8982164 ],
       [-2.20457978, -0.1887958 ,  0.30625018, -1.13017121],
       [-1.54906699, -0.38085592,  0.1943239 , -0.10847915]])

In [55]:
print(f"mean: {arr.mean()}")
print(f"sum: {arr.sum()}")


mean: -0.1055738435785311
sum: -2.111476871570622


In [56]:
#Functions like mean and sum take an optional axis argument that computes the statistic over the given axis, resulting in an array with one fewer dimension
arr.mean(axis=1)
#arr.mean(1) means “compute mean across the columns” where arr.sum(0) means “compute sum down the rows.”

array([-0.51532603,  0.08705875,  1.16574175, -0.80432415, -0.46101954])

In [57]:
#In multidimensional arrays, accumulation functions like cumsum return an array of the same size, but with the partial aggregates computed along the indicated axis according to each lower dimensional slice
arr = np.array([[0, 1, 2], [3, 4, 5], [6, 7, 8]])
print(arr)
print(arr.cumsum(axis=0))

[[0 1 2]
 [3 4 5]
 [6 7 8]]
[[ 0  1  2]
 [ 3  5  7]
 [ 9 12 15]]


## Methods on boolean arrays

Boolean values are coerced to 1 (True) and 0 (False) in the preceding methods. Thus,
sum is often used as a means of counting True values in a boolean array

In [58]:
arr = np.random.randn(100)
(arr > 0).sum() # Number of positive values

np.int64(52)

There are two additional methods, any and all, useful especially for boolean arrays.
any tests whether one or more values in an array is True, while all checks if every
value is True

In [59]:
bools = np.array([False, False, True, False])
print(bools.any())
print(bools.all())

True
False


## Sorting

Like Python’s built-in list type, NumPy arrays can be sorted in-place with the sort
method

In [60]:
arr = np.random.randn(6)
arr

array([ 0.22808228, -0.41467204, -0.96388404, -0.07894934,  0.84011554,
        0.35265082])

In [61]:
arr.sort()
arr

array([-0.96388404, -0.41467204, -0.07894934,  0.22808228,  0.35265082,
        0.84011554])

You can sort each one-dimensional section of values in a multidimensional array in-
place along an axis by passing the axis number to sort

In [62]:
arr = np.random.randn(5, 3)
arr

array([[ 1.0929133 , -0.67774606,  0.74963256],
       [ 0.80389836,  1.14316611, -1.42635281],
       [ 0.4945161 , -0.0798549 ,  0.44507322],
       [-0.55522912,  0.08177985,  1.98555252],
       [-1.40746421,  1.04776429,  1.27223235]])

In [63]:
arr.sort(1)

The top-level method np.sort returns a sorted copy of an array instead of modifying
the array in-place.

## Unique and Other Set Logic in NumPy

NumPy provides **basic set operations for one-dimensional `ndarray`s**, enabling you to efficiently perform set-based computations during data analysis and preprocessing.

---

- A **commonly used function is `np.unique`**, which returns the **sorted unique values in an array**.
- Another useful function is `np.in1d`, which **tests membership of the values in one array within another**, returning a boolean array.

---

## Table: Array Set Operations

| **Function** | **Description** |
|--------------|-----------------|
| `unique(x)` | Compute the sorted, unique elements in `x`. |
| `intersect1d(x, y)` | Compute the sorted, common elements in `x` and `y`. |
| `union1d(x, y)` | Compute the sorted union of elements in `x` and `y`. |
| `in1d(x, y)` | Return a boolean array indicating whether each element of `x` is contained in `y`. |
| `setdiff1d(x, y)` | Compute the set difference: elements in `x` that are not in `y`. |
| `setxor1d(x, y)` | Compute the set symmetric difference: elements that are in either `x` or `y`, but not both. |

---

### Why These Are Useful

 Enable **fast set operations on numerical or categorical data** while leveraging NumPy’s vectorization.  
Useful for:
- Removing duplicates (`unique`)
- Finding common elements (`intersect1d`)
- Checking membership (`in1d`)
- Filtering or preparing datasets before merging or joining operations.

Return **sorted results for consistency in downstream processing**.



In [64]:
names = np.array(['Bob', 'Joe', 'Will', 'Bob', 'Will', 'Joe', 'Joe'])
np.unique(names)

array(['Bob', 'Joe', 'Will'], dtype='<U4')

In [65]:
values = np.array([6, 0, 0, 3, 2, 5, 6])
np.isin(values, [2, 3, 6])
#isin alternative for inld

array([ True, False, False,  True,  True, False,  True])

## Linear Algebra in NumPy

**Linear algebra** operations such as **matrix multiplication, decompositions, determinants, and square matrix operations** are essential in scientific computing, data analysis, and machine learning workflows.

---

### Key Points

- In **NumPy**, multiplying two 2D arrays with `*` **performs element-wise multiplication**, *not* matrix multiplication (unlike MATLAB).
- For **matrix multiplication**, use the `dot` function, available as:
  - An **array method**: `arr.dot()`
  - A **top-level function**: `np.dot()`

- The `numpy.linalg` module provides **standard matrix decompositions and linear algebra operations** like:
  - Inverses
  - Determinants
  - Eigenvalues and eigenvectors

These are implemented under the hood using **industry-standard linear algebra libraries** (BLAS, LAPACK, Intel MKL depending on your NumPy build), ensuring performance comparable to MATLAB or R.

---

## Table: Commonly Used `numpy.linalg` Functions

| **Function** | **Description** |
|--------------|-----------------|
| `diag` | Return the diagonal (or off-diagonal) elements of a square matrix as a 1D array, or convert a 1D array into a square matrix with zeros on the off-diagonal |
| `dot` | Matrix multiplication |
| `trace` | Compute the sum of the diagonal elements of a matrix |
| `det` | Compute the determinant of a matrix |
| `eig` | Compute the eigenvalues and eigenvectors of a square matrix |
| `inv` | Compute the inverse of a square matrix |
| `pinv` | Compute the Moore-Penrose pseudo-inverse of a matrix |
| `qr` | Compute the QR decomposition of a matrix |
| `svd` | Compute the Singular Value Decomposition (SVD) of a matrix |
| `solve` | Solve the linear system `Ax = b` for `x`, where `A` is a square matrix |
| `lstsq` | Compute the least-squares solution to `Ax = b` |

---

### Why This Is Useful

Essential for:
- Solving systems of linear equations
- Computing projections in ML models
- Dimensionality reduction (SVD, PCA)
- Feature engineering and data transformations

Allows leveraging **high-performance, vectorized, and compiled operations** within Python workflows without leaving the `NumPy` ecosystem.



In [66]:
x = np.array([[1., 2., 3.], [4., 5., 6.]])
y = np.array([[6., 23.], [-1, 7], [8, 9]])

In [67]:
x.dot(y)

array([[ 28.,  64.],
       [ 67., 181.]])

In [68]:
#x.dot(y) is equivalent to np.dot(x, y):
np.dot(x,y)

array([[ 28.,  64.],
       [ 67., 181.]])

In [69]:
# A matrix product between a two-dimensional array and a suitably sized one-dimensional array results in a one-dimensional array:
np.dot(x, np.ones(3))

array([ 6., 15.])

In [70]:
from numpy.linalg import inv, qr

X = np.random.randn(5, 5)
mat = X.T.dot(X) 
mat

array([[ 0.3652337 , -0.03393802, -1.28833521,  0.93636007, -1.32456083],
       [-0.03393802,  3.96343405, -0.96820831, -0.9257532 , -0.4289053 ],
       [-1.28833521, -0.96820831,  9.98836986, -2.74068956,  2.72406284],
       [ 0.93636007, -0.9257532 , -2.74068956,  2.97376586, -3.24689362],
       [-1.32456083, -0.4289053 ,  2.72406284, -3.24689362,  6.79066668]])

In [71]:
inv(mat)

array([[ 66.72574866,  -0.15961615,   3.99718804, -10.30882862,
          6.47264744],
       [ -0.15961615,   0.3955531 ,   0.09349932,   0.44335906,
          0.16833034],
       [  3.99718804,   0.09349932,   0.40073207,  -0.37234818,
          0.44679365],
       [-10.30882862,   0.44335906,  -0.37234818,   2.87749999,
         -0.4575803 ],
       [  6.47264744,   0.16833034,   0.44679365,  -0.4575803 ,
          1.02240395]])

In [72]:
q, r = qr(mat)
print(f"q: {q}")
print(f"r: {r}")

q: [[-0.17361427  0.02390002 -0.05170341  0.02816967  0.98276147]
 [ 0.01613248 -0.94774469 -0.1569106  -0.27612606  0.0255581 ]
 [ 0.61241166  0.17543772 -0.75673821 -0.13007246  0.06783802]
 [-0.44509986  0.26152509 -0.16174047 -0.83815504 -0.06947579]
 [ 0.62963155  0.04511134  0.61146801 -0.45115202  0.15523466]]
r: [[-2.10370783 -0.38110962  9.26008446 -5.22389987  7.61210114]
 [ 0.         -4.18845112  2.04528698  1.050179    0.30993103]
 [ 0.          0.         -5.23108994 -0.29551787  2.75181119]
 [ 0.          0.          0.         -0.38914445 -0.61542882]
 [ 0.          0.          0.          0.          0.151833  ]]


## Pseudorandom Number Generation in NumPy

The `numpy.random` module **extends Python’s built-in `random` module** with functions for **efficiently generating entire arrays of sample values from various probability distributions.**

---

### Key Points

- **Pseudorandom numbers** are generated **algorithmically with deterministic behavior** based on the **seed** of the random number generator.  
- You can control reproducibility in experiments using `np.random.seed` to set the global random seed.

- By default, **NumPy uses a global random state**. To **avoid global state** and ensure isolated reproducible streams of random numbers, you can use `numpy.random.RandomState` to create independent random number generators.

---

## Table: Commonly Used `numpy.random` Functions

| **Function** | **Description** |
|--------------|-----------------|
| `seed` | Seed the random number generator for reproducibility |
| `permutation` | Return a random permutation of a sequence, or a permuted range |
| `shuffle` | Randomly permute a sequence in place |
| `rand` | Draw samples from a uniform distribution (0, 1) |
| `randint` | Draw random integers from a specified low-to-high range |
| `randn` | Draw samples from a normal distribution with mean 0 and standard deviation 1 (MATLAB-like interface) |
| `binomial` | Draw samples from a binomial distribution |
| `normal` | Draw samples from a normal (Gaussian) distribution with specified mean and standard deviation |
| `beta` | Draw samples from a beta distribution |
| `chisquare` | Draw samples from a chi-square distribution |
| `gamma` | Draw samples from a gamma distribution |

---

### Why This Is Useful

Allows **vectorized, efficient generation of random samples** for:
- Simulations
- Testing algorithms
- Bootstrapping
- Random sampling in ML and statistics workflows

Supports **drawing samples from a wide range of distributions** necessary for scientific and data analysis tasks.

Ensures **reproducibility in experiments and projects** when seeds are controlled.


In [73]:
samples = np.random.normal(size=(4, 4))
samples

array([[-4.79007831e-04,  3.58353175e-01,  1.35439207e+00,
         6.03070657e-01],
       [-3.98074224e-01, -1.27204097e+00,  1.04298365e-01,
        -5.07912527e-01],
       [-2.89084317e-01,  3.23228723e-01,  7.90831058e-01,
         4.57049291e-02],
       [-5.96354459e-02, -1.20295636e-01, -1.40315206e+00,
        -1.65799435e+00]])

In [74]:
#change NumPy’s random number generation seed using np.random.seed
np.random.seed(1234)

In [75]:
#To avoid global state, you can use numpy.random.RandomState to create a random number generator isolated from others
rng = np.random.RandomState(1234)
rng.randn(10)

array([ 0.47143516, -1.19097569,  1.43270697, -0.3126519 , -0.72058873,
        0.88716294,  0.85958841, -0.6365235 ,  0.01569637, -2.24268495])

Python’s built-in random module, by contrast, only samples one value at a time. As
you can see from this benchmark, numpy.random is well over an order of magnitude
faster for generating very large samples

In [76]:
from random import normalvariate

N = 1000000

%timeit samples = [normalvariate(0, 1) for _ in range(N)]

875 ms ± 25.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [77]:
%timeit np.random.normal(size=N)

37.5 ms ± 1.14 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


# Tricky Questions On Numpy

## 1. Flatten without Copying Data

**Question:**  
How can you flatten a multi-dimensional NumPy array without copying data?

**Answer:**  
Use `ravel()`, which returns a flattened **view** if possible (while `flatten()` always returns a copy).

In [79]:
import numpy as np

a = np.arange(9).reshape(3, 3)
b = a.ravel()

print("Original Array:\n", a)
print("Flattened View:\n", b)

# Modifying view
b[0] = 999
print("After modifying view:\n", a)

Original Array:
 [[0 1 2]
 [3 4 5]
 [6 7 8]]
Flattened View:
 [0 1 2 3 4 5 6 7 8]
After modifying view:
 [[999   1   2]
 [  3   4   5]
 [  6   7   8]]


## 2. C_CONTIGUOUS vs F_CONTIGUOUS

**Question:**  
What is the difference between `C_CONTIGUOUS` and `F_CONTIGUOUS` in NumPy?

**Answer:**  
- `C_CONTIGUOUS` → row-major (C-style).
- `F_CONTIGUOUS` → column-major (Fortran-style).
- Affects traversal and performance in some operations.


In [80]:
a = np.asfortranarray(np.arange(9).reshape(3, 3))

print("Is C_CONTIGUOUS?", a.flags['C_CONTIGUOUS'])
print("Is F_CONTIGUOUS?", a.flags['F_CONTIGUOUS'])

Is C_CONTIGUOUS? False
Is F_CONTIGUOUS? True


## 3. Broadcasting Shapes

**Question:**  
How does broadcasting work for shapes `(3,1)` and `(1,4)`?

**Answer:**  
- `(3,1)` → broadcast to `(3,4)`
- `(1,4)` → broadcast to `(3,4)`
- Resulting shape is `(3,4)`.

In [81]:
a = np.ones((3, 1))
b = np.ones((1, 4))

print("Broadcasted shape:", (a + b).shape)
print(a + b)

Broadcasted shape: (3, 4)
[[2. 2. 2. 2.]
 [2. 2. 2. 2.]
 [2. 2. 2. 2.]]


## 4. View vs Copy with Slicing

**Question:**  
Does slicing return a view or a copy in NumPy?

**Answer:**  
Slicing returns a **view** (modifying it affects the original array).


In [82]:
a = np.array([1, 2, 3, 4])
b = a[1:3]
b[0] = 99

print("Modified Original Array:\n", a)

Modified Original Array:
 [ 1 99  3  4]


## 5️5. Masked Indexing Assignment

**Question:**  
Does `arr[mask] = value` modify the original array in-place?

**Answer:**  
Yes, it modifies the original array **in-place** at the masked locations.

In [83]:
a = np.array([1, 2, 3, 4, 5])
mask = a > 3
a[mask] = 99
print(a)

[ 1  2  3 99 99]


## 6. Stride Tricks and Sliding Window View

**Question:**  
What are strides, and how to use `sliding_window_view`?

**Answer:**  
Strides control how NumPy steps through memory.  
`sliding_window_view` allows creating sliding windows **without copying data**.


from numpy.lib.stride_tricks import sliding_window_view

a = np.arange(10)
windows = sliding_window_view(a, window_shape=3)
print(windows)

## 7. Structured Arrays

**Question:**  
What is a structured array in NumPy?

**Answer:**  
An array with fields of different data types, useful for tabular data.

In [85]:
dt = np.dtype([('name', 'S10'), ('age', 'i4')])
arr = np.array([('Alice', 25), ('Bob', 30)], dtype=dt)
print(arr)
print(arr['name'])

[(b'Alice', 25) (b'Bob', 30)]
[b'Alice' b'Bob']


## 8. np.dot vs np.matmul vs @

**Question:**  
Difference between `np.dot`, `np.matmul`, and `@`?

**Answer:**  
- 1D arrays: `np.dot` → inner product, `@` and `matmul` error.
- 2D arrays: All perform matrix multiplication.
- Higher dims: `matmul` and `@` support broadcasting, `dot` does not.


In [87]:
a = np.array([[1, 2], [3, 4]])
b = np.array([[5, 6], [7, 8]])

print(np.dot(a, b))
print(np.matmul(a, b))
print(a @ b)

[[19 22]
 [43 50]]
[[19 22]
 [43 50]]
[[19 22]
 [43 50]]


## 9.Zero-Dimensional Arrays

**Question:**  
How to access the value in a 0-D NumPy array?

**Answer:**  
Use `.item()` or `arr[()]`.

In [88]:
a = np.array(42)
print(a.shape)
print(a.item())
print(a[()])


()
42
42


## 10. Reverse Array with Negative Strides

**Question:**  
What does `a[::-1]` do in NumPy?

**Answer:**  
Reverses the array using **negative strides without copying**.

In [89]:
a = np.arange(5)
print(a[::-1])

[4 3 2 1 0]


## 11. Boolean Indexing Shape Mismatch

**Question:**  
What happens if the boolean mask does not match the shape of the array?

**Answer:**  
Raises an `IndexError`.

In [90]:
a = np.arange(4)
mask = np.array([True, False])
try:
    print(a[mask])
except Exception as e:
    print(e)

boolean index did not match indexed array along axis 0; size of axis is 4 but size of corresponding boolean axis is 2


##  12.NaN Comparisons

**Question:**  
Why does `np.nan == np.nan` return `False`?

**Answer:**  
By IEEE standard, `NaN` is not equal to itself.

In [91]:
import numpy as np

print(np.nan == np.nan)
print(np.isnan(np.nan))

False
True


## 13. Memory Usage per Element

**Question:**  
How to check memory usage per element?

**Answer:**  
Using `.itemsize`.


In [93]:
a = np.array([1, 2, 3], dtype=np.int32)
print(a.itemsize, "bytes per element")

4 bytes per element


## 14. Fancy Indexing vs Slicing

**Question:**  
Does fancy indexing return a view or copy?

**Answer:**  
Fancy indexing returns a **copy**, slicing returns a **view**.

In [94]:
a = np.arange(5)
b = a[[0, 2, 4]]
b[0] = 99
print("Original array (unchanged):", a)

Original array (unchanged): [0 1 2 3 4]


## 15. Conditional Replacement

**Question:**  
How to replace all negative values with zero?

**Answer:**  
Using:
```python
arr[arr < 0] = 0
```
or:
```python
np.clip(arr, 0, None, out=arr)
```


In [95]:
a = np.array([-3, 5, -1, 7])
a[a < 0] = 0
print(a)

[0 5 0 7]


## 16. np.argmax with Axis

**Question:**  
How does `np.argmax` work with `axis`?

**Answer:**  
Returns indices of max values along the specified axis.

In [97]:
a = np.array([[1, 5, 2], [7, 3, 9]])
print(np.argmax(a, axis=0))  # Max indices along each column
print(np.argmax(a, axis=1))  # Max indices along each row

[1 0 1]
[1 2]


## 17. Transpose and Memory

**Question:**  
Does `arr.T` create a copy?

**Answer:**  
No, it returns a **view** by modifying strides.

In [98]:
a = np.array([[1, 2], [3, 4]])
b = a.T
b[0, 0] = 99
print(a)

[[99  2]
 [ 3  4]]


## 18. Check if All Elements are Non-zero

**Question:**  
How to check if all elements in an array are non-zero?

**Answer:**  
Using `np.all(arr)`.

In [99]:
a = np.array([1, 2, 3])
print(np.all(a))

b = np.array([1, 0, 3])
print(np.all(b))

True
False


## 19. astype Copies

**Question:**  
Does `astype` create a copy?

**Answer:**  
Yes, it **always creates a copy** even if dtype is the same.

In [100]:
a = np.array([1, 2, 3])
b = a.astype(np.int32)
print(a is b)

False


## 20. Using np.where for Conditional Replacement

**Question:**  
How to use `np.where` for conditional replacement?

**Answer:**  
`np.where(condition, x, y)` returns `x` where condition is `True`, else `y`.

In [101]:
a = np.array([-1, 2, -3, 4])
b = np.where(a < 0, 0, a)
print(b)

[0 2 0 4]
