### What is NumPy?
NumPy, short for "Numerical Python," is a fundamental library for scientific computing in Python. It provides support for large, multi-dimensional arrays and matrices, along with a vast collection of mathematical functions to operate on these arrays efficiently.

Key features of NumPy include:

Multi-dimensional arrays: NumPy introduces the ndarray (n-dimensional array) object, which allows you to store and manipulate large arrays of homogeneous data types efficiently.

Broadcasting: NumPy enables arrays with different shapes to work together in arithmetic operations, making it easier to perform complex computations.

Vectorized operations: NumPy provides a wide range of mathematical functions that operate on entire arrays, eliminating the need for explicit loops and resulting in faster execution.

Integration with other libraries: NumPy serves as the foundation for many other Python libraries in the data science ecosystem, such as Pandas, Matplotlib, and Scikit-learn.

### What is Pandas?
Pandas is a powerful, open-source Python library designed for data manipulation and analysis. It provides high-performance, easy-to-use data structures and data analysis tools that make working with structured data a breeze.

Key features of Pandas include:

Data structures: Pandas introduces two primary data structures: Series (one-dimensional) and DataFrame (two-dimensional), which allow you to store and manipulate labeled and heterogeneous data efficiently.

Data manipulation: Pandas provides a wide range of functions and methods for data manipulation, such as filtering, sorting, grouping, merging, and reshaping data.

Handling missing data: Pandas offers various techniques for handling missing data, such as filling, dropping, or interpolating missing values.

Time series functionality: Pandas has extensive support for working with time series data, including date range generation, frequency conversion, and rolling window operations.

Integration with other libraries: Pandas seamlessly integrates with other Python libraries in the data science ecosystem, such as NumPy, Matplotlib, and Scikit-learn.

While NumPy and Pandas are closely related, they serve different purposes and have some key differences:

Data structures:

NumPy provides the ndarray for storing homogeneous data in multi-dimensional arrays.
Pandas introduces Series for one-dimensional labeled data and DataFrame for two-dimensional labeled data, which can hold heterogeneous data types.

Labeled data:

NumPy arrays do not have built-in support for labeled axes.
Pandas Series and DataFrame have labeled axes (index and columns), making it easier to work with structured and labeled data.

Functionality:

NumPy focuses on numerical computing and provides a wide range of mathematical functions for array operations.

Pandas builds on top of NumPy and offers additional functionality for data manipulation, cleaning, and analysis, such as handling missing data, merging datasets, and time series operations.

Use cases:

NumPy is primarily used for numerical computing, scientific computing, and performing mathematical operations on arrays.

Pandas is mainly used for data manipulation, data analysis, and data preprocessing tasks in data science and machine learning workflows.

Despite these differences, NumPy and Pandas share some similarities:

Performance: Both libraries are designed for high-performance operations on large datasets, leveraging the efficiency of C extensions under the hood.

Interoperability: Pandas is built on top of NumPy, and they can be used together seamlessly. NumPy arrays can be easily converted to Pandas Series or DataFrame, and vice versa.

Python ecosystem: Both NumPy and Pandas are essential components of the Python data science ecosystem and are widely used in conjunction with other libraries like Matplotlib and Scikit-learn.

Some key features of NumPy include:

Efficient memory usage: NumPy arrays are stored in contiguous memory blocks, allowing for faster access and computation compared to traditional Python lists.
Vectorized operations: NumPy provides a wide range of mathematical functions that operate on entire arrays, eliminating the need for explicit loops and resulting in concise and efficient code.

Broadcasting: NumPy allows arrays with different shapes to be used in arithmetic operations, enabling efficient and intuitive computations.

Over the years, NumPy has become an integral part of the scientific Python ecosystem. It serves as the foundation for many other popular libraries, such as:

SciPy: A library for scientific computing that builds upon NumPy, providing additional functionality for optimization, signal processing, and more.

Pandas: A data manipulation library that uses NumPy arrays as its underlying data structure.

Matplotlib: A plotting library that relies on NumPy for numerical computations and data representation.

In [1]:
import numpy as np
np.__version__

'2.3.3'

Here are some key characteristics of NumPy arrays:

Homogeneous: All elements in a NumPy array must be of the same data type.

Multidimensional: NumPy arrays can have one or more dimensions, allowing you to represent vectors, matrices, and higher-dimensional tensors.

Fixed Size: The size of a NumPy array is <b>fixed</b> at creation time and cannot be changed afterward.

NumPy arrays are represented by the ndarray object in NumPy. The dimensions of an array are called axes, and the number of axes is referred to as the rank of the array.

For example, a 1-dimensional array (rank 1) is a vector, a 2-dimensional array (rank 2) is a matrix, and arrays with rank 3 or higher are called higher-dimensional arrays or tensors.

<div style="text-align: center">
<img src="../image/scalar-vector-matrix-tensor.png" alt="From Pytopia GitHub - Dimension and Rank for Arrays in Numpy" width="400"/>
</div>

NumPy arrays offer several advantages over traditional Python lists:

Efficiency: NumPy arrays are stored in contiguous memory locations, allowing for faster access and manipulation compared to Python lists. This is especially beneficial when working with large datasets.

Vectorized Operations: NumPy provides a wide range of mathematical functions that operate element-wise on arrays, eliminating the need for explicit loops. This vectorization leads to more concise and efficient code.

Broadcasting: NumPy arrays support broadcasting, which allows arrays with different shapes to be used in arithmetic operations without the need for explicit reshaping. Broadcasting enables you to write more expressive and readable code.

Memory Efficiency: NumPy arrays are more memory-efficient than Python lists, especially for large datasets. NumPy uses fixed-size data types, which reduces memory overhead and allows for optimized memory allocation.

Interoperability: NumPy arrays seamlessly integrate with other scientific computing libraries in Python, such as SciPy, Pandas, and Matplotlib. This interoperability enables you to leverage the full power of the scientific Python ecosystem.


In [2]:
# Create numpy array
# Creating 1-Dimentional Arrays (Vectors)- Using various iterables in Python
# List will be used!

import numpy as np
arr0 = np.array([1, 2, 3, 4, 5,6]) # from list
arr0

arr1 = np.array((1, 2, 3, 4, 5,6)) # from tuple
arr1

arr2 = np.array(range(1,6,1)) # from tuple
arr2

array([1, 2, 3, 4, 5])

In [None]:
# np.arange() equivalent to range() in Python yet flexible to use float numbers - Argument: Start, Stop and Step with float numbers
arr3 = np.arange(1,30, 1.5)
arr3

array([ 1. ,  2.5,  4. ,  5.5,  7. ,  8.5, 10. , 11.5, 13. , 14.5, 16. ,
       17.5, 19. , 20.5, 22. , 23.5, 25. , 26.5, 28. , 29.5])

In [None]:
# np.linespace(start value of interval, stop value of interval inclusive, num of values to generate with 50 as default, endpoint which by default is True)

arr4 = np.linspace(10, 40)
arr4

array([10.        , 10.6122449 , 11.2244898 , 11.83673469, 12.44897959,
       13.06122449, 13.67346939, 14.28571429, 14.89795918, 15.51020408,
       16.12244898, 16.73469388, 17.34693878, 17.95918367, 18.57142857,
       19.18367347, 19.79591837, 20.40816327, 21.02040816, 21.63265306,
       22.24489796, 22.85714286, 23.46938776, 24.08163265, 24.69387755,
       25.30612245, 25.91836735, 26.53061224, 27.14285714, 27.75510204,
       28.36734694, 28.97959184, 29.59183673, 30.20408163, 30.81632653,
       31.42857143, 32.04081633, 32.65306122, 33.26530612, 33.87755102,
       34.48979592, 35.10204082, 35.71428571, 36.32653061, 36.93877551,
       37.55102041, 38.16326531, 38.7755102 , 39.3877551 , 40.        ])

In [8]:
arr5 = np.linspace(10, 40, 20)
arr5

array([10.        , 11.57894737, 13.15789474, 14.73684211, 16.31578947,
       17.89473684, 19.47368421, 21.05263158, 22.63157895, 24.21052632,
       25.78947368, 27.36842105, 28.94736842, 30.52631579, 32.10526316,
       33.68421053, 35.26315789, 36.84210526, 38.42105263, 40.        ])

In [11]:
arr6 = np.linspace(10, 40, 10, False) # Do not include the stop!
arr6

array([10., 13., 16., 19., 22., 25., 28., 31., 34., 37.])

In [2]:
# Creating 2-Dimensional Arrays (Matrices)
arr7 = np.array([[1,2,3], [4,5,6], [7,8,9]])
arr7
arr7[1,2]



np.int64(6)

In [None]:
# Using np.zeros() to create 2-d array filled with zeros! The tuple/list inside (number of rows, number of columns)
arr8 = np.zeros([2,3])
arr8
arr9 = np.zeros((3, 4))
arr9

array([[0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])

In [None]:
# using np.ones() - All elements are 1

arr10 = np.ones((3,5))
arr10

array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]])

In [None]:
# np.eye() to create a 2-D indentity matrix (main diagonal is one and therest are Zeros!) - Must be SQUARE so one number is enough

arr11 = np.eye(8,8)
arr11
arr12 = np.eye(5)
arr12

array([[1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 1.]])

In [23]:
# arr13 = np.diag() create 2-D diagonal matrix - Non-zero elements only on the main diagonal, and takes one-D array or a list as input

arr13 = np.diag([3,4,5,6,7,10])
arr13

array([[ 3,  0,  0,  0,  0,  0],
       [ 0,  4,  0,  0,  0,  0],
       [ 0,  0,  5,  0,  0,  0],
       [ 0,  0,  0,  6,  0,  0],
       [ 0,  0,  0,  0,  7,  0],
       [ 0,  0,  0,  0,  0, 10]])

In [None]:
# nD Array - Tensors
arrt0 = np.array([[[1,2], [3,4], [5,6]], [[7,8], [9,10],[11,12]], [[13,14],[15,16],[17,18]]])
arrt0

# Pay attention to the number of SQUARE BRAKETS after Curly one that gives us hits for dimentions of our Array
# We are creating 3 matrices, each have 3 rows and 2 columns (3,3,2) - To see the shape (dimensions)
arrt0.shape


(3, 3, 2)

In [3]:
arrt1 = np.zeros((4,2,5))
arrt1

array([[[0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.]],

       [[0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.]],

       [[0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.]],

       [[0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.]]])

In [4]:
arrt2 = np.ones((4, 2,5))
arrt2

array([[[1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.]],

       [[1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.]],

       [[1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.]],

       [[1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.]]])

In [None]:
# More than 3 Dimensions - Check the number of brakets
arrt3 = np.array([
    [[[1, 2], [3, 4]], [[5, 6], [7, 8]]],
    [[[9, 10], [11, 12]], [[13, 14], [15, 16]]]
])
arrt3 # Two blocks, each contain 2 matrices that has 2 rows and 2 columns

arrt3.shape

(2, 2, 2, 2)

In [32]:
arrt4 = np.zeros((3, 4, 3,5))
arrt4

array([[[[0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.]],

        [[0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.]],

        [[0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.]],

        [[0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.]]],


       [[[0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.]],

        [[0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.]],

        [[0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.]],

        [[0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.]]],


       [[[0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.]],

        [[0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.],
         [0., 0., 0., 0., 0.]],

        [[0., 0., 0., 0., 0.],
         [0., 0

In [33]:
arrt5 = np.ones((4, 3, 3,4))
arrt5

array([[[[1., 1., 1., 1.],
         [1., 1., 1., 1.],
         [1., 1., 1., 1.]],

        [[1., 1., 1., 1.],
         [1., 1., 1., 1.],
         [1., 1., 1., 1.]],

        [[1., 1., 1., 1.],
         [1., 1., 1., 1.],
         [1., 1., 1., 1.]]],


       [[[1., 1., 1., 1.],
         [1., 1., 1., 1.],
         [1., 1., 1., 1.]],

        [[1., 1., 1., 1.],
         [1., 1., 1., 1.],
         [1., 1., 1., 1.]],

        [[1., 1., 1., 1.],
         [1., 1., 1., 1.],
         [1., 1., 1., 1.]]],


       [[[1., 1., 1., 1.],
         [1., 1., 1., 1.],
         [1., 1., 1., 1.]],

        [[1., 1., 1., 1.],
         [1., 1., 1., 1.],
         [1., 1., 1., 1.]],

        [[1., 1., 1., 1.],
         [1., 1., 1., 1.],
         [1., 1., 1., 1.]]],


       [[[1., 1., 1., 1.],
         [1., 1., 1., 1.],
         [1., 1., 1., 1.]],

        [[1., 1., 1., 1.],
         [1., 1., 1., 1.],
         [1., 1., 1., 1.]],

        [[1., 1., 1., 1.],
         [1., 1., 1., 1.],
         [1., 1., 1., 1.]]]

In [None]:
# To see the data type - arr.dtype

np.array([2, 5, 7, 8]).dtype

np.array(["ali", "hassan", "hossein"]).dtype

dtype('<U7')

In [35]:
np.array([2.2, 5.5, 7.8, 8.12]).dtype

dtype('float64')

In [36]:
my_arr = np.array([2, 5, 7 ,8], dtype=np.float64)
my_arr

array([2., 5., 7., 8.])


Common NumPy Data Types
NumPy provides a wide range of data types to suit different numerical requirements. Here are some commonly used NumPy data types:

Integer Types:

* np.int8: 8-bit signed integer
* np.int16: 16-bit signed integer
* np.int32: 32-bit signed integer (default integer type)
* np.int64: 64-bit signed integer
* np.uint8: 8-bit unsigned integer
* np.uint16: 16-bit unsigned integer
* np.uint32: 32-bit unsigned integer
* np.uint64: 64-bit unsigned integer

Floating-Point Types:

+ np.float16: 16-bit half-precision floating-point
+ np.float32: 32-bit single-precision floating-point
+ np.float64: 64-bit double-precision floating-point (default float type)

Complex Types:

- np.complex64: Complex number represented by two 32-bit floats
- np.complex128: Complex number represented by two 64-bit floats

Boolean Type:

* np.bool: Boolean (True or False)

String Type:

np.str: String (fixed-length)

Object Type:

np.object: Python object
When creating arrays, if you don't specify the dtype parameter, NumPy will infer the data type based on the input data. However, it's good practice to explicitly specify the data type when you have specific requirements or when you want to ensure consistent behavior across different platforms.

In [None]:
# Random Number Generating
# The np.random.rand() function generates an array of random numbers uniformly distributed between 0 and 1 (exclusive). You can specify the shape of the array as arguments to the function.

arrand0 = np.random.random(10)

arrand0

array([0.36924015, 0.61859129, 0.98858654, 0.65650603, 0.24574872,
       0.10036479, 0.76893618, 0.87396358, 0.18327967, 0.03276587])

In [38]:
arrand1 = np.random.rand(2,3,4) # 2 container with two matrix of 3 rows and 4 columns 
arrand1

array([[[0.35049708, 0.40017932, 0.29296896, 0.19249129],
        [0.69575575, 0.08154046, 0.08315761, 0.0099896 ],
        [0.80747606, 0.56199767, 0.95566372, 0.78410407]],

       [[0.97209423, 0.15382216, 0.25250389, 0.82005074],
        [0.39342485, 0.89977645, 0.55029551, 0.76885331],
        [0.03706422, 0.18187076, 0.28648809, 0.81932571]]])

In [39]:
arrand2 = np.random.rand(3,4) # matrix of 3 rows and 4 columns 
arrand2

array([[0.35784636, 0.7808763 , 0.93183109, 0.68378367],
       [0.20684325, 0.91264596, 0.6503696 , 0.16942818],
       [0.71453807, 0.69050288, 0.75338713, 0.41788351]])

In [41]:
# The np.random.randn() function generates an array of random numbers from the standard normal distribution (mean = 0, variance = 1). Similar to np.random.rand(), you can specify the shape of the array as arguments to the function.

arrand3 = np.random.randn(2,3,6) # Create STD normal distribution (mean of = ZERO and Variance = 1)
arrand3

array([[[-0.10844484,  0.49144747, -1.16797756, -0.72813071,
          0.28242127,  0.26469591],
        [ 0.93047278,  0.07218503, -1.28921101, -1.44600511,
          0.97302299,  0.25398034],
        [-0.67803446,  0.33030851,  1.47095992, -0.52747128,
         -0.47031022,  0.28595902]],

       [[-1.12135124,  1.27913517,  0.64581597,  0.55671454,
         -1.73116276,  1.03269621],
        [ 0.75858518, -1.39347783,  0.93137542, -0.65373263,
          0.98880201,  1.99263788],
        [ 0.61069263,  0.42795112, -0.51924434,  1.69097501,
         -0.10687682, -0.56678422]]])

In [5]:
#The np.random.randint() function generates an array of random integers within a specified range. You can specify the lower and upper bounds of the range (inclusive) and the shape of the array. It is Uniform & randint(Start, Stop, Size = (you desire dimentions  such as 3, 4, 6))

arrand4 = np.random.randint(2,3,size=6)
arrand4

arrand41 = np.random.randint(2,4,size=6)
arrand41

array([3, 3, 3, 2, 2, 2])

In [None]:
arrand5 = np.random.randint(low=2,high=70,size=(2, 4 ,5))
arrand5

array([[44, 50, 58, 13,  6],
       [ 5, 34, 29, 39, 65],
       [37, 16, 33, 62,  5],
       [63, 40, 17, 61,  5]])

## Indexing NumPy Arrays
Indexing is a fundamental operation in NumPy that allows you to access individual elements or subsets of an array. In this section, we'll explore different ways to index NumPy arrays using square bracket notation, access elements in multi-dimensional arrays, and use negative indices.


Accessing Elements using Square Bracket Notation
To access elements in a NumPy array, you use square bracket notation [] followed by the index or indices of the desired element. Let's consider an example:

In [6]:
arr = np.array([10, 20, 30, 40, 50])
arr

array([10, 20, 30, 40, 50])

In [7]:
arr[0]

np.int64(10)

In [None]:
arr[2]

np.int64(30)

In [9]:
index = 1
arr[index]

np.int64(20)

## Accessing Elements in Multi-dimensional Arrays

In multi-dimensional arrays, you need to provide multiple indices to access elements. Each index corresponds to a specific dimension of the array. Let's consider a 2-dimensional array:

In [None]:
arr_2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
arr_2d

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [11]:
arr_2d[1, 2]

np.int64(6)

In [12]:
arr_2d[1][2]  # Equivalent to arr_2d[1, 2]

np.int64(6)

For higher-dimensional arrays, you simply provide more indices, one for each dimension.

## Using Negative Indices
In NumPy, you can use negative indices to access elements from the end of an array. The last element of an array has an index of -1, the second-to-last element has an index of -2, and so on. Let's consider an example:

In [13]:
arr = np.array([10, 20, 30, 40, 50])
arr

array([10, 20, 30, 40, 50])

In [14]:
arr[-1]

np.int64(50)

In [15]:
arr[-2]

np.int64(40)

Negative indexing allows you to access elements from the end of an array without knowing its exact length.

You can also use negative indices in multi-dimensional arrays:

In [10]:
arr_2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
arr_2d

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [11]:
arr_2d[1, -1]

np.int64(6)

## Slicing NumPy Arrays
Slicing is a powerful feature in NumPy that allows you to extract subsets of elements from an array. It provides a concise and efficient way to access contiguous sections of an array. In this section, we'll explore the basic slicing syntax and how to slice single-dimensional and multi-dimensional arrays, as well as how to use step size in slicing.

### Basic Slicing Syntax
The basic syntax for slicing a NumPy array is as follows:

`arr[start:end:step]`

start: The starting index of the slice (inclusive). If omitted, it defaults to 0.

end: The ending index of the slice <b>(exclusive)</b>. If omitted, it defaults to the length of the array.

step: The step size or stride of the slice. It determines the increment between each element in the slice. If omitted, it defaults to 1.

### Slicing Single-dimensional Arrays
Let's consider an example of slicing a single-dimensional array:

In [18]:
arr = np.array([1, 2, 3, 4, 5])
arr[1:4]

array([2, 3, 4])

In [19]:
arr[:3]

array([1, 2, 3])

In [20]:
arr[2:]

array([3, 4, 5])

In [21]:
arr[-3:]

array([3, 4, 5])

## Slicing Multi-dimensional Arrays

In [24]:
arr_2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

arr_2d[:2, :2]

array([[1, 2],
       [4, 5]])

In [25]:
# Get a vector out of matrices
arr_2d[1:, 1]

array([5, 8])

## Slicing with Step Size

You can specify a step size in slicing to skip elements in the slice. The step size determines the increment between each element in the slice. Let's consider an example:

In [26]:
arr = np.array([1, 2, 3, 4, 5])
arr[::2]

array([1, 3, 5])

In [27]:
arr[::-1]

array([5, 4, 3, 2, 1])

In [28]:
arr_2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
arr_2d[::2, ::2]

array([[1, 3],
       [7, 9]])

It selects every other row and every other column from the array.

Slicing provides a flexible and efficient way to extract subsets of elements from NumPy arrays. It allows you to select contiguous sections of an array, skip elements using step size, and even reverse the order of elements.

In the next section, we'll explore advanced indexing techniques, such as Boolean indexing and fancy indexing, which offer even more powerful ways to select elements from arrays based on conditions and arbitrary index arrays.

## Advanced Indexing Techniques
In addition to basic indexing and slicing, NumPy provides advanced indexing techniques that allow you to select elements from arrays based on conditions and arbitrary index arrays. In this section, we'll explore Boolean indexing, fancy indexing, and how to combine indexing and slicing.

### Boolean Indexing
Boolean indexing allows you to select elements from an array based on a Boolean condition. You can create a boolean mask array of the same shape as the original array, where each element is either True or False. When you use this **Boolean** mask to index the array, only the elements corresponding to True values are selected.

In [None]:
arr = np.array([1, 2, 3, 4, 5])
mask = np.array([True, False, False, False, True])
result = arr[mask]
result

# arr0 = np.array([1, 2, 3, 4, 5])
# mask0 = np.array([1,0,0,1,1])
# result0 = arr0[mask0]
# result0

array([2, 1, 1, 2, 2])

In [15]:
arr = np.array([1, 2, 3, 4, 5])
arr > 3

# result = arr[arr > 3]
# result

array([False, False, False,  True,  True])

## Fancy Indexing
Fancy indexing allows you to select elements from an array using an array of indices. You can provide an array of integers specifying the indices you want to select.

In [45]:
arr = np.array([10, 20, 30, 40, 50])
indices = np.array([1, 0, 4])
result = arr[indices]
result

array([20, 10, 50])

In [16]:
# A Strang Unique Way of slicing and addressing the elements in ndarray

arr_2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
row_indices = np.array([0, 1, 2])
col_indices = np.array([1, 2, 0])
result = arr_2d[row_indices, col_indices]
result

array([2, 6, 7])

## Combining Indexing and Slicing
You can combine indexing and slicing techniques to select specific subsets of an array. This allows you to use basic indexing, slicing, boolean indexing, and fancy indexing together to extract the desired elements.

In [19]:
arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
result1 = arr[1:, [0, 2]]
result1

array([[4, 6],
       [7, 9]])

In [32]:
arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
result2 = arr[arr[:, 1] > 3, 1:]
result2

array([[5, 6],
       [8, 9]])

Here, `arr[:, 1] > 3` creates a boolean mask that selects rows where the second column is greater than 3, and 1: selects the second and third columns.

Combining indexing and slicing techniques provides flexibility in selecting specific subsets of an array based on various conditions and criteria.

Advanced indexing techniques, such as boolean indexing and fancy indexing, along with the ability to combine them with basic indexing and slicing, offer powerful ways to select and manipulate elements in NumPy arrays.

In the next section, we'll explore how to modify arrays using indexing and slicing, allowing you to update specific elements or subsets of an array.

## Modifying Arrays using Indexing and Slicing
Indexing and slicing not only allow you to access elements from arrays but also provide a way to modify specific elements or subsets of an array. In this section, we'll explore how to assign values to array elements and modify slices of an array.

### Assigning Values to Array Elements
You can use indexing to assign new values to specific elements of an array. By specifying the index or indices of the elements you want to modify, you can update their values.

In [33]:
arr = np.array([1, 2, 3, 4, 5])
arr[2] = 10
arr

array([ 1,  2, 10,  4,  5])

In [21]:
arr = np.array([1, 2, 3, 4, 5])
indices = np.array([0, 2, 4])
arr[indices]
arr[indices] = 0
arr

array([0, 2, 0, 4, 0])

## Modifying Slices of an Array
Slicing allows you to modify entire subsets of an array at once. You can assign new values to a slice of an array, and the changes will be reflected in the original array.

In [35]:
arr = np.array([1, 2, 3, 4, 5])
arr[1:4] = 0
arr

array([1, 0, 0, 0, 5])

In [36]:
arr = np.array([1, 2, 3, 4, 5])
arr[1:4] = [10, 20, 30]
arr

array([ 1, 10, 20, 30,  5])

Here, we assign the array [10, 20, 30] to the slice arr[1:4], replacing the corresponding elements in the original array.

When modifying slices of an array, keep in mind that the changes affect the original array. If you want to modify a slice without changing the original array, you need to create a copy of the slice before making the modifications.

Modifying arrays using indexing and slicing provides a convenient way to update specific elements or subsets of an array. It allows you to change values, replace sections of an array, and perform in-place modifications efficiently. We will now explore the concept of views and copies in NumPy arrays when using slicing.

## Views vs. Copies in Slicing
When you slice a NumPy array, it's important to understand whether the slicing operation returns a view or a copy of the original array. The behavior depends on how the slicing is performed.

### Views
In most cases, slicing a NumPy array returns a view of the original array. A view is a new array object that shares the same underlying data as the original array. Any modifications made to the view will affect the original array, and vice versa. Views are useful when you want to work with a subset of the array without copying the data.

In [22]:
arr = np.array([1, 2, 3, 4, 5])
view = arr[1:4]
view[0] = 10
view

array([10,  3,  4])

In [38]:
arr

array([ 1, 10,  3,  4,  5])

### Copies
In some cases, slicing a NumPy array returns a copy of the original array. A copy is a new array object with its own separate copy of the data. Modifying the copy does not affect the original array, and vice versa. Copies are useful when you want to work with a subset of the array independently, without modifying the original data.

Here are a few scenarios where slicing returns a copy:

+ When you use an advanced indexing operation, such as Boolean indexing or fancy indexing (which we'll cover in the next section), the result is always a copy.

+ When you use a non-contiguous slice, such as `arr[::2]` or `arr[::-1]`, the result is a copy.

+ When you explicitly request a copy using the `copy()` method, like `arr[1:4].copy()`.

In [39]:
arr = np.array([1, 2, 3, 4, 5])
copy = arr[::2].copy()
copy[0] = 10
copy

array([10,  3,  5])

In [40]:
arr

array([1, 2, 3, 4, 5])

## Memory Sharing Check
NumPy provides two functions to check if two arrays share the same memory:

`np.may_share_memory(a, b)`: This function returns True if the arrays a and b might share memory, and False if they definitely do not share memory. It performs a quick check based on the arrays' memory addresses and dimensions.

`np.shares_memory(a, b)`: This function returns True if the arrays a and b actually share memory, and False otherwise. It performs a more thorough check by comparing the memory addresses of the arrays' underlying data buffers.

In [41]:
a = np.array([1, 2, 3, 4, 5])
b = a[2:4]
c = a.copy()
np.may_share_memory(a, b)

True

In [42]:
np.shares_memory(a, b)

True

In [43]:
np.may_share_memory(a, c)

False

In [44]:
np.shares_memory(a, c)

False

In this example, b is a view of a, so they share the same memory. Both np.may_share_memory(a, b) and np.shares_memory(a, b) return True.

On the other hand, c is a copy of a, so they do not share memory. Both np.may_share_memory(a, c) and np.shares_memory(a, c) return False.

Using these functions can be helpful when you want to determine whether modifying one array will affect another array that shares the same memory.

## When to Use Views vs. Copies
The choice between using views or copies depends on your specific requirements:

Use views when you want to work with a subset of the array and any modifications should be reflected in the original array. Views are more memory-efficient since they don't create a new copy of the data.

Use copies when you want to work with a subset of the array independently, without modifying the original data. Copies ensure that the original array remains unchanged.

It's important to be aware of the difference between views and copies to avoid unintended modifications to the original array. If you're unsure whether a slicing operation returns a view or a copy, you can use the base attribute to check. If view.base is None, it means view is a copy. If view.base is not None, it means view is a view of the original array.

Understanding the behavior of views and copies in slicing helps you write more precise and predictable code when working with NumPy arrays.

## Best Practices and Common Pitfalls
When working with indexing and slicing in NumPy, there are certain best practices to follow and common pitfalls to avoid. In this section, we'll discuss how to avoid out-of-bounds errors, understand the behavior of views vs. copies, and consider performance implications.

### Avoiding Out-of-Bounds Errors
One common pitfall when using indexing and slicing is trying to access elements that are outside the bounds of the array. This can lead to IndexError exceptions. To avoid out-of-bounds errors, make sure to:

Use valid indices within the range of the array dimensions.
Be cautious when using negative indices, ensuring they are within the valid range.
Be mindful of the array shape when slicing, especially when using multi-dimensional arrays.
Here's an example of an out-of-bounds error:

In [46]:
arr = np.array([1, 2, 3, 4, 5])
arr[10]  # Raises IndexError: index 10 is out of bounds for axis 0 with size 5

IndexError: index 10 is out of bounds for axis 0 with size 5

To avoid such errors, you can use techniques like:

Checking the array shape using the shape attribute before accessing elements.
Using conditional statements to ensure indices are within valid ranges.
Handling exceptions using try-except blocks when necessary.

## Understanding View vs. Copy Behavior
As discussed earlier, slicing can return either a view or a copy of the original array, depending on how the slicing is performed. It's important to understand this behavior to avoid unintended modifications to the original array.

Here are some best practices:

Be aware that basic slicing (e.g., arr[start:end]) typically returns a view, while advanced indexing (e.g., boolean indexing, fancy indexing) returns a copy.
If you want to ensure that modifications to the sliced array do not affect the original array, explicitly create a copy using the copy() method.
Use views when you want to work with a subset of the array and have modifications reflected in the original array, as views are more memory-efficient.

In [47]:
arr = np.array([1, 2, 3, 4, 5])
sliced_arr = arr[1:4].copy()
sliced_arr[0] = 10
arr  # Original array remains unchanged

array([1, 2, 3, 4, 5])

By creating an explicit copy using copy(), modifications to sliced_arr do not affect the original array arr.

## Performance Considerations
Indexing and slicing can have performance implications, especially when working with large arrays. Here are a few performance considerations to keep in mind:

Accessing elements using basic indexing and slicing is generally fast, as NumPy arrays are stored in contiguous memory blocks.
Advanced indexing techniques, such as boolean indexing and fancy indexing, can be slower compared to basic indexing and slicing because they involve creating new arrays.
When possible, use basic slicing instead of advanced indexing for better performance.
Be mindful of the size of the arrays you are working with. Slicing large arrays can create new large arrays, consuming memory.
If you need to perform repeated operations on a subset of an array, consider extracting that subset into a separate array to avoid repeated slicing.

In [None]:
arr = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
subset = arr[1:, 1:]  # Extract the subset into a separate array
result = subset ** 2  # Perform operations on the subset

## NumPy Data Types

- Numeric types: integers, floating-point numbers, and complex numbers.
- Boolean type: represents either True or False.
- String type: fixed-length strings.
- Structured types: allows for creating compound data types similar to C structures.

### Integer Types
NumPy offers several integer data types with different sizes and ranges. The most commonly used integer types are:

+ int8: 8-bit signed integer (-128 to 127)
+ int16: 16-bit signed integer (-32,768 to 32,767)
+ int32: 32-bit signed integer (-2,147,483,648 to 2,147,483,647)
+ int64: 64-bit signed integer (-9,223,372,036,854,775,808 to 9,223,372,036,854,775,807)

NumPy also provides unsigned integer types, which have the same sizes as their signed counterparts but can only represent <b>non-negative</b> values:

+ uint8: 8-bit unsigned integer (0 to 255)
+ uint16: 16-bit unsigned integer (0 to 65,535)
+ uint32: 32-bit unsigned integer (0 to 4,294,967,295)
+ uint64: 64-bit unsigned integer (0 to 18,446,744,073,709,551,615)

In [23]:
arr_int32 = np.array([1, 2, 3], dtype=np.int32)  # 'i4' and 'int32' are equivalent
arr_int32

array([1, 2, 3], dtype=int32)

In [24]:
arr_uint8 = np.array([1, 2, 3], dtype=np.uint8)  # 'u1' and 'uint8' are equivalent
arr_uint8

array([1, 2, 3], dtype=uint8)

### Floating-Point Types
Floating-point types are used to represent real numbers with decimal points. NumPy provides two main floating-point types:

+ float32: 32-bit single-precision floating-point number
+ float64: 64-bit double-precision floating-point number (default)

The float32 type has a precision of about 7 decimal digits, while float64 has a precision of about 15 decimal digits. The choice between float32 and float64 depends on the required precision and the memory constraints of your application.

It's important to note that floating-point arithmetic can introduce small errors due to the finite precision of the representation. These errors can accumulate over multiple operations, leading to numerical instabilities or inaccuracies in some cases.

In [25]:
arr_float32 = np.array([1.0, 2.0, 3.0], dtype=np.float32)
arr_float32

array([1., 2., 3.], dtype=float32)

In [26]:
arr_float64 = np.array([1.0, 2.0, 3.0], dtype=np.float64)
arr_float64

array([1., 2., 3.])

### Complex Types

NumPy provides two complex number types:

+ complex64: 64-bit complex number, consisting of two 32-bit floating-point numbers (real and imaginary parts)
+ complex128: 128-bit complex number, consisting of two 64-bit floating-point numbers (real and imaginary parts)

Complex numbers are useful in various scientific and engineering applications, such as signal processing, quantum mechanics, and Fourier analysis.

In [27]:
arr_complex64 = np.array([1+2j, 3+4j], dtype=np.complex64)
arr_complex64

array([1.+2.j, 3.+4.j], dtype=complex64)

In [28]:
arr_complex128 = np.array([1+2j, 3+4j], dtype=np.complex128)
arr_complex128

array([1.+2.j, 3.+4.j])

### Boolean Data Type

NumPy provides a boolean data type that represents logical values of either True or False. Boolean arrays are commonly used for conditional operations, filtering, and masking in NumPy.

Boolean arrays in NumPy are arrays that contain only boolean values (True or False). They are typically the result of comparison operations or logical operations on numeric arrays.

To create a boolean array, you can use comparison operators such as ==, !=, <, >, <=, >=, or logical operators such as & (and), | (or), and ~ (not).

In [29]:
np.array([True, False, True, True])

array([ True, False,  True,  True])

In [30]:
np.array([1, 2, 3, 0, -1], dtype=np.bool_)

array([ True,  True,  True, False,  True])

In [31]:
arr = np.array([1, 2, 3, 4, 5])

In [32]:
arr > 3

array([False, False, False,  True,  True])

In [33]:
arr == 2

array([False,  True, False, False, False])

In [34]:
(arr > 1) & (arr < 5)

array([False,  True,  True,  True, False])

#### Boolean Indexing and Masking

Boolean indexing and masking are powerful techniques in NumPy that allow you to select elements from an array based on boolean conditions. This is particularly useful when you want to filter or extract specific elements that satisfy a certain criteria.

To perform boolean indexing, you can use a boolean array as an index to select elements from the original array. The resulting array will contain only the elements corresponding to the True values in the boolean array.

In [35]:
arr = np.array([1, 2, 3, 4, 5])
bool_arr = arr > 3
bool_arr

array([False, False, False,  True,  True])

In [36]:
arr[bool_arr]

array([4, 5])

Boolean masking is similar to boolean indexing but is used to assign values to specific elements of an array based on a boolean condition.

In [37]:
arr = np.array([1, 2, 3, 4, 5])
bool_mask = arr < 4
bool_mask

array([ True,  True,  True, False, False])

In [38]:
arr[bool_mask] = 0

In [39]:
arr

array([0, 0, 0, 4, 5])

<b>Boolean indexing and masking are incredibly useful for data manipulation and analysis tasks, such as filtering outliers, selecting specific subsets of data, or applying conditional transformations to arrays.</b>

### String Data Type
NumPy provides a string data type to represent fixed-length strings in arrays. Although NumPy is primarily used for numerical computations, string arrays can be useful in certain scenarios, such as data preprocessing or working with categorical data

In NumPy, there are two string data types: 'S' and 'U'. While both are used to represent string data, they have some important differences.

1. 'S' Data Type (Byte Strings):
   
+ The 'S' data type represents fixed-length byte strings.
+ It is specified as 'S', where '' is the number of bytes per string element.
+ Byte strings are stored as raw bytes and do not have a specific encoding.
+ They can contain arbitrary binary data, including non-printable characters.
+ Byte strings are more memory-efficient compared to Unicode strings.

In [40]:
arr = np.array(['Hello', 'World'], dtype='S10')
arr

array([b'Hello', b'World'], dtype='|S10')

2. 'U' Data Type (Unicode Strings):
   
+ The 'U' data type represents fixed-length Unicode strings.
+ It is specified as 'U', where '' is the number of characters (not bytes) per string element.
+ Unicode strings are stored using a specific character encoding (default is UTF-32).
+ They can represent a wide range of characters from various scripts and languages.
+ Unicode strings require more memory compared to byte strings.

In [41]:
arr = np.array(['Hello', 'World'], dtype='U10')
arr

array(['Hello', 'World'], dtype='<U10')

The main differences between 'S' and 'U' data types are:

+ 'S' represents byte strings, while 'U' represents Unicode strings.
+ 'S' is specified in terms of bytes, while 'U' is specified in terms of characters.
+ 'S' can contain arbitrary binary data, while 'U' is designed to store text data.
+ 'S' is more memory-efficient, while 'U' can handle a wider range of characters.

#### When choosing between 'S' and 'U', consider the following:

+ If you are working with ASCII text or raw binary data, use the 'S' data type.
+ If you need to handle non-ASCII characters or multilingual text, use the 'U' data type.
+ Be aware of the memory implications, as Unicode strings require more memory than byte strings.

In [42]:
arr = np.array(['apple', 'banana', 'cherry'], dtype='S10')
arr

array([b'apple', b'banana', b'cherry'], dtype='|S10')

#### String Operations
NumPy provides several functions and methods to perform operations on string arrays. Here are a few commonly used string operations:

1. Comparison operations: You can use comparison operators such as ==, !=, <, >, <=, >= to compare string arrays element-wise.

In [43]:
arr = np.array(['apple', 'banana', 'cherry'])
arr == 'banana'

array([False,  True, False])

In [44]:
arr < 'cherry'

array([ True,  True, False])

2. Concatenation: You can concatenate string arrays using the np.char.add() function

In [45]:
arr1 = np.array(['apple', 'banana'])
arr2 = np.array(['pie', 'split'])

In [46]:
np.char.add(arr1, arr2)

array(['applepie', 'bananasplit'], dtype='<U11')

3. Substring search: You can check if a substring exists in each element of a string array using the `np.char.find()` function.

In [47]:
arr = np.array(['apple', 'banana', 'cherry'])

np.char.find(arr, 'a')

array([ 0,  1, -1])

The `np.char.find()` function returns the index of the first occurrence of the substring in each element. If the substring is not found, it returns -1.

4. String manipulation: NumPy provides various functions for string manipulation, such as `np.char.upper()`, `np.char.lower()`, `np.char.title()`, `np.char.strip()`, etc.

In [3]:
arr = np.array(['apple', 'BANANA', 'Cherry'])

np.char.upper(arr)

array(['APPLE', 'BANANA', 'CHERRY'], dtype='<U6')

In [4]:
np.char.title(arr)

array(['Apple', 'Banana', 'Cherry'], dtype='<U6')

### Object Data Type

In NumPy, np.object (or numpy.object_) is a data type that allows arrays to hold objects of arbitrary Python types. This is particularly useful when dealing with heterogeneous data, where the elements of the array are not of the same type or when the elements are themselves Python objects.

Key Points:

1. Flexibility:

+ np.object arrays can store any type of Python object, including lists, tuples, dictionaries, custom objects, etc.
+ This is in contrast to other NumPy data types (like np.int32, np.float64, etc.) which are designed to hold elements of a specific type for efficient numerical computation.

2. Performance:

+ While np.object provides flexibility, it comes at the cost of performance. Operations on np.object arrays are generally slower compared to arrays of numerical types because the elements are Python objects and not raw numbers.

3. Memory Usage:

+ np.object arrays may use more memory since each element is a reference to a Python object, which includes additional overhead.
  
4. Usage:

+ np.object is often used when you need to store a mix of data types or when you need to include complex objects within a NumPy array.

In [5]:
# Creating an array with np.object type
obj_array = np.array([1, "apple", [3, 4, 5], {"key": "value"}], dtype=np.object_)
obj_array

array([1, 'apple', list([3, 4, 5]), {'key': 'value'}], dtype=object)

In [6]:
obj_array[2]

[3, 4, 5]

#### When to Use:

Use np.object when you need to store non-uniform data.
It's useful for quick prototyping or when dealing with data that inherently contains mixed types.
For numerical computations, prefer using fixed-type arrays for better performance and efficiency.

## Structured Data Types
NumPy allows you to create structured arrays, which are arrays with multiple fields of different data types. Structured arrays are similar to C structures or SQL database tables, where each element of the array can contain multiple named fields with different data types.

### Creating Structured Arrays
To create a structured array in NumPy, you need to define a <b>data type that specifies the names and data types of the fields</b>. You can define the data type using a list of tuples, where each tuple contains the field name and its corresponding data type.

Structured arrays are useful for organizing and manipulating complex data sets in NumPy. They provide a way to store and access data in a tabular format, similar to a database table or a spreadsheet. For example, consider titanic dataset where each row represents a passenger and columns represent different attributes like age, sex, fare, etc. In such cases, structured arrays can be used to represent the data in a structured format.

In [7]:
data_type = [('name', 'S10'), ('age', 'i4'), ('height', 'f8')]
arr = np.array([('John', 25, 1.8), ('Alice', 30, 1.6), ('Bob', 20, 1.7)], dtype=data_type)
arr

array([(b'John', 25, 1.8), (b'Alice', 30, 1.6), (b'Bob', 20, 1.7)],
      dtype=[('name', 'S10'), ('age', '<i4'), ('height', '<f8')])

In this example, we define a data type data_type that consists of three fields: 'name' (a string of maximum length 10), 'age' (a 32-bit integer), and 'height' (a 64-bit floating-point number).

We then create a structured array arr using np.array() and specify the dtype parameter as data_type. The array is initialized with three elements, each containing the values for the respective fields.

In [8]:
np.array([(1, 2.5, 'hello'), (2, 3.7, 'world')], dtype=[('id', 'i4'), ('value', 'f4'), ('label', 'S10')])


array([(1, 2.5, b'hello'), (2, 3.7, b'world')],
      dtype=[('id', '<i4'), ('value', '<f4'), ('label', 'S10')])

#### Accessing Fields in Structured Arrays
Once you have created a structured array, you can access individual fields of the array using either the field names or field indices.

To access a field by its name, you can use the dot notation: (works like dictionary in Regular Python!)

In [9]:
data_type = [('name', 'S10'), ('age', 'i4'), ('height', 'f8')]
arr = np.array([('John', 25, 1.8), ('Alice', 30, 1.6), ('Bob', 20, 1.7)], dtype=data_type)

In [10]:
arr['name']

array([b'John', b'Alice', b'Bob'], dtype='|S10')

In [11]:
arr['age']

array([25, 30, 20], dtype=int32)

##### Access to the record , observation, or a row! using indexing

In [13]:
arr[1]

np.void((b'Alice', 30, 1.6), dtype=[('name', 'S10'), ('age', '<i4'), ('height', '<f8')])

## Casting Data Types
In NumPy, you can cast arrays from one data type to another. Casting data types is useful when you need to convert an array to a different data type to perform certain operations or to save memory.

NumPy provides two main ways to cast data types: implicit type casting and explicit type casting.

### Implicit Type Casting
Implicit type casting, also known as upcasting, occurs automatically when NumPy performs an operation between arrays with different data types. NumPy promotes the data type of the resulting array to a type that can accommodate all possible values without losing precision.

Here are the rules for implicit type casting in NumPy:

1. When an operation involves arrays of the same data type, the resulting array maintains the same data type.
2. When an operation involves arrays of different data types, NumPy promotes the data type to a higher precision or a more general type that can represent all values.

The type promotion hierarchy in NumPy is as follows:

`bool -> int -> float -> complex -> str`

In [14]:
arr_int = np.array([1, 2, 3])
arr_float = np.array([1.5, 2.5, 3.5])

In [15]:
arr_int + arr_float


array([2.5, 4.5, 6.5])

### Explicit Type Casting

Explicit type casting, also known as type conversion or downcasting, allows you to explicitly convert an array from one data type to another using the astype() method or by specifying the desired data type during array creation.

To cast an array to a different data type using the astype() method, you can pass the desired data type as an argument:

In [16]:
arr_float = np.array([1.5, 2.5, 3.5])

In [19]:
arr_float.astype(np.int32)

array([1, 2, 3], dtype=int32)

In [20]:
arr_float.astype(str)

array(['1.5', '2.5', '3.5'], dtype='<U32')

You can also specify the desired data type during array creation using the dtype parameter:



In [21]:
np.array([1.5, 2.5, 3.5], dtype=int)

array([1, 2, 3])

## Memory Considerations

Each data type in NumPy has a specific size in bytes. The memory usage of an array depends on the number of elements in the array and the size of each element's data type.

Here are the sizes of some common NumPy data types:

+ bool: 1 byte
+ int8, uint8: 1 byte
+ int16, uint16: 2 bytes
+ int32, uint32, float32: 4 bytes
+ int64, uint64, float64 (default): 8 bytes
+ complex64: 8 bytes (4 bytes for real part, 4 bytes for imaginary part)
+ complex128: 16 bytes (8 bytes for real part, 8 bytes for imaginary part)

To calculate the memory usage of an array, you can multiply the number of elements in the array by the size of the data type. For example, an array of 1 million float64 elements would consume approximately 8 MB of memory (1,000,000 * 8 bytes).

You can check the memory usage of an array using the `nbytes` attribute:

In [22]:
arr = np.zeros((1000, 1000), dtype=np.float64)  # 8000000 (8 MB)
arr.nbytes

8000000

### Choosing the Right Data Type
Choosing the appropriate data type is crucial for optimizing memory usage and performance in NumPy. Here are some guidelines to help you choose the right data type:

1. Precision: Consider the required precision for your data. If you don't need high precision, you can use smaller data types like float32 instead of float64, or int16 instead of int32. Smaller data types consume less memory.

2. Range: Ensure that the data type you choose can accommodate the range of values in your data. For example, if your data contains integer values between -128 and 127, you can use int8 instead of int32 to save memory.

3. Memory constraints: If you are working with large datasets and have limited memory resources, consider using smaller data types or even boolean arrays when applicable. For example, if you have a large array of integer values that can be represented as 0 or 1, using a boolean array (bool) instead of an integer array can significantly reduce memory usage.

4. Compatibility: Consider the data types required by the libraries or functions you are using. Some libraries may expect specific data types, and using incompatible data types can lead to errors or unexpected behavior.

5. Performance: In some cases, using larger data types can lead to better performance due to hardware optimization. For example, using float64 instead of float32 may be faster on certain architectures. However, this depends on the specific hardware and the nature of the computations being performed.

## Best Practices and Tips
When working with NumPy data types, following best practices and tips can help you write more efficient and maintainable code. In this section, we'll discuss two important best practices: specifying data types explicitly and avoiding unnecessary type casting.

### Specifying Data Types Explicitly
One of the best practices when creating NumPy arrays is to specify the data type explicitly using the dtype parameter. By explicitly specifying the data type, you can ensure that your arrays have the desired type from the beginning, avoiding potential issues and unexpected behavior.

Here are a few reasons why specifying data types explicitly is beneficial:

1. Memory usage: By specifying the data type explicitly, you have control over the memory usage of your arrays. You can choose the appropriate data type that fits your data and memory constraints, avoiding unnecessary memory consumption.

2. Performance: Specifying the data type explicitly can lead to better performance in certain cases. NumPy can optimize computations based on the known data type, leading to faster execution times.

3. Consistency: When you specify the data type explicitly, you ensure that your arrays have a consistent data type throughout your codebase. This can prevent subtle bugs and make your code more maintainable.

In [None]:
# Specifying data type explicitly
arr1 = np.array([1, 2, 3], dtype=np.int32)
arr2 = np.array([1.5, 2.5, 3.5], dtype=np.float64)

## Rules of Broadcasting
NumPy follows a set of rules to determine if broadcasting is possible between two arrays. These rules define how arrays with different shapes can be used in arithmetic operations. Let's explore each rule in detail.

When operating on two arrays, NumPy compares their shapes element-wise. It starts with the trailing (i.e. rightmost) dimension and works its way left. Two dimensions are compatible when:

+ They are equal
+ One of them is 1

If these conditions are not met, a ValueError: operands could not be broadcast together exception is thrown, indicating that the arrays have incompatible shapes.

Input arrays do not need to have the same number of dimensions. The resulting array will have the same number of dimensions as the input array with the greatest number of dimensions, where the size of each dimension is the largest size of the corresponding dimension among the input arrays. Note that missing dimensions are assumed to have size one.

- The first rule of broadcasting states that if the arrays have different numbers of dimensions, the shape of the array with fewer dimensions is padded with ones on its leading (left) side.

For example, consider an array A with shape (3, 4) and an array B with shape (4,):

`A.shape = (3, 4)`
`B.shape = (4,)`
To perform an operation between A and B, NumPy will pad the shape of B with a leading dimension of size 1:

`B.shape = (1, 4)`
After padding, the shapes of A and B are compatible for broadcasting.

In [23]:
a = np.random.rand(2, 3, 3)
b = np.random.rand(3)  # 1 1 3

(a * b).shape

(2, 3, 3)

In [24]:
a = np.random.rand(2, 3, 3)
b = np.random.rand(3, 3)

(a * b).shape

(2, 3, 3)

In [25]:
# Raises ValueError because the last dimension of a is not the same as the size of b
a = np.random.rand(4, 3)
b = np.random.rand(4)

(a * b).shape

ValueError: operands could not be broadcast together with shapes (4,3) (4,) 

- The second rule of broadcasting states that if one of the arrays has a dimension size of 1, it can be stretched to match the size of the corresponding dimension in the other array.

Let's consider an example where we have an array A with shape (3, 4) and a scalar value s:

A.shape = (3, 4)
s = 10
When performing an operation between A and s, NumPy will stretch the scalar value s to match the shape of A. The scalar value will be broadcasted to all elements of A.

Mathematically, this can be represented as:


where i ranges from 0 to 2 and j ranges from 0 to 3.

- The third rule of broadcasting states that if the arrays have the same number of dimensions and the size of any dimension is 1, that dimension can be stretched to match the size of the corresponding dimension in the other array.

Consider an example where we have an array A with shape (3, 4) and an array B with shape (3, 1):

A.shape = (3, 4)
B.shape = (3, 1)
When performing an operation between A and B, NumPy will stretch the second dimension of B to match the size of the second dimension of A.

In [None]:
a= np.array([1,3,5])
a.shape # or [1, 3]

(3,)

In [27]:
b = np.array([[1], [3], [5]])
b.shape

(3, 1)

### Vectorization

Vectorization is a fundamental concept in NumPy that allows you to perform operations on arrays efficiently without the need for explicit loops. It is a powerful technique that can significantly improve the performance and readability of your code when working with large datasets.

<b>Vectorization</b> refers to the process of converting scalar operations into vector operations. In other words, it involves performing operations on entire arrays or vectors instead of individual elements.

The secret behind NumPy's efficient vectorization lies in its optimized C implementation. When you perform a vectorized operation in NumPy, the actual computation is carried out by pre-compiled C code that is highly optimized for performance.

NumPy's C implementation takes advantage of several low-level optimizations:

1. Contiguous Memory Layout: NumPy arrays are stored in contiguous memory blocks, allowing for efficient memory access and cache utilization. This contiguous layout enables faster read and write operations compared to non-contiguous data structures like Python lists.

2. SIMD Instructions: Modern CPUs support Single Instruction, Multiple Data (SIMD) instructions, which allow multiple data elements to be processed simultaneously. NumPy's C implementation leverages SIMD instructions to perform operations on multiple array elements in parallel, significantly speeding up computations.

3. Loop Unrolling: Loop unrolling is a technique where the compiler optimizes loops by duplicating the loop body multiple times, reducing the overhead of loop control statements. NumPy's C implementation employs loop unrolling to minimize the overhead of iterating over array elements.

4. Caching and Memory Management: NumPy's C implementation is designed to efficiently utilize CPU caches and minimize memory transfers. It employs techniques like data locality optimization and cache-friendly algorithms to ensure optimal performance.

By leveraging these low-level optimizations, NumPy's vectorized operations can achieve significant speedups compared to pure Python implementations. The C code is pre-compiled and optimized for the specific hardware architecture, allowing NumPy to take full advantage of the available computing resources.

It's important to note that while vectorization is highly efficient, it may not always be the best approach for every problem. In some cases, especially when dealing with complex or non-vectorizable operations, explicit loops or other techniques may be more appropriate. However, for most common numerical computations, vectorization provides a powerful and efficient solution.

### Writing Vectorized Code

When working with NumPy, writing vectorized code is essential for achieving optimal performance and readability. In this section, we'll explore some tips and best practices for writing vectorized code and walk through examples of vectorizing a function and using `np.vectorize()`.

#### Tips and Best Practices
Here are some tips and best practices to keep in mind when writing vectorized code in NumPy:

1. Think in terms of arrays: Approach problems from an array-oriented perspective. Consider how you can express operations on entire arrays rather than individual elements.
   
2. Leverage NumPy functions and operators: NumPy provides a wide range of functions and operators that operate on arrays. Familiarize yourself with these functions and use them instead of writing explicit loops.
   
3. Avoid Python loops: Whenever possible, try to replace Python loops with vectorized operations. Vectorized code is more concise and performs better than loop-based code.
   
4. Use broadcasting: Broadcasting allows arrays with different shapes to be used in arithmetic operations. Leverage broadcasting to perform operations between arrays of different sizes without the need for explicit reshaping.
   
5. Vectorize computations with `np.vectorize()`: If you have a custom function that operates on scalar values, you can use `np.vectorize()` to create a vectorized version of the function that can be applied to arrays.
   
6. Profile and optimize: When working with large datasets or complex operations, profile your code to identify performance bottlenecks. Use profiling tools like %timeit in Jupyter Notebook or cProfile to measure the execution time of different parts of your code and optimize accordingly.
 
8. Continuously learn and explore: NumPy is a vast library with many features and optimizations. Keep learning and exploring new techniques and functions to improve your vectorized code.