<img src="LaeCodes.png" 
     align="center" 
     width="100" />

# Introduction to NumPy 

NumPy, short for Numerical Python, is a fundamental library for numerical computing in Python. It is widely used in data science for its powerful array-processing capabilities, which allow for efficient manipulation and operations on large datasets. NumPy provides support for large multi-dimensional arrays and matrices, along with a collection of mathematical functions to operate on these arrays. 

**Key Features of NumPy**

- **Ndarray (N-Dimensional Array):** 
    - The core of NumPy is the ndarray object, which is a multi-dimensional array of homogeneous data types (elements are of the same type). 
    - It supports arrays of arbitrary dimensions (1D, 2D, 3D, etc.), making it highly versatile for various types of data. 

- **Mathematical Functions:**
    - NumPy includes a wide range of mathematical functions for operations on arrays, including basic arithmetic operations, statistical functions, linear algebra operations, and more. 

- **Vectorization:**
    - NumPy allows vectorized operations, which means you can perform element-wise operations on entire arrays without the need for explicit loops, leading to more concise and faster code. 

- **Broadcasting:**
    - Broadcasting is a powerful mechanism that allows NumPy to perform operations on arrays of different shapes by automatically expanding their dimensions to be compatible. 

- **Integration with Other Libraries:** 
    - NumPy integrates seamlessly with other scientific and data analysis libraries such as SciPy, pandas, matplotlib, and scikit-learn. 

**Common Uses of NumPy in Data Science**

- **Data Manipulation:**
    - Creating, reshaping, and slicing arrays for data manipulation. 
    - Efficiently handling large datasets due to its optimized performance. 

- **Mathematical Computations:** 
    - Performing complex mathematical operations on arrays, such as matrix multiplication, dot product, and element-wise operations. 

- **Statistical Analysis:** 
    - Computing statistical measures like mean, median, variance, and standard deviation on datasets. 

- **Data Preparation:**
    - Preprocessing data, including normalization, scaling, and transforming data into a suitable format for machine learning algorithms. 

- **Integration with Machine Learning:** 
    - Serving as a foundational library for machine learning frameworks and tools, aiding in the implementation of algorithms and models. 

- **NumPy vs. Python Lists:** 
    - NumPy arrays are more efficient, faster, and provide more functionality compared to Python lists. 

**Creating NumPy Arrays:** 

- From Lists:

In [1]:
import numpy as np
arr = np.array([1, 2, 3, 4, 5])

- Using NumPy Functions:

In [2]:
np.zeros((3, 3))  # Creates a 3x3 array of zeros
np.ones((2, 4))   # Creates a 2x4 array of ones
np.arange(10)     # Creates an array with values from 0 to 9
np.linspace(1, 5, 10)  # Creates an array with 10 values between 1 and 5

array([1.        , 1.44444444, 1.88888889, 2.33333333, 2.77777778,
       3.22222222, 3.66666667, 4.11111111, 4.55555556, 5.        ])

- **Array Attributes:**

    - shape: Returns the dimensions of the array.
    - dtype: Returns the data type of the array elements. 
    - ndim: Returns the number of dimensions. 
    - size: Returns the total number of elements. 

- **Array Indexing and Slicing:**
    - Accessing elements using indices:

In [3]:
arr = np.array([1, 2, 3, 4, 5])
arr[0]
arr[-1]

5

- Slicing arrays:

In [4]:
arr[1:3]
arr[:3]
arr[::2]

array([1, 3, 5])

**Indexing with Arrays and Boolean Arrays:**
- Using arrays to index:

In [5]:
arr = np.array([1, 2, 3, 4, 5])
idx = np.array([0, 2, 4])
arr[idx]

array([1, 3, 5])

- Using Boolean arrays:

In [6]:
arr = [arr > 2] #Returns array ([3, 4, 5])

**Fancy Indexing:**
- Using arrays of indices to access multiple array elements:

In [7]:
arr = np.array([10, 20, 30, 40, 50])
idx = [0, 2, 3]
arr[idx]

array([10, 30, 40])

**Modifying Array Values through Indexing:**
- Assigning values to specific positions:

In [8]:
arr = np.array([1, 2, 3, 4, 5])
arr[0] = 10
arr[1:3] = [20, 30]

**Basic Arithmetic Operations:**
- Element-wise operations

In [9]:
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
arr1 + arr2 #Returns array ([5, 7, 9])
arr1 * arr2 #Returns array ([4, 10, 18])

array([ 4, 10, 18])

**Universal Functions (ufuncs):**
- Applying functions element-wise:

In [10]:
np.sqrt(arr1) #Returns array([1.0, 1.41, 1.73])
np.exp(arr1) #Returns array([2.71, 7.39, 20.08])

array([ 2.71828183,  7.3890561 , 20.08553692])

**Aggregation Functions:**
- Computing summary statistics:

In [11]:
arr = np.array([1, 2, 3, 4, 5])
arr.sum()    #Returns 15
arr.mean()   #Returns 3.0
arr.std()    #Returns 1.41
arr.min()    #Returns 1
arr.max()    #Returns 5

5

**Axis-wise Operations:**
- Applying functions along specific axes:

In [12]:
arr = np.array([[1, 2, 3], [4, 5, 6]])
arr.sum(axis=0)  #Returns array ([5, 7, 9])
arr.sum(axis=1)  #Returns array ([6, 15])

array([ 6, 15])

**Broadcasting and Vectorized Operations**

**What is Broadcasting?** 
- Broadcasting is the process by which NumPy treats arrays with different shapes during arithmetic operations. 

**Rules of Broadcasting:**
- Rule 1: If the arrays do not have the same rank, prepend the shape of the lower-rank array with ones until both shapes have the same length.
- Rule 2: The two arrays are compatible in a dimension if they have the same size in that dimension or if one of them has size 1 in that dimension. 

**Examples of Broadcasting:**
- Adding a scalar to an array:

In [13]:
arr = np.array([1, 2, 3])
arr + 5 #Returns array ([6,7,8])

array([6, 7, 8])

- Adding two arrays of different shapes:

In [22]:
arr1 = np.array([[1, 2, 3], [4, 5, 6]])
arr2 = np.array([1, 2, 3])
arr1 + arr2 #Returns array ([[2, 4, 6], [5, 7, 9]])

array([[2, 4, 6],
       [5, 7, 9]])

**Vectorized Operations**

**Benefits of Vectorization:**
- Vectorization allows operations to be applied element-wise to arrays, leading to more concise and readable code. 
- It also improves performance by utilizing optimized C and Fortran libraries in NumPy.

**Applying Vectorized Operations to Arrays:** 
- Example of vectorized operations:

In [23]:
arr2 = np.array([1, 2, 3, 4, 5])
np.exp(arr) #Returns array ([2.71, 7.39, 20.08, 54.60, 148.41])
np.log(arr) #Returns array ([0.0, 0.69, 1.09, 1.39, 1.61])

array([0.        , 0.69314718, 1.09861229])

**I/O with NumPy**
- **Reading Data from Files:**
- **Using np.loadtxt():**

In [17]:
import numpy as np
data = np.loadtxt('data.txt')
print(data)

[[1. 2. 3.]
 [4. 5. 6.]
 [7. 8. 9.]]


- **Using np.genfromtxt():** <br>
Handles missing values and different delimiters:

In [18]:
data = np.genfromtxt('data.csv', delimiter=';', dtype=float, skip_header=1)
print(data)

[[16.99  1.01   nan   nan   nan   nan  2.  ]
 [10.34  1.66   nan   nan   nan   nan  3.  ]
 [21.01  3.5    nan   nan   nan   nan  3.  ]
 [23.68  3.31   nan   nan   nan   nan  2.  ]
 [24.59  3.61   nan   nan   nan   nan  4.  ]
 [  nan   nan   nan   nan   nan   nan   nan]]


- **Writing Data to Files:** <br>
Using np.savetxt():

In [19]:
np.savetxt('output.txt', data)

- **Using np.save() and np.load():** <br>
For binary format (more efficient storage):

In [20]:
np.save('data.npy', data)
loaded_data = np.load('data.npy')

- **Using np.savez():** <br>
Saving multiple arrays in a compressed format:

In [29]:
# Defining two arrays
data1 = np.array([1, 2, 3])
data2 = np.array([4, 5, 6])

# Saving multiple arrays in a compressed format
np.savez('data.npz', arr1=data1, arr2=data2)

# Loading and accessing saved arrays
loaded = np.load('data.npz')
print(loaded['arr1'])  # Outputs: [1 2 3]
print(loaded['arr2'])  # Outputs: [4 5 6]

[1 2 3]
[4 5 6]
