# Numpy and Pandas
Numpy and Pandas are the most popular libraries for data manipulation and analysis in Python. They provide powerful data structures and functions to work with large datasets efficiently. Numpy is primarily used for numerical computations, while Pandas is built on top of Numpy and provides data structures like DataFrames for handling tabular data.
## Quick comparison
| Feature | Numpy | Pandas |
| --- | --- | --- |
| Data Structure | N-dimensional arrays | DataFrames (2D labeled data) |
| Indexing | Integer-based | Label-based |
| Data Types | Homogeneous | Heterogeneous |
| Operations | Element-wise operations, linear algebra | Data manipulation, aggregation, merging |
| Performance | Fast for numerical computations | Optimized for data manipulation |


## Numpy Operations

We need to import the Numpy library to use its functionalities. Numpy is a powerful library for numerical computations in Python, providing support for large, multi-dimensional arrays and matrices, along with a collection of mathematical functions to operate on these arrays.

### Basic Numpy Operations
To create a Numpy array, we can use the `np.array()` function. This function takes a list or tuple as input and converts it into a Numpy array. The syntax is as follows:

some_array = np.array([some_list])

This creates a Numpy array from the provided list. To create a multi-dimensional array, we can pass a nested list or tuple to the `np.array()` function. For example:   

multi_dimensional_array = np.array([[1, 2, 3], [4, 5, 6]])

This creates a 2D Numpy array with two rows and three columns (2x3). 

Here are some basic operations you can perform with Numpy arrays:

```python
import numpy as np  
# Create a Numpy array
arr = np.array([1, 2, 3, 4, 5]) #Arrays are the core data structure in Numpy, allowing for efficient storage and manipulation of numerical data. 
print(arr)  # Output: [1 2 3 4 5]
# Create a multi-dimensional array
multi_dimensional_arr = np.array([[1, 2, 3], [4, 5, 6]])  # Creating a 2D array with two rows and three columns
print(multi_dimensional_arr)  # Output: [[1 2 3] [4 5 6]]
```

`Broadcasting` is a powerful feature in Numpy that allows you to perform operations on arrays of different shapes and sizes. It automatically expands the smaller array to match the shape of the larger array, enabling element-wise operations without the need for explicit loops.

```python
# Perform element-wise operations
arr_squared = arr ** 2  # Squaring each element. You use broadcasting to operate on arrays of different shapes and sizes. Valid operations include addition, subtraction, multiplication, and division.
print(arr_squared)  # Output: [ 1  4  9 16 25]
```
Arrays can be `modified and reshaped` without changing their data. This is useful for preparing data for various operations, such as matrix multiplication or statistical analysis.
```Python
# Reshape the array
reshaped_arr = arr.reshape(5, 1)  # Reshaping to a 2D array with 5 rows and 1 column
print(reshaped_arr)  # Output: [[1] [2] [3] [4] [5]]
# Editing an element
reshaped_arr[0, 0] = 10  # Changing the first element to 10
print(reshaped_arr)  # Output: [[10] [ 2] [ 3] [ 4] [ 5]]
# Slicing the array
sliced_arr = reshaped_arr[1:4, 0]  # Slicing the array to get elements from index 1 to 3 by row
print(sliced_arr)  # Output: [2 3 4]

```


In [1]:
%pip install numpy



[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.0[0m[39;49m -> [0m[32;49m25.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip3 install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


In [2]:
import numpy as np
# Create a Numpy array
arr = np.array([1, 2, 3, 4, 5]) #Arrays are the core data structure in Numpy, allowing for efficient storage and manipulation of numerical data. 
print(arr)  # Output: [1 2 3 4 5]
# Create a multi-dimensional array
multi_dimensional_arr = np.array([[1, 2, 3], [4, 5, 6]])  # Creating a 2D array with two rows and three columns
print('multi_dimensional_arr:', multi_dimensional_arr)  # Output: [[1 2 3] [4 5 6]]

# Create a Numpy array with a specific data type
arr_with_dtype = np.array([1, 2, 3], dtype=np.float64)  # Specifying the data type as float64
print('arr_with_dtype:', arr_with_dtype)  # Output: [1. 2. 3.]
# Create a Numpy array with a specific shape
arr_with_shape = np.zeros((2, 3))  # Creating a 2D array filled with zeros, with shape (2, 3)
print('arr_with_shape:\n', arr_with_shape)  # Output: [[0. 0. 0.] [0. 0. 0.]]
print('shape of arr_with_shape:', arr_with_shape.shape)  # Output: (2, 3)

# Create a Numpy array with a specific range of values
arr_with_range = np.arange(0, 10, 2)  # Creating an array with values from 0 to 10, with a step of 2
print('arr_with_range:\n', arr_with_range)  # Output: [0 2 4 6 8]

# Create a Numpy array with random values
arr_with_random = np.random.rand(3, 2)  # Creating a 2D array with random values, with shape (3, 2)
print('arr_with_random:\n', arr_with_random)
print('shape of arr_with_random:', arr_with_random.shape)  # Output: (3, 2)

[1 2 3 4 5]
multi_dimensional_arr: [[1 2 3]
 [4 5 6]]
arr_with_dtype: [1. 2. 3.]
arr_with_shape:
 [[0. 0. 0.]
 [0. 0. 0.]]
shape of arr_with_shape: (2, 3)
arr_with_range:
 [0 2 4 6 8]
arr_with_random:
 [[0.72285723 0.67970159]
 [0.42166589 0.96170304]
 [0.87073293 0.19762868]]
shape of arr_with_random: (3, 2)


### Numpy Operations in Machine Learning
Numpy is widely used in machine learning for various tasks, such as data preprocessing, feature extraction, and model training. 

The advantages of using Numpy in machine learning include:
- **Performance**: Numpy arrays are more efficient than Python lists for numerical computations, allowing for faster data processing and manipulation.

- **Memory Efficiency**: Numpy arrays use less memory compared to Python lists, making them suitable for handling large datasets.
  
- **Vectorization**: Numpy supports vectorized operations, which allow for efficient computation on entire arrays without the need for explicit loops.
  
- **Broadcasting**: Numpy supports broadcasting, which allows for operations on arrays of different shapes and sizes, making it easier to perform mathematical operations on datasets.
  
- **Integration with Other Libraries**: Numpy is the foundation for many other libraries in the Python ecosystem, such as Pandas, Scikit-learn, and TensorFlow, making it a crucial component for machine learning workflows.




#### Boolean in Numpy
Numpy supports boolean operations, which are used to filter and manipulate data based on conditions. Boolean operations in Numpy allow you to create boolean arrays that can be used for indexing and filtering data.
```python
# Create a boolean array based on a condition
arr = np.array([1, 2, 3, 4, 5])  # Creating a Numpy array   
bool_arr = arr > 2  # Creating a boolean array where elements greater than 2 are marked as True
print(bool_arr)  # Output: [False False  True  True  True]
```
Note that the returned array is a **`boolean array`**, where each element corresponds to whether the condition is met for the respective element in the original array. In this case, elements greater than 2 are marked as `True`, while others are marked as `False`. This is different from the filtered array, which contains only the **`elements`** that meet the condition.
```python
# Use the boolean array for indexing
filtered_arr = arr[bool_arr]  # Filtering the original array using the boolean array
print(filtered_arr)  # Output: [3 4 5]
```


## Pandas 

Pandas is a useful library for data manipulation and analysis in Python. It provides data structures like DataFrames and Series, which make it easy to work with structured data. Pandas is built on top of Numpy and is widely used in data science and machine learning tasks.

Pandas can read, edit, and write data in various formats, including CSV, Excel, SQL databases, and more. It also provides powerful tools for data cleaning, transformation, and analysis.

The main data structures in Pandas are:
- `Series`: A one-dimensional labeled array that can hold any data type (similar to a column in a spreadsheet).

- `DataFrame`: A two-dimensional labeled data structure with columns of potentially different types (similar to a table in a database or a spreadsheet).




### Basic Pandas Operations

To create a Pandas DataFrame, you can use the `pd.DataFrame()` function. `pd` is the standard alias for the Pandas library, and it is commonly used in Python code.

This function takes a dictionary, list of lists, or other data structures as input and converts it into a DataFrame. The syntax is as follows:
- pd.DataFrame(data, columns=None, index=None)

This creates a Pandas DataFrame from the provided data. You can specify the column names and index labels using the `columns` and `index` parameters, respectively. You can access the columns of a DataFrame using the `df['column_name']` syntax, where `df` is the DataFrame object and `column_name` is the name of the column you want to access.



In [4]:
!pip install pandas



In [1]:
import pandas as pd  # Importing the Pandas library
# Create a Pandas DataFrame from a dictionary
data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'City': ['New York', 'Los Angeles', 'Chicago']
}
df = pd.DataFrame(data)
print(df)
print("=" * 20 + " DataFrame Example " + "=" * 20)
print(df['Name'])  # Accessing a specific column

      Name  Age         City
0    Alice   25     New York
1      Bob   30  Los Angeles
2  Charlie   35      Chicago
0      Alice
1        Bob
2    Charlie
Name: Name, dtype: object
