# Module 0: Introduction to Scikit-Learn

## Part 4: Pandas

In this part, we will explore the powerful data manipulation library Pandas.

### 4.1 Introduction to Pandas

Pandas is a popular Python library that provides data structures and functions to efficiently handle and preprocess data. It is widely used in data analysis and preparation before applying machine learning algorithms. Let's dive into Pandas and see how it can be used for data preprocessing.

Pandas introduces two essential data structures: Series and DataFrame.

- Series: A one-dimensional array-like object containing data and an associated array of labels, called the index.

- DataFrame: A two-dimensional tabular data structure representing data as rows and columns, similar to a spreadsheet or SQL table.

### 4.2 Loading data with Pandas

Pandas provides various functions to load data from different sources, such as CSV files, Excel files, databases, and more. Here's an example of loading data from a CSV file:

```python
import pandas as pd

# Load data from CSV file
df = pd.read_csv('data.csv')
```

### 4.3 Exploring data with Pandas

Pandas offers a wide range of functions to explore and understand the data. Some common functions include:

- df.head(): Display the first few rows of the DataFrame.
- df.info(): Provide information about the DataFrame, such as data types and missing values.
- df.describe(): Generate descriptive statistics for numerical columns.
- df.shape: Get the dimensions of the DataFrame (number of rows and columns).

### 4.4 Data preprocessing with Pandas

Pandas allows you to perform various data preprocessing tasks efficiently, such as:

- Handling missing data: Use df.dropna() to drop rows with missing values or df.fillna() to fill missing values with specific strategies.
- Encoding categorical variables: Use pd.get_dummies() for one-hot encoding or df['column'].map() for label encoding.
- Feature scaling: Scale numerical features using df['column'].apply() or df['column'].transform().

We will dive to data analysis and data preprocessing in the next module.

### 4.5 Summary

Pandas is a powerful data manipulation library that provides essential data structures and functions to handle and preprocess data efficiently. It simplifies data exploration, cleaning, and preparation tasks, making it an invaluable tool for data analysis and machine learning.