### **Module 1: Introduction to Pandas and Data Structures (In-Depth Guide)**

This module introduces the essential concepts of **Pandas**, a widely used library in data science, machine learning (ML), and artificial intelligence (AI). We will cover the basics of working with **Series** and **DataFrames**, the core data structures in Pandas, and explore how to create, manipulate, and inspect data.

---

### **1.1 Introduction to Pandas**

#### **Why Pandas is essential for Data Science, ML, and AI**
Pandas is an open-source data manipulation and analysis library for Python. It provides high-level data structures (Series and DataFrames) and numerous functions that make data cleaning, wrangling, and preprocessing much easier.

Key reasons why Pandas is crucial for data science and ML:
- **Data manipulation**: Easily filter, group, merge, and reshape datasets.
- **Data cleaning**: Pandas simplifies handling missing data, dealing with duplicates, and correcting inconsistencies.
- **Integration**: Pandas works seamlessly with other data science libraries like NumPy, Matplotlib, and Scikit-learn.
- **Performance**: Built on top of NumPy, Pandas is optimized for performance with large datasets.
  
#### **Installation and Setup**
You can install Pandas using `pip` (the package manager for Python). Ensure that you have Python and pip installed on your machine.

To install Pandas:
```bash
pip install pandas
```

Once installed, you can start using Pandas by importing it in your Python scripts:
```python
import pandas as pd
```

For optimal performance, it's recommended to also have **NumPy** installed:
```bash
pip install numpy
```

---

### **1.2 Pandas Data Structures**

The two core data structures in Pandas are:
1. **Series**: A one-dimensional array-like structure that can hold any data type (integers, floats, strings, etc.).
2. **DataFrame**: A two-dimensional table with rows and columns, much like an Excel spreadsheet, where each column can have a different data type.

#### **Series:**
A **Series** is like a column in a table, but it comes with labels (indices). It can store any data type: integers, floats, strings, or even Python objects.

**Creating a Series:**
```python
import pandas as pd

# Creating a Series from a list
s = pd.Series([10, 20, 30, 40, 50])

# Specifying custom index labels
s_custom_index = pd.Series([10, 20, 30], index=['a', 'b', 'c'])

print(s)
print(s_custom_index)
```

**Key Properties:**
- `values`: Returns the values of the Series.
- `index`: Returns the index (labels) of the Series.

```python
print(s.values)
print(s.index)
```

#### **DataFrame:**
A **DataFrame** is a 2D table (tabular data) with labeled axes (rows and columns). It is the most commonly used data structure in Pandas.

**Creating a DataFrame:**
You can create a DataFrame from various inputs like dictionaries, lists, or NumPy arrays.

Example 1: Creating a DataFrame from a dictionary of lists:
```python
data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'Salary': [50000, 60000, 70000]
}

df = pd.DataFrame(data)
print(df)
```

Example 2: Creating a DataFrame from a list of dictionaries:
```python
data = [
    {'Name': 'Alice', 'Age': 25, 'Salary': 50000},
    {'Name': 'Bob', 'Age': 30, 'Salary': 60000},
    {'Name': 'Charlie', 'Age': 35, 'Salary': 70000}
]

df = pd.DataFrame(data)
print(df)
```

**Key Properties:**
- `columns`: Returns the column labels.
- `index`: Returns the row labels.
- `shape`: Returns the shape (number of rows, number of columns) of the DataFrame.

```python
print(df.columns)
print(df.index)
print(df.shape)
```

---

### **1.3 Basic Operations with Pandas**

#### **Loading Data from Files**
Pandas makes it easy to load data from external files, such as CSV, Excel, JSON, and SQL databases.

1. **Loading a CSV File:**
```python
df = pd.read_csv('file_path.csv')
```

2. **Loading an Excel File:**
```python
df = pd.read_excel('file_path.xlsx')
```

3. **Loading a JSON File:**
```python
df = pd.read_json('file_path.json')
```

#### **Basic Inspection of DataFrames**
Once the data is loaded, you can inspect it using basic Pandas functions to understand its structure and content.

1. **`head()`**: Displays the first 5 rows of the DataFrame.
```python
print(df.head())
```

2. **`tail()`**: Displays the last 5 rows of the DataFrame.
```python
print(df.tail())
```

3. **`info()`**: Provides a summary of the DataFrame, including data types and non-null counts.
```python
print(df.info())
```

4. **`describe()`**: Generates descriptive statistics for numeric columns (e.g., count, mean, standard deviation).
```python
print(df.describe())
```

5. **`shape`**: Returns the shape (number of rows, number of columns).
```python
print(df.shape)
```

---

### **1.4 Indexing and Selecting Data**

#### **1.4.1 Using `[]` Operator:**
The `[]` operator can be used to select a single column or a subset of columns from a DataFrame.

- Select a single column:
```python
age_column = df['Age']
print(age_column)
```

- Select multiple columns:
```python
subset = df[['Name', 'Salary']]
print(subset)
```

#### **1.4.2 Using `.loc[]` and `.iloc[]`:**
1. **`.loc[]`**: Used for label-based indexing. You can select rows and columns by their labels (row/column names).
```python
# Select rows by labels
row = df.loc[0]

# Select rows and columns by labels
subset = df.loc[0:2, ['Name', 'Salary']]
print(subset)
```

2. **`.iloc[]`**: Used for integer-based indexing. You can select rows and columns by their integer positions (similar to NumPy).
```python
# Select rows by index position
row = df.iloc[0]

# Select rows and columns by index position
subset = df.iloc[0:2, [0, 2]]
print(subset)
```

#### **1.4.3 Slicing Data:**
- You can slice rows based on their position:
```python
# Select rows from index 0 to 3
subset = df[0:3]
print(subset)
```

- Slicing is also possible with `.loc[]` and `.iloc[]`:
```python
# Select rows by labels from 'Name' and 'Age' columns
subset = df.loc[0:2, 'Name':'Age']
print(subset)
```

---

### **1.5 Hands-on Lab**

1. **Load a dataset into Pandas and inspect it:**
   - Load a CSV file using `pd.read_csv()`.
   - Inspect the first few rows using `head()`.
   - Check for missing values using `info()`.

2. **Create a DataFrame manually:**
   - Create a DataFrame from a dictionary.
   - Print the DataFrame and its `shape`.

3. **Indexing and Slicing Practice:**
   - Select a single column from the DataFrame.
   - Use `.loc[]` to select rows by label and columns by name.
   - Use `.iloc[]` to select rows and columns by position.
   
---

### **Conclusion:**
In this module, you learned the core concepts of Pandas, including how to work with **Series** and **DataFrames**, perform basic operations like loading data from files, and inspect your datasets. You also practiced indexing and selecting data using different methods. This knowledge is fundamental for working with larger, more complex datasets in later modules.
