![image.png](attachment:image.png)

   # <u>Getting Started with Pandas</u>

##  <u>1. Objective</u>:

1. Understand what Pandas is and its role in data analysis
2. Install Pandas using pip or conda
3. Test your installation by importing Pandas and loading a sample dataset

4. Learn the structure of typical Pandas workflows

5. Understand common Pandas functions: read_csv(), head(), describe(), info(), etc.





##  <u>2.Summary of the Technology</u>
**Pandas** is a powerful, open-source data analysis and manipulation library built on top of the Python programming language. It offers flexible and intuitive data structures, such as *Series* and *DataFrames*, that make working with structured (tabular) data fast and easy.

When you're dealing with data stored in formats like spreadsheets or databases, Pandas is the go-to tool. It helps streamline tasks such as **data exploration, cleaning, transformation, and analysis.**

At its core, **Pandas** provides two primary data structures:

1. **Series**: A one-dimensional labeled array.

2. **DataFrame**: A two-dimensional labeled data structure (similar to a spreadsheet or SQL table).

The **DataFrame** is the most commonly used structure and is ideal for handling tabular data.

![image.png](attachment:image.png)

Pandas comes with built-in support for **reading and writing** a variety of file formats and data sources, including:

CSV

Excel

SQL databases

JSON

Parquet

To load data, use functions that start with **read_, such as read_csv() or read_excel()**. Similarly, to export or store data, use functions that start with **to_, such as to_csv() or to_sql().**

![image.png](attachment:image.png)

*Where is it used:*

* Data cleaning and preprocessing

* Exploratory data analysis (EDA)

* Statistical modeling and visualization

* Data pipelines and automation

### Real-world Example
*Here are three real-world companies using Pandas in their data workflows:*

* **Netflix** – Analyzes user viewing data to personalize recommendations and optimize content delivery strategies.

* **Airbnb** – Uses Pandas for cleaning and analyzing property listing data, helping hosts and guests find better matches.

* **Spotify** – Leverages Pandas to process massive datasets for music recommendation systems and user behavior analytics.


##  <u>3.System Requirements</u>

|   Component                    | Requirement                           |
|   ---------------              | ------------------------------------- |
|   OS                           | Linux, macOS, or Windows              |
|   Editor                       | VS Code, Jupyter Notebook, or PyCharm |
|   Python Version               | 3.8 or higher                         |
|   Package Manager              | pip or conda                          |
|   Required Libs                | pandas, numpy, jupyter                |


## <u>4.Installation & Setup Instructions</u>

### *Step-by-Step Guide:*



* Install Pandas with pip

![WhatsApp%20Image%202025-08-07%20at%2004.12.42_28213ed3.jpg](attachment:WhatsApp%20Image%202025-08-07%20at%2004.12.42_28213ed3.jpg)

* (Optional) To install with additional performance dependencies:

![WhatsApp%20Image%202025-08-07%20at%2004.15.46_8ad67edc.jpg](attachment:WhatsApp%20Image%202025-08-07%20at%2004.15.46_8ad67edc.jpg)

* This installs optional dependencies like numexpr and bottleneck for faster operations on large datasets.

* **verify the installation:**
You can open a Python shell or Jupyter notebook and run the code below:

![WhatsApp%20Image%202025-08-07%20at%2004.26.59_ba25a16d.jpg](attachment:WhatsApp%20Image%202025-08-07%20at%2004.26.59_ba25a16d.jpg)

## <u>5.Working With Pandas Example</u>

## <u> 6.AI Prompt Journal</u>
AI Prompt Journal
🧠 i. Using AI to Comprehend Existing Codebase Prompts
Prompt Used:
"Explain what this Pandas script does line by line."

Response Summary:
The AI broke down each line, explaining data loading, inspection, grouping, and statistical operations.

Reflection:
Very helpful for understanding unfamiliar Pandas scripts — especially groupby and describe usage.

🧠 ii. Using AI to Debug Code
Prompt Used:
"Why does my Pandas code return 'KeyError: Column not found' when I try df['total_bill']?"

Response Summary:
The AI explained that column names may have typos or trailing spaces. It suggested using df.columns to verify names.

Reflection:
Quickly identified and resolved a common beginner issue. Saved time that would’ve been spent searching Stack Overflow.

🧠 iii. Using AI to Refactor and Enhance Code
Prompt Used:
"Refactor this code to include missing value handling and visualize tip distribution."

AI Response Summary:
The AI suggested adding a missing value check, .dropna() if needed, and using Seaborn to plot tip distribution.



## <u> 7.Common Issues & Fixes </u>

| Issue                 | Error Message                                   | Fix                                                   |
| --------------------- | ----------------------------------------------- | ----------------------------------------------------- |
| Typo in column name   | `KeyError: 'totl_bill'`                         | Check spelling using `df.columns`                     |
| File not found        | `FileNotFoundError`                             | Ensure file path is correct or use a public dataset   |
| Missing package       | `ModuleNotFoundError: No module named 'pandas'` | Install using `pip install pandas`                    |
| Jupyter not launching | Server error or command not found               | Ensure Jupyter is installed and in PATH, or reinstall |
