# Week 5, Day 23: Introduction to Pandas

**Date:** August 1, 2025
**Topic:** Pandas Fundamentals - Data Structures & File I/O

---

#### **1. What is Pandas?**

*   A fundamental Python library for data manipulation and analysis.
*   It introduces two primary data structures that we will work with: `Series` and `DataFrame`.

In [1]:
import pandas as pd
import numpy as np

```mermaid
graph TD
    A["Pandas Library"] --> B["Series (1D Data)"]
    A --> C["DataFrame (2D Data)"]
```

---

#### **2. The Pandas `Series`**

> **Definition:** A **`Series`** is a one-dimensional array-like object that can hold any data type. It has an associated array of data labels, called its **index**.

##### **Creating a `Series`**

*   **From a list (most common):**

In [16]:
# The index is automatically generated (0, 1, 2...)
ser1 = pd.Series([10, 20, 30])
print(ser1)

0    10
1    20
2    30
dtype: int64


*   **From a dictionary:**

In [3]:
# The dictionary keys are used as the index
di = {101: "Riyan", 102: "Amaan", 103: "Adnan"}
s = pd.Series(di)
print(s)

101    Riyan
102    Amaan
103    Adnan
dtype: object


---

#### **3. The Pandas `DataFrame`**

> **Definition:** A **`DataFrame`** is a 2-dimensional, tabular data structure with labeled axes (rows and columns). It's like a spreadsheet or an SQL table.

*   **Creating a `DataFrame`:** The most common way is from a **dictionary of collections** (like lists, arrays, or other Series).
    *   The dictionary keys become the **column names**.
    *   The values in the collections become the **rows** of data.

```mermaid
graph TD
    subgraph "Input: Dictionary of Lists"
        A["{'col1': [r1, r2], 'col2': [r1, r2]}"]
    end
    
    A --> B["pd.DataFrame()"]
    
    subgraph "Output: DataFrame"
        C["DataFrame Structure:<br/>Index | col1 | col2<br/>0 | r1 | r1<br/>1 | r2 | r2"]
    end
    
    B --> C
```

##### **Examples of DataFrame Creation:**

**1. From a Dictionary of Lists:**

In [4]:
di = {
    'Name': ['Riyan', 'Amaan', 'Adnan'],
    'age': [20, 22, 23],
    'Course':['Ds', 'DevOP', 'Bussiness']
}
df = pd.DataFrame(di)
df

Unnamed: 0,Name,age,Course
0,Riyan,20,Ds
1,Amaan,22,DevOP
2,Adnan,23,Bussiness


**2. From a Dictionary of NumPy Arrays:**

In [5]:
name = np.array(['Riyan','Amaan','Adnan'])
course = np.array(['DS','DevOps','WebDev'])
score = np.array([44,46,78])

di_np = {'Name':name, 'Course':course, 'Score':score}
df_np = pd.DataFrame(di_np)
df_np

Unnamed: 0,Name,Course,Score
0,Riyan,DS,44
1,Amaan,DevOps,46
2,Adnan,WebDev,78


**3. From a Nested Dictionary (Dictionary of Dictionaries)**

This creates a DataFrame where the outer keys are columns and the inner keys are the index.

In [6]:
Osmania = {
    'Eng':{
        'Enr_ID':[101,102,103],
        'Student_Name':['Amaan','Adnan','Riyan'],
        'Location':['TGS', "AP", 'TS']
          },
    'Arts':{
        'Enr_ID':[201,202,203],
        'Student_Name':['faizaan', 'Shahid', 'Qhari'],
        'Location':['TS','Goa','TGS']
          },
    'Medical':{
        'Enr_ID':[301,302,303],
        'Student_Name':["Imaraan", 'Ghouse', 'ligma'],
        'Location':['Ap','TS','TGS']
          },
}

df_osmania = pd.DataFrame(Osmania)
df_osmania

Unnamed: 0,Eng,Arts,Medical
Enr_ID,"[101, 102, 103]","[201, 202, 203]","[301, 302, 303]"
Student_Name,"[Amaan, Adnan, Riyan]","[faizaan, Shahid, Qhari]","[Imaraan, Ghouse, ligma]"
Location,"[TGS, AP, TS]","[TS, Goa, TGS]","[Ap, TS, TGS]"


You can then create separate, more conventional DataFrames for each department.

In [7]:
df_eng = pd.DataFrame(Osmania['Eng'])
df_eng

Unnamed: 0,Enr_ID,Student_Name,Location
0,101,Amaan,TGS
1,102,Adnan,AP
2,103,Riyan,TS


---

#### **4. Inspecting Your DataFrame**

Once you have a DataFrame, you can check its properties.

In [8]:
print("DataFrame used for inspection:")
print(df)
print("\nDimensions (df.ndim):", df.ndim)
print("Shape (df.shape):", df.shape)
print("Size (df.size):", df.size)

DataFrame used for inspection:
    Name  age     Course
0  Riyan   20         Ds
1  Amaan   22      DevOP
2  Adnan   23  Bussiness

Dimensions (df.ndim): 2
Shape (df.shape): (3, 3)
Size (df.size): 9


---

#### **5. Custom Indexing**

By default, the index is `0, 1, 2, ...`. We can assign a more meaningful index during creation.

> **Key Point:** The `index` parameter in `pd.DataFrame()` must have the same number of elements as the number of rows in your data.

In [9]:
# The 'index' list provides the row labels
df_custom = pd.DataFrame(
    {
    'course':['DS','DevOps','WebDev'],
    'score':[90, 80 , 60]
    }, 
    index=['Riyan','Amaan','Adnan']
)

df_custom

Unnamed: 0,course,score
Riyan,DS,90
Amaan,DevOps,80
Adnan,WebDev,60


---

#### **6. Reading Data From Files**

This is how we'll work with real-world datasets.

*   **Core Functions:**
    *   `pd.read_csv('filename.csv')` for Comma-Separated Values files.
    *   `pd.read_excel('filename.xlsx')` for Excel files.

*   **Two Scenarios for File Paths:**

    1.  **File in the Same Location:** If the notebook (`.ipynb`) and the data file (`.csv`) are in the same folder, you just need the file's name.

In [19]:
# This will work only if 'FinData.csv' is in the same directory as the notebook.
# If not, it will raise a FileNotFoundError.
try:
    df_from_csv = pd.read_csv('dataset.csv')
    print("Successfully loaded DataSet.csv")
    # display the first 5 rows
    # df_from_csv.head()
except FileNotFoundError as e:
    print(e)

Successfully loaded DataSet.csv


2.  **File in a Different Location:** You must provide the full file path.
    > **Important:** On Windows, you must either use double backslashes (`\\`) or a single forward slash (`/`) to avoid errors.

In [25]:
# These are examples of how to format paths for different operating systems.
# They will raise errors if the files do not exist at these exact locations.

# Windows Example
try:
    df_windows = pd.read_csv('/home//riyan//Desktop//SDHub-DS//SDHub-DS//01_Foundation//03 Python For Data Science//dataset.csv') 
                               #("C:\\Users\\uwais\\Downloads\\DA1 Score.xlsx")
except FileNotFoundError as e:
    print("Windows Path Example Error:", e)

# Mac/Linux Example
try:
    df_linux = pd.read_csv('/home//riyan//Desktop//SDHub-DS//SDHub-DS//01_Foundation//03 Python For Data Science//dataset.csv')
except FileNotFoundError as e:
    print("Linux/Mac Path Example Error:", e)

df_linux

Unnamed: 0,College_ID,IQ,Prev_Sem_Result,CGPA,Academic_Performance,Internship_Experience,Extra_Curricular_Score,Communication_Skills,Projects_Completed,Placement
0,CLG0030,107,6.61,6.28,8,No,8,8,4,No
1,CLG0061,97,5.52,5.37,8,No,7,8,0,No
2,CLG0036,109,5.36,5.83,9,No,3,1,1,No
3,CLG0055,122,5.47,5.75,6,Yes,1,6,1,No
4,CLG0004,96,7.91,7.69,7,No,8,10,2,No
...,...,...,...,...,...,...,...,...,...,...
9995,CLG0021,119,8.41,8.29,4,No,1,8,0,Yes
9996,CLG0098,70,9.25,9.34,7,No,0,7,2,No
9997,CLG0066,89,6.08,6.25,3,Yes,3,9,5,No
9998,CLG0045,107,8.77,8.92,3,No,7,5,1,No
