# **10. Input/Output Operations**

# 📊 2. Excel Files in Pandas (Reading and Writing)

In [15]:
import pandas as pd 

## 1️⃣ What It Does and When to Use It

### ✅ What it does:

* `pd.read_excel()` → **Reads** Excel files (`.xls`, `.xlsx`) into a pandas DataFrame.
* `df.to_excel()` → **Writes** a pandas DataFrame into an Excel file.

### 📌 When to use:

* When working with **corporate/business data**, which often comes in Excel format.
* When importing/exporting reports, dashboards, financial statements, etc.
* When handling **multi-sheet** Excel files or structured tabular data.

## 2️⃣ Syntax and Key Parameters

### 🔹 Reading an Excel File — `pd.read_excel()`

```python
pd.read_excel(io, sheet_name=0, header=0, index_col=None, usecols=None, dtype=None)
```

| Parameter    | Description                                   |
| ------------ | --------------------------------------------- |
| `io`         | File path or URL or ExcelFile object          |
| `sheet_name` | Sheet to read (name, index, or list of names) |
| `header`     | Row number to use as the column names         |
| `index_col`  | Column to use as row labels                   |
| `usecols`    | Subset of columns to read                     |
| `dtype`      | Data types for columns                        |

---

### 🔹 Writing an Excel File — `df.to_excel()`

```python
df.to_excel(excel_writer, sheet_name='Sheet1', index=True, columns=None, header=True)
```

| Parameter      | Description                     |
| -------------- | ------------------------------- |
| `excel_writer` | File path or ExcelWriter object |
| `sheet_name`   | Name of the sheet to write to   |
| `index`        | Write row names (index) or not  |
| `columns`      | Subset of columns to write      |
| `header`       | Write column names              |


## 3️⃣ Examples of Reading/Writing

### 📥 Reading from Excel

In [16]:
# Read default (first) sheet
df = pd.read_excel('data files/excel/employees.xlsx')

df 

Unnamed: 0,EmployeeID,Name,Department,Salary,JoiningDate
0,101,Alice,HR,50000,2020-01-15
1,102,Bob,Engineering,75000,2019-07-23
2,103,Charlie,Sales,62000,2021-03-12
3,104,David,Marketing,58000,2018-11-30


In [17]:
# Read specific sheet by name
df = pd.read_excel('data files/excel/employees.xlsx', sheet_name='Employees')

df

Unnamed: 0,EmployeeID,Name,Department,Salary,JoiningDate
0,101,Alice,HR,50000,2020-01-15
1,102,Bob,Engineering,75000,2019-07-23
2,103,Charlie,Sales,62000,2021-03-12
3,104,David,Marketing,58000,2018-11-30


In [18]:
# Read only selected columns
df = pd.read_excel('data files/excel/employees.xlsx', sheet_name='Employees',
              usecols=['Name', 'Salary'])

df 

Unnamed: 0,Name,Salary
0,Alice,50000
1,Bob,75000
2,Charlie,62000
3,David,58000


In [19]:
# Use one column as the index

df = pd.read_excel(
    'data files/excel/employees.xlsx',
    sheet_name='Employees',
    usecols=['EmployeeID', 'Name', 'Salary'],
    index_col='EmployeeID'
)

df 

Unnamed: 0_level_0,Name,Salary
EmployeeID,Unnamed: 1_level_1,Unnamed: 2_level_1
101,Alice,50000
102,Bob,75000
103,Charlie,62000
104,David,58000


In [20]:
df = pd.read_excel('data files/excel/employees.xlsx')

df

Unnamed: 0,EmployeeID,Name,Department,Salary,JoiningDate
0,101,Alice,HR,50000,2020-01-15
1,102,Bob,Engineering,75000,2019-07-23
2,103,Charlie,Sales,62000,2021-03-12
3,104,David,Marketing,58000,2018-11-30


### 📤 Writing to Excel

In [21]:
# Save to a new Excel file

df.to_excel('data files/excel/emploeyees processed.xlsx', index=False)

In [22]:
# Save to a specific sheet name

df.to_excel(
    'data files/excel/emploeyees processed sheet.xlsx',
    index=False,
    sheet_name='Cleaned Data'
)

In [23]:
# Save selected columns only

df.to_excel(
    'data files/excel/emploeyees summary.xlsx',
    index=False,
    columns=['Name', 'Salary']
)

## 4️⃣ Common Pitfalls

| Pitfall                   | Description & Solution                                                           |
| ------------------------- | -------------------------------------------------------------------------------- |
| **Missing dependency**    | You need `openpyxl` or `xlrd` installed. Use `pip install openpyxl` for `.xlsx`. |
| **Large files**           | Excel read/write is slower than CSV. For big data, prefer CSV or Parquet.        |
| **Multi-sheet confusion** | Always specify `sheet_name`, especially with multi-sheet files.                  |
| **Date columns**          | Excel stores dates in a serial format. Use `parse_dates` or convert later.       |
| **Index column added**    | Default `to_excel()` adds row numbers unless `index=False` is used.              |


## 5️⃣ Real-World Usage

### 🧮 Financial Reporting

* Export DataFrame summaries like expenses, profits, etc. to Excel.
* Business analysts use Excel exports for dashboard tools like Power BI.

### 📊 Multi-Sheet Workbooks

* Read from multiple sheets (e.g., monthly reports).
* Use `sheet_name=["Jan", "Feb"]` to load multiple sheets at once.

### 🧪 Data Audits

* Analysts or QA teams use Excel exports to verify data corrections or pipeline changes.

## ✅ Summary Table

| Task               | Method            |
| ------------------ | ----------------- |
| Read Excel         | `pd.read_excel()` |
| Write Excel        | `df.to_excel()`   |
| Specify sheet      | `sheet_name=...`  |
| Select columns     | `usecols=...`     |
| Save without index | `index=False`     |


<center><b>Thanks</b></center>