# **10. Input/Output Operations**

# 📄 1. CSV Files in Pandas (Reading and Writing)

In [1]:
import pandas as pd 

## 1️⃣ What It Does and When to Use It

### ✅ What it does:

CSV (Comma-Separated Values) files are simple text files where each line represents a row of data, and columns are separated by commas (or other delimiters).

* `pd.read_csv()` → **Reads** data from a CSV file into a DataFrame.
* `df.to_csv()` → **Writes** data from a DataFrame to a CSV file.

### 📌 When to use:

* When exchanging tabular data between different systems (Python ↔ Excel, databases, web apps).
* For reading public datasets (Kaggle, GitHub, UCI ML datasets).
* For saving processed datasets, predictions, or intermediate steps during analysis.

## 2️⃣ Syntax and Key Parameters

### 🔹 Reading a CSV – `pd.read_csv()`

```python
pd.read_csv(filepath_or_buffer, sep=',', header='infer', index_col=None, usecols=None, dtype=None, na_values=None)
```

| Parameter            | Description                                     |
| -------------------- | ----------------------------------------------- |
| `filepath_or_buffer` | Path to the CSV file or a URL                   |
| `sep`                | Delimiter to use (default is comma `,`)         |
| `header`             | Row to use as column names (default: first row) |
| `index_col`          | Column to use as row labels                     |
| `usecols`            | Return a subset of the columns                  |
| `dtype`              | Data types for columns                          |
| `na_values`          | Additional strings to recognize as NA/NaN       |

---

### 🔹 Writing a CSV – `df.to_csv()`

```python
df.to_csv(path_or_buf=None, sep=',', index=True, header=True, columns=None, encoding='utf-8')
```

| Parameter     | Description                                       |
| ------------- | ------------------------------------------------- |
| `path_or_buf` | File path or buffer where the CSV will be written |
| `sep`         | Delimiter to use (default: comma `,`)             |
| `index`       | Whether to write row index (default: `True`)      |
| `header`      | Write out column names (default: `True`)          |
| `columns`     | Subset of columns to write                        |
| `encoding`    | Encoding format (UTF-8, ISO-8859-1, etc.)         |


## 3️⃣ Examples of Reading/Writing

### 🔹 Reading CSV Files

In [2]:
# Basic read
df = pd.read_csv('data files/csv/employees.csv')

df

Unnamed: 0,EmployeeID,Name,Department,Salary,JoiningDate
0,101,Alice,HR,50000,2020-01-15
1,102,Bob,Engineering,75000,2019-07-23
2,103,Charlie,Sales,62000,2021-03-12
3,104,David,Marketing,58000,2018-11-30


In [3]:
# With custom separator and index
df = pd.read_csv('data files/csv/employees.csv', sep=',', index_col='EmployeeID')

df

Unnamed: 0_level_0,Name,Department,Salary,JoiningDate
EmployeeID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
101,Alice,HR,50000,2020-01-15
102,Bob,Engineering,75000,2019-07-23
103,Charlie,Sales,62000,2021-03-12
104,David,Marketing,58000,2018-11-30


In [4]:
# Reading selected columns
df = pd.read_csv('data files/csv/employees.csv', usecols=['Name', 'Salary'])

df

Unnamed: 0,Name,Salary
0,Alice,50000
1,Bob,75000
2,Charlie,62000
3,David,58000


In [5]:
# Handling missing values
df = pd.read_csv('data files/csv/employees.csv', na_values=["n/a", "NA", "--"])

df

Unnamed: 0,EmployeeID,Name,Department,Salary,JoiningDate
0,101,Alice,HR,50000,2020-01-15
1,102,Bob,Engineering,75000,2019-07-23
2,103,Charlie,Sales,62000,2021-03-12
3,104,David,Marketing,58000,2018-11-30


### 🔹 Writing CSV Files

In [6]:
# Write to CSV with index

display(df)

df.to_csv('data files/csv/cleaned employees.csv')

Unnamed: 0,EmployeeID,Name,Department,Salary,JoiningDate
0,101,Alice,HR,50000,2020-01-15
1,102,Bob,Engineering,75000,2019-07-23
2,103,Charlie,Sales,62000,2021-03-12
3,104,David,Marketing,58000,2018-11-30


In [7]:
# Write without index

df.to_csv('data files/csv/cleaned employees no index.csv', index=False)

In [8]:
# Write only specific columns

df.to_csv('data files/csv/employees specifiec cols.csv', columns=['Name', 'Salary'], index=False)

In [9]:
# Write using different delimiter (e.g., tab-separated)

df.to_csv('data files/csv/employees tab sep.csv', sep='\t', index=False)

## 4️⃣ Common Pitfalls

| Pitfall                         | Description & Solution                                                                              |
| ------------------------------- | --------------------------------------------------------------------------------------------------- |
| **Wrong delimiter**             | CSV file may use `;` or `\t` instead of `,`. Use `sep=';'` or `sep='\t'`.                           |
| **Double headers**              | Re-running `.to_csv()` without setting `index=False` can add duplicate index columns.               |
| **Encoding errors**             | Non-English characters can break the read/write. Use `encoding='utf-8'` or `encoding='ISO-8859-1'`. |
| **Missing data misinterpreted** | Default NaN handling may not catch custom missing strings like `--`. Use `na_values`.               |
| **Large files**                 | Reading large CSVs can be memory intensive. Use `chunksize` or `dtypes`.                            |


## 5️⃣ Real-World Usage

### 💼 Business Analytics

* A sales team exports data from Excel → analyst uses `pd.read_csv()` to import it.
* After cleaning and analysis, they use `df.to_csv()` to share the processed report.

### 🔍 Data Science Projects

* Most Kaggle datasets are in `.csv` → load with `pd.read_csv()` for modeling.
* Save processed features or model predictions with `df.to_csv("predictions.csv")`.

### 🧪 Experiment Logging

* Save logs from experiments, metrics, hyperparameters using `.to_csv()` for reproducibility.


## ✅ Summary Table

| Task                  | Method          |
| --------------------- | --------------- |
| Read CSV              | `pd.read_csv()` |
| Write CSV             | `df.to_csv()`   |
| Handle delimiters     | `sep=...`       |
| Handle encoding       | `encoding=...`  |
| Handle missing values | `na_values=...` |

<center><b>Thanks</b></center>