# ðŸ“Š 04 - Data Input and Output

Most of data science is about **loading data, exploring it, and saving results**.  
In this notebook you will learn:
- Reading CSV files with Pandas
- Inspecting DataFrames (`head`, `info`, `describe`)
- Writing data back to CSV
- Reading Excel files (if available)


## 1. Reading CSV Files

In [1]:
import pandas as pd

# Create a small CSV for demo
data = """Name,Age,Score
Alice,23,90
Bob,25,85
Charlie,22,95
"""
with open("students.csv", "w") as f:
    f.write(data)

# Read CSV into DataFrame
df = pd.read_csv("students.csv")
df

Unnamed: 0,Name,Age,Score
0,Alice,23,90
1,Bob,25,85
2,Charlie,22,95


âœ… **Your Turn**: Create your own small CSV (3â€“5 rows) and read it into Pandas.

In [2]:
# create a small csv
bts = """Name, Age, Position
Kim Namjoon, 31, Leader and Main Rapper
Kim Seokjin, 32, Lead Vocalist and Visual
Min Yoongi, 32, Lead Rapper
Jung Hoseok, 31, Lead Rapper and Main Dancer
Park Jimin, 29, Lead Vocalist and Main Dancer
Kim Taehyung, 29, Lead Vocalist and Lead Dancer
Jeon Jungkook, 28, Main Vocalist and Lead Dancer
"""

with open("bangtan.csv", "w") as f:
  f.write(bts)

bg = pd.read_csv("bangtan.csv")
bg

Unnamed: 0,Name,Age,Position
0,Kim Namjoon,31,Leader and Main Rapper
1,Kim Seokjin,32,Lead Vocalist and Visual
2,Min Yoongi,32,Lead Rapper
3,Jung Hoseok,31,Lead Rapper and Main Dancer
4,Park Jimin,29,Lead Vocalist and Main Dancer
5,Kim Taehyung,29,Lead Vocalist and Lead Dancer
6,Jeon Jungkook,28,Main Vocalist and Lead Dancer


## 2. Exploring DataFrames

In [3]:
# Look at the first rows
df.head()

Unnamed: 0,Name,Age,Score
0,Alice,23,90
1,Bob,25,85
2,Charlie,22,95


In [4]:
# Info about columns and data types
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   Name    3 non-null      object
 1   Age     3 non-null      int64 
 2   Score   3 non-null      int64 
dtypes: int64(2), object(1)
memory usage: 204.0+ bytes


In [5]:
# Summary statistics
df.describe()

Unnamed: 0,Age,Score
count,3.0,3.0
mean,23.333333,90.0
std,1.527525,5.0
min,22.0,85.0
25%,22.5,87.5
50%,23.0,90.0
75%,24.0,92.5
max,25.0,95.0


âœ… **Your Turn**: Use `.head()` to preview your dataset and `.describe()` to see basic statistics.

In [6]:
bg.head()

Unnamed: 0,Name,Age,Position
0,Kim Namjoon,31,Leader and Main Rapper
1,Kim Seokjin,32,Lead Vocalist and Visual
2,Min Yoongi,32,Lead Rapper
3,Jung Hoseok,31,Lead Rapper and Main Dancer
4,Park Jimin,29,Lead Vocalist and Main Dancer


In [7]:
bg.describe()

Unnamed: 0,Age
count,7.0
mean,30.285714
std,1.603567
min,28.0
25%,29.0
50%,31.0
75%,31.5
max,32.0


## 3. Writing CSV Files

In [8]:
# Save the DataFrame to a new CSV
df.to_csv("students_copy.csv", index=False)
%ls *.csv

bangtan.csv  students_copy.csv  students.csv


âœ… **Your Turn**: Save your DataFrame to a CSV file called `my_data.csv`.

In [9]:
bg.to_csv("my_data.csv", index=False)
%ls *.csv

bangtan.csv  my_data.csv  students_copy.csv  students.csv


## 4. Reading Excel Files

In [10]:
# Optional: Requires 'openpyxl' installed
# Save to Excel
df.to_excel("students.xlsx", index=False)

# Read Excel file
pd.read_excel("students.xlsx")

Unnamed: 0,Name,Age,Score
0,Alice,23,90
1,Bob,25,85
2,Charlie,22,95


âœ… **Your Turn**: Try saving your dataset to Excel and reading it back (if supported in your environment).

In [11]:
bg.to_excel("my_data.xlsx", index=False)
pd.read_excel("my_data.xlsx")

Unnamed: 0,Name,Age,Position
0,Kim Namjoon,31,Leader and Main Rapper
1,Kim Seokjin,32,Lead Vocalist and Visual
2,Min Yoongi,32,Lead Rapper
3,Jung Hoseok,31,Lead Rapper and Main Dancer
4,Park Jimin,29,Lead Vocalist and Main Dancer
5,Kim Taehyung,29,Lead Vocalist and Lead Dancer
6,Jeon Jungkook,28,Main Vocalist and Lead Dancer


---
### Summary
- `pd.read_csv` loads data into a DataFrame.
- Use `.head()`, `.info()`, `.describe()` to explore quickly.
- `df.to_csv` and `df.to_excel` save results.
- Pandas makes file I/O simple and reliable.
