<a href="https://colab.research.google.com/github/naazsibia/ug-trainer/blob/main/tutorial.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Welcome! In this notebook we’ll walk through:

1. **Loading data** from CSV  
2. **Basic DataFrame operations**  
3. **Applying functions** to columns  
4. **Useful DataFrame methods**  
5. **Merging multiple files**  
6. **Saving results**  

Each section has a short explanation and example code—feel free to modify and rerun cells!

## 1. Setup and Sample Data

First, import pandas and create a small in-memory sample dataset.

In [None]:
import pandas as pd
from io import StringIO

# Create a sample CSV as a string
csv_data = """name,age,score
Alice,30,85
Bob,25,90
Carol,22,78"""

# Read into DataFrame
df_sample = pd.read_csv(StringIO(csv_data))
df_sample  # display the loaded DataFrame




## 2. Basic DataFrame Operations
Explore some fundamental DataFrame methods.

In [None]:

# Display first few rows
df_sample.head()


# Summary statistics
df_sample.describe()


# Select specific columns
df_sample[['name', 'score']]

## 3. Applying Functions to Columns
Define a custom function and apply it to a column.

In [None]:

# Example function to add bonus points
def add_bonus(score):
    return score + 5

# Apply to 'score' column
df_sample['score_plus_bonus'] = df_sample['score'].apply(add_bonus)
df_sample

## 4. Useful DataFrame Methods
Demonstrations of common DataFrame transformations.

In [None]:
# Create a copy with a missing value for demonstration
df = df_sample.copy()
df.loc[3] = ['David', None, 88]
df


# Drop rows with any missing values
df.dropna(inplace=True)
df


# Sort by score descending
df_sample.sort_values('score', ascending=False)


# Rename a column (returns a new DataFrame)
df_sample.rename(columns={'score': 'exam_score'}, inplace=False)


# Demonstrate dropping duplicate rows
df_dup = pd.concat([df_sample, df_sample])
df_dup


# Remove duplicates
df_dup.drop_duplicates()


## 5. Merging Multiple Files

Combine two DataFrames on a common key.

In [None]:
# Create a second sample CSV for department info
csv_data2 = """name,department
Alice,Physics
Bob,Chemistry
Carol,Biology"""

df_dept = pd.read_csv(StringIO(csv_data2))
df_dept


# Merge sample data with department info on 'name'
merged = pd.merge(df_sample, df_dept, on='name', how='inner')
merged

## 6. Saving Results
Write your processed DataFrame back to CSV.

In [None]:
# Save the merged DataFrame to a file
merged.to_csv('merged_output.csv', index=False)
print('Saved merged_output.csv')