<a href="https://colab.research.google.com/github/jtsui23-code/Jtsui23-code-DataScience-2025/blob/main/Completed/06-Working_with_Data_Adv/06-column_operations.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# üß± Column Operations: Creating, Renaming, and Dropping Columns

## üîπ LEARNING GOALS:
- Create new columns using calculations or functions
- Rename one or multiple columns
- Drop unwanted columns safely
- Apply functions across columns using `.apply()` and lambdas


### üèóÔ∏è 1. Setup and Sample Data

In [None]:
import pandas as pd

data = {
    "first_name": ["Alice", "Bob", "Charlie", "David"],
    "last_name": ["Smith", "Jones", "Brown", "Wilson"],
    "math_score": [85, 90, 78, 92],
    "science_score": [88, 85, 82, 95]
}

df = pd.DataFrame(data)
df

Unnamed: 0,first_name,last_name,math_score,science_score
0,Alice,Smith,85,88
1,Bob,Jones,90,85
2,Charlie,Brown,78,82
3,David,Wilson,92,95


### ‚ûï 2. Creating New Columns

In [None]:
# Simple column arithmetic
df["average_score"] = (df["math_score"] + df["science_score"]) / 2
df

Unnamed: 0,first_name,last_name,math_score,science_score,average_score
0,Alice,Smith,85,88,86.5
1,Bob,Jones,90,85,87.5
2,Charlie,Brown,78,82,80.0
3,David,Wilson,92,95,93.5


In [None]:
# Create full name column
df["full_name"] = df["first_name"] + " " + df["last_name"]
df

Unnamed: 0,first_name,last_name,math_score,science_score,average_score,full_name
0,Alice,Smith,85,88,86.5,Alice Smith
1,Bob,Jones,90,85,87.5,Bob Jones
2,Charlie,Brown,78,82,80.0,Charlie Brown
3,David,Wilson,92,95,93.5,David Wilson


### ‚úçÔ∏è 3. Renaming Columns

In [None]:
# Rename a single column
df.rename(columns={"math_score": "Math", "science_score": "Science"}, inplace=True)
df

Unnamed: 0,first_name,last_name,Math,Science,average_score,full_name
0,Alice,Smith,85,88,86.5,Alice Smith
1,Bob,Jones,90,85,87.5,Bob Jones
2,Charlie,Brown,78,82,80.0,Charlie Brown
3,David,Wilson,92,95,93.5,David Wilson


In [None]:
# Rename all columns to uppercase
df.columns = [col.upper() for col in df.columns]
df

Unnamed: 0,FIRST_NAME,LAST_NAME,MATH,SCIENCE,AVERAGE_SCORE,FULL_NAME
0,Alice,Smith,85,88,86.5,Alice Smith
1,Bob,Jones,90,85,87.5,Bob Jones
2,Charlie,Brown,78,82,80.0,Charlie Brown
3,David,Wilson,92,95,93.5,David Wilson


### ‚ùå 4. Dropping Columns

In [None]:
# Drop by name
df.drop(columns=["FULL_NAME"], inplace=True)
df

Unnamed: 0,FIRST_NAME,LAST_NAME,MATH,SCIENCE,AVERAGE_SCORE
0,Alice,Smith,85,88,86.5
1,Bob,Jones,90,85,87.5
2,Charlie,Brown,78,82,80.0
3,David,Wilson,92,95,93.5


### üîÅ 5. Applying Functions Across Columns

In [None]:
# Categorize based on average score
def grade(score):
    if score >= 90:
        return "A"
    elif score >= 80:
        return "B"
    else:
        return "C"

df["GRADE"] = df["AVERAGE_SCORE"].apply(grade)
df

Unnamed: 0,FIRST_NAME,LAST_NAME,MATH,SCIENCE,AVERAGE_SCORE,GRADE
0,Alice,Smith,85,88,86.5,B
1,Bob,Jones,90,85,87.5,B
2,Charlie,Brown,78,82,80.0,B
3,David,Wilson,92,95,93.5,A


### üß™ Try It Yourself

- Add a new column called `"NAME_LENGTH"` that contains the length of each `FIRST_NAME`
- Create a column `"MATH_PLUS_5"` which is the math score + 5 points bonus


In [8]:
df["NAME_LENGTH"] = df["first_name"].str.len()
print(df)

  first_name last_name  Math  Science  average_score      full_name  \
0      Alice     Smith    85       88           86.5    Alice Smith   
1        Bob     Jones    90       85           87.5      Bob Jones   
2    Charlie     Brown    78       82           80.0  Charlie Brown   
3      David    Wilson    92       95           93.5   David Wilson   

   NAME_LENGTH  
0            5  
1            3  
2            7  
3            5  


In [9]:
def add_5(score):
  score += 5
  return score

df["MATH_PLUS_5"] = df["Math"].apply(add_5)
print(df)

  first_name last_name  Math  Science  average_score      full_name  \
0      Alice     Smith    85       88           86.5    Alice Smith   
1        Bob     Jones    90       85           87.5      Bob Jones   
2    Charlie     Brown    78       82           80.0  Charlie Brown   
3      David    Wilson    92       95           93.5   David Wilson   

   NAME_LENGTH  MATH_PLUS_5  
0            5           90  
1            3           95  
2            7           83  
3            5           97  


### üß† Mini-Challenge

> Load `"data/students.csv"` and:
- Create a `"total_score"` column by summing up all numeric test columns
- Rename any columns that contain spaces (e.g., `"Test 1"`) to use underscores
- Drop any column that contains only `NaN` values


There is no data/students.csv so I made my own students.csv df

In [13]:
student_data = {
    "Student Name": ["Alice", "Bob", "Charlie", "David"],
    "Test 1": [85, 90, 78, 92],
    "Test 2": [88, 85, 82, 95],
    "Comments": ["Good", "Average", "Needs improvement", "Excellent"],
}

student_df = pd.DataFrame(student_data)
print(student_df)
print("-----------------------------------------------------------------------------------------------\n")

student_df.columns = student_df.columns.str.replace(" ", "_")
print("Replaced white space with underscore")
print(student_df)
print("-----------------------------------------------------------------------------------------------\n")




  Student Name  Test 1  Test 2           Comments
0        Alice      85      88               Good
1          Bob      90      85            Average
2      Charlie      78      82  Needs improvement
3        David      92      95          Excellent
-----------------------------------------------------------------------------------------------

Replaced white space with underscore
  Student_Name  Test_1  Test_2           Comments
0        Alice      85      88               Good
1          Bob      90      85            Average
2      Charlie      78      82  Needs improvement
3        David      92      95          Excellent
-----------------------------------------------------------------------------------------------



In [14]:
student_df["Empty_column"] = [None, None, None, None]
print(student_df)
print("-----------------------------------------------------------------------------------------------\n")


  Student_Name  Test_1  Test_2           Comments Empty_column
0        Alice      85      88               Good         None
1          Bob      90      85            Average         None
2      Charlie      78      82  Needs improvement         None
3        David      92      95          Excellent         None
-----------------------------------------------------------------------------------------------



In [15]:
student_df = student_df.dropna(axis=1, how="all")
print(student_df)
print("-----------------------------------------------------------------------------------------------\n")



  Student_Name  Test_1  Test_2           Comments
0        Alice      85      88               Good
1          Bob      90      85            Average
2      Charlie      78      82  Needs improvement
3        David      92      95          Excellent
-----------------------------------------------------------------------------------------------



In [16]:
student_df["total_score"] = student_df.select_dtypes(include="number").sum(axis=1)
print(student_df)
print("-----------------------------------------------------------------------------------------------\n")



  Student_Name  Test_1  Test_2           Comments  total_score
0        Alice      85      88               Good          173
1          Bob      90      85            Average          175
2      Charlie      78      82  Needs improvement          160
3        David      92      95          Excellent          187
-----------------------------------------------------------------------------------------------



### üìù Summary

| Action             | Method                                |
|--------------------|----------------------------------------|
| Create column       | `df["new"] = ...`                     |
| Rename column(s)    | `df.rename(columns={...})`            |
| Rename all columns  | `df.columns = [...]`                 |
| Drop column(s)      | `df.drop(columns=[...])`              |
| Apply function      | `df["col"].apply(func)`               |
