# Clean Courses CSV

This notebook cleans the courses.csv file:
- Removes trailing semicolons from fields
- Standardizes location names (removes ', UK')
- Cleans up any extra whitespace

In [10]:
import pandas as pd

df = pd.read_csv('courses.csv')

print(f"Total courses: {len(df)}")
df.head(2)

Total courses: 211


Unnamed: 0,ID,Course Name,Instructor,Course Type,Location,Cost,Learning Objectives,Provided Materials,Skills Developed,Description
0,4033,The Art of Wondrous Waffle Weaving,Chef Waffleby,Culinary Arts,Harrogate,£75.00,Master the technique of creating intricate waf...,Professional waffle iron; Selection of flours ...,"Culinary Arts, Baking, Creative Cooking, Food ...",Join us at the Harrogate Culinary Academy for ...
1,317,Cornish Pasty Poetry & Patisserie,Chef Rhubarb Mince,Traditional Skills,Cornwall,£85.00,Master the art of crafting the perfect Cornish...,Organic flour and butter; Locally sourced ingr...,"Cooking, Creative Writing, Cultural History, P...",Welcome to the Cornish Pasty Poetry & Patisser...


In [11]:
columns_to_check = ['Learning Objectives', 'Provided Materials', 'Skills Developed', 'Description']

for col in columns_to_check:
    if col in df.columns:
        semicolon_count = df[col].astype(str).str.endswith(';').sum()
        print(f"{col}: {semicolon_count} rows end with semicolon")

Learning Objectives: 0 rows end with semicolon
Provided Materials: 187 rows end with semicolon
Skills Developed: 0 rows end with semicolon
Description: 0 rows end with semicolon


In [12]:
if 'Location' in df.columns:
    uk_locations = df[df['Location'].astype(str).str.contains(', UK', na=False)]
    print(f"Locations with ', UK': {len(uk_locations)}")
    print(uk_locations['Location'].unique())

Locations with ', UK': 1
['Harrogate, UK']


In [13]:
# Clean the data
import re

def clean_semicolons(text):
    """Clean semicolons from text:
    - If there's a full stop before semicolon (.;), remove the semicolon
    - If there's just a semicolon (;), replace it with a full stop (.)
    """
    if pd.isna(text) or text == 'nan':
        return text
    
    text = str(text)
    # First, replace '.;' with just '.'
    text = text.replace('.;', '.')
    # Then replace remaining ';' with '.'
    text = text.replace(';', '.')
    # Clean up any double periods that might have been created
    text = re.sub(r'\.{2,}', '.', text)
    return text.strip()

# Apply semicolon cleaning to text columns
for col in df.columns:
    if df[col].dtype == 'object':  # Only text columns
        df[col] = df[col].apply(clean_semicolons)

# Remove ', UK' from locations
if 'Location' in df.columns:
    df['Location'] = df['Location'].str.replace(', UK', '', regex=False)

# Final whitespace cleanup
for col in df.columns:
    if df[col].dtype == 'object':
        df[col] = df[col].str.strip()

print("Data cleaned!")
df.head()

Data cleaned!


Unnamed: 0,ID,Course Name,Instructor,Course Type,Location,Cost,Learning Objectives,Provided Materials,Skills Developed,Description
0,4033,The Art of Wondrous Waffle Weaving,Chef Waffleby,Culinary Arts,Harrogate,£75.00,Master the technique of creating intricate waf...,Professional waffle iron. Selection of flours ...,"Culinary Arts, Baking, Creative Cooking, Food ...",Join us at the Harrogate Culinary Academy for ...
1,317,Cornish Pasty Poetry & Patisserie,Chef Rhubarb Mince,Traditional Skills,Cornwall,£85.00,Master the art of crafting the perfect Cornish...,Organic flour and butter. Locally sourced ingr...,"Cooking, Creative Writing, Cultural History, P...",Welcome to the Cornish Pasty Poetry & Patisser...
2,8879,Mystical Moss Mosaics,Professor Mossbottom,Nature Crafts,Scottish Highlands,£85.00,Identify and collect different types of moss s...,Moss identification booklet. Trowel. Canvas bo...,"Nature Art, Botanical Knowledge, Eco-Friendly ...",Venture into the enchanting wilderness of the ...
3,1153,Advanced Hedgehog Husbandry,Mr. Pricklesworth,Traditional Skills,Norfolk,£95.00,Learn the basics of hedgehog care and feeding....,Hedgehog-friendly gloves. Illustrated hedgehog...,"Animal Care, Wildlife Conservation, Mindfulnes...","Welcome to ""Advanced Hedgehog Husbandry,"" a ch..."
4,9224,Leafy Quill and Ink Potion Mastery,Professor Ivy Fernsnap,Nature Crafts,Oxford,£75.00,Craft quills from natural materials. Create bo...,Selection of feathers and twigs. Natural plant...,"Nature crafts, Botanical art, Writing techniqu...",Join Professor Ivy Fernsnap in the historic ci...


In [14]:
# Verify the cleaning
print("\nVerifying semicolons removed:")
for col in columns_to_check:
    if col in df.columns:
        semicolon_count = df[col].astype(str).str.endswith(';').sum()
        print(f"{col}: {semicolon_count} rows end with semicolon")

print("\nVerifying UK removed from locations:")
if 'Location' in df.columns:
    uk_locations = df[df['Location'].astype(str).str.contains(', UK', na=False)]
    print(f"Locations with ', UK': {len(uk_locations)}")


Verifying semicolons removed:
Learning Objectives: 0 rows end with semicolon
Provided Materials: 0 rows end with semicolon
Skills Developed: 0 rows end with semicolon
Description: 0 rows end with semicolon

Verifying UK removed from locations:
Locations with ', UK': 0


In [15]:
# Save the cleaned CSV
df.to_csv('courses_cleaned.csv', index=False)
print("✅ Saved to courses_cleaned.csv")

# Optional: Backup original and replace
# import shutil
# shutil.copy('courses.csv', 'courses_backup.csv')
# df.to_csv('courses.csv', index=False)
# print("✅ Original backed up and replaced")

✅ Saved to courses_cleaned.csv


In [16]:
# Check a specific problematic row
# Replace with actual course name or ID you want to inspect
sample = df[df['Course Name'].str.contains('Waffle', na=False)].iloc[0]
print("Sample course after cleaning:")
for col in sample.index:
    print(f"\n{col}:")
    print(sample[col])

Sample course after cleaning:

ID:
4033

Course Name:
The Art of Wondrous Waffle Weaving

Instructor:
Chef Waffleby

Course Type:
Culinary Arts

Location:
Harrogate

Cost:
£75.00

Learning Objectives:
Master the technique of creating intricate waffle patterns. Understand the science behind perfect waffle batter consistency. Explore innovative flavour combinations for both sweet and savoury waffles. Learn artistic presentation techniques for waffle dishes. Develop skills in using waffle irons of various designs.

Provided Materials:
Professional waffle iron. Selection of flours (including gluten-free options). Assorted spices and herbs. Fresh fruits and savory toppings. Waffle weaving toolkit (spatulas, piping bags, edible glitter).

Skills Developed:
Culinary Arts, Baking, Creative Cooking, Food Presentation, Flavour Pairing

Description:
Join us at the Harrogate Culinary Academy for an unforgettable experience in 'The Art of Wondrous Waffle Weaving.' Under the whimsical guidance of Ch