SIS Faculty List — Data Quality Cleaning

This project contains a Python script (sis_clean.py) to perform data quality analysis and cleaning on the SIS Faculty dataset.
It prepares the dataset for use in Machine Learning tasks by addressing missing values, invalid identifiers, inconsistent categories, and redundant fields.

Features

Normalises column names and trims whitespace
Parses date fields with multiple formats
Validates and corrects identifiers (ID)
Drops columns with >95% missing values or constant values
Standardises qualification labels (e.g., Ph.D → PhD)
Imputes missing values (mode for categorical, median for numeric)
Removes duplicate rows
Saves the cleaned dataset as a new CSV file
Prints before/after summaries of data quality

Requirements

Python 3.9+
Libraries:
```
pip install pandas numpy
```

Project Structure

├── sis_clean.py # Python script for cleaning the dataset ├── SIS_Faculty-List.csv # Raw dataset (input) ├── SIS_Faculty-List_clean.csv # Cleaned dataset (output, generated by script) ├── README.md # Instructions for setup and usage

How to Use

Place your raw dataset file (e.g. SIS_Faculty-List.csv) in the same folder as sis_clean.py.
Open a terminal (or PowerShell on Windows) and navigate to the folder:
```
cd path/to/folder
```
Run the script with the default input/output names:
```
python sis_clean.py
```

This will:

Load SIS_Faculty-List.csv
Generate SIS_Faculty-List_clean.csv in the same folder
Print before/after data quality metrics in the terminal

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SIS Faculty List — Data Quality Cleaning

Features

Requirements

Project Structure

How to Use

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.idea		.idea
README.md		README.md
SIS_Faculty-List.csv		SIS_Faculty-List.csv
SIS_Faculty-List_clean.csv		SIS_Faculty-List_clean.csv
sis_clean.py		sis_clean.py

prime97/CSCK503_Assignment_1

Folders and files

Latest commit

History

Repository files navigation

SIS Faculty List — Data Quality Cleaning

Features

Requirements

Project Structure

How to Use

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages