This project builds a spam detection model using Logistic Regression to classify text messages as spam or ham. The workflow includes data cleaning, text preprocessing, TF-IDF feature extraction, model training, and evaluation using standard machine learning metrics.
- Uses a labeled SMS dataset (
spam.csv) - Converts text into numerical vectors using TF-IDF
- Trains a Logistic Regression classifier with scikit-learn
- Evaluates performance using accuracy and classification metrics
- Fully implemented in a Jupyter Notebook
- Python
- Pandas
- NumPy
- Scikit-learn
- Jupyter Notebook
The dataset spam.csv contains two main columns:
- text β message content
- label β spam or ham
Ensure spam.csv is located in the project folder.
Install dependencies:
pip install scikit-learn pandas numpy