This project aims to introduce the basics of DataScience and Logistic Regression while exploring different tools used in Machine Learning.
- Read Dataset and preprocess it.
- Create visualizations to detect defects or anomalies.
- Implement a logistic regression model.
Clone the repository.
git clone https://github.com/framdani/dslr.git
cd dslr
Install the required packages.
pip install -r requirements.txt
Use the describe.py
script to get an overview of the dataset.
python3 describe.py dataset_train.csv
Use the following scripts to create visualizations:
histogram.py
: Displays a histogram that answers the question, "Which Hogwarts course has a homogeneous score distribution between all houses?"scatter_plot.py
: Displays a scatter plot that answers the question, "What are the features that are similar?"pair_plot.py
: Displays a pair plot that helps identify the features to use in the logistic regression.
Use the logreg_train.py
script to train the model and generate the weights.
python3 logreg_train.py dataset_train.csv
Use the logreg_predict.py
script to predict output using the weights generated by the previous script.
python3 logreg_predict.py dataset_test.csv weights.csv
The purpose of this project is educational. You are welcome to clone, modify ,and use it for your own learning purposes.