Root /fldr
- Alpha&Omega.ipynb <---- Main Jupyter notebook (Forgot to change name)
- Assignement - Data Scientist (1).docx <---- Assessment problem document
- Testing_predicitons.csv <---- Target class outputs of the Test Data
- README.md <---- this very file
- XGBFTW.sav <---- XGBoost model export done over using pickle
- requirements.txt <---- Environment Screenshot
- essentials_only_req.txt <---- ipynb specific requirements
- Data /fldr
- Training_set.csv <---- Training Dataset
- Test_set.csv <---- Testing Dataset
- Train Dataset Shape -> (3910,58)
- Test Dataset Shape -> (691,57)
- Dataset is Sparse and High Dimensional
- Features are highly skewed
- Used RandomForest Classifier for feature selection.
- Selected top 30 features with respect to their feature importance.
- For metric considered Binary CrossEntropy | LogLoss and ROC-AUC score.
- Model of choice is Xgboost.
- EDA
- Splitting the data
- Feature Selection
- Data Scaling - Normalization
- Model Training
- Prediction Metrics
- Processing and Predicting on Test Data
- Saving Model for Future Usage
- Exporting Y_test Predicted scores
- Generating requirements. #Has an important Note. Must Read!
- Splitting the Data
- Feature Selection
- Importing Presaved model
- Using presaved model to generate scores
- Using Prettytable to print output table