Skip to content

raphaelmansuy/machine-learning-feature-selection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Machine Learning Feature Selection — SOTA EDA & Cleaning

A didactic Colab-ready tutorial demonstrating SOTA exploratory data analysis (EDA), cleaning, and feature selection for tabular datasets using Polars, Sweetviz, Pyjanitor, and XGBoost.

This repository contains:

  • tutorial_eda_feature_selection.ipynb — A progressive Google Colab notebook that walks you through dataset download, EDA, cleaning, and feature selection with explanations and runnable cells.

Why this repo:

  • Designed for Kaggle-style tabular problems (100MB–5GB)
  • Uses fast, modern tooling: polars for performance, sweetviz for automated EDA, pyjanitor for clean transforms, and xgboost for embedded feature selection.

Quick Start (local or Colab)

  1. Open the notebook in VS Code or upload it to Google Colab. The notebook includes an "Open in Colab" badge.
  2. Run the first cell to install dependencies, then run cells sequentially.

Dependencies

(You can install them in Colab with the notebook's first cell.)

Contributing

  • Suggest improvements via issues or PRs.

License

This repository is provided as-is for educational purposes. (Add your preferred license.)


About

See ABOUT.md for project goals, author, and contact details.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published