Predict California Housing Prices Using Two AutoML Tools
This repository contains two tutorials that show how to use state-of-the-art AutoML methodsβAutoGluon and TabPFNβon the California Housing dataset.
AutoML (Automated Machine Learning) automates the process of training machine learning models: model selection, hyperparameter tuning, ensembling, and evaluationβmaking it faster and easier to build high-performing models without manual coding.
Weβll use the California Housing dataset (a regression problem) Dataset
A hands-on guide using AutoGluon, an open-source AutoML toolkit by Amazon.
- Open this Google Colab notebook:
Tutorial Link - Go to
FileβSave a copy in Drive - Click
Connect(upper right corner) and start running cells
Try out TabPFN, a fast transformer-based model trained to approximate Bayesian posteriors.
- Open this Google Colab notebook:
Tutorial Link - Go to
FileβSave a copy in Drive - Click
Connectto run
βAlthough TabPFN provides a powerful drop-in replacement for traditional tabular data models such as CatBoost, similar to these models, it is intended to be only one component in the toolkit of a data scientist. Achieving top performance on real-world problems often requires domain expertise and the ingenuity of data scientists. As with other modeling approaches, data scientists should continue to apply their skills and insights in feature engineering, data cleaning and problem framing to get the most out of TabPFN. We hope that the training speed of TabPFN will facilitate faster iterations in the data science workflow.β
- π AutoGluon Docs
- π TabPFN Research Paper
- π ML Models Cheat Sheet (PDF)
Both tutorials run in Google Colab, no installation needed.
To run locally:
pip install autogluon
pip install tabpfn| Tool | Task | Highlights |
|---|---|---|
| AutoGluon | Regression | Fast, interpretable, ensemble-based |
| TabPFN | Regression | Fast transformer, few-shot learning |
Miscl
3- click 'connect' upper right corner
Script: https://colab.research.google.com/drive/1cFR4n7N0WxUI2vtQiEzjbEzbgULWXmfI?usp=sharing
autogluon: https://auto.gluon.ai/
Part B: California housing dataset and TabPFN
Script: https://colab.research.google.com/drive/1Mbkvk2egLQkVN6qCimvhIDCSyrX1OWxj?usp=sharing