Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -158,3 +158,4 @@ tests/.datasets/
test.py
lightning_logs/
docs/tutorials/examples/basic/
docs/tutorials/pytorch-tabular-covertype/
7 changes: 3 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,13 +76,13 @@ For complete Documentation with tutorials visit [ReadTheDocs](https://pytorch-ta
- FT Transformer from [Revisiting Deep Learning Models for Tabular Data](https://arxiv.org/abs/2106.11959)
- [Gated Additive Tree Ensemble](https://arxiv.org/abs/2207.08548v3) is a novel high-performance, parameter and computationally efficient deep learning architecture for tabular data. GATE uses a gating mechanism, inspired from GRU, as a feature representation learning unit with an in-built feature selection mechanism. We combine it with an ensemble of differentiable, non-linear decision trees, re-weighted with simple self-attention to predict our desired output.
- [Gated Adaptive Network for Deep Automated Learning of Features (GANDALF)](https://arxiv.org/abs/2207.08548) is pared-down version of GATE which is more efficient and performing than GATE. GANDALF makes GFLUs the main learning unit, also introducing some speed-ups in the process. With very minimal hyperparameters to tune, this becomes an easy to use and tune model.

- [DANETs: Deep Abstract Networks for Tabular Data Classification and Regression](https://arxiv.org/pdf/2112.02962v4.pdf) is a novel and flexible neural component for tabular data, called Abstract Layer (AbstLay), which learns to explicitly group correlative input features and generate higher-level features for semantics abstraction. A special basic block is built using AbstLays, and we construct a family of Deep Abstract Networks (DANets) for tabular data classification and regression by stacking such blocks.

**Semi-Supervised Learning**

- [Denoising AutoEncoder](https://www.kaggle.com/code/faisalalsrheed/denoising-autoencoders-dae-for-tabular-data) is an autoencoder which learns robust feature representation, to compensate any noise in the dataset.

## Implement Custom Models
To implement new models, see the [How to implement new models tutorial](https://github.com/manujosephv/pytorch_tabular/blob/main/docs/tutorials/04-Implementing%20New%20Architectures.ipynb). It covers basic as well as advanced architectures.

## Usage
Expand Down Expand Up @@ -140,11 +140,10 @@ loaded_model = TabularModel.load_model("examples/basic")
## Future Roadmap(Contributions are Welcome)

1. Integrate Optuna Hyperparameter Tuning
1. Integrate Captum for interpretability
1. Have a scikit-learn compatible API
1. Migrate Datamodule to Polars or NVTabular for faster data loading and to handle larger than RAM datasets.
1. Add GaussRank as Feature Transformation
1. Have a scikit-learn compatible API
1. Enable support for multi-label classification
1. Migrate Datamodule to Polars or Vaex for faster data loading and to handle larger than RAM datasets.
1. Keep adding more architectures

## Contributors
Expand Down
28 changes: 28 additions & 0 deletions docs/gs_cite.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
If you use PyTorch Tabular for a scientific publication, we would appreciate citations to the published software and the following paper:

- [arxiv Paper](https://arxiv.org/abs/2104.13638)

```
@misc{joseph2021pytorch,
title={PyTorch Tabular: A Framework for Deep Learning with Tabular Data},
author={Manu Joseph},
year={2021},
eprint={2104.13638},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
```

- Zenodo Software Citation

```
@article{manujosephv_2021,
title={manujosephv/pytorch_tabular: v0.5.0-alpha},
DOI={10.5281/zenodo.4732773},
abstractNote={<p>First Alpha Release</p>},
publisher={Zenodo},
author={manujosephv},
year={2021},
month={May}
}
```
43 changes: 43 additions & 0 deletions docs/gs_installation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
!!! note

Although the installation includes PyTorch, the best and recommended way is to first install PyTorch from [here](https://pytorch.org/get-started/locally/), picking up the right CUDA version for your machine. (PyTorch Version >1.3)

Once, you have got PyTorch installed and working, just use:

```bash
pip install pytorch_tabular[extra]
```

to install the complete library with extra dependencies:

- Weights&Biases for experiment tracking
- Plotly for some visualization
- Captum for Interpretability

And :

``` bash
pip install pytorch_tabular
```

for the bare essentials.

The sources for `pytorch_tabular` can be downloaded from the Github repo.

You can clone the public repository:

``` bash
git clone git://github.com/manujosephv/pytorch_tabular
```

Once you have a copy of the source, you can install it with:

``` bash
pip install .
```

or

``` bash
python setup.py install
```
48 changes: 48 additions & 0 deletions docs/gs_usage.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
PyTorch Tabular comes with intelligent defaults that make it easy to get started with tabular deep learning. However, it also provides the flexibility to customize the model and pipeline to suit your needs.

Here is a simple example of how to use PyTorch Tabular to train a model, evaluate on new data, generate predictions, and save and load the model.

```python
from pytorch_tabular import TabularModel
from pytorch_tabular.models import CategoryEmbeddingModelConfig
from pytorch_tabular.config import (
DataConfig,
OptimizerConfig,
TrainerConfig,
)

data_config = DataConfig(
target=[
"target"
], # target should always be a list. Multi-targets are only supported for regression. Multi-Task Classification is not implemented
continuous_cols=num_col_names,
categorical_cols=cat_col_names,
)
trainer_config = TrainerConfig(
auto_lr_find=True, # Runs the LRFinder to automatically derive a learning rate
batch_size=1024,
max_epochs=100,
)
optimizer_config = OptimizerConfig()

model_config = CategoryEmbeddingModelConfig(
task="classification",
layers="1024-512-512", # Number of nodes in each layer
activation="LeakyReLU", # Activation between each layers
learning_rate=1e-3,
)

tabular_model = TabularModel(
data_config=data_config,
model_config=model_config,
optimizer_config=optimizer_config,
trainer_config=trainer_config,
)
tabular_model.fit(train=train, validation=val)
result = tabular_model.evaluate(test)
pred_df = tabular_model.predict(test)
tabular_model.save_model("examples/basic")
loaded_model = TabularModel.load_model("examples/basic")
```

For more detailed tutorials and how-to guides refer to the **Tutorials** and **How-To Guides** sections.
Binary file added docs/imgs/diataxis.webp
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/imgs/gflu_v2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/imgs/pytorch_tabular_logo_inv.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
126 changes: 13 additions & 113 deletions docs/index.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
![PyTorch Tabular](imgs/pytorch_tabular_logo.png)
![PyTorch Tabular](imgs/pytorch_tabular_logo.png#only-light)
![PyTorch Tabular](imgs/pytorch_tabular_logo_inv.png#only-dark)

[![pypi](https://img.shields.io/pypi/v/pytorch_tabular.svg)](https://pypi.python.org/pypi/pytorch_tabular)
[![Testing](https://github.com/manujosephv/pytorch_tabular/actions/workflows/testing.yml/badge.svg?event=push)](https://github.com/manujosephv/pytorch_tabular/actions/workflows/testing.yml)
Expand All @@ -8,126 +9,25 @@
[![DOI](https://zenodo.org/badge/321584367.svg)](https://zenodo.org/badge/latestdoi/321584367)
[![contributions welcome](https://img.shields.io/badge/contributions-welcome-brightgreen.svg?style=flat-square)](https://github.com/manujosephv/pytorch_tabular/issues)

PyTorch Tabular aims to make Deep Learning with Tabular data easy and accessible to real-world cases and research alike. The core principles behind the design of the library are:

- **Low Resistance Usability**
- **Easy Customization**
- **Scalable and Easier to Deploy**
**PyTorch Tabular** is a powerful library that aims to simplify and popularize the application of deep learning techniques to tabular data. Tabular deep learning has gained significant importance in the field of machine learning due to its ability to handle structured data, such as data in spreadsheets or databases. However, working with tabular data can be challenging, requiring expertise in both deep learning and data preprocessing.

It has been built on the shoulders of giants like [**PyTorch**](https://pytorch.org/)(obviously), [**PyTorch Lightning**](https://www.pytorchlightning.ai/), and [pandas](https://pandas.pydata.org/)
This is where **PyTorch Tabular** comes in. Built on the shoulders of giants like `PyTorch`, `PyTorch Lightning`, and `pandas`, PyTorch Tabular offers a **low resistance usability**, making it accessible to both real-world use cases and research projects. The library's core principles revolve around **easy customization**, allowing users to tailor their models and pipelines to specific requirements. Moreover, PyTorch Tabular provides **scalable and efficient tooling**, making it easier to deploy models in production environments. The underlying goodness of `PyTorch` makes designing deep learning architectures pythonic and intuitive, while `PyTorch Lightning` simplifies the training process. `pandas` is the de-facto standard for working with tabular data, and PyTorch Tabular leverages its strengths to simplify the preprocessing of tabular data. With PyTorch Tabular, data scientists and researchers can focus on the core aspects of their work, while the library takes care of the underlying complexities, enabling efficient and effective tabular deep learning.

## Installation
The documentation is organized taking inspiration from the Diátaxis system of documentation.

Although the installation includes PyTorch, the best and recommended way is to first install PyTorch from [here](https://pytorch.org/get-started/locally/), picking up the right CUDA version for your machine. (PyTorch Version >1.3)
> Diátaxis is a way of thinking about and doing documentation. Diátaxis identifies four distinct needs, and four corresponding forms of documentation - tutorials, how-to guides, technical reference and explanation. It places them in a systematic relationship, and proposes that documentation should itself be organised around the structures of those needs. Diátaxis solves problems related to documentation content (what to write), style (how to write it) and architecture (how to organise it). It is a system for thinking about documentation, and a system for doing documentation. - [Diátaxis](https://diataxis.fr/)

Once, you have got Pytorch installed, just use:
![Diátaxis System of Documentation](imgs/diataxis.webp)

```bash
pip install pytorch_tabular[extra]
```
Taking cues from the system, the documentation is separated into five sections:

to install the complete library with extra dependencies(Weights&Biases and Plotly).
- **Getting Started** - A quick introduction on how to install and get started with PyTorch Tabular.

And :
- **Tutorials** - Short and focused exercises to get you going quickly.

```bash
pip install pytorch_tabular
```
- **How-to Guides** - Step-by-step guides to covering key tasks, real world operations and common problems.

for the bare essentials.
- **Concepts** - Explanations of some of the larger concepts and intricacies of the library.

The sources for pytorch_tabular can be downloaded from the `Github repo`.

You can either clone the public repository:

```bash
git clone git://github.com/manujosephv/pytorch_tabular
```

Once you have a copy of the source, you can install it with:

```bash
pip install .
```

or

```bash
python setup.py install
```

## Usage

```python
from pytorch_tabular import TabularModel
from pytorch_tabular.models import CategoryEmbeddingModelConfig
from pytorch_tabular.config import (
DataConfig,
OptimizerConfig,
TrainerConfig,
)

data_config = DataConfig(
target=[
"target"
], # target should always be a list. Multi-targets are only supported for regression. Multi-Task Classification is not implemented
continuous_cols=num_col_names,
categorical_cols=cat_col_names,
)
trainer_config = TrainerConfig(
auto_lr_find=True, # Runs the LRFinder to automatically derive a learning rate
batch_size=1024,
max_epochs=100,
)
optimizer_config = OptimizerConfig()

model_config = CategoryEmbeddingModelConfig(
task="classification",
layers="1024-512-512", # Number of nodes in each layer
activation="LeakyReLU", # Activation between each layers
learning_rate=1e-3,
)

tabular_model = TabularModel(
data_config=data_config,
model_config=model_config,
optimizer_config=optimizer_config,
trainer_config=trainer_config,
)
tabular_model.fit(train=train, validation=val)
result = tabular_model.evaluate(test)
pred_df = tabular_model.predict(test)
tabular_model.save_model("examples/basic")
loaded_model = TabularModel.load_model("examples/basic")
```

## Citation

If you use PyTorch Tabular for a scientific publication, we would appreciate citations to the published software and the following paper:

- [arxiv Paper](https://arxiv.org/abs/2104.13638)

```
@misc{joseph2021pytorch,
title={PyTorch Tabular: A Framework for Deep Learning with Tabular Data},
author={Manu Joseph},
year={2021},
eprint={2104.13638},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
```

- Zenodo Software Citation

```
@article{manujosephv_2021,
title={manujosephv/pytorch_tabular: v0.5.0-alpha},
DOI={10.5281/zenodo.4732773},
abstractNote={<p>First Alpha Release</p>},
publisher={Zenodo},
author={manujosephv},
year={2021},
month={May}
}
```
- **API Reference** - The technical details of the library: all classes and functions, along with their parameters and return types.
Loading