Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@

PyTorch Tabular aims to make Deep Learning with Tabular data easy and accessible to real-world cases and research alike. The core principles behind the design of the library are:

- Low Resistance Useability
- Low Resistance Usability
- Easy Customization
- Scalable and Easier to Deploy

Expand Down
42 changes: 21 additions & 21 deletions docs/models.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ While there are separate config classes for each model, all of them share a few

- `learning_rate`: float: The learning rate of the model. Defaults to 1e-3.

- `loss`: Optional\[str\]: The loss function to be applied. By Default it is MSELoss for regression and CrossEntropyLoss for classification. Unless you are sure what you are doing, leave it at MSELoss or L1Loss for regression and CrossEntropyLoss for classification
- `loss`: Optional\[str\]: The loss function to be applied. By Default, it is MSELoss for regression and CrossEntropyLoss for classification. Unless you are sure what you are doing, leave it at MSELoss or L1Loss for regression and CrossEntropyLoss for classification

- `metrics`: Optional\[List\[str\]\]: The list of metrics you need to track during training. The metrics should be one of the functional metrics implemented in `torchmetrics`. By default, it is `accuracy` if classification and `mean_squared_error` for regression

Expand Down Expand Up @@ -55,13 +55,13 @@ That's it, Thats the most basic necessity. All the rest is intelligently inferre

Adam Optimizer and the `learning_rate` of 1e-3 is a default that is set in PyTorch Tabular. It's a rule of thumb that works in most cases and a good starting point which has worked well empirically. If you want to change the learning rate(which is a pretty important hyperparameter), this is where you should. There is also an automatic way to derive a good learning rate which we will talk about in the TrainerConfig. In that case, Pytorch Tabular will ignore the learning rate set through this parameter

Another key component of the model is the `loss`. Pytorch Tabular can use any loss function from standard PyTorch([`torch.nn`](https://pytorch.org/docs/stable/nn.html#loss-functions)) through this config. By default it is set to `MSELoss` for regression and `CrossEntropyLoss` for classification, which works well for those use cases and are the most popular loss functions used. If you want to use something else specficaly, like `L1Loss`, you just need to mention it in the `loss` parameter
Another key component of the model is the `loss`. Pytorch Tabular can use any loss function from standard PyTorch([`torch.nn`](https://pytorch.org/docs/stable/nn.html#loss-functions)) through this config. By default, it is set to `MSELoss` for regression and `CrossEntropyLoss` for classification, which works well for those use cases and are the most popular loss functions used. If you want to use something else specficaly, like `L1Loss`, you just need to mention it in the `loss` parameter

```python
loss = "L1Loss
```

PyTorch Tabular also accepts custom loss functions(which are drop in replacements for the standard loss functions) through the `fit` method in the `TabularModel`.
PyTorch Tabular also accepts custom loss functions (which are drop in replacements for the standard loss functions) through the `fit` method in the `TabularModel`.

!!! warning

Expand Down Expand Up @@ -113,7 +113,7 @@ All the parameters have intelligent default values. Let's look at few of them:
- `use_batch_norm`: bool: Flag to include a BatchNorm layer after each Linear Layer+DropOut. Defaults to `False`
- `dropout`: float: The probability of the element to be zeroed. This applies to all the linear layers. Defaults to `0.0`

**For a complete list of parameters refer to the API Docs**
**For a complete list of parameters refer to the API Docs**
[pytorch_tabular.models.CategoryEmbeddingModelConfig][]

### Gated Adaptive Network for Deep Automated Learning of Features (GANDALF)
Expand Down Expand Up @@ -141,7 +141,7 @@ All the parameters have beet set to recommended values from the paper. Let's loo
GANDALF can be considered as a more light and more performant Gated Additive Tree Ensemble (GATE). For most purposes, GANDALF is a better choice than GATE.


**For a complete list of parameters refer to the API Docs**
**For a complete list of parameters refer to the API Docs**
[pytorch_tabular.models.GANDALFConfig][]


Expand All @@ -165,14 +165,14 @@ All the parameters have beet set to recommended values from the paper. Let's loo

- `share_head_weights`: bool: If True, we will share the weights between the heads. Defaults to True

**For a complete list of parameters refer to the API Docs**
**For a complete list of parameters refer to the API Docs**
[pytorch_tabular.models.GatedAdditiveTreeEnsembleConfig][]

### Neural Oblivious Decision Ensembles (NODE)

[Neural Oblivious Decision Ensembles for Deep Learning on Tabular Data](https://arxiv.org/abs/1909.06312) is a model presented in ICLR 2020 and according to the authors have beaten well-tuned Gradient Boosting models on many datasets. It uses a Neural equivalent of Oblivious Trees(the kind of trees Catboost uses) as the basic building blocks of the architecture. You can use it by choosing `NodeConfig`.
[Neural Oblivious Decision Ensembles for Deep Learning on Tabular Data](https://arxiv.org/abs/1909.06312) is a model presented in ICLR 2020 and according to the authors have beaten well-tuned Gradient Boosting models on many datasets. It uses a Neural equivalent of Oblivious Trees (the kind of trees Catboost uses) as the basic building blocks of the architecture. You can use it by choosing `NodeConfig`.

The basic block, or a "layer" looks something like below(from the paper)
The basic block, or a "layer" looks something like below (from the paper)

![NODE Architecture](imgs/node_arch.png)

Expand All @@ -185,37 +185,37 @@ All the parameters have beet set to recommended values from the paper. Let's loo
- `num_layers`: int: Number of Oblivious Decision Tree Layers in the Dense Architecture. Defaults to `1`
- `num_trees`: int: Number of Oblivious Decision Trees in each layer. Defaults to `2048`
- `depth`: int: The depth of the individual Oblivious Decision Trees. Parameters increase exponentially with the increase in depth. Defaults to `6`
- `choice_function`: str: Generates a sparse probability distribution to be used as feature weights(aka, soft feature selection). Choices are: `entmax15` `sparsemax`. Defaults to `entmax15`
- `bin_function`: str: Generates a sparse probability distribution to be used as tree leaf weights. Choices are: `entmax15` `sparsemax`. Defaults to `entmax15`
- `choice_function`: str: Generates a sparse probability distribution to be used as feature weights (aka, soft feature selection). Choices are: `entmax15` `sparsemax`. Defaults to `entmax15`
- `bin_function`: str: Generates a sparse probability distribution to be used as tree leaf weights. Choices are: `entmoid15` `sparsemoid`. Defaults to `entmoid15`
- `additional_tree_output_dim`: int: The additional output dimensions which is only used to pass through different layers of the architectures. Only the first output_dim outputs will be used for prediction. Defaults to `3`
- `input_dropout`: float: Dropout which is applied to the input to the different layers in the Dense Architecture. The probability of the element to be zeroed. Defaults to `0.0`


**For a complete list of parameters refer to the API Docs**
**For a complete list of parameters refer to the API Docs**
[pytorch_tabular.models.NodeConfig][]

!!! note

NODE model has a lot of parameters and therefore takes up a lot of memory. Smaller batchsizes(like 64 or 128) makes the model manageable in a smaller GPU(~4GB).
NODE model has a lot of parameters and therefore takes up a lot of memory. Smaller batchsizes (like 64 or 128) makes the model manageable in a smaller GPU(~4GB).

### TabNet

- [TabNet: Attentive Interpretable Tabular Learning](https://arxiv.org/abs/1908.07442) is another model coming out of Google Research which uses Sparse Attention in multiple steps of decision making to model the output. You can use it by choosing `TabNetModelConfig`.

The architecture is as shown below(from the paper)
The architecture is as shown below (from the paper)

![TabNet Architecture](imgs/tabnet_architecture.png)

All the parameters have beet set to recommended values from the paper. Let's look at few of them:

- `n_d`: int: Dimension of the prediction layer (usually between 4 and 64). Defaults to `8`
- `n_a`: int: Dimension of the attention layer (usually between 4 and 64). Defaults to `8`
- `n_steps`: int: Number of sucessive steps in the newtork (usually betwenn 3 and 10). Defaults to `3`
- `n_steps`: int: Number of successive steps in the network (usually between 3 and 10). Defaults to `3`
- `n_independent`: int: Number of independent GLU layer in each GLU block. Defaults to `2`
- `n_shared`: int: Number of independent GLU layer in each GLU block. Defaults to `2`
- `virtual_batch_size`: int: Batch size for Ghost Batch Normalization. BatchNorm on large batches sometimes does not do very well and therefore Ghost Batch Normalization which does batch normalization in smaller virtual batches is implemented in TabNet. Defaults to `128`

**For a complete list of parameters refer to the API Docs**
**For a complete list of parameters refer to the API Docs**
[pytorch_tabular.models.TabNetModelConfig][]

### Automatic Feature Interaction Learning via Self-Attentive Neural Networks(AutoInt)
Expand All @@ -228,9 +228,9 @@ All the parameters have beet set to recommended values from the paper. Let's loo

- `num_heads`: int: The number of heads in the Multi-Headed Attention layer. Defaults to 2

- `num_attn_blocks`: int: The number of layers of stacked Multi-Headed Attention layers. Defaults to 2
- `num_attn_blocks`: int: The number of layers of stacked Multi-Headed Attention layers. Defaults to 3

**For a complete list of parameters refer to the API Docs**
**For a complete list of parameters refer to the API Docs**
[pytorch_tabular.models.AutoIntConfig][]

### DANETs: Deep Abstract Networks for Tabular Data Classification and Regression
Expand All @@ -239,18 +239,18 @@ All the parameters have beet set to recommended values from the paper. Let's loo

All the parameters have beet set to recommended values from the paper. Let's look at them:

- `n_layers`: int: Number of Blocks in the DANet. Defaults to 16
- `n_layers`: int: Number of Blocks in the DANet. Each block has 2 Abstlay Blocks each. Defaults to 8

- `abstlay_dim_1`: int: The dimension for the intermediate output in the first ABSTLAY layer in a Block. Defaults to 32

- `abstlay_dim_2`: int: The dimension for the intermediate output in the second ABSTLAY layer in a Block. Defaults to 64
- `abstlay_dim_2`: int: The dimension for the intermediate output in the second ABSTLAY layer in a Block. If None, it will be twice abstlay_dim_1. Defaults to None

- `k`: int: The number of feature groups in the ABSTLAY layer. Defaults to 5

- `dropout_rate`: float: Dropout to be applied in the Block. Defaults to 0.1


**For a complete list of parameters refer to the API Docs**
**For a complete list of parameters refer to the API Docs**
[pytorch_tabular.models.DANetConfig][]

## Implementing New Architectures
Expand Down Expand Up @@ -308,7 +308,7 @@ In addition to the model, you will also need to define a config. Configs are pyt

**Key things to note:**

1. All the different parameters in the different configs(like TrainerConfig, OptimizerConfig, etc) are all available in `config` before calling `super()` and in `self.hparams` after.
1. All the different parameters in the different configs (like TrainerConfig, OptimizerConfig, etc) are all available in `config` before calling `super()` and in `self.hparams` after.
1. the input batch at the `forward` method is a dictionary with keys `continuous` and `categorical`
1. In the `\_build_network` method, save every component that you want access in the `forward` to `self`

Expand Down
26 changes: 13 additions & 13 deletions src/pytorch_tabular/config/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -68,31 +68,31 @@ class DataConfig:
introduction_date and with a monthly frequency like "2023-12" should have
an entry ('intro_date','M','%Y-%m')

encode_date_columns (bool): Whether or not to encode the derived variables from date
encode_date_columns (bool): Whether to encode the derived variables from date

validation_split (Optional[float]): Percentage of Training rows to keep aside as validation. Used
only if Validation Data is not given separately

continuous_feature_transform (Optional[str]): Whether or not to transform the features before
modelling. By default it is turned off.. Choices are: [`None`,`yeo-johnson`,`box-
cox`,`quantile_normal`,`quantile_uniform`].
continuous_feature_transform (Optional[str]): Whether to transform the features before
modelling. By default, it is turned off. Choices are: [`None`,`yeo-johnson`,`box-cox`,
`quantile_normal`,`quantile_uniform`].

normalize_continuous_features (bool): Flag to normalize the input features(continuous)

quantile_noise (int): NOT IMPLEMENTED. If specified fits QuantileTransformer on data with added
gaussian noise with std = :quantile_noise: * data.std ; this will cause discrete values to be more
separable. Please not that this transformation does NOT apply gaussian noise to the resulting
separable. Please note that this transformation does NOT apply gaussian noise to the resulting
data, the noise is only applied for QuantileTransformer

num_workers (Optional[int]): The number of workers used for data loading. For windows always set to
0

pin_memory (bool): Whether or not to pin memory for data loading.
pin_memory (bool): Whether to pin memory for data loading.

handle_unknown_categories (bool): Whether or not to handle unknown or new values in categorical
handle_unknown_categories (bool): Whether to handle unknown or new values in categorical
columns as unknown

handle_missing_values (bool): Whether or not to handle missing values in categorical columns as
handle_missing_values (bool): Whether to handle missing values in categorical columns as
unknown
"""

Expand Down Expand Up @@ -146,7 +146,7 @@ class DataConfig:
)
normalize_continuous_features: bool = field(
default=True,
metadata={"help": "Flag to normalize the input features(continuous)"},
metadata={"help": "Flag to normalize the input features (continuous)"},
)
quantile_noise: int = field(
default=0,
Expand Down Expand Up @@ -264,7 +264,7 @@ class TrainerConfig:
Choices are: [`cpu`,`gpu`,`tpu`,`ipu`,'mps',`auto`].

devices (Optional[int]): Number of devices to train on (int). -1 uses all available devices. By
default uses all available devices (-1)
default, uses all available devices (-1)

devices_list (Optional[List[int]]): List of devices to train on (list). If specified, takes
precedence over `devices` argument. Defaults to None
Expand Down Expand Up @@ -563,7 +563,7 @@ class ExperimentConfig:
this defines the folder under which the logs will be saved and for W&B it defines the project name

run_name (Optional[str]): The name of the run; a specific identifier to recognize the run. If left
blank, will be assigned a auto-generated name
blank, will be assigned an auto-generated name

exp_watch (Optional[str]): The level of logging required. Can be `gradients`, `parameters`, `all`
or `None`. Defaults to None. Choices are: [`gradients`,`parameters`,`all`,`None`].
Expand Down Expand Up @@ -695,7 +695,7 @@ def __init__(
exp_version_manager: str = ".pt_tmp/exp_version_manager.yml",
) -> None:
"""The manages the versions of the experiments based on the name. It is a simple dictionary(yaml) based lookup.
Primary purpose is to avoid overwriting of saved models while runing the training without changing the
Primary purpose is to avoid overwriting of saved models while running the training without changing the
experiment name.

Args:
Expand Down Expand Up @@ -752,7 +752,7 @@ class ModelConfig:

learning_rate (float): The learning rate of the model. Defaults to 1e-3.

loss (Optional[str]): The loss function to be applied. By Default it is MSELoss for regression and
loss (Optional[str]): The loss function to be applied. By Default, it is MSELoss for regression and
CrossEntropyLoss for classification. Unless you are sure what you are doing, leave it at MSELoss
or L1Loss for regression and CrossEntropyLoss for classification

Expand Down
Loading