sklearn-compatible interface #147

34j · 2023-10-24T14:36:42Z

I think it would be great to have this feature, as I think sklearn is often used for tabular data. I tried to use skorch, but skorch does not allow TensorFrames and did not work well.

(examples/tutorial.py)

from skorch import NeuralNetClassifier

net = NeuralNetClassifier(module=model, max_epochs=args.epochs, lr=args.lr, 
                            device=device, batch_size=args.batch_size, 
                            classes=dataset.num_classes, iterator_train=DataLoader,
                            iterator_valid=DataLoader, train_split=None)
net.fit(train_dataset, y=None)

Traceback (most recent call last):
  File "\examples\tutorial.py", line 346, in <module>
    net.fit(train_dataset, y=None)
  File "\site-packages\skorch\classifier.py", line 165, in fit
    return super(NeuralNetClassifier, self).fit(X, y, **fit_params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "\site-packages\skorch\net.py", line 1319, in fit
    self.partial_fit(X, y, **fit_params)
  File "\site-packages\skorch\net.py", line 1278, in partial_fit
    self.fit_loop(X, y, **fit_params)
  File "\site-packages\skorch\net.py", line 1190, in fit_loop
    self.run_single_epoch(iterator_train, training=True, prefix="train",
  File "\site-packages\skorch\net.py", line 1226, in run_single_epoch
    step = step_fn(batch, **fit_params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "\site-packages\skorch\net.py", line 1105, in train_step
    self._step_optimizer(step_fn)
  File "\site-packages\skorch\net.py", line 1060, in _step_optimizer
    optimizer.step(step_fn)
  File "\site-packages\torch\optim\optimizer.py", line 373, in wrapper
    out = func(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^
  File "\site-packages\torch\optim\optimizer.py", line 76, in _use_grad
    ret = func(self, *args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "\site-packages\torch\optim\sgd.py", line 66, in step
    loss = closure()
           ^^^^^^^^^
  File "\site-packages\skorch\net.py", line 1094, in step_fn
    step = self.train_step_single(batch, **fit_params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "\site-packages\skorch\net.py", line 993, in train_step_single
    y_pred = self.infer(Xi, **fit_params)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "\site-packages\skorch\net.py", line 1517, in infer
    x = to_tensor(x, device=self.device)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "\site-packages\skorch\utils.py", line 104, in to_tensor
    return [to_tensor_(x) for x in X]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "\site-packages\skorch\utils.py", line 104, in <listcomp>
    return [to_tensor_(x) for x in X]
            ^^^^^^^^^^^^^
  File "\site-packages\skorch\utils.py", line 118, in to_tensor
    raise TypeError("Cannot convert this data type to a torch tensor.")
TypeError: Cannot convert this data type to a torch tensor.

I think the following changes are needed:

Add an ability to convert from DataFrame to TensorFrame without much prior information.
Create a wrapper that passes Tensor to skorch or create a scikit-learn compatible estimator specifically for this package.

I am sorry, but I cannot take much time to assist in creating this feature, so if it is not possible, please close this.

The text was updated successfully, but these errors were encountered:

yiweny · 2023-10-25T05:46:01Z

You can convert a DataFrame to TensorFrame easily with

dataset = Dataset(df, col_to_stype=col_to_stype, target_col="y")
dataset.tensor_frame

See tutorial.

weihua916 · 2023-10-25T05:53:19Z

Thanks for your suggestion! I think this is great to add. Setting this as P2 feature, as we first want to prioritize more stype support #88.

MacOS · 2023-12-18T21:48:09Z

Is someone already working on that?

weihua916 · 2023-12-18T23:57:51Z

No, as far as I know. Let us know if you are interested!

MacOS · 2023-12-22T20:25:24Z

Yes, I'm interested. Hence, you can assign this to me. How fast should this task be completed?

weihua916 · 2023-12-29T11:57:54Z

@MacOS Great, thank you! It'd be good to complete this feature by the end of January. Would that be possible?

MacOS · 2024-01-01T19:57:20Z

@weihua916 As of now, yes.

34j · 2024-03-11T03:52:55Z

I have tried this and it seems to be very difficult.
As a quick fix that isn't pretty, the following seems necessary:

Patch skorch.utils.to_tensor_ to bypass TensorFrame.
Add index = torch.sensor(index) to torch_frame.DataLoader.collapse_fn to make it return TensorFrame instead of list[TensorFrame].

~~However, I don't know how to pass the validation dataset.~~

Next, we want to pass a validation dataset as well, but if we pass them using a tuple like skorch.NeuralNet.fit((train_dataset.tensorframe, val_dataset.tensor_frame), None), skorch would raise a lot of errors. Therefore, I tried to split them inside skorch.

~~Pass col_to_stype as y, as in skorch.NeuralNet.fit(dataset.df, dataset.col_to_stype), utilizing the internal structure.~~
Remove self.check_data(X, y) in skorch.NeuralNet.fit_loop().
~~Modify TensorFrame to call self.materialize() in the constructor.~~
To avoid an error in torch_frame.Dataset.split(), set split_col like skorch.NeuralNet(... , dataset=lambda d, c: Dataset(d, c, split_col='split_col')).

MacOS · 2024-03-11T11:51:02Z

🤔

Thank you for looking into this, @34j! I was about to start working on it.

Add an ability to convert from DataFrame to TensorFrame without much prior information.

I would have simply converted the DataFrame to TensorFrame internally, work with it, and if requested, return the DataFrame again. This means, of course, that one has to track what was given. Or am I missing something?

Create a wrapper that passes Tensor to skorch or create a scikit-learn compatible estimator specifically for this package.

This seems to be very big and unrealistic because we would have to make all estimators compatible with scikit-learn, which is a lot to ask for. At the moment, scikit-learn is an optional dependency.

May I ask you, @34j, to post a self-contained example (or examples) that what qualify pytorch-frame as being sklearn-compatible?

PS: I would submit one PR today, but maybe only as a draft.

34j · 2024-03-11T12:06:49Z

Add an ability to convert from DataFrame to TensorFrame without much prior information.

This is an implicit request for the recently implemented infer_df_stype, which has thankfully already been resolved.

Create a wrapper that passes Tensor to skorch

I feel like this could probably be done, I'll send a draft PR in an hour and I want to ask @MacOS to take it over and do the documentation, testing and tutorial work.

dirty prototype code

example/tutorial.py:

from skorch import NeuralNetClassifier
from skorch.dataset import Dataset as SkorchDataset
import torch.nn as nn
from torch_frame.utils import infer_df_stype
from torch_frame.data.dataset import DataFrameToTensorFrameConverter, Dataset


def create_dataset(df, _) -> Dataset:
    dataset_ = Dataset(
        df, dataset.col_to_stype, split_col="split_col", target_col="target_col"
    )
    dataset_.materialize()
    return dataset_


def split_dataset(dataset: Dataset) -> tuple[SkorchDataset, SkorchDataset]:
    datasets = dataset.split()[:2]
    return datasets[0].tensor_frame, datasets[1].tensor_frame


def get_iterator(dataset: SkorchDataset, **kwargs) -> DataLoader:
    return DataLoader2(dataset, **kwargs)


class DataLoader2(DataLoader):
    def collate_fn(
        self, index: int | List[int] | range | slice | Tensor
    ) -> tuple[TensorFrame, Tensor | None]:
        index = torch.tensor(index)
        res = super().collate_fn(index).to(device)
        return res, res.y

net = NeuralNetClassifier(
    module=model,
    max_epochs=args.epochs,
    lr=args.lr,
    device=device,
    batch_size=6,
    iterator_train=get_iterator,
    dataset=create_dataset,
    iterator_valid=get_iterator,
    train_split=split_dataset,
    classes=dataset.df["target_col"].unique(),
    verbose=1,
    criterion=nn.CrossEntropyLoss,
)
net.fit(dataset.df, None)

MacOS · 2024-03-11T12:32:58Z

@34j Is fine with me!

So we drop the second part of your request then, correct?

MacOS · 2024-03-13T09:25:28Z

Heads up everyone, I started working on it. I already merge the PR draft of @34j into my fork.

Would be nice if you guys would be available in case I have questions. :)

34j · 2024-03-14T06:57:51Z

Heads up everyone, I started working on it. I already merge the PR draft of @34j into my fork.

Would be nice if you guys would be available in case I have questions. :)

~~May I ask you what is your question~~ nvm plz, sorry for my terrible English comprehension

MacOS · 2024-03-19T07:49:09Z

So far none. I meant just in case.

Sorry for the delay at all, but I had personal matters to deal with. I'm confident that I can submit a PR this month.

MacOS · 2024-04-06T16:09:13Z

Hi all,

short update, unfortunately, I got sick, hence again a delay. Should I still work on it?

qychen2001 · 2024-07-10T10:30:33Z

Hi all,

short update, unfortunately, I got sick, hence again a delay. Should I still work on it?

I think it should continue. Are you still working on this part? Otherwise I can take over.

MacOS · 2024-07-10T15:54:06Z

Yes, still working on it @qychen2001!

qychen2001 · 2024-07-11T00:02:01Z

Yes, still working on it @qychen2001!

That's great! This feature is really important, looking forward to your PR.

34j · 2024-07-11T03:33:11Z

Sorry but I have almost completed this feature by myself in #375 (as MacOS seemed to be sick) and am just waiting for @weihua916 's review. However, the styling work for pre-commit by MacOS I referred certainly helped this.

qychen2001 · 2024-07-11T03:39:42Z

Sorry but I have almost completed this feature by myself in #375 (as MacOS seemed to be sick) and am just waiting for @weihua916 's review. However, the styling work for pre-commit by MacOS I referred certainly helped this.

That's fantastic! But I'm still concerned about the relationship between skorch and sklearn, can your PR directly support models in sklearn such as svm?

34j · 2024-07-11T03:44:30Z

Excuse me but what do you mean by relationship? skorch works perfectly, trust me plz 🫠

34j · 2024-07-11T03:48:55Z

can your PR directly support models in sklearn such as svm?

sklearn models already have sklearn-compatible interface apparently

weihua916 added the 1 - Priority P1 label Oct 25, 2023

weihua916 added 2 - Priority P2 and removed 1 - Priority P1 labels Oct 25, 2023

weihua916 added the feature label Oct 25, 2023

yiweny added the good first issue label Nov 14, 2023

akihironitta assigned MacOS Jan 3, 2024

This was referenced Mar 11, 2024

feat(skorch): add an inherited class from skorch.NeuralNet that is compatible with PyTorch Frame 34j/pytorch-frame#1

Closed

feat(skorch): add an inherited class from skorch.NeuralNet that is compatible with PyTorch Frame #375

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sklearn-compatible interface #147

sklearn-compatible interface #147

34j commented Oct 24, 2023 •

edited

Loading

yiweny commented Oct 25, 2023

weihua916 commented Oct 25, 2023

MacOS commented Dec 18, 2023

weihua916 commented Dec 18, 2023

MacOS commented Dec 22, 2023 •

edited

Loading

weihua916 commented Dec 29, 2023

MacOS commented Jan 1, 2024

34j commented Mar 11, 2024 •

edited

Loading

MacOS commented Mar 11, 2024 •

edited

Loading

34j commented Mar 11, 2024 •

edited

Loading

MacOS commented Mar 11, 2024

MacOS commented Mar 13, 2024

34j commented Mar 14, 2024 •

edited

Loading

MacOS commented Mar 19, 2024

MacOS commented Apr 6, 2024

qychen2001 commented Jul 10, 2024

MacOS commented Jul 10, 2024

qychen2001 commented Jul 11, 2024

34j commented Jul 11, 2024 •

edited

Loading

qychen2001 commented Jul 11, 2024

34j commented Jul 11, 2024 •

edited

Loading

34j commented Jul 11, 2024 •

edited

Loading

sklearn-compatible interface #147

sklearn-compatible interface #147

Comments

34j commented Oct 24, 2023 • edited Loading

yiweny commented Oct 25, 2023

weihua916 commented Oct 25, 2023

MacOS commented Dec 18, 2023

weihua916 commented Dec 18, 2023

MacOS commented Dec 22, 2023 • edited Loading

weihua916 commented Dec 29, 2023

MacOS commented Jan 1, 2024

34j commented Mar 11, 2024 • edited Loading

MacOS commented Mar 11, 2024 • edited Loading

34j commented Mar 11, 2024 • edited Loading

MacOS commented Mar 11, 2024

MacOS commented Mar 13, 2024

34j commented Mar 14, 2024 • edited Loading

MacOS commented Mar 19, 2024

MacOS commented Apr 6, 2024

qychen2001 commented Jul 10, 2024

MacOS commented Jul 10, 2024

qychen2001 commented Jul 11, 2024

34j commented Jul 11, 2024 • edited Loading

qychen2001 commented Jul 11, 2024

34j commented Jul 11, 2024 • edited Loading

34j commented Jul 11, 2024 • edited Loading

34j commented Oct 24, 2023 •

edited

Loading

MacOS commented Dec 22, 2023 •

edited

Loading

34j commented Mar 11, 2024 •

edited

Loading

MacOS commented Mar 11, 2024 •

edited

Loading

34j commented Mar 11, 2024 •

edited

Loading

34j commented Mar 14, 2024 •

edited

Loading

34j commented Jul 11, 2024 •

edited

Loading

34j commented Jul 11, 2024 •

edited

Loading

34j commented Jul 11, 2024 •

edited

Loading