Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

InMemoryDataset's load()method is obsolete #9414

Open
gbg141 opened this issue Jun 10, 2024 · 1 comment
Open

InMemoryDataset's load()method is obsolete #9414

gbg141 opened this issue Jun 10, 2024 · 1 comment
Labels

Comments

@gbg141
Copy link

gbg141 commented Jun 10, 2024

🛠 Proposed Refactor

Update InMemoryDataset's load()method for it to be compatible with latest datasets (e.g. TUDatasets) that store 4 elements in the processed data.pd object (data, slices, sizes, data_cls).

Right now it is only compatible with 3 elements (data, slices, data_cls) or 2 (which is called backward compatibility, with only data and slices). When implementing a custom InMemoryDataset that might be of any size 2<=x<=4, the default load()method is then obsolete with the newest options.

Suggest a potential alternative/fix

An easy fix could be to allow 4 elements in the loading method of InMemoryDataset, something like:

def load(self, path: str) -> None:
    r"""Load the dataset from the file path :obj:`path`."""
    out = fs.torch_load(path)
    assert isinstance(out, tuple)
    assert len(out) >= 2 and len(out) <= 4
    if len(out) == 2:  # Backward compatibility (1).
        data, self.slices = out
    elif len(out) == 3:  # Backward compatibility (2).
        data, self.slices, data_cls = out
    else:  # TU Datasets store additional element (__class__) in the processed file
        data, self.slices, sizes, data_cls = out

    if not isinstance(data, dict):  # Backward compatibility.
        self.data = data
    else:
        self.data = data_cls.from_dict(data)
@rusty1s
Copy link
Member

rusty1s commented Jun 24, 2024

You are correct that TUDataset.load is not applicable due to additonal information being stored. That's why we use custom logic in TUDataset to load the dataset. I personally think this is fine (since it is a rare case).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants