Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Let Data and HeteroData implement FeatureStore #4807

Merged
merged 11 commits into from Jun 20, 2022
Merged

Conversation

mananshah99
Copy link
Contributor

@mananshah99 mananshah99 commented Jun 15, 2022

This PR lets Data and HeteroData implement the feature store abstraction. In particular, it defines put_tensor, get_tensor, and remove_tensor methods on both classes, and adds basic tests for these functionalities.

torch_geometric/data/feature_store.py Show resolved Hide resolved
torch_geometric/data/hetero_data.py Show resolved Hide resolved
torch_geometric/data/data.py Outdated Show resolved Hide resolved
Copy link
Member

@rusty1s rusty1s left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks pretty great. Thank you! Let me know if you need further help for the DynamicInheritance issue in Batch.

On a different note: It looks like Data and FeatureStore both implement, e.g., __getitem__. Currently, this means that we do not really take advantage of the View stuff in FeatureStore as a result. Might be good to add this as a TODO somewhere. WDYT?

torch_geometric/data/data.py Outdated Show resolved Hide resolved
torch_geometric/data/data.py Outdated Show resolved Hide resolved
torch_geometric/data/data.py Outdated Show resolved Hide resolved
torch_geometric/data/data.py Show resolved Hide resolved
torch_geometric/data/feature_store.py Show resolved Hide resolved
torch_geometric/data/hetero_data.py Outdated Show resolved Hide resolved
torch_geometric/data/hetero_data.py Show resolved Hide resolved
torch_geometric/data/hetero_data.py Outdated Show resolved Hide resolved
@mananshah99 mananshah99 marked this pull request as ready for review June 17, 2022 02:53
@codecov
Copy link

codecov bot commented Jun 17, 2022

Codecov Report

Merging #4807 (7504407) into master (c13d62c) will increase coverage by 0.01%.
The diff coverage is 89.83%.

@@            Coverage Diff             @@
##           master    #4807      +/-   ##
==========================================
+ Coverage   82.62%   82.64%   +0.01%     
==========================================
  Files         325      325              
  Lines       17373    17427      +54     
==========================================
+ Hits        14355    14403      +48     
- Misses       3018     3024       +6     
Impacted Files Coverage Δ
torch_geometric/data/hetero_data.py 93.63% <85.18%> (-0.78%) ⬇️
torch_geometric/data/data.py 90.88% <92.85%> (+0.10%) ⬆️
torch_geometric/data/batch.py 93.67% <100.00%> (+0.16%) ⬆️
torch_geometric/data/feature_store.py 88.80% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c13d62c...7504407. Read the comment docs.

Copy link
Contributor

@yaoyaowd yaoyaowd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comment, LGTM.

torch_geometric/data/hetero_data.py Show resolved Hide resolved
r"""Obtains a feature tensor from node storage."""
# Retrieve tensor and index accordingly:
tensor = getattr(self[attr.group_name], attr.attr_name)
if tensor is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it requires we set index why do we even need to check?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe another way to solve this if its confusing is to have a different value for UNSET which indicates its going to index 'all'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comment! Not sure what you mean by needing to check index; this is because TensorAttr just requires that its attributes are set (they can be set to None). The current way to index all would be None, which I think is acceptable; although I'm happy to define a custom value for the UNSET enum to indicate all indexing in a follow-up PR.

Copy link
Member

@rusty1s rusty1s left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks great. Thank you! One thing that would be good to confirm is whether pickling of data objects between PyG 2.0.4 and PyG master still works, e.g., running an example in PyG 2.0.4 and re-run with pre-processed dataset in master. Otherwise, we have to list this as a breaking change.

torch_geometric/data/batch.py Show resolved Hide resolved
torch_geometric/data/data.py Outdated Show resolved Hide resolved
torch_geometric/data/data.py Show resolved Hide resolved
torch_geometric/data/hetero_data.py Outdated Show resolved Hide resolved
@mananshah99
Copy link
Contributor Author

@rusty1s I tried testing this explicitly by constructing a Data object from torch_geometric@2.0.4, pickling it, and re-loading it in torch_geometric@feature_store_pt1, and observed no errors, so I'm not listing it as a breaking change for the time being. If there are more expansive tests that need to be conducted beyond this one, let me know :)

@mananshah99 mananshah99 merged commit 4b30b6d into master Jun 20, 2022
@mananshah99 mananshah99 deleted the feature_store_pt1 branch June 20, 2022 17:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants