Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

load_graph: attributes #72

Open
alyakin314 opened this issue Jan 22, 2020 · 3 comments
Open

load_graph: attributes #72

alyakin314 opened this issue Jan 22, 2020 · 3 comments
Labels

Comments

@alyakin314
Copy link
Contributor

part of #66

for graphs that are taken as edgelists the node attributes are not attached to the graph and thus are not getting embedded in the ase, which leads to terrible performance. they should be attached to the graph object at the stage when the graphs is read.

@alyakin314
Copy link
Contributor Author

alyakin314 commented Jan 22, 2020

unfortunately this is almost unfeasible for the classification task (which happens to be the one this matters for the most) at the current stage. this is because the attributes are stored in the learningdata.csv, which gets train-test splitted. hence, we do not have access to the attributes of the testing part of the data at the time we embed in training.

there are some very weird work-arounds. one includes accessing the full (not train-test split) dataset. this both needs to be done in a very hacky way with weird path manipulations that are very fragile AND is borderline cheating because that csv file has labels for testing datas.

another is a "lazy" approach which includes rewriting our whole framework to not do the embedding at the train time, but only freeze those nodes with their attributes. then at the test time we join them with the test attributes, do the embedding, learn the classifier and predict. this is both counter-intuitive to the foundational ml ideas, because our classifier will be trained right before test-time and will very likely require to rewrite a significant portion of our framework (ase and gclass notably).

this issue is dropped until a reasonable change in the way d3m handles graph attributes.

CC: @hhelm10 @bvarjavand

@alyakin314
Copy link
Contributor Author

the easiest way this can be resolved is to request Mitar or Swaroop to include nodeID+Attributes csv for all vertices and nodeID+label csv for the training vertices. or, alternatively to have one csv that has nodeIDs+Attributes for all and labels for the ones that are being trained on (rest can be nan for example). the former seems more intuitive and easier to adjust to, but both would resolve the embedding issue.

@alyakin314
Copy link
Contributor Author

attributes for datasets with edgelists are now provided as nodelists. first of all, this is awsome. second of all, we need to add a way to load them in now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant