load_graph: attributes #72

alyakin314 · 2020-01-22T19:27:09Z

part of #66

for graphs that are taken as edgelists the node attributes are not attached to the graph and thus are not getting embedded in the ase, which leads to terrible performance. they should be attached to the graph object at the stage when the graphs is read.

alyakin314 · 2020-01-22T22:02:54Z

unfortunately this is almost unfeasible for the classification task (which happens to be the one this matters for the most) at the current stage. this is because the attributes are stored in the learningdata.csv, which gets train-test splitted. hence, we do not have access to the attributes of the testing part of the data at the time we embed in training.

there are some very weird work-arounds. one includes accessing the full (not train-test split) dataset. this both needs to be done in a very hacky way with weird path manipulations that are very fragile AND is borderline cheating because that csv file has labels for testing datas.

another is a "lazy" approach which includes rewriting our whole framework to not do the embedding at the train time, but only freeze those nodes with their attributes. then at the test time we join them with the test attributes, do the embedding, learn the classifier and predict. this is both counter-intuitive to the foundational ml ideas, because our classifier will be trained right before test-time and will very likely require to rewrite a significant portion of our framework (ase and gclass notably).

this issue is dropped until a reasonable change in the way d3m handles graph attributes.

CC: @hhelm10 @bvarjavand

alyakin314 · 2020-01-27T18:54:49Z

the easiest way this can be resolved is to request Mitar or Swaroop to include nodeID+Attributes csv for all vertices and nodeID+label csv for the training vertices. or, alternatively to have one csv that has nodeIDs+Attributes for all and labels for the ones that are being trained on (rest can be nan for example). the former seems more intuitive and easier to adjust to, but both would resolve the embedding issue.

alyakin314 · 2020-02-03T14:49:22Z

attributes for datasets with edgelists are now provided as nodelists. first of all, this is awsome. second of all, we need to add a way to load them in now.

This was referenced Jan 22, 2020

lse: use attributes #73

Open

submit working primitives/pipelines to v2020.1.9 #64

Closed

alyakin314 added the good first issue label Jan 22, 2020

alyakin314 added invalid and removed good first issue labels Jan 22, 2020

alyakin314 mentioned this issue Jan 27, 2020

submit load_graphs and lcc to core #66

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

load_graph: attributes #72

load_graph: attributes #72

alyakin314 commented Jan 22, 2020

alyakin314 commented Jan 22, 2020 •

edited

Loading

alyakin314 commented Jan 27, 2020

alyakin314 commented Feb 3, 2020

load_graph: attributes #72

load_graph: attributes #72

Comments

alyakin314 commented Jan 22, 2020

alyakin314 commented Jan 22, 2020 • edited Loading

alyakin314 commented Jan 27, 2020

alyakin314 commented Feb 3, 2020

alyakin314 commented Jan 22, 2020 •

edited

Loading