Make HinSAGE reproducible #926

kjun9 · 2020-02-24T01:09:28Z

This is for both NAI and link prediction, which are the two tasks currently available for HinSAGE

Part of #749

codeclimate · 2020-02-24T01:09:57Z

stellargraph/mapper/sampled_link_generators.py

@@ -75,7 +75,7 @@ def __init__(self, G, batch_size, schema=None):
    def sample_features(self, head_links, batch_num):
        pass

-    def flow(self, link_ids, targets=None, shuffle=False):
+    def flow(self, link_ids, targets=None, shuffle=False, seed=None):


Refactor this function to reduce its Cognitive Complexity from 17 to the 15 allowed.

codeclimate · 2020-02-24T01:09:59Z

Code Climate has analyzed commit 69270e9 and detected 0 issues on this pull request.

View more on Code Climate.

kjun9 · 2020-02-28T00:18:15Z

Hmm.. something is still flaky, investigating the test failure now

codecov-io · 2020-02-28T01:11:36Z

Codecov Report

Merging #926 into develop will decrease coverage by 0.9%.
The diff coverage is 100.0%.

@@            Coverage Diff            @@
##           develop    #926     +/-   ##
=========================================
- Coverage     85.4%   84.5%   -0.9%     
=========================================
  Files           66      58      -8     
  Lines         6774    4956   -1818     
=========================================
- Hits          5784    4186   -1598     
+ Misses         990     770    -220

Impacted Files	Coverage Δ
stellargraph/mapper/sampled_link_generators.py	`91.2% <100.0%> (-0.2%)`	⬇️
stellargraph/mapper/sampled_node_generators.py	`93.5% <100.0%> (+0.6%)`	⬆️
stellargraph/losses.py	`33.3% <0.0%> (-66.7%)`	⬇️
stellargraph/layer/watch_your_step.py	`30.0% <0.0%> (-61.9%)`	⬇️
stellargraph/mapper/adjacency_generators.py	`31.4% <0.0%> (-41.7%)`	⬇️
stellargraph/datasets/datasets.py	`70.8% <0.0%> (-14.6%)`	⬇️
stellargraph/mapper/mini_batch_node_generators.py	`83.8% <0.0%> (-11.4%)`	⬇️
stellargraph/data/loader.py	`37.5% <0.0%> (-4.6%)`	⬇️
stellargraph/layer/gcn.py	`85.4% <0.0%> (-4.2%)`	⬇️
stellargraph/layer/cluster_gcn.py	`90.8% <0.0%> (-3.1%)`	⬇️
... and 56 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fc46b7a...69270e9. Read the comment docs.

kieranricardo

looks good! just a few minor comments on the tests

kieranricardo · 2020-02-28T05:14:14Z

tests/mapper/test_node_mappers.py

+        ]
+        features, labels = zip(*batches)
+        features, labels = np.concatenate(features), np.concatenate(labels)
+        assert all(features == labels)


I think this can be:

Suggested change

assert all(features == labels)

assert np.array_equal(features, labels)

kieranricardo · 2020-02-28T05:15:05Z

tests/mapper/test_node_mappers.py

+
+    for i in range(max_iter):
+        f1, f2 = consecutive_epochs(seq)
+        comparison_results.add(all(f1 == f2))


Same here:

Suggested change

comparison_results.add(all(f1 == f2))

comparison_results.add(np.array_equal(f1, f2))

kieranricardo · 2020-02-28T05:19:05Z

tests/mapper/test_node_mappers.py

+
+    for i in range(max_iter):
+        f1, f2 = consecutive_epochs(seq)
+        comparison_results.add(all(f1 == f2))


could the assertions happen in this line?

Suggested change

comparison_results.add(all(f1 == f2))

if shuffle:

assert not np.array_equal(f1, f2)

else:

assert np.array_equal(f1, f2)

Avalee21 · 2021-06-24T06:44:53Z

It should be blamed on random state use multiple times, which will generate different results. The following steps guarantee consistent results even at every epoch.

set stellargraph seed: (1) HinSAGELinkGenerator (fix the seed in the run function of SampledHeterogeneousBreadthFirstWalk ) and (2) flow function
set tensorflow seed

codeclimate bot reviewed Feb 24, 2020

View reviewed changes

kjun9 mentioned this pull request Feb 24, 2020

Test all algorithms with randomness for reproducibility #749

Open

14 tasks

kjun9 requested a review from kieranricardo February 28, 2020 00:07

kjun9 marked this pull request as ready for review February 28, 2020 00:08

kieranricardo reviewed Feb 28, 2020

View reviewed changes

kevin added 9 commits June 4, 2020 16:05

Add hinsage reproducibility tests

319ea22

Fix link prediction

9dd436b

Use SeededPerBatch for hinsage node gen

56b54d8

Turn off verbose

74dd406

Remove link prediction shuffle=True

11e01e6

Update hinnodemapper test

cefa903

Remove unused optional parameters

8150c05

Address review comments

5913b16

Try adding failing case

69270e9

kjun9 force-pushed the feature/749-hinsage branch from a54510b to 69270e9 Compare June 4, 2020 06:23

huonw mentioned this pull request Mar 31, 2021

inconsistent prediction results using HinSage #1900

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make HinSAGE reproducible #926

Make HinSAGE reproducible #926

kjun9 commented Feb 24, 2020

codeclimate bot Feb 24, 2020

codeclimate bot commented Feb 24, 2020 •

edited

kjun9 commented Feb 28, 2020

codecov-io commented Feb 28, 2020 •

edited by codecov bot

kieranricardo left a comment

kieranricardo Feb 28, 2020

kieranricardo Feb 28, 2020

kieranricardo Feb 28, 2020

Avalee21 commented Jun 24, 2021 •

edited

	assert all(features == labels)
	assert np.array_equal(features, labels)

	comparison_results.add(all(f1 == f2))
	comparison_results.add(np.array_equal(f1, f2))

-        comparison_results.add(all(f1 == f2))
+        if shuffle:
+            assert not np.array_equal(f1, f2)
+        else:
+            assert np.array_equal(f1, f2)

Make HinSAGE reproducible #926

Are you sure you want to change the base?

Make HinSAGE reproducible #926

Conversation

kjun9 commented Feb 24, 2020

codeclimate bot Feb 24, 2020

Choose a reason for hiding this comment

codeclimate bot commented Feb 24, 2020 • edited

kjun9 commented Feb 28, 2020

codecov-io commented Feb 28, 2020 • edited by codecov bot

Codecov Report

kieranricardo left a comment

Choose a reason for hiding this comment

kieranricardo Feb 28, 2020

Choose a reason for hiding this comment

kieranricardo Feb 28, 2020

Choose a reason for hiding this comment

kieranricardo Feb 28, 2020

Choose a reason for hiding this comment

Avalee21 commented Jun 24, 2021 • edited

codeclimate bot commented Feb 24, 2020 •

edited

codecov-io commented Feb 28, 2020 •

edited by codecov bot

Avalee21 commented Jun 24, 2021 •

edited