Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make HinSAGE reproducible #926

Open
wants to merge 9 commits into
base: develop
Choose a base branch
from
Open

Make HinSAGE reproducible #926

wants to merge 9 commits into from

Conversation

kjun9
Copy link
Contributor

@kjun9 kjun9 commented Feb 24, 2020

This is for both NAI and link prediction, which are the two tasks currently available for HinSAGE

Part of #749

@@ -75,7 +75,7 @@ def __init__(self, G, batch_size, schema=None):
def sample_features(self, head_links, batch_num):
pass

def flow(self, link_ids, targets=None, shuffle=False):
def flow(self, link_ids, targets=None, shuffle=False, seed=None):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactor this function to reduce its Cognitive Complexity from 17 to the 15 allowed.

@codeclimate
Copy link

codeclimate bot commented Feb 24, 2020

Code Climate has analyzed commit 69270e9 and detected 0 issues on this pull request.

View more on Code Climate.

@kjun9 kjun9 marked this pull request as ready for review February 28, 2020 00:08
@kjun9
Copy link
Contributor Author

kjun9 commented Feb 28, 2020

Hmm.. something is still flaky, investigating the test failure now

@codecov-io
Copy link

codecov-io commented Feb 28, 2020

Codecov Report

Merging #926 into develop will decrease coverage by 0.9%.
The diff coverage is 100.0%.

Impacted file tree graph

@@            Coverage Diff            @@
##           develop    #926     +/-   ##
=========================================
- Coverage     85.4%   84.5%   -0.9%     
=========================================
  Files           66      58      -8     
  Lines         6774    4956   -1818     
=========================================
- Hits          5784    4186   -1598     
+ Misses         990     770    -220     
Impacted Files Coverage Δ
stellargraph/mapper/sampled_link_generators.py 91.2% <100.0%> (-0.2%) ⬇️
stellargraph/mapper/sampled_node_generators.py 93.5% <100.0%> (+0.6%) ⬆️
stellargraph/losses.py 33.3% <0.0%> (-66.7%) ⬇️
stellargraph/layer/watch_your_step.py 30.0% <0.0%> (-61.9%) ⬇️
stellargraph/mapper/adjacency_generators.py 31.4% <0.0%> (-41.7%) ⬇️
stellargraph/datasets/datasets.py 70.8% <0.0%> (-14.6%) ⬇️
stellargraph/mapper/mini_batch_node_generators.py 83.8% <0.0%> (-11.4%) ⬇️
stellargraph/data/loader.py 37.5% <0.0%> (-4.6%) ⬇️
stellargraph/layer/gcn.py 85.4% <0.0%> (-4.2%) ⬇️
stellargraph/layer/cluster_gcn.py 90.8% <0.0%> (-3.1%) ⬇️
... and 56 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fc46b7a...69270e9. Read the comment docs.

Copy link
Contributor

@kieranricardo kieranricardo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good! just a few minor comments on the tests

]
features, labels = zip(*batches)
features, labels = np.concatenate(features), np.concatenate(labels)
assert all(features == labels)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be:

Suggested change
assert all(features == labels)
assert np.array_equal(features, labels)


for i in range(max_iter):
f1, f2 = consecutive_epochs(seq)
comparison_results.add(all(f1 == f2))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here:

Suggested change
comparison_results.add(all(f1 == f2))
comparison_results.add(np.array_equal(f1, f2))


for i in range(max_iter):
f1, f2 = consecutive_epochs(seq)
comparison_results.add(all(f1 == f2))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could the assertions happen in this line?

Suggested change
comparison_results.add(all(f1 == f2))
if shuffle:
assert not np.array_equal(f1, f2)
else:
assert np.array_equal(f1, f2)

@Avalee21
Copy link

Avalee21 commented Jun 24, 2021

It should be blamed on random state use multiple times, which will generate different results. The following steps guarantee consistent results even at every epoch.

  1. set stellargraph seed: (1) HinSAGELinkGenerator (fix the seed in the run function of SampledHeterogeneousBreadthFirstWalk ) and (2) flow function
  2. set tensorflow seed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants