Validate times when constructing TemporalRandomWalk #1072

kjun9 · 2020-03-12T05:35:56Z

This adds validation of edge weights (time) for temporal random walks.

The construction of edges and times to sample from I've moved to the constructor, since I thought it made sense to do the time validation when the user creates a TemporalRandomWalk object, instead of running this validation every time run gets called.

Part of #828

codeclimate · 2020-03-12T05:36:30Z

Code Climate has analyzed commit 2263161 and detected 1 issue on this pull request.

Here's the issue category breakdown:

Category	Count
Security	1

View more on Code Climate.

tests/data/test_temporal_random_walker.py

huonw

Looks good... mostly tiny/optional tweaks.

stellargraph/data/explorer.py

huonw · 2020-03-13T01:07:32Z

stellargraph/data/explorer.py

+    def __init__(self, graph, graph_schema=None, seed=None):
+        super().__init__(graph, graph_schema=graph_schema, seed=seed)
+        self._edges, self._times = self.graph.edges(include_edge_weight=True)
+        self._validate_times()


Instead of constructing the object and then validating, meaning it may be in an invalid state, what do you think about validating first?

edges, times = self.graph.edges(include_edge_weight=True) (neg_time_locs,) = np.where(times < 0) if len(neg_time_locs) > 0: ... # all good self._edges = edges self._times = times

Totally minor, though.

Is your suggestion based more on readability or do you mean that if we assign self._edges and self._times first before running validation, it's somehow possible for the object to be instantiated in an invalid state?

A bit of both, but I guess this class is simple enough that the invalid state probably isn't observable (I think the self value would have to have been saved somewhere externally accessible inside __init__, and then after the validation error is thrown, the saved instance accessed and used: I don't think this class does that saving anywhere, and thus we shouldn't worry).

Hmm I see, I hadn't considered that! And didn't realise that would result in that externally accessible thing actually pointing to the semi-constructed object.. I've taken your suggestion just to be safe

huonw · 2020-03-13T01:09:19Z

stellargraph/data/explorer.py

+            if num_neg_times > max_edges_shown:
+                neg_time_edges_formatted += " ..."
+            raise ValueError(
+                f"All edge times must be non-negative. Found {num_neg_times} negatives: "


This message is good, but it doesn't quite match the style we're kinda sorta using elsewhere (at least, I and sometimes @kieranricardo do), maybe it could be:

graph: expected all edge types to be non-negative, found ...

In particular this tells the user which parameter the problem is with. Although maybe that's obvious in this case and so is pointless?

Ah yeah that style was what I was going for, so I'll switch to that

tests/data/test_temporal_random_walker.py

huonw · 2020-03-13T01:13:20Z

tests/data/test_temporal_random_walker.py

+def temporal_graph_negative_times(num_edges):
+    nodes = [1, 2, 3, 4, 5, 6]
+    edges = np.hstack(
+        [np.random.choice(nodes, size=(num_edges, 2)), -np.ones((num_edges, 1))]


Is it worth having some non-negative ones here, to ensure we don't accidentally refactor to something like np.all(times < 0) instead of np.any(times < 0)/np.where(times < 0)? E.g.

Suggested change

[np.random.choice(nodes, size=(num_edges, 2)), -np.ones((num_edges, 1))]

[np.random.choice(nodes, size=(num_edges, 2)), -np.repeat([-1, 1], num_edges / 2)]

Plus adjusting 1 to 2 in the test parameterisation, and the num_edges > 10 threshold too.

I might go with temporal_graph_negative_times(num_neg_edges) then this becomes

[np.random.choice(nodes, size(num_neg_edges * 2, 2)), -np.repeat([-1, 1], num_neg_edges)]

That's better 👍 (also, I noticed that my suggestion and your replacement accidentally retain the - in front of the np.repeat, which is probably unnecessary)

huonw

Looks good other than two potentially minor tweaks.

However, does it matter if times are negative? I think the temporal walks basically only care about the relative distance between the times, and a constant shift (e.g. times - 999999999 or times + 123456) doesn't actually change the results?

huonw · 2020-03-16T00:03:23Z

stellargraph/data/explorer.py

+                neg_time_locs, stringify=lambda loc: str((edges[loc], times[loc])),
+            )
+            raise ValueError(
+                f"graph: expected edge times to be non-negative, found {num_neg_times}: "


Suggested change

f"graph: expected edge times to be non-negative, found {num_neg_times}: "

f"graph: expected edge times to be non-negative, found {num_neg_times} negative times: "

huonw · 2020-03-16T00:05:28Z

tests/data/test_temporal_random_walker.py

+    edges = np.hstack(
+        [
+            np.random.choice(nodes, size=(num_neg_edges * 2, 2)),
+            np.repeat([[-1], [1]], num_neg_edges, axis=0),


Totally minor and potentially not an improvement, but this could be:

Suggested change

np.repeat([[-1], [1]], num_neg_edges, axis=0),

np.repeat([-1, 1], num_neg_edges)[:, np.newaxis],

if you were so inclined.

I like that it avoids defining the nested lists so I'll use your suggestion

kjun9

However, does it matter if times are negative?

Hmmm good point.. I initially thought that negatives would indicate that the user had done something incorrectly to produce the time values, but yeah I guess the reference time could actually be at some arbitrary point and I'm not so convinced anymore..

I guess I should really just be validating that they are numerical instead? I had incorrectly assumed that weights are already forced to be numerical upon constructing a stellargraph

huonw · 2020-03-16T01:58:15Z

I guess I should really just be validating that they are numerical instead? I had incorrectly assumed that weights are already forced to be numerical upon constructing a stellargraph

Uhh, I'd (implicitly) assumed this too. Maybe we should validate it when constructing the StellarGraph instead?

kjun9 · 2020-03-16T03:18:40Z

Maybe we should validate it when constructing the StellarGraph instead?

Yeah I guess they should actually always be numerical since they're supposed to be used to create adjacency matrices, etc. I can close this and do that instead if we agree that's a good idea

kjun9 · 2020-03-19T03:52:50Z

Closing this in favour of #1118

Validate times and test it

bd70c2e

Merge branch 'develop' into feature/828-ctdne-test-validate-times

349ebef

kjun9 marked this pull request as ready for review March 12, 2020 06:25

kjun9 requested review from huonw and kieranricardo March 12, 2020 06:25

kieranricardo reviewed Mar 12, 2020

View reviewed changes

tests/data/test_temporal_random_walker.py Outdated Show resolved Hide resolved

kjun9 changed the title ~~Validate times and test it~~ Validate times when constructing TemporalRandomWalk Mar 12, 2020

simplify edge creation

35361b2

kjun9 requested a review from kieranricardo March 13, 2020 00:01

huonw reviewed Mar 13, 2020

View reviewed changes

Address review comments

2263161

kjun9 requested a review from huonw March 13, 2020 03:00

huonw approved these changes Mar 16, 2020

View reviewed changes

kjun9 commented Mar 16, 2020

View reviewed changes

kjun9 closed this Mar 19, 2020

kjun9 deleted the feature/828-ctdne-test-validate-times branch March 19, 2020 03:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Validate times when constructing TemporalRandomWalk #1072

Validate times when constructing TemporalRandomWalk #1072

kjun9 commented Mar 12, 2020

codeclimate bot commented Mar 12, 2020 •

edited

huonw left a comment

huonw Mar 13, 2020

kjun9 Mar 13, 2020

huonw Mar 13, 2020

kjun9 Mar 13, 2020

huonw Mar 13, 2020

kjun9 Mar 13, 2020

huonw Mar 13, 2020

kjun9 Mar 13, 2020

huonw Mar 13, 2020

huonw left a comment

huonw Mar 16, 2020

huonw Mar 16, 2020

kjun9 Mar 16, 2020

kjun9 left a comment

huonw commented Mar 16, 2020

kjun9 commented Mar 16, 2020

kjun9 commented Mar 19, 2020

	[np.random.choice(nodes, size=(num_edges, 2)), -np.ones((num_edges, 1))]
	[np.random.choice(nodes, size=(num_edges, 2)), -np.repeat([-1, 1], num_edges / 2)]

	f"graph: expected edge times to be non-negative, found {num_neg_times}: "
	f"graph: expected edge times to be non-negative, found {num_neg_times} negative times: "

	np.repeat([[-1], [1]], num_neg_edges, axis=0),
	np.repeat([-1, 1], num_neg_edges)[:, np.newaxis],

Validate times when constructing TemporalRandomWalk #1072

Validate times when constructing TemporalRandomWalk #1072

Conversation

kjun9 commented Mar 12, 2020

codeclimate bot commented Mar 12, 2020 • edited

huonw left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

huonw left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kjun9 left a comment

Choose a reason for hiding this comment

huonw commented Mar 16, 2020

kjun9 commented Mar 16, 2020

kjun9 commented Mar 19, 2020

codeclimate bot commented Mar 12, 2020 •

edited