-
Notifications
You must be signed in to change notification settings - Fork 425
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Validate times when constructing TemporalRandomWalk #1072
Conversation
Code Climate has analyzed commit 2263161 and detected 1 issue on this pull request. Here's the issue category breakdown:
View more on Code Climate. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good... mostly tiny/optional tweaks.
stellargraph/data/explorer.py
Outdated
def __init__(self, graph, graph_schema=None, seed=None): | ||
super().__init__(graph, graph_schema=graph_schema, seed=seed) | ||
self._edges, self._times = self.graph.edges(include_edge_weight=True) | ||
self._validate_times() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of constructing the object and then validating, meaning it may be in an invalid state, what do you think about validating first?
edges, times = self.graph.edges(include_edge_weight=True)
(neg_time_locs,) = np.where(times < 0)
if len(neg_time_locs) > 0:
...
# all good
self._edges = edges
self._times = times
Totally minor, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is your suggestion based more on readability or do you mean that if we assign self._edges
and self._times
first before running validation, it's somehow possible for the object to be instantiated in an invalid state?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bit of both, but I guess this class is simple enough that the invalid state probably isn't observable (I think the self
value would have to have been saved somewhere externally accessible inside __init__
, and then after the validation error is thrown, the saved instance accessed and used: I don't think this class does that saving anywhere, and thus we shouldn't worry).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I see, I hadn't considered that! And didn't realise that would result in that externally accessible thing actually pointing to the semi-constructed object.. I've taken your suggestion just to be safe
stellargraph/data/explorer.py
Outdated
if num_neg_times > max_edges_shown: | ||
neg_time_edges_formatted += " ..." | ||
raise ValueError( | ||
f"All edge times must be non-negative. Found {num_neg_times} negatives: " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This message is good, but it doesn't quite match the style we're kinda sorta using elsewhere (at least, I and sometimes @kieranricardo do), maybe it could be:
graph: expected all edge types to be non-negative, found ...
In particular this tells the user which parameter the problem is with. Although maybe that's obvious in this case and so is pointless?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yeah that style was what I was going for, so I'll switch to that
def temporal_graph_negative_times(num_edges): | ||
nodes = [1, 2, 3, 4, 5, 6] | ||
edges = np.hstack( | ||
[np.random.choice(nodes, size=(num_edges, 2)), -np.ones((num_edges, 1))] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth having some non-negative ones here, to ensure we don't accidentally refactor to something like np.all(times < 0)
instead of np.any(times < 0)
/np.where(times < 0)
? E.g.
[np.random.choice(nodes, size=(num_edges, 2)), -np.ones((num_edges, 1))] | |
[np.random.choice(nodes, size=(num_edges, 2)), -np.repeat([-1, 1], num_edges / 2)] |
Plus adjusting 1
to 2
in the test parameterisation, and the num_edges > 10
threshold too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might go with temporal_graph_negative_times(num_neg_edges)
then this becomes
[np.random.choice(nodes, size(num_neg_edges * 2, 2)), -np.repeat([-1, 1], num_neg_edges)]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's better 👍 (also, I noticed that my suggestion and your replacement accidentally retain the -
in front of the np.repeat
, which is probably unnecessary)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good other than two potentially minor tweaks.
However, does it matter if times are negative? I think the temporal walks basically only care about the relative distance between the times, and a constant shift (e.g. times - 999999999
or times + 123456
) doesn't actually change the results?
neg_time_locs, stringify=lambda loc: str((edges[loc], times[loc])), | ||
) | ||
raise ValueError( | ||
f"graph: expected edge times to be non-negative, found {num_neg_times}: " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
f"graph: expected edge times to be non-negative, found {num_neg_times}: " | |
f"graph: expected edge times to be non-negative, found {num_neg_times} negative times: " |
edges = np.hstack( | ||
[ | ||
np.random.choice(nodes, size=(num_neg_edges * 2, 2)), | ||
np.repeat([[-1], [1]], num_neg_edges, axis=0), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally minor and potentially not an improvement, but this could be:
np.repeat([[-1], [1]], num_neg_edges, axis=0), | |
np.repeat([-1, 1], num_neg_edges)[:, np.newaxis], |
if you were so inclined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that it avoids defining the nested lists so I'll use your suggestion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, does it matter if times are negative?
Hmmm good point.. I initially thought that negatives would indicate that the user had done something incorrectly to produce the time values, but yeah I guess the reference time could actually be at some arbitrary point and I'm not so convinced anymore..
I guess I should really just be validating that they are numerical instead? I had incorrectly assumed that weights are already forced to be numerical upon constructing a stellargraph
Uhh, I'd (implicitly) assumed this too. Maybe we should validate it when constructing the |
Yeah I guess they should actually always be numerical since they're supposed to be used to create adjacency matrices, etc. I can close this and do that instead if we agree that's a good idea |
Closing this in favour of #1118 |
This adds validation of edge weights (time) for temporal random walks.
The construction of edges and times to sample from I've moved to the constructor, since I thought it made sense to do the time validation when the user creates a
TemporalRandomWalk
object, instead of running this validation every timerun
gets called.Part of #828