Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate times when constructing TemporalRandomWalk #1072

Closed
wants to merge 4 commits into from

Conversation

kjun9
Copy link
Contributor

@kjun9 kjun9 commented Mar 12, 2020

This adds validation of edge weights (time) for temporal random walks.

The construction of edges and times to sample from I've moved to the constructor, since I thought it made sense to do the time validation when the user creates a TemporalRandomWalk object, instead of running this validation every time run gets called.

Part of #828

@codeclimate
Copy link

codeclimate bot commented Mar 12, 2020

Code Climate has analyzed commit 2263161 and detected 1 issue on this pull request.

Here's the issue category breakdown:

Category Count
Security 1

View more on Code Climate.

@kjun9 kjun9 marked this pull request as ready for review March 12, 2020 06:25
@kjun9 kjun9 changed the title Validate times and test it Validate times when constructing TemporalRandomWalk Mar 12, 2020
Copy link
Member

@huonw huonw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good... mostly tiny/optional tweaks.

stellargraph/data/explorer.py Outdated Show resolved Hide resolved
def __init__(self, graph, graph_schema=None, seed=None):
super().__init__(graph, graph_schema=graph_schema, seed=seed)
self._edges, self._times = self.graph.edges(include_edge_weight=True)
self._validate_times()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of constructing the object and then validating, meaning it may be in an invalid state, what do you think about validating first?

edges, times = self.graph.edges(include_edge_weight=True)

(neg_time_locs,) = np.where(times < 0)
if len(neg_time_locs) > 0:
    ...

# all good
self._edges = edges
self._times = times

Totally minor, though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is your suggestion based more on readability or do you mean that if we assign self._edges and self._times first before running validation, it's somehow possible for the object to be instantiated in an invalid state?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit of both, but I guess this class is simple enough that the invalid state probably isn't observable (I think the self value would have to have been saved somewhere externally accessible inside __init__, and then after the validation error is thrown, the saved instance accessed and used: I don't think this class does that saving anywhere, and thus we shouldn't worry).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I see, I hadn't considered that! And didn't realise that would result in that externally accessible thing actually pointing to the semi-constructed object.. I've taken your suggestion just to be safe

if num_neg_times > max_edges_shown:
neg_time_edges_formatted += " ..."
raise ValueError(
f"All edge times must be non-negative. Found {num_neg_times} negatives: "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This message is good, but it doesn't quite match the style we're kinda sorta using elsewhere (at least, I and sometimes @kieranricardo do), maybe it could be:

graph: expected all edge types to be non-negative, found ...

In particular this tells the user which parameter the problem is with. Although maybe that's obvious in this case and so is pointless?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah that style was what I was going for, so I'll switch to that

tests/data/test_temporal_random_walker.py Outdated Show resolved Hide resolved
def temporal_graph_negative_times(num_edges):
nodes = [1, 2, 3, 4, 5, 6]
edges = np.hstack(
[np.random.choice(nodes, size=(num_edges, 2)), -np.ones((num_edges, 1))]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth having some non-negative ones here, to ensure we don't accidentally refactor to something like np.all(times < 0) instead of np.any(times < 0)/np.where(times < 0)? E.g.

Suggested change
[np.random.choice(nodes, size=(num_edges, 2)), -np.ones((num_edges, 1))]
[np.random.choice(nodes, size=(num_edges, 2)), -np.repeat([-1, 1], num_edges / 2)]

Plus adjusting 1 to 2 in the test parameterisation, and the num_edges > 10 threshold too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might go with temporal_graph_negative_times(num_neg_edges) then this becomes

[np.random.choice(nodes, size(num_neg_edges * 2, 2)), -np.repeat([-1, 1], num_neg_edges)]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's better 👍 (also, I noticed that my suggestion and your replacement accidentally retain the - in front of the np.repeat, which is probably unnecessary)

@kjun9 kjun9 requested a review from huonw March 13, 2020 03:00
Copy link
Member

@huonw huonw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good other than two potentially minor tweaks.

However, does it matter if times are negative? I think the temporal walks basically only care about the relative distance between the times, and a constant shift (e.g. times - 999999999 or times + 123456) doesn't actually change the results?

neg_time_locs, stringify=lambda loc: str((edges[loc], times[loc])),
)
raise ValueError(
f"graph: expected edge times to be non-negative, found {num_neg_times}: "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
f"graph: expected edge times to be non-negative, found {num_neg_times}: "
f"graph: expected edge times to be non-negative, found {num_neg_times} negative times: "

edges = np.hstack(
[
np.random.choice(nodes, size=(num_neg_edges * 2, 2)),
np.repeat([[-1], [1]], num_neg_edges, axis=0),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally minor and potentially not an improvement, but this could be:

Suggested change
np.repeat([[-1], [1]], num_neg_edges, axis=0),
np.repeat([-1, 1], num_neg_edges)[:, np.newaxis],

if you were so inclined.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that it avoids defining the nested lists so I'll use your suggestion

Copy link
Contributor Author

@kjun9 kjun9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, does it matter if times are negative?

Hmmm good point.. I initially thought that negatives would indicate that the user had done something incorrectly to produce the time values, but yeah I guess the reference time could actually be at some arbitrary point and I'm not so convinced anymore..

I guess I should really just be validating that they are numerical instead? I had incorrectly assumed that weights are already forced to be numerical upon constructing a stellargraph

@huonw
Copy link
Member

huonw commented Mar 16, 2020

I guess I should really just be validating that they are numerical instead? I had incorrectly assumed that weights are already forced to be numerical upon constructing a stellargraph

Uhh, I'd (implicitly) assumed this too. Maybe we should validate it when constructing the StellarGraph instead?

@kjun9
Copy link
Contributor Author

kjun9 commented Mar 16, 2020

Maybe we should validate it when constructing the StellarGraph instead?

Yeah I guess they should actually always be numerical since they're supposed to be used to create adjacency matrices, etc. I can close this and do that instead if we agree that's a good idea

@kjun9
Copy link
Contributor Author

kjun9 commented Mar 19, 2020

Closing this in favour of #1118

@kjun9 kjun9 closed this Mar 19, 2020
@kjun9 kjun9 deleted the feature/828-ctdne-test-validate-times branch March 19, 2020 03:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants