Skip to content

Fix find primary timeout#2284

Merged
jumaffre merged 2 commits into
microsoft:mainfrom
jumaffre:fix_find_primary_timeout
Mar 9, 2021
Merged

Fix find primary timeout#2284
jumaffre merged 2 commits into
microsoft:mainfrom
jumaffre:fix_find_primary_timeout

Conversation

@jumaffre
Copy link
Copy Markdown
Contributor

@jumaffre jumaffre commented Mar 9, 2021

In #2241, I incorrectly changed the logic of network.find_primary() so that we don't exit early if the primary is found. This meant that some end-to-end tests that rely heavily on this function took longer, and started to fail in the Daily pipeline. This PR should fix this.

@jumaffre jumaffre requested a review from a team as a code owner March 9, 2021 11:55
Comment thread tests/infra/network.py
res = c.get("/node/network", log_capture=logs)
if res.status_code != 200:
continue
assert res.status_code == http.HTTPStatus.OK.value, res
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The /node/network endpoint isn't guaranteed to return 200 but at this stage, we only loop through nodes that have joined the service so this assert is actually safe.

Comment thread tests/infra/network.py

view_change_in_progress = body["view_change_in_progress"]
primary_id = body["primary_id"]
if primary_id is not None:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this if necessary? It looks like line 702 has already checked that "primary_id" is present, and then we've assigned it here. Or is it possible to have a "primary_id": null in the response object?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not now, but when @jumaffre and I talked about it, I suggested we should always have a primary_id and it should be set to null when it's unknown, rather than not have the key at all when we don't know which node is primary.

There's a more general point here about what do for optional fields in our JSON responses, we can talk about it.

Copy link
Copy Markdown
Contributor Author

@jumaffre jumaffre Mar 9, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's discuss (and agree on) this separately. I'll merge this PR as it is as it is blocking other PRs and will address further changes in a follow-up PR.

@ghost
Copy link
Copy Markdown

ghost commented Mar 9, 2021

fix_find_primary_timeout@20029 aka 20210309.8 vs main ewma over 20 builds from 19707 to 20009
images

@jumaffre jumaffre merged commit 3f6c436 into microsoft:main Mar 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants