Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race conditions #9

Closed
wants to merge 16 commits into from
Closed

Race conditions #9

wants to merge 16 commits into from

Conversation

colmsnowplow
Copy link

@colmsnowplow colmsnowplow commented Oct 27, 2021

Resolves the issues summarised in #3

Specific descriptions and reasoning for each change can be found in the below issues:

#6
#5
#3

Note that the integration tests do sometimes fail due to duplicates - about 1 in 5 GH actions has one of 3 tests fail with this. So about 1 in 15 runs of 4k events.

I have some ideas of how we might attempt to avoid those duplicates. I don't believe that this behaviour is caused by my changes. The previous strategy was to crash when we hit a scenario that might produce these duplicates (without mitigating the cause of duplicates). It is impossible to say conclusively however since the integration tests weren't running previously.

Copy link

@paulboocock paulboocock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just done an initial code read before we catch up tomorrow to have some expectations on what to expect. Left a couple of thoughts around your comments.

run: make integration-up

- name: Run integration tests
run: GO111MODULE=on go test ./... -v

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: GO111MODULE=on is default as of 1.16, so can be removed. https://golang.org/doc/go1.16#go-command

Comment on lines +44 to +60
# - name: Block Concurrent Executions of Integration Tests
# if: ${{ env.NGROK_TOKEN != '' }}
# uses: softprops/turnstyle@v1
# with:
# poll-interval-seconds: 10
# same-branch-only: false
# env:
# GITHUB_TOKEN: ${{ github.token }}

# - name: Setup integration resources
# run: export NGROK_DIR=$GITHUB_WORKSPACE/bin/ && make integration-up

# - name: Run integration & unit tests
# run: make integration-test

# - name: Compile all targets
# run: make all

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I assume this can be removed.

// TODO: Add logging for these cases
if awsErr, ok := err.(awserr.Error); ok && awsErr.Code() == conditionalFail && cp.lastUpdate < time.Now().Add(-cp.maxAgeForClientRecord).UnixNano() {

// TODO: investigate if not marking cp.dirty = false here causes duplicates

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you do this investigation?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still to do!

config Config // configuration struct
numberOfRuns int32 // Used to atomically make sure we only ever allow one Run() to be called
isLeader bool // Whether this client is the leader
unbecomingLeader bool // Flag to avoid race condition when unbecoming leader

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: unbecomingLeader made me chuckle. I assume this isn't describing a leader which is unsuitable, or unattractive... but a leader who is stepping down from being a leader?

Does retiringLeader better describe this, or isLeaderRetiring?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After reading the usages of this flag, I understand the naming now (due to unbecomeLeader) 🤷

numberOfRuns int32 // Used to atomically make sure we only ever allow one Run() to be called
isLeader bool // Whether this client is the leader
unbecomingLeader bool // Flag to avoid race condition when unbecoming leader
restartingConsumers bool // Flag to indicate that we should only update the clients table while restarting consumers

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I like flags that start with is. Lets me know its a "status" flag.

Suggested change
restartingConsumers bool // Flag to indicate that we should only update the clients table while restarting consumers
isRestartingConsumers bool // Flag to indicate that we should only update the clients table while restarting consumers

// block until we know we're not unbecoming leader already
iteration := 0
for {
if !k.unbecomingLeader { // TODO: does looping indefinitely like this have any negative effects? should we sleep for 50ms in the loop?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're definitely going to be blocking a thread to some degree here, and perhaps would lead to performance hiccups in the hosting app. A sleep is a fast way to stop this being too painful. Typically this is a bit of an antipattern though, you'd want to be blocking execution using a channel or something similar, however for brevity I think plopping a sleep here isn't too bad. I don't expect we'd get stuck in this loop for particularly long or for us to end up in here that often.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK so just to check my understanding:

for {
}

blocks a thread

for {
time.Sleep(100)
}

also blocks a thread, but with less impact? Or makes no difference?

I don't know why I have this impression but it feels like go has more opportunity to context switch when there's a sleep but reflecting on that might not make sense.

Copy link

@paulboocock paulboocock Oct 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you sleep in a thread, you give other threads the opportunity to do some work. So you're still blocking execution of this particular thread but your only checking your condition every x milliseconds rather than as fast as possible, which gives the cpu a little more time to do other tasks in between checks. So yes, it still blocks but with less impact.

@colmsnowplow
Copy link
Author

Closing in favour of #13

@colmsnowplow colmsnowplow deleted the race-conditions branch March 8, 2023 17:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants