Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 1819161 - Attempt to fix issue when leaving DispatchGroup #2392

Merged
merged 1 commit into from
Mar 8, 2023

Conversation

travis79
Copy link
Member

My suspicion is that this is due to the recursive nature of process and defer being especially executed at the end of the scope and some interaction between the two.

@travis79 travis79 requested a review from a team as a code owner February 27, 2023 21:05
@travis79 travis79 requested review from badboy and removed request for a team February 27, 2023 21:05
@codecov
Copy link

codecov bot commented Feb 27, 2023

Codecov Report

Patch and project coverage have no change.

Comparison is base (36c1d78) 32.50% compared to head (7701dcc) 32.50%.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2392   +/-   ##
=======================================
  Coverage   32.50%   32.50%           
=======================================
  Files           1        1           
  Lines          40       40           
=======================================
  Hits           13       13           
  Misses         27       27           

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@Dexterp37 Dexterp37 requested review from rosahbruno and removed request for badboy February 28, 2023 08:07
@Dexterp37
Copy link
Contributor

Since @badboy is not around I redirected this to @rosahbruno for a first look. @travis79 I'd recommend seeking an additional review from an iOS expert outside of our team.

@OrlaM
Copy link

OrlaM commented Feb 28, 2023

I don’t think this is a timing issue as DispatchGroup will hang forever if you let it. The error we are seeing is Unbalanced call to dispatch_group_leave(), this means there were more leave calls than there were enter calls. Having more enter calls than leave calls is also an issue but it won’t result in a crash, it will just hang the process indefinitely.

There are potential issues with the upload function that is called. If for some reason the buildRequest function returns nil then the function will exit without triggering the callback. Similarly if the write to file in the upload function fails it will not call the completion. These aren’t causing the crash but would cause the process to hang forever as the dispatch group will never complete.
I suspect but can’t confirm that there is a use case that causes the SessionResponseDelegate to trigger the callback multiple times. Or at the very least some code path is causing multiple returns from the upload function.

To get around this issue you can instead call the same defer block you had previously but put it at the top of the function, at the same scope as the enter and then don’t call leave anywhere else, this will guarantee the number of enter calls matches the number of leave calls.

Something like this:

        uploadGroup.enter()
        defer {
            uploadGroup.leave()
        }

@OrlaM
Copy link

OrlaM commented Feb 28, 2023

The reason the upload functions completion fires twice is potentially related to the same iOS crash we had last year, the quick fix was to remove finishTasksAndInvalidate for the url session, but that crash had the same presentation as this issue where something funny was going on with the url session task on foregrounding the app.

The fix I suggest above will likely still resolve this crash but I recommend creating a follow up task to investigate what the underlying issue is with how url session is managed because it's likely this will continue to cause random issues going forward.

@travis79
Copy link
Member Author

Thanks @OrlaM! I agree 100% that this seems very much like the previous crash related to the URL Session usage and I'd really like to resolve that for good. I'll adjust this to add the single defer at the same scope as the group entry occurs and file a bug to do a more thorough investigation of the URL Session usage.

@OrlaM
Copy link

OrlaM commented Feb 28, 2023

Sorry, the advice I gave was bad on further consideration. Moving the defer to the top of the function will ensure there is a leave for every enter but it will not have the effect you want it to since the leave will fire before the completion of the upload. I assume this isn't what you are after and will potentially cause other knock on issues.

I'm not sure there is a good fix for this 🤔
The options I see are:

  1. fix the underlying issue that is causing upload to complete more than once
  2. add some hacky check, like a flag or something to the upload completion block to ensure leave is only called once
  3. Remove the dispatch group. I haven't dug into see what it's used for so not sure how viable this is but it didn't used to exist so it's potentially an option to remove it until option 1) is fixed.

@travis79 travis79 force-pushed the Bug1819161 branch 2 times, most recently from 46535fc to e74e49b Compare March 2, 2023 20:25
Copy link

@OrlaM OrlaM left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🤞

@travis79 travis79 force-pushed the Bug1819161 branch 4 times, most recently from 8cfeacb to a521527 Compare March 6, 2023 20:28
Copy link
Contributor

@chutten chutten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs CHANGELOG and maybe we can drop testingMode?

I'm no iOS wizard and I barely understand Swift, but I'm feeling some unearned confidence that the approach is sound from having a proper iOS dev already look over this.

And generally speaking, less code means fewer bugs, right?

@travis79 travis79 force-pushed the Bug1819161 branch 3 times, most recently from 635148f to 916920a Compare March 7, 2023 21:57
@travis79 travis79 requested a review from chutten March 7, 2023 22:27
Copy link
Contributor

@rosahbruno rosahbruno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems good and changes make sense - just a few nit-picky things

CHANGELOG.md Outdated Show resolved Hide resolved
glean-core/ios/Glean/Net/HttpPingUploader.swift Outdated Show resolved Hide resolved
glean-core/ios/Glean/Net/HttpPingUploader.swift Outdated Show resolved Hide resolved
My suspicion is that this is due to the recursive nature of `process` and `defer` being especially executed at the end of the scope and some interaction between the two.
@travis79 travis79 merged commit 8805658 into mozilla:main Mar 8, 2023
@travis79 travis79 deleted the Bug1819161 branch March 8, 2023 15:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants