Skip to content

Bug 1240041 - Fix Top Sites concurrency issues#1642

Closed
thebnich wants to merge 8 commits into
mozilla-mobile:masterfrom
thebnich:topsites
Closed

Bug 1240041 - Fix Top Sites concurrency issues#1642
thebnich wants to merge 8 commits into
mozilla-mobile:masterfrom
thebnich:topsites

Conversation

@thebnich
Copy link
Copy Markdown
Contributor

As mentioned this morning, the issue here is that sync triggers while we're acting on a stale UI, causing the UI to be out-of-sync with the data source count. There are two places this can affect us:

  1. A sync complete notification triggers while we're in the middle of a deletion. This calls refreshTopSites, which can results in inconsistency between the stale post-deletion data and the newly updated sync data.
  2. A partial sync happens while we're in the middle of a deletion. There's not necessarily a notification, but because we do a requery after deleting the site, we could inadvertently pull in Sync additions (or deletions!), or new pages that have since loaded in the background.

To address the first case, I proposed we simply drop sync notifications while the view is visible; @rnewman suggested postponing Sync (and other) refreshes until we're done editing. I went with the latter here since that's closer to our current behavior, and we can discuss how we want to handle Sync (and location change) updates in bug 1257592.

The fix to the second case is to simply act on the existing data source instead of pulling it in after a deletion, which can cause bad things to happen. Right now, for sync additions, we hit bug 1257291; for sync deletions, we'll crash.

@thebnich thebnich force-pushed the topsites branch 2 times, most recently from 1210e65 to e75ee34 Compare March 17, 2016 18:26
@sleroux
Copy link
Copy Markdown

sleroux commented Mar 17, 2016

I fired up the simulator with an existing synced account, went to delete the first top site, and got a crash:

_endItemAnimationsWithInvalidationContext:tentativelyForReordering:], /BuildRoot/Library/Caches/com.apple.xbs/Sources/UIKit_Sim/UIKit-3512.30.14/UICollectionView.m:4324
2016-03-17 14:38:55.193 Client[62167:4877457] *** Terminating app due to uncaught exception 'NSInternalInconsistencyException', reason: 'Invalid update: invalid number of items in section 0.  The number of items contained in an existing section after the update (12) must be equal to the number of items contained in that section before the update (12), plus or minus the number of items inserted or deleted from that section (0 inserted, 1 deleted) and plus or minus the number of items moved into or out of that section (0 moved in, 0 moved out).'
*** First throw call stack:
(
    0   CoreFoundation                      0x00000001070aee65 __exceptionPreprocess + 165
    1   libobjc.A.dylib                     0x00000001063cedeb objc_exception_throw + 48
    2   CoreFoundation                      0x00000001070aecca +[NSException raise:format:arguments:] + 106
    3   Foundation                          0x000000010601b4de -[NSAssertionHandler handleFailureInMethod:object:file:lineNumber:description:] + 198
    4   UIKit                               0x00000001086b4ac7 -[UICollectionView _endItemAnimationsWithInvalidationContext:tentativelyForReordering:] + 15347
    5   UIKit                               0x00000001086b0d2a -[UICollectionView _updateRowsAtIndexPaths:updateAction:] + 350
    6   UIKit                               0x000000011a819ce4 -[UICollectionViewAccessibility deleteItemsAtIndexPaths:] + 43
    7   Client                              0x00000001042bac08 _TFFC6Client13TopSitesPanelP33_C9DFE36B31E426C19626C23F17C1D22924deleteHistoryTileForSiteFS0_FTC7Storage4Site11atIndexPathCSo11NSIndexPath_T_U_FGO6Shared5MaybeT__T_ + 808
    8   Deferred                            0x000000010996b498 _TPA__TTRGrXFo_iq__dT__XFo_iq__iT__50 + 56
    9   Deferred                            0x000000010996b152 _TPA__TTRGrXFo_iq__iT__XFo_iq__dT__35 + 50
    10  Deferred                            0x000000010996b216 _TPA__TFFC8Deferred8DeferredP33_8471EEB727F82F3A241B76FBEA842E6D5_fillurFGS0_q__FTq_14assertIfFilledSb_T_U0_FT_T_ + 118
    11  libdispatch.dylib                   0x000000010a325e5d _dispatch_call_block_and_release + 12
    12  libdispatch.dylib                   0x000000010a34649b _dispatch_client_callout + 8
    13  libdispatch.dylib                   0x000000010a32e2af _dispatch_main_queue_callback_4CF + 1738
    14  CoreFoundation                      0x000000010700ed09 __CFRUNLOOP_IS_SERVICING_THE_MAIN_DISPATCH_QUEUE__ + 9
    15  CoreFoundation                      0x0000000106fd02c9 __CFRunLoopRun + 2073
    16  CoreFoundation                      0x0000000106fcf828 CFRunLoopRunSpecific + 488
    17  GraphicsServices                    0x000000010d3cdad2 GSEventRunModal + 161
    18  UIKit                               0x0000000107e4a610 UIApplicationMain + 171
    19  Client                              0x000000010441e8b6 main + 582
    20  libdyld.dylib                       0x000000010a37a92d start + 1
    21  ???                                 0x0000000000000001 0x0 + 1
)
libc++abi.dylib: terminating with uncaught exception of type NSException
(Recorded Frame) 

I have a feeling its crashing because I have more than 12 top sites and the deletion doesn't factor in that a new tile will come in after deleting the first one?

@sleroux
Copy link
Copy Markdown

sleroux commented Mar 17, 2016

I think it might have to do with this removal:

e75ee34#diff-abfc8d88f8fbb974154bce59e22c81a6L218

It checks to see if we have enough to add in another tile so instead of just a deletion its a delete + insert.

@thebnich
Copy link
Copy Markdown
Contributor Author

Nice catch -- I emptied my sites from testing deleting so much and forgot to try on a full set of sites ;)

@thebnich
Copy link
Copy Markdown
Contributor Author

Though that presents another question: what do we show?

If our cache is larger than the number of thumbnails that fit, we can use the next site in the list to fill the gap (though there's no guarantee it will be the same site after the actual requery). Best case, it matches. Otherwise, it doesn't, and the top sites change on us after we click Done.

Worse: what happens if the cache isn't larger than the number of thumbnails? We'll end up with gaps where the deleted sites are, and they won't be filled until we click Done.

@rnewman
Copy link
Copy Markdown
Contributor

rnewman commented Mar 17, 2016

I think the most consistent thing to do is that we let you delete whatever you see, and when you exit edit mode we fill in the gaps. But that's annoying: if your results are crap, you will have to re-enter edit mode multiple times to get rid of the junk on your top sites.

I think the least annoying thing to do, then, is to keep streaming in new results.

As @sleroux notes, to handle that you'd need to compute the new results, and in the same animation block switch the data source, diff, remove the deleted item, and then insert the new one.

We could more smoothly handle the common case by pre-fetching extras (perhaps we already do?), or pre-fetching the next N as soon as the user enters edit mode or has deleted N-M, with the goal that there's always a filled Deferred waiting with more items for us to futz with directly on the UI thread.

@thebnich
Copy link
Copy Markdown
Contributor Author

This is some gnarly stuff, but here goes (going out of order for ease of explanation):

Part 2: First, we move the refresh to happen after the deletion. This gives us a consistent data set to work with during the deletion while making sure we also keep our buffer up-to-date if we delete more. To prevent multiple quick deletions from conflicting with each other, we hold the userInteractionEnabled from start to finish. When we delete a site, we pull the replacement from last visible index in our data source.

But there's a problem: since we pull in the latest sites after each deletion, and since the sites/order can change for any Sync updates or location changes, we have no guarantees about which site we're getting back from the last visible index. It's possible that some new sites have shown up in the meantime, pushing down the existing visible sites, so the "new" site we're pulling in is actually a site we're already showing. That's where Part 1 comes in: we now update, rather than replace, the existing data source -- preserving the order in the process. That means any new sites will always be appended to the existing set of sites.

Finally, Part 3 is a simple extension to this userInteractionEnabled lock by ignoring any refreshes if the lock is set. We update after each deletion, so a refresh is right around the corner anyway.

@sleroux
Copy link
Copy Markdown

sleroux commented Mar 18, 2016

Logically it makes more sense than what was happening before but I was able to find two edge cases where this breaks down:

  1. Starting with the suggested sites, I kicked off a first time sync to pull down my data and when the data came in it appended itself to the end of the suggested sites. The sites should populate ahead of the suggested sites instead of behind.

1. I was able to get duplicate sites when quickly deleting top sites using two thumbs going up and down the list:

[bnicholson: edited to scale down images]

@thebnich
Copy link
Copy Markdown
Contributor Author

Thanks for testing. Yeah, I came across similar issues last night.

Two more parts:
Part 4: Since we were merging suggested sites each time the sites were updated, deleting a site could still cause jank (a reflow) if a new site came in while the panel was open. The reason is that our suggested sites merge logic puts suggested sites at the end, so a new site would jump ahead of any suggested site tiles at the end. This commit makes sure that updates move new sites to be after suggested tiles after deletion. In the case of deletions, I think we want new sites to come after suggested sites, since the alternative is reflow jank. But if we're doing a full sites refresh (e.g., after Sync), I think a reflow is expected, so we should handle that separately (see below).

Part 5: After adding Part 4, I noticed that new sites would always appear after suggested sites unless I opened Top Sites a second time (open the panel, Cancel, open the panel again). The reason is that we do two queries to the DB when we open the panel: the first pass gets the stale set of sites, and the second pass gets the updated set of sites. But since we hold onto the position of the suggested sites after the first pass, the setHistorySites merging will then put any new sites at the end for the second pass. The fix here is to invalidate our sites list whenever our cache is updated, meaning setHistorySites should do a replace, not a merge. Consequently, this also handles the case where we want to reflow after syncs as mentioned above (since a Sync will cause the cache to be updated).

So, long story short: we should never show suggested sites before normal sites if we're simply loading the panel (unless your top history sites actually include a suggested site domain, of course). Deletions may cause suggested sites to appear before normal sites because we want the animation to be smooth, and new sites may have been added to the DB since the panel was open. But that will fix itself the next time we open Top Sites. Depending what we decide for bug 1257592, that could also have an effect here.

Also, it was possible to hit duplicates because the performBatchUpdates could happen in parallel with the reloadTopSitesWithLimit since performBatchUpdates is async. Fixed this in Part 2 by returning a Deferred, filled when performBatchUpdates hits its completion handler.

self.deleteOrUpdateSites(result, indexPath: indexPath)
self.collection?.userInteractionEnabled = true
profile.history.removeSiteFromTopSites(site).uponQueue(dispatch_get_main_queue()) { _ in
// Remove the site from the current data source. Don't requery yet
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should check isSuccess on the input here. If the removal failed, bail out.

@rnewman
Copy link
Copy Markdown
Contributor

rnewman commented Mar 19, 2016

I hate to say "LGTM" on a PR like this, but: modulo those nits, if it stands up to the testing you've done, this looks fine to me.

@thebnich
Copy link
Copy Markdown
Contributor Author

Speaking of testing, I'll throw in a few UI tests that can verify some of the things here (order of incoming sites after deletions, maintaining the positioning of suggested sites, etc.).

@sleroux
Copy link
Copy Markdown

sleroux commented Mar 21, 2016

Did some more testing and I'm seeing a lot of flickering on the tiles when they are being reloaded. Seems that the tiles that come in after deletion have their favicons flicker on the next deletion. Were you seeing this as well? I'm not sure if this was happening before or not.

@rnewman
Copy link
Copy Markdown
Contributor

rnewman commented Mar 21, 2016

FWIW on all branches I've seen icons flicker in every time top sites loads, sometimes. We're probably doing something wrong somewhere :/

@thebnich
Copy link
Copy Markdown
Contributor Author

@sleroux Yeah, I see the flicker when deleting here. This happens because we now call reloadData() after the deletion, which invalidates and recreates all of the thumbnails. I think the simple thing to do is simply not call reloadData() after updating the data source with deletions (we'll still fetch the latest sites from the DB, we just won't invalidate the view). With this PR series, we take careful steps to make sure the data source and the thumbnails are always in sync during deletions, so there's no need to reload set of sites. Made this change in Part 6.

@rnewman Related: what's happening is we often fetch (and call reloadData()) twice each time we refresh Top Sites (which we do every time we open the panel). This will happen any time we're browsing and a new domain is added to history since that will invalidate the Top Sites cache -- and we don't update the cache until we're actually showing the panel.

Separate bug, but obvious options are:

  1. Don't do a double load. If the cache is invalidated, wait until we have the updated cache before calling reloadData(). This means the Top Sites panel might be blank for a short period before we show anything, but that still might be better than flicker.
  2. Building on option 1, we could auto-update the cache as domains are added to history so that we don't need to rebuild the cache while the panel is visible. We'd have to be careful how we do this -- we don't want a sync to trigger hundreds of top site requeries!

@thebnich thebnich changed the title Bug 1240041 - Don't refresh Top Sites while editing Bug 1240041 - Fix Top Sites concurrency issues Mar 21, 2016
thebnich added a commit that referenced this pull request Mar 21, 2016
Bug 1240041 - Fix Top Sites concurrency issues
@thebnich
Copy link
Copy Markdown
Contributor Author

Merged with tests and comments addressed: f39f2d8

@thebnich thebnich closed this Mar 21, 2016
@thebnich thebnich deleted the topsites branch March 21, 2016 22:52
isabelrios pushed a commit to isabelrios/firefox-ios that referenced this pull request Feb 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants