Performance - Send Recursion #1290

alexjameslittle · 2022-08-23T22:23:06Z

alexjameslittle
Aug 23, 2022

At Lapse we've been using TCA in production since the inception of the project. We are currently deeply invested into the architecture and we have really enjoyed using this in conjunction with SwiftUI. As time has gone on we are finding ourselves building more features with increasing complexity and have started noticing a massive amount of performance issues which is forcing us to build certain features using a different pattern or using completely detached stores all together.

These performance issues are very easy to replicate by setting up a list of 10,000 items within a deeply nested state with an onAppear action and an onDisappear for each cell in the list.

In the example I have linked to, pausing execution mid scroll results in a stack frame count of well over 100. This stack trace also shows that the issue is due to send recursion on the main thread.

I'm defining send recursion as the process of the action being sent from the deeply nested child through all layers of the TCA hierarchy. This action has to be run through all layers of reducers all the way from the top AppState way down through 7 levels of reducers before being able to apply the changes we need to apply on the child state in the list. In an example such as a calendar inside of a UICollectionView as you scroll 7 items would appear along with 7 disappearing. This quickly adds up and means scrolling even on the most powerful new iPhones will be impacted.

By design TCA's store processes all actions on the main thread. Whilst this makes a lot of sense, it doesn't scale well to the complicated states managed by production apps. This is extremely problematic in the examples above as it causes extremely large hitches in the render phase and the CPU quickly rises to 100%. If you then also take into account the user could have low power mode on which throttles CPU usage, the lag becomes unbearable even on an iPhone 13 Pro Max.

At Lapse we believe this is the biggest issue facing TCA as it is an extremely common requirement, from a business and product stand point, to deliver perfect scrolling on a feed or infinite scrolling list. Seamless scroll and fluid interactions/transitions are essential for a great user experience, especially with newer phones such as the 13 Pro having a much higher refresh rate and apps like Instagram, Snapchat and TikTok setting the bar high for quality and performance

We are starting this discussion now to hopefully start a conversation with the community around these issues, whether other teams have come across anything like this and to share any potential solutions or ideas for TCA going forward

Example performance repo:
https://github.com/alexjameslittle/tca-performance-issues

mbrandonw · 2022-08-24T16:07:02Z

mbrandonw
Aug 24, 2022
Maintainer

Hey @alexjameslittle, thanks for starting the convo!

We know there are some things we can improve with the performance, and we believe the reducer protocol stuff helps a bunch. We've also been working with companies that have very large TCA applications to track down performance bottlenecks and improve instrumentation. We've even recently added a dedicated article about performance to our docs to address some of the most common pitfalls we see people fall into.

Also thanks for providing the demo app to play around with so that we all have a common thing to look at when discussing performance. There's a lot to respond to in your post, some of it is specific to your demo and some of it is general to how applications are built and how/when to use certain tools.

First, when discussing performance I think it would be good to have hard numbers to reference. For example, the demo app's performance can only be measured by a feeling of lagginess. It runs quite smoothly in my simulator, but also I have an M1 Pro. If I make changes to improve the performance, it's going to be difficult to compare before and after.

These performance issues are very easy to replicate by setting up a list of 10,000 items within a deeply nested state with an onAppear action and an onDisappear for each cell in the list.

Now technically I didn't see any performance issues in the demo, but putting that aside, the application described is not what I would call "simple". 😅 It's an application with 7 layers of behavior ending in a list of 10,000 items, each of which have their own behavior, such that any layer can inspect what is happening to any child layer and any change to a child layer is instantly visible to every layer above it.

Recreating such a thing in vanilla SwiftUI is likely to have all the same problems you might run into with TCA. It's just that people don't typically build vanilla SwiftUI applications in that style. Instead, people use islands of isolated @StateObjects and manually wire communication between layers where they see fit. We think this style really shines with simpler application, but becomes untenable with more complex applications, and TCA tries to fill that gap.

Now, we would love if we could expose all of that functionality without incurring a performance cost, but honestly that may not even be feasible. There's definitely low hanging fruit that we can address (and we're starting to), but at the end of the day there may just be a performance wall if every detail, even at the smallest leaf, is put into the global store. And so in those cases maybe we can come up with tools to eject you from the infinitely observable closed system into something not as nice, but still powerful.

One such tool could be an official way of splitting off disconnected stores to work on a little bit of state, and then somehow communicating that change to the main store. We're not sure how that communication mechanism will look, but we think the new dependency management style with ReducerProtocol makes this much more feasible since environemnts no longer need to be explicitly passed around.

In the example I have linked to, pausing execution mid scroll results in a stack frame count of well over 100. This stack trace also shows that the issue is due to send recursion on the main thread.

Judging from the screenshot above I believe the app must have been running in debug mode rather than release. If you re-run in release you will find that there are 88 significant stack frames (i.e. starting at the viewStore.send) and 9 of them get inlined. So, that's a bit better, not huge, but also it's always best to run in release mode when discussing performance. Do you see the same performance problems when run in release?

The good news here is that naively porting the app to the ReducerProtocol significantly improves the situation. When running in release there are still roughly 88 stack frames (i.e. starting at viewStore.send), but now 40 of them get inlined.

In an example such as a calendar inside of a UICollectionView as you scroll 7 items would appear along with 7 disappearing. This quickly adds up and means scrolling even on the most powerful new iPhones will be impacted.

There's nothing in TCA that requires every view to hold onto its own store. Some views are mostly inert and can get by with just some data. And if their behavior is quite limited, maybe their actions can be exposed with some action closures, as is done in SwiftUI's Button.

So, in the situation you described with a calendar, I would question whether each cell needs its own store of behavior. And if it did, what exactly is it doing in the .onAppear actions? Is it firing up effects to make network requests? If so then you have more problems on your hands because you have just fired up 40+ effects to thrash the network. And are parent layers really observing changes to each individual cell?

It seems the more likely scenario is that a CellView in a calendar could be constructed with some simple data, and perhaps an action closure or two. And maybe even the entire calendar should be a simple data component and not involve any TCA code at all. Instead, the TCA feature using the calendar is responsible for preparing the data to send to the calendar and for listening to events from the calendar, such as selected date.

And more generally, if a reducer is being defined for a domain that doesn't actually have much significant logic and doesn't execute any effects, then maybe it doesn't need to exist and the view can be "dumb". Such “dumb” data components can manage all types of internal state that doesn’t need to be representable in the global state. Just as TextField must manage all types of things for cursor location, selection range, etc., but the only thing you can publicly pull out of it is a Binding<String>.

By design TCA's store process's all actions on the main thread. Whilst this makes a lot of sense, it doesn't scale well to the complicated states managed by production apps.

While this is true, the alternative of running all state mutations on a non-main thread aren't without their gotchas too. Such a situation can easily lead to a back log of actions waiting to be processed, leading to a weird UI experience that seems like it's trying to catch up.

At the end of the day, we think state mutations do need to be serialized and that focus should be on making reducers run as quickly as possible. So any heavy works should be moved to an effect and fed back into the system via an action.

We do think there is something to running the store off the main thread, perhaps even making Store a global actor, but it requires far deeper integration with Swift's concurrency tools and more research. But it's definitely something we are interested in.

Sorry for dumping so much all at once, but I'll try to summarize my thoughts a bit:

We want to do everything possible to improve the performance of the library. We think the ReducerProtocol is a big step, and we have a few other things we're tinkering with, but we'd love to hear from others on their ideas of changes that could be made to help.
Even with dramatic performance improvements it is always going to be possible get into a situation that is just not practical, such as installing a store in every child view. Such power is usually not really needed. In this case, I think better documentation can help to make it clear that we are not dogmatically suggesting people start literally every view with a domain, reducer and store.
For massive applications that run into the limitations of what can be feasibly done in a closed, infinitely observable system (😅), there are probably tools we can offer to break away when necessary. The main one that comes to mind is the concept of a disconnected store, which we've seen people do in an ad-hoc fashion, but perhaps there's a more formal tool that could be provided with some niceties.

Thanks again for bringing this up, and we will regularly report back as we research more ways of improving the performance. 😁

12 replies

iampatbrown Aug 27, 2022

I haven't really tested it

To elaborate... my concern would be that state updates are no longer deterministic at a scope level because all scoped stores would be subscribing to the same CurrentValueSubject. Possibly using CurrentValueRelay inside Store like in ViewStore could help. Unsure though.
Now that I think about it, I might have inadvertently altered the lifecycle of the childStore by subscribing directly to the root...

Edit: I think I've resolved those two issues... I'll leave this here for visibility.

mbrandonw Aug 27, 2022
Maintainer

Hey @iampatbrown, that is a great find! Very clever use of existentials to specialize rescoping a scope. We're going to play with it a bit more and if everything checks out we will bring it into the library! Thanks!

iampatbrown Aug 28, 2022

@mbrandonw Keen to see how it goes on your end :)

iampatbrown Aug 28, 2022

Here's a similar approach applied to main main...iampatbrown:swift-composable-architecture:store-scope
All tests pass.

tgrapperon Aug 28, 2022

This is really impressive @iampatbrown!
While working on reusability, I experimented with something related where KeyPath transforms were flattened into one KeyPath, but the performance was abysmal (I think 10x worse than composed functions), and I dropped the idea.

Your approach makes additional scoping roughly as heavy as the transforms, and invites for more composition. This is really great!

alexjameslittle · 2022-08-24T22:09:21Z

alexjameslittle
Aug 24, 2022
Author

Thank you for the very in depth response @mbrandonw.

I think it's more than reasonable to assume a cell in a grid or list will have to do synchronous or asynchronous work as soon as it appears or even just before it appears. This is what UICollectionView's prefetching api's are often used for. Even in a normal (non TCA) collection view/grid such as a photo gallery, the cells will be responsible for fetching data such as remote images asynchronously. I agree however on the point that the parents don't necessarily need to listen to changes to the children/leaf nodes. This is what the discussion was meant to highlight, that the library may need a solution for complex projects such as ours with many routes and infinite scrolling lists. I think it would be great for TCA to have a baked in solution for this problem, especially as you mentioned you've seen similar approaches to isolated stores implemented in other projects.

The idea of a disconnected store within TCA is great, and is something we've been playing with and implementing around the whole app to achieve better performance. We are actively looking into a delegate action pattern for this problem to allow the leaf/child reducer to still delegate back up to the parent where necessary.

A real world example I would use is a Router that is responsible for potentially recursive navigation, such as how you can browse a profile on TikTok -> post -> comment -> profile -> post -> comment -> profile. You will probably agree that keeping all of this state in one large app state is destined for performance/memory issues, especially if a profile is responsible for displaying hundreds or thousands of posts. I believe this is a great case for a detached/isolated store.

I am looking forward to seeing any future updates from Pointfree/community on this topic and we will also regularly report back to this discussion with any findings/potential solutions we find along the way.

0 replies

stephencelis · 2022-08-29T16:54:56Z

stephencelis
Aug 29, 2022
Maintainer

@alexjameslittle Just a side note on your demo:

While I couldn't reproduce much of an issue on iOS, building your example on macOS had a noticeably slow frame rate while scrolling. Changing List { … } to ScrollView { LazyVStack { … } }, however, with no other changes, seems to dramatically improve performance. So it's probably worth exploring if some of your performance bottlenecks could be improved in the view layer.

Still, it's probably important to evaluate what state and actions truly need to be in the global store and which can be more local.

7 replies

stephencelis Aug 29, 2022
Maintainer

It's worrying that @mbrandonw & yourself can't replicate the hitches from the demo and lagging when scrolling. Release mode on an iPhone 13 Pro Max, if you scroll fairly quick down the list, very quickly results in huge hitches and jumping in the scroll. These hitches are also visible in Instruments when using the hitches template.

Don't worry! We believe you, we just haven't built for device, so we were only relaying our simulator performance.

stephencelis Aug 30, 2022
Maintainer

@alexjameslittle Just a small update, but @tgrapperon submitted a PR that should improve the performance of comparing large lists of state in a ForEachStore:

#1307

It's just one aspect of the performance you were seeing, but hopefully this change, plus a few others we have planned, will improve the baseline performance of TCA.

alexjameslittle Aug 30, 2022
Author

I just tried this on the example I provided, it's a huge improvement! There's still some hitches when you scroll fast, but this is a massive boost for performance even with the 7 layers of state/reducers

stephencelis Aug 31, 2022
Maintainer

@alexjameslittle We've merged a couple more performance improvements to main if you'd like to take them for a spin. We hope to do a release soon.

alexjameslittle Aug 31, 2022
Author

Awesome! Just checked it out, even better, barely any visual hitches now. Only if you scroll incredibly fast do you start to see issues now

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance - Send Recursion #1290

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 3 comments 19 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Performance - Send Recursion #1290

alexjameslittle Aug 23, 2022

Replies: 3 comments · 19 replies

mbrandonw Aug 24, 2022 Maintainer

iampatbrown Aug 27, 2022

mbrandonw Aug 27, 2022 Maintainer

iampatbrown Aug 28, 2022

iampatbrown Aug 28, 2022

tgrapperon Aug 28, 2022

alexjameslittle Aug 24, 2022 Author

stephencelis Aug 29, 2022 Maintainer

stephencelis Aug 29, 2022 Maintainer

stephencelis Aug 30, 2022 Maintainer

alexjameslittle Aug 30, 2022 Author

stephencelis Aug 31, 2022 Maintainer

alexjameslittle Aug 31, 2022 Author

alexjameslittle
Aug 23, 2022

Replies: 3 comments 19 replies

mbrandonw
Aug 24, 2022
Maintainer

mbrandonw Aug 27, 2022
Maintainer

alexjameslittle
Aug 24, 2022
Author

stephencelis
Aug 29, 2022
Maintainer

stephencelis Aug 29, 2022
Maintainer

stephencelis Aug 30, 2022
Maintainer

alexjameslittle Aug 30, 2022
Author

stephencelis Aug 31, 2022
Maintainer

alexjameslittle Aug 31, 2022
Author