-
-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace local EFcore database backing with Realm #7057
Comments
The most value of EF are change tracking and cascade loading. If change tracking is unacceptable, EF will be not so friendly. |
EF also has a large startup overhead and memory presence (mostly reflection related). I understand that change tracking can actually improve performance, but as mentioned, our threading model is very non-standard and leads to headaches when attempting to use it. And at the end of the day, SQLite is a bad backend for EF or anything else, as it doesn't type so nicely, nor migrate. |
As a side note for anyone looking to test the attached realm branch, please build it twice for now to bypass the compile-time errors (there should be three remaining). It still works as expected. Also ignore the errors in test projects. I have not updated tests yet. |
Do you mean PropertyChanged calls? It is not that bad.
|
I've written such helpers like Prism in my UI application. Experience is still bad because it still requires 6 lines of one property. As peppy said, change tracking is not the only problem, even not the major one. |
You're welcome to attempt that. It doesn't work without fixes (many other users have reported the same). |
Things are feeling good, so I began looking into the realm filesize issue. The problem here is that we are holding realm contexts open on all background threads indefinitely. My first thought was to clean them up. I added read usage tracking so we can use a finalizer in This doesn't work with RealmWrapper may not be the solutionIn its current form anyway. Two paths forward are: Return
|
I remember having file size issues with Realm earlier this year, and it was mostly resolved by ensuring we didn't do too many bulk writes. IIRC, since transactions are copy-on-write, doing a large transaction will reserve a lot of space in the DB file. In our example we were moving around hundreds of megabytes of serialised |
That's not the issue, it's definitely what I explained in my last post. Already ensured transactions are of realistic sizes (one beatmap set at a time). |
Realm 6.0 beta details have now been released. Not yet available via .NET API but at a glance, "Frozen Objects" look to solve the one major issue I was facing until now. I will continue keeping my realm branch up-to-date with master, but not make any further changes until 6.0 becomes available for our consumption. |
Moving out of milestone for now. Will become a current task once we have Realm 6.0 to test against. |
Thanks for attempting to be useful, but it has only been released for cocoa, not .NET. |
Oddly enough, is says that in January, listing it as GA meant that .NET would be supported, but according to your comment, it's not. The Realm blog hasn't been updated, either since the Realm 6.0 post.. |
Please refrain from these comments. Software takes time and it's not ready yet. You can be rest assured I'm following progress. |
Just as an update, it looks like May at earliest for a revisit to this. |
Over the past five days I've dedicated a large block of time into looking at alternatives to EF Core. This is a consideration that we have been talking about internally for a while now, for the following reasons:
store.Refresh()
to retrieve a "live" version which can be modified and updated correctly.The first step was to decide whether we should be sticking with SQLite
Pro:
Con:
Attempt 1: Dapper + LiteDB
I an initial effort to move forward with simplicity in mind, I attempted switching to dapper + LiteDB. LiteDB is a very lightweight (and not heavily maintained or used) project which allowed to quickly test things out without changing our models too much.
The branch above is in a mostly working state, but brought up a few crippling issues:
All queries need to be converted to raw SQL.
While this isn't a killer, it feels like a step backwards in terms of code quality, readability and maintainability. Rider will give inspections on the SQL to ensure correctness to some extent, but this relies on the table name being present in the queries (which it isn't in my branch, to use our existing store layout).
SQLite bugs prevail
I ran into this issue. It has a multi-year history and no sign of being fixed without local workarounds. This isn't necessarily Dapper's fault either – it's a shortcoming in the communications between the sqlite native/wrappers and the Dapper mappings which can't easily be resolved without a workaround that would affect other database engines too.
At this point I stopped looking towards dapper as it didn't feel like the correct direction, and would leave a lot of the issues that we have with EF core.
Attempt 2: Realm
Realm is a lightweight database that boasts a "zero-copy" architecture. All queries on the database return live
IQueryable<T>
instances, and all models and relations are lazy-marshalled on access. This means very low memory and allocation overhead on retrieval. It also means you do not need to specify relations to include on retrieval – you have access to all of them to an infinite recursion depth. Here's some good further reading to get up to speed on the points of pain that can be encountered with realm.For the record, this is my second attempt at using realm. This time I was more confident as over the last year I have been thinking about how we can work around the limitations it has which affect us:
RealmObject cannot be derived more than once
One example which required changes is
DatabasedKeyBinding : KeyBinding : RealmObject
. This can be quite easily solved by flattening the inheritance somewhat. Another wasLegacyScoreInfo
, which tidied up very nicely, resulting in better code quality and better defined behaviour than before.There are a few remaining cases such as
APIBeatmap
which is derivingBeatmapMetadata
– this is a very ugly inheritance and one which I believe we should be looking to fix anyway.I believe this limitation will actually benefit us from a code quality perspective, and make our data models more portable.
Realm is thread-safe, with caveats
Each thread needs its own realm context. This is the same as EF and slots in nicely in our
DatabaseContextFactory
with no issue. The caveat is that objects retrieved on one thread cannot be accessed on any other thread.This is something we do absolutely everywhere. Consider a simple class:
Another example is that we store a
Bindable<BeatmapInfo>
at anOsuGame
level. Without special attention the whole game breaks due to these cross-thread accesses.I did a good amount of reading on how realm should be used and it seems to favour the mobile app development model, where you:
For us, we use a different model with our
BackgroundDependencyLoader
async flow. We expect visuals or UI elements to take longer than a frame to load in some cases and don't want to be inhibited by the main UI thread. This is a very different approach but one that will not be changing any time soon, as it is one of the core features/paradigms of osu-framework.So, we have two options available to use realm in our ecosystem:
Detach all objects from the database
Copying the data from realm into an in-memory version would completely solve this issue with no side-effects, but does limit the usefulness and performance benefits of realm (and in some people's eyes is outright blasphemy).
While I agree we definitely shouldn't be doing this everywhere as the copy-to-memory overhead is actually quite large when compared to EF core, it does have its place. In my implementation so far there are two use-cases which require this:
BeatmapInfo
but does not wish to affect the databased version.RulesetStore
andRulesetInfo
in general, which is passed around and available game-wide as a bindable, but (currently) never changes. The overhead of using a realm-backed object here would be higher than detaching/copying, even if threading was not an issue.I have implemented this using Automapper, via the following implementation:
Mapping specifications must be made for each model, including collection members and lookup depth. I believe the above can be further optimised by reducing inclusions further, as I believe there are some circular references present at a lookup depth of 2.
Re-fetch objects on each thread on use
This is the recommended approach and realm actually provides an API for passing objects between threads. While the API doesn't work so well for us, because it requires explicit implementation at every usage, the concept can still be used by us by re-fetching via primary key from each thread.
Doing this manually would be quite a pain, so I came up with another proposal...
Enter RealmWrapper
I ended up creating a class to manage passing realm objects around inside osu!:
This is the core of the realm implementation and probably the most important class. It allows us to use realm with minimal changes (initially) but also expand our usage to include realm's observable changes and zero-copy optimisations as we require. In my branch (at the time of writing this)
RealmWrapper
is used correctly by theBeatmapCarousel
andMusicController
, as these are scenarios where performance is important.All data stores should return
RealmWrapper<T>
(BeatmapManager
already does in my branch). Consumers that want to benefit from the realm-backed object should accept aRealmWrapper<T>
instead of aT
, then "unwrap" it locally viaGet()
, on usage.What makes this tick:
Get()
will ensure you have a realm-backed object usable on the current thread, via a re-lookup using the primary key.T
allows it to be used in all existing usages. This will force a copy-to-memory viaDetach()
, so it does come with an overhead.Further improvements I would like to make to
RealmWrapper<T>
:Live<T>
orLiveData<T>
to better denote what it is.ThreadLocal
is probably overkill. I think storing the last instance is enough (and will be more memory/performance efficient).IsSameInstance
check intoGet()
depends on realm instance lifetime management direction.IBindable
or similar if we can.ctor
which creates an unmanagedRealmWrapper
viaWrapAsUnmanaged
. This allows classes provided models (especially tests) to do so without explicitly changing the code to wrap inside theRealmWrapper
class.Performance
Using the above two methods combined, my branch is in a usable state for importing beatmaps and playing the game. Some auxiliary functionality like selecting the current skin in settings will crash the game, still. I wanted to get the song select and gameplay loop working so I could benchmark the new structure and ensure we have not regressed, before continuing any further.
Startup
We save around 1.5 seconds on startup which used to be consumed by EF core, on a multi-core system. The benefit will be larger with fewer cores, I believe.
Startup memory usage is also lower than EF:
...although things seem to even out after a longer play session (needs further investigation)
Property retrieval via realm
Realm is made to be run on non-spinning solid storage media. In order to test realm under the worst scenario possible, I ran some read-heavy benchmarks on an average speed USB drive.
RealmObject
RealmObject
.RealmObject
.RealmObject
using a primary key (in a table with 10,000 rows). Basically what we will be doing on each thread when accessing aRealmWrapper<T>
the first time.RealmObject
.With no external load:
Worst case is around one order of magnitude difference in property retrieval performance.
Let's try again with high external write load induced by
dd if=/dev/urandom of=/Volumes/UNTITLED/test-output bs=1024 count=40000000000
(100% IO saturation):Thankfully, it looks like either realm or OS file cache is enough to ensure little performance deviation. Writes on the other hand are noticeably slower (during the test setup when inserting rows, this is noticeable), but this is to be expected and out of the scope of testing for now (writes will generally be async so are not relevant).
In a more optimal scenario, here's the same benchmarks on my desktop with SSD backing:
It's safe to say that for read workloads, at least on macOS under my testing conditions, there is no issue with slower IO. This should probably be tested on windows for comparison, but I think it should behave sanely enough to not be an issue.
I will add more benchmark results as to how this translates into usage in osu! as they become available to me, but the following shows what a worst-case scenario can look like (this is around 10 filters running at song select doing >100k comparisons and many more property retrievals)
Disk space usage
Right now, realm is consuming more space on disk than I would hope for. Initially, the realm database was 2gb after an import of 564 beatmap sets. This was somewhat alleviated by reducing the number of transactions during the import process (196mb after). Performing a manual compact on the realm database brings it within sane bounds, at about 75% the size of our SQLite database:
There's existing documentation around online about database file bloat and this will require further reading and investigation, but as it can be resolved by a manual compact after large import processes, is not a blocker.
Remaining tasks
Realm.Refresh()
to be run regularly on main thread, and not run everyGet()
ItemAdded
/ItemRemoved
to remove schedulesPrimaryKey
s, including fixingRulesetInfo
's IDlong
primary key instead of GUID (realm now supports guid efficiently, so we'll stick with that)RealmWrapper
APIBeatmap : BeatmapMetadata
andDummyRulesetInfo : RulesetInfo
)Closing remarks
I have spend around 50 hours on this so far (~40 of that in realm). I do still plan to continue investigating realm as it does seem to be, at very least, objectively a better solution than EF core.
Until now, our models and stores have been quite haphazard. In some cases, I'm not even sure how things are working correctly. The threading model realm uses forces some degree of consistency and structure that will help us better define out data usage.
At the point of writing this I estimate another 50 hours of work required to get the full game into a flawlessly working state, including addressing all the points above. The remaining work feels like the downhill part of this journey – I am only posting this as most of the points of pain have been resolved.
I plan to do this over the coming weeks.
2021 progress on this task
The text was updated successfully, but these errors were encountered: