- at Mailbox for about 2.5 years
- Worked on Mailbox infra, mostly the core email sync algorithms until Mailbox launch
- Mapping IMAP to a sane protocol
- Layering features on top of IMAP (threading, conversation parsing, delta diff, checkpoint imap sync)
- then moved to non-UI mobile (libmailbox)
- paving the way for more platforms, making longer term bets to be able to quickly build Mac Desktop, Android & other platforms
- migrate tens of thousands of LOC to C++
- websockets -> C++
- NSUserDefaults -> levelDB + json11 + C++
- CoreData -> SQLite + C++
Rewrite CoreData... seriously?
Its very important to understand why we decided to migrate from CoreData. CoreData is fast enough and CoreData is powerful enough, but we needed android, mac desktop, and we will need windows desktop. The immediate choice in front of us was to either rewrite Mailbox in Java or C++.
- SQLite is written in C
- API is much more difficult than CoreData (but query API is powerful, SQL)
- you need to build your own abstractions on top
- OR find a good (and thin!) sqlite wrapper
- C++ is complex / difficult
- NDK build system is hard / confusing
- the Java Native Interface (JNI) is terrible to work with
- this is a property of Java, not Android. Google has the ability to fix this specifically for android
- I believe the community will build tooling to mitigate this
Can such a small team rewrite CoreData?
That is not what we tried to do, we didn't aim to rewrite CoreData (would be fun though). We didn't try to even rewrite just the parts Mailbox uses. We thought back to our initial goal (ship Mailbox on Android) and we only needed one thing persistence and query layer in c++.
SQLite pretty much already meets this description, but we also needed NSManagedObjectContextObjectsDidChangeNotification to maintain the delightful animation and fast processing that has been defining for the Mailbox product. Armed with SQLite we only needed to work on delta changes, our c++ ObjectsChangedNotification
3 major concepts
Query: (roughly NSFetchRequest), running query over and over produces different DataViews
DataView: a sorted list of stuff. roughly a result set of a SQL query
ChangeSet: what was added, what was deleted, what was moved, what was updated
how concepts relate
- (time passes, something changes)
Steps 1 and 3 are essentialy just running a SQLite query. Step 5 is already defined by UITableView & NSTableView (beginUpdates, ..., endUpdates). So we need to design an algorithm for 4.
an aside on DB replication
An interesting way to think about this problem is to imagine that
UITableView is a replica of your data in SQLite
that you want to remain consistent. Also, just like a database system we want to be able to do this as efficiently
as possible. Deriving this ChangeSet enables efficient updates (with animation) to
diffing 2 DataView's
What is the minimal amount of changes between these 2 DataView's?
- Delete Clear
- Move Objc.io
What about this one?
- Delete Clear
- Delete Mailbox
Think about this, how would you express this to UITableView?
Turns out, NSTableView wants option 1, UITableView wants option 2. NSTableView wants a consistent view of data after each update. We struggled with this problem for a bit, the easy solution was to just write 2 separate sets of code for each, but that defeats the purpose of a shared library :(
However, if you apply the updates in the correct order, you don't need to do index shuffling...
[Delete(2), Delete(1)] works for both!
- Deletes in descending order
- Inserts in ascending order
accidental nice things
After we built this, and looked back on the solution - we ended up accidentally accomplishing things we didn't mean to.
- data locality
- perf improvements (not at first though, we stupidly ported over some optimizations that made CoreData faster, but SQLite slower)
- ability to trivially move from sqlite (we separated ChangeSet calculation from Query completely)