New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stale reads make our conversations' messages count lag behind #127

Closed
Trolldemorted opened this Issue Feb 15, 2018 · 10 comments

Comments

Projects
None yet
2 participants
@Trolldemorted
Copy link
Contributor

Trolldemorted commented Feb 15, 2018

Imagine the following DbContext:

public static void Lock()
{
    GlobalSemaphore = new Semaphore(1, 1, "ourfancyexamplesemaphore", out bool b);
    GlobalSemaphore.WaitOne();
}

public static void Unlock()
{
    GlobalSemaphore.Release();
}

public static void PlaceOrder()
{
    Lock();
    try
    {
        using (var ctx = new FooDBContext())
        {
            Customer customer = ctx.Customers
                .Include(c => c.Orders)
                .First();
            customer.Orders.Add(new Order()
            {
                Whatever = "cookies"
            });
            customer.OrdersCount += 1;
            ctx.SaveChanges();
        }
    }
    catch(Exception e)
    {
        Debug.WriteLine("{0}: {1}", e.Message, e.StackTrace);
    }
    finally
    {
        Unlock();
    }
}

This code looks implies (at least to me) that OrdersCount will always be equal to the number of orders of each customer (if you do not delete records), even if you use it in multiple processes. However, if you call PlaceOrder in two processes often, OrdersCount will be lower than the actual orders of a customer.

The same thing is happening with our MessagesCount when the App and bg-task fight for the lock. Not super-often, but it happens. If it does, the conversation is borderline unusable. If you select it, the last few messages won't be displayed, the unread marker will be wrong, new messages will be appended but disappear once you reselect the conversation.

Comments very welcome!

  • is our semaphore usage wrong?
  • is our assumption that a new DbContext does not use old caches wrong?
  • is it a bug in ef core?
  • is it a bug in sqlite?
  • is my drive failing?

Full poc available here. Deploy the app, run the app, push the button, change the console client's db path to the one on your system, run the console client, observe db.

cc @golf1052 @SERVCUBED

@Trolldemorted Trolldemorted added the bug label Feb 15, 2018

@Trolldemorted Trolldemorted added this to the 0.3.0 public beta milestone Feb 15, 2018

@golf1052

This comment has been minimized.

Copy link
Contributor

golf1052 commented Feb 18, 2018

Added timestamps. Apparently the lock isn't really a lock if both applications can get a hold of it at the same time...

[2018-02-18T11:12:04.4472573-08:00][UWP][12]: Started place order
[2018-02-18T11:12:04.4482559-08:00][UWP][12]: Attempting to create lock
[2018-02-18T11:12:04.4492575-08:00][UWP][12]: Lock creation status: False. Attempting to get lock.
[2018-02-18T11:12:04.4502565-08:00][UWP][12]: Lock got
[2018-02-18T11:12:04.4622692-08:00][UWP][12]: Customer order count BEFORE addition: 47
[2018-02-18T11:12:04.4623182-08:00][Console][1]: Started place order
[2018-02-18T11:12:04.4632612-08:00][UWP][12]: Customer order number BEFORE addition: 47
[2018-02-18T11:12:04.4635083-08:00][Console][1]: Attempting to create lock
[2018-02-18T11:12:04.4642569-08:00][UWP][12]: Customer order count AFTER addition: 47
[2018-02-18T11:12:04.4647733-08:00][Console][1]: Lock creation status: False. Attempting to get lock.
[2018-02-18T11:12:04.4659480-08:00][Console][1]: Lock got
[2018-02-18T11:12:04.4662562-08:00][UWP][12]: Customer order number AFTER addition: 48
[2018-02-18T11:12:04.4672709-08:00][UWP][12]: Customer order count AFTER increment: 48
[2018-02-18T11:12:04.4697093-08:00][Console][1]: Customer order count BEFORE addition: 47
[2018-02-18T11:12:04.4708298-08:00][Console][1]: Customer order number BEFORE addition: 47
[2018-02-18T11:12:04.4719322-08:00][Console][1]: Customer order count AFTER addition: 47
[2018-02-18T11:12:04.4729332-08:00][Console][1]: Customer order number AFTER addition: 48
[2018-02-18T11:12:04.4739024-08:00][Console][1]: Customer order count AFTER increment: 48
[2018-02-18T11:12:04.4934393-08:00][UWP][12]: Unlocking lock
[2018-02-18T11:12:04.4944382-08:00][UWP][12]: Unlocked lock
[2018-02-18T11:12:04.5002894-08:00][Console][1]: Unlocking lock
[2018-02-18T11:12:04.5014012-08:00][Console][1]: Unlocked lock
@golf1052

This comment has been minimized.

Copy link
Contributor

golf1052 commented Feb 18, 2018

The above statement is false, the Console semaphore and the UWP semaphore are actually two different semaphores.

Yes, you can have duplicate named semaphores.

If I lock the semaphore in the Console before the while loop starts the Console can't get the lock. However if I call Semaphore.OpenExisting() in UWP instead of creating a semaphore I get the exception UnauthorizedAccessException. If I do the reverse, lock the semaphore in UWP and try to open it in the Console, the Console says the semaphore doesn't exist. Looks like UWP semaphores aren't really global. Going to try and see if UWP and UWP works.

@golf1052

This comment has been minimized.

Copy link
Contributor

golf1052 commented Feb 18, 2018

If I create another UWP app and then try opening the existing semaphore I get a WaitHandleCannotBeOpenedException meaning that UWP apps cannot make globally visible semaphores. However I am fairly certain that the App semaphore works in locking out the background task and vice versa. So let me try a background task in the same app.

@golf1052

This comment has been minimized.

Copy link
Contributor

golf1052 commented Feb 18, 2018

Yep, named semaphores between the main app and background tasks do lock as expected. Got no desync issues when testing. That doesn't really help narrow down the current problem in our app.

@Trolldemorted

This comment has been minimized.

Copy link
Contributor Author

Trolldemorted commented Feb 21, 2018

Can you reproduce the bug in Signal-Windows?

You can search for desynced conversations easily:

SELECT count(*) AS RealMessagesCount, SignalConversation.MessagesCount, SignalConversation.ThreadDisplayName
FROM Messages
JOIN SignalConversation ON SignalConversation.ThreadId = Messages.ThreadId
GROUP BY SignalConversation.ThreadDisplayName
HAVING RealMessagesCount != SignalConversation.MessagesCount
@golf1052

This comment has been minimized.

Copy link
Contributor

golf1052 commented Feb 21, 2018

I don't have any issues with my DB. It's also probably because I don't have enough conversations for it to trigger on. Do you have any conversations that were fine before you posted this issue that are now out of sync?

We definitely need more logging so we can start to track down the issue, and it should be fairly easy to notify us if a desync issue has occurred so we can immediately check the logs.

@Trolldemorted

This comment has been minimized.

Copy link
Contributor Author

Trolldemorted commented Feb 21, 2018

New theory, imagine the following:

  • a message with an attachment arrives
  • the message is saved, conversation updated, StartDownloads starts a new task that handles the download
  • a new message in the same conversation arrives
  • the new message is saved, conversation updated
  • the task started by StartDownloads calls UpdateAttachmentLocked, which uses EF core's update function.

Update doc:

Begins tracking the given entity in the Microsoft.EntityFrameworkCore.EntityState.Modified state such that it will be updated in the database when Microsoft.EntityFrameworkCore.DbContext.SaveChanges is called. All properties of the entity will be marked as modified. To mark only some properties as modified, use Microsoft.EntityFrameworkCore.DbSet.Attach to begin tracking the entity in the Microsoft.EntityFrameworkCore.EntityState.Unchanged state and then use the returned Microsoft.EntityFrameworkCore.ChangeTracking.EntityEntry to mark the desired properties as modified. A recursive search of the navigation properties will be performed to find reachable entities that are not already being tracked by the context. These entities will also begin to be tracked by the context. If a reachable entity has its primary key value set then it will be tracked in the Microsoft.EntityFrameworkCore.EntityState.Modified state. If the primary key value is not set then it will be tracked in the Microsoft.EntityFrameworkCore.EntityState.Added state. An entity is considered to have its primary key value set if the primary key property is set to anything other than the CLR default for the property type.

  • ef core searches the attachment resursively
  • attachment has a message
  • message has an author
  • the author is the record representing the conversation
  • the SignalContact object instance was created in IncomingMessages via GetOrCreateContactLocked

=> we could be overwriting the changes that were written in the message handler! ❗️

@golf1052

This comment has been minimized.

Copy link
Contributor

golf1052 commented Feb 22, 2018

Was able to reproduce this, pushed fix here c563eb7

Definitely need to read more about EF to know if this is the preferred way.

@Trolldemorted

This comment has been minimized.

Copy link
Contributor Author

Trolldemorted commented Feb 22, 2018

Can't we just fetch a new record from the db, and update only the values we want? It works like a charm for messages and conversations

@golf1052

This comment has been minimized.

Copy link
Contributor

golf1052 commented Mar 3, 2018

Since this was in the attachments branch that hasn't been pushed yet and the issue is known and has been fixed I'm closing this.

@golf1052 golf1052 closed this Mar 3, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment