Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About nested chat messages query efficiency #2838

Closed
luohao123 opened this issue Jan 16, 2024 · 8 comments
Closed

About nested chat messages query efficiency #2838

luohao123 opened this issue Jan 16, 2024 · 8 comments

Comments

@luohao123
Copy link

Hi, I have a nested chat messages query, it need query out base by roomId, and the replied messages as well.

Currently as suggest by community, I have a worked version like this:

 Stream<List<DbMessageWithDbContactOrRoomMember>>
      getMessagesWithRoomMemberByRoomId(String roomId,
          {int perPage = 25, int pageNum = 0}) {
    // debugPrint("[dao] roomid: $roomId, perPage: $perPage, pageNum: $pageNum");
    final outerMessages = alias(messages, 'm');
    final replies = alias(messages, 'i');
    final selectReplies = selectOnly(replies)
      ..addColumns([
        FunctionCallExpression('json_group_array', [replies.id])
      ])
      ..where(replies.originalId.equalsExp(outerMessages.id));
    final replyIds = subqueryExpression<String>(selectReplies);
    final query = select(outerMessages).join([
      leftOuterJoin(
          roomMember,
          roomMember.id.equalsExp(outerMessages.fromId) &
              roomMember.roomId.equalsExp(outerMessages.roomId))
    ])
      // ..limit(perPage, offset: pageNum * perPage)
      ..limit((pageNum + 1) * perPage, offset: 0)
      ..addColumns([replyIds])
      ..where(outerMessages.roomId.equals(roomId))
      ..orderBy([
        OrderingTerm(
            expression: outerMessages.sendTime, mode: OrderingMode.desc)
      ]);
    return query.watch().map((rows) {
      final messageById = {
        for (final row in rows)
          row.read(outerMessages.id)!: row.readTable(outerMessages),
      };

      return rows.map((row) {
        final id = row.read(outerMessages.id)!;
        DbMessage dbMessage = messageById[id]!;
        final original = switch (dbMessage.originalId) {
          null => null,
          var id => messageById[id],
        };
        final childIds =
            (json.decode(row.read(replyIds)!) as List).cast<String>();
        DbMessageExtra dbMessageExtra = DbMessageExtra.fromDbMessage(
          dbMessage,
          // originalityMessage: null, repliedMessages: null
          originalityMessage: original != null
              ? DbMessageExtra.fromDbMessage(original,
                  repliedMessages: null, originalityMessage: null)
              : null,
          repliedMessages: [
            for (final childId in childIds)
              if (messageById.containsKey(childId))
                DbMessageExtra.fromDbMessage(messageById[childId]!,
                    repliedMessages: null, originalityMessage: null)
          ],
        );
        // print(dbMessageExtra);
        return DbMessageWithDbContactOrRoomMember(
          dbMessageExtra,
          roomMember: row.readTableOrNull(roomMember),
        );
      }).toList();
    });
  }

But currently I found it caused efficiency issue on my app when messages up to about 500 messages.

If message not too many, it's faster with a blink, but if messages more than about 500, need to load 1s.

However, i have using limited to 25 per query.

Can u guide me some way to make it faster? At least it won't takes too many time when limit to 25

@simolus3
Copy link
Owner

Why did you comment out offset: pageNum * perPage? Isn't that required for pagination?

To the actual issue, nothing obvious stands out. Are you using foreign keys for originalId and roomId? Did you enable foreign keys with a pragma? That would generate some indexes which might make this query faster.
Alternatively, you could add an index on the originalId column to find replies more efficiently - but I don't know if that's the actual bottleneck. If you use the profiler from the DevTools, doe calls in the sqlite3 package stand out to be particularly slow? (if we spend a second on the query, it should be visible in the profiler)

@luohao123
Copy link
Author

@simolus3 the reason why I not using pageNation, but simple just query 20, 40, 60, 80 etc, because I can not judge whether the data was last batch, the UI side so complicated to judge, so simple query them all.

But this is not the question, since the laggy happen only first batch, which is only 25 element queried out.

I didn't using foregin key, the originalId is just a table field of messages, not foregin key design.

To be more specically:

  1. the slow only happens when one room messages more than 500 so, if just 100-200 messgaes, it's fast and blink to query out;

Alternatively, you could add an index on the originalId column to find replies more efficiently

will this more faster?

I think roughly this code can work, but I wanna it be a little bit more faster,

@simolus3
Copy link
Owner

the slow only happens when one room messages more than 500 so, if just 100-200 messgaes, it's fast and blink to query out;

This sounds like the runtime query might be quadratic with the amount of messages looked up. A missing index on originalId is one possible explanation for this, since for every message found, sqlite3 will have to iterate over all messages again just to check whether they are a response.

So you can try adding an index on originalId and see if it improves things. You can also pull the database file from the device and run the query locally (or use EXPLAIN) to see what sqlite3 is doing.

@luohao123
Copy link
Author

@simolus3 thank u sir.

A missing index on originalId is one possible explanation for this

Seems no index on originalId can caused repeatly query, make it quadratic.

Do u know how can I add such index in my table? Need to do migrate?

@luohao123
Copy link
Author

Also, I thought a little bit, this line could cause very serious repeated query, no matter what first batch I query, this will query through all data:

final selectReplies = selectOnly(replies)
..addColumns([
FunctionCallExpression('json_group_array', [replies.id])
])
..where(replies.originalId.equalsExp(outerMessages.id));

@simolus3
Copy link
Owner

Do u know how can I add such index in my table?

Add a @TableIndex annotation to your Messages table class, like

@TableIndex(name: 'messages_original_id', columns: {#originalId})
class Messages extends Table {

Need to do migrate?

Yes, you need to increment the schema version and run m.create(messagesOriginalId) to create the index after adding it to the table.

Also, I thought a little bit, this line could cause very serious repeated query, no matter what first batch I query, this will query through all data:

Databases apply a bunch of optimizations to the query. When this subquery is executed, the outerMessages.id value is already known. So the database shouldn't have to go through all messages, just the one with a matching originalId. But since there is no index, it doesn't have a mechanism of just looking up matching messages. So my assumption is that, after adding an index, that part of the query will be much faster.

But it's important to know that this is just an assumption. To analyze seemingly slow queries, you need to run an EXPLAIN statement to see whether the assumptions about the missing index are actually true. The easiest way to do that is to use the sqlite3 command line tool on the database in the app.

@luohao123
Copy link
Author

@simolus3 Hi, I ended with created a new table which stores message_id and original_id (for current message replys and the messages replied to current message).

the speed is fast now.

However, my solution seems not same as yours. Possiably your solution maybe even more faster?

@simolus3
Copy link
Owner

I don't think so, it's likely that your solution sped it up because sqlite3 will create indices automatically for foreign keys. So your solution is essentially the same as mine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants