Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch from LocalStorage to IndexedDB for Chat History Storage #158

Merged
merged 3 commits into from Sep 19, 2023
Merged

Switch from LocalStorage to IndexedDB for Chat History Storage #158

merged 3 commits into from Sep 19, 2023

Conversation

Ashesh3
Copy link
Contributor

@Ashesh3 Ashesh3 commented Aug 30, 2023

Partially Resolves: #137

This PR addresses the limitation of storing only 20 conversations in the chat history due to the max storage limit of LocalStorage. The solution proposed is to switch to IndexedDB for storing chat conversations, which has a higher storage limit and thus will allow for an unlimited number of conversations to be stored.

Key changes include:

  1. Addition of idb-keyval package to interact with IndexedDB.

  2. Removal of MAX_CONVERSATIONS constant and related logic that limited the number of stored conversations.

  3. Modification of the useChatStore state to use IndexedDB for storage instead of LocalStorage.

  4. Update of the ChatDrawerItems.tsx and ChatMenuItems.tsx components to remove the restrictions on maximum conversations.

By implementing these changes, users will no longer face the restriction of a limited number of stored conversations, improving the user experience and functionality of the chat feature.

Note: This PR includes a version bump to 3 in store-chats.ts to reflect the switch to IndexedDB.

image

Storage limit on IndexedDB:

  • Firefox: 10% of the total disk size. (up to 10GB)
  • Chrome: 60% of the total disk space. (No max limit)

@vercel
Copy link

vercel bot commented Aug 30, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Updated (UTC)
big-agi ✅ Ready (Inspect) Visit Preview Sep 11, 2023 7:28pm

@Ashesh3
Copy link
Contributor Author

Ashesh3 commented Sep 11, 2023

@enricoros Did you get a chance to review it? :) Hope you are enjoying your vacations!

@enricoros
Copy link
Owner

enricoros commented Sep 11, 2023

Hi, this looks really good! I was on purpose ignoring requests when coming back from vacation, to prioritize product brainstorming to code but this looks like a great quality commit.

Do you think the chats can be preserved / migrated when switching? (As a one time operation).

I'm asking because many people have valuable data in the chats and would be disappointed at the product if it wiped the chats at a certain point.

@Ashesh3
Copy link
Contributor Author

Ashesh3 commented Sep 11, 2023

Hi, this looks really good! I was on purpose ignoring requests when coming back from vacation, to prioritize product brainstorming to code but this looks like a great quality commit.

Do you think the chats can be preserved / migrated when switching? (As a one time operation).

I'm asking because many people have valuable data in the chats and would be disappointed at the product if it wiped the chats at a certain point.

Hey, thanks for the comment, I would have used the migrate option in the persist middleware if it were not for the fact that we are switching the entire storage backend itself. In this case, it will not be able to read the old data saved in local storage since the change itself suggests it to use Indexed DB going forward.

What we can do here is, instead use a custom one-off migration to migrate local storage to indexed DB, it will check if there's existing data in local storage, if so, it will migrate it then clear it out from local storage so that the migration need not happen again.

I've implemented the above-mentioned migration in this PR. Do review and let me know how you feel about this approach!

Note: Tested by manually setting local storage key app-chats from the current live version on this deploy preview and was successfully able to move that data to indexed DB.

@Ashesh3
Copy link
Contributor Author

Ashesh3 commented Sep 13, 2023

@enricoros any updates on reviewing this? :P It'd be really nice to have 20+ chats, I constantly find myself having to delete chats 😅

@enricoros
Copy link
Owner

I'm being very careful particularly on this, as I'm super scared to kill people's chats. Imagine the backlash! The code is definitely of high quality, and I want to merge it. Just being cautious for this, as it's potentially destructive.

On the Chat Limit

I am thinking of keeping the limit mechanism in place and raise it to 50. I have no idea what will happen for someone that goes to thousands of chats, and then the DB corrupts or the browser crashes or whatnot.

What if we keep the safety net? The product has no analytics, so we don't know how many people are at limit, but I'm pretty much at limit in all my installs :). However, raising 20 -> 50 could be a good safety net to keep in place.

There's a visual indicator on the chats list that can be activated when nearing the limit (turn it on with Preferences > Experiments > ON), and can help to visually see which chats are "mostly empty" and good to delete.

You may also consider the "MAX" limit as a limit that forces creativity and tidyness. My ChatGPT (I haven't used it since big-AGI) looked like an untify mess :) It's like having 160 chars for a tweet.

So the question is: how do you feel to raise the limit to 50, as a "defensive/safety" measure, while switching to IndexDB?

Side Effects / Other Questions

Support: I checked browser compatibility for IndexedDB and it seems good, but I haven't checked on a few browsers (Android/Chrome, iOS/Mobile Safari, MacOS/Safari). Do you think there's the chance some configuration will be unsupported?

Speed: haven't checked the speed difference of IndexedDB vs localStorage - is there any?

Versioning: If this works well, we can migrate more things to IndexedDB, and use it properly as a structured DB - for instance AI personas definition, which really needs something like a DB. Do you think we need to "version" the name of the DBs/Tables?

@enricoros
Copy link
Owner

enricoros commented Sep 13, 2023

Also, thinking of importing IndexedDB management functions (from the 'idb-keyval' lib, https://github.com/jakearchibald/idb-keyval) internally - so we can have error handling and more control. e.g. common/utils/idbUtils.ts.
This is so that we can change DB names, be in control of closing, apply some of the fixes in the upstream repo that haven't been merged.
Wdyt?

@Ashesh3
Copy link
Contributor Author

Ashesh3 commented Sep 13, 2023

I'm being very careful particularly on this, as I'm super scared to kill people's chats. Imagine the backlash! The code is definitely of high quality, and I want to merge it. Just being cautious for this, as it's potentially destructive.

I totally understand, merging something which has the potential to destroy existing data should be thoroughly considered and examined. I have tested the code on 3 of my existing deployments and there were no issues migrating the chats and no data was lost. If someone uses the app for the first time, they use indexedDB in the first place, no issues for them too. I encourage you and others stumbling across this PR to copy their localstorage key to this deploy preview to see for themselves their data being migrated and not losing anything.

I am thinking of keeping the limit mechanism in place and raise it to 50. I have no idea what will happen for someone that goes to thousands of chats, and then the DB corrupts or the browser crashes or whatnot.

I would have to disagree with keeping the limit mechanism as the entire point of having chat history is that they serve as historical records to refer to in future, I take reference from every other provider, all of them instead of imposing limits on total conversations, instead aim towards improving chat management to make it easier for users to find any of their old chats. If someone hits the 50-chat limit, now every time they have to delete a previous conversation and spend extra time and effort to manually erase chats not needed anymore which is something I believe users should not be troubled with.

Talking about DB corruption, it's as likely to happen as with local storage, both are ultimately managed by the browser itself and corruption to browser files will result in data loss in both cases, no exceptions. In terms of reliability, the way atomic transactions work in IndexedDB, in case a crash happens, the browser will preserve the state of the database just the second before crash. WhatsApp for example stores 100s of MBs of data in the IndexedDB and billions use it on a daily basis.

You may also consider the "MAX" limit as a limit that forces creativity and tidyness. My ChatGPT (I haven't used it since big-AGI) looked like an untify mess :) It's like having 160 chars for a tweet.

I agree with you here, my ChatGPT history is probably cluttered with old conversations I no longer need but I still do not feel like deleting them. I feel it should be left on the user to decide, if they want to keep it organized or not. We should not force them into our own ways. if I want a quick chat, I do not sit and spend time finding a useless chat to delete.

I checked browser compatibility for IndexedDB and it seems good, but I haven't checked on a few browsers (Android/Chrome, iOS/Mobile Safari, MacOS/Safari). Do you think there's the chance some configuration will be unsupported?

I am confident that there would be no issues combability wise, It's a fairly matured feature and used by major projects across the globe. It's supported in all browsers, on all platforms (Opera Mini (phone) does not support either localstorage or indexedDB). I am certain not a single user would be left out incompatible in scope of this project.

Speed: haven't checked the speed difference of IndexedDB vs localStorage - is there any?

IndexedDB is slower by a few milliseconds, not a difference to be ever noticed by any user.

Versioning: If this works well, we can migrate more things to IndexedDB, and use it properly as a structured DB - for instance AI personas definition, which really needs something like a DB. Do you think we need to "version" the name of the DBs/Tables?

This is a good idea; AI personas would benefit from moving it out to IndexedDB. We already have versioning for the database, it will start at 3 once this PR is merged. We can also use a separate IndexedDB for AI personas and keep it separate and clean from the chat records.

P.S: Above are just my opinions, feel free to disagree and I will make the required changes as per your suggestions.

@Ashesh3
Copy link
Contributor Author

Ashesh3 commented Sep 13, 2023

Also, thinking of importing IndexedDB management functions (from the 'idb-keyval' lib, https://github.com/jakearchibald/idb-keyval) internally - so we can have error handling and more control. e.g. common/utils/idbUtils.ts. This is so that we can change DB names, be in control of closing, apply some of the fixes in the upstream repo that haven't been merged. Wdyt?

No issues doing that but i ask myself if we really do need to reinvent the weel and copy their implementation when their library works best for our use case already. If any errors, although rare, do happen, the middleware will catch it and preserve the state accordingly is my understanding. I feel like we'd be unnecessarily adding to the codebase for something that isn't worth major improvement. Basically, keeping it simple and let the library do state management / error handling for us.

Also, currently the middleware allows us to detect and run migrations, in case we decide to add/remove/alter any field in the database.

But yes, changing the DB name itself isn't simple. But again, how often are we going to change the entire name of the database where chats are stored?

@enricoros
Copy link
Owner

@Ashesh3 great comments across the board. Gives me good confidence on the code. In the end I'm the only QA to ensure the production-ready quality of the App, but I feel that you get that and I'm very comfortable with the patch.

Apologies for not jumping on it already - I stayed away from code on purpose the last couple of weeks. I'll be hosting a friend for the next few days, but this will be the first commit you'll see.

I'll probably add back the 50 or 100 limit, but again, it's just a product choice, one of those "constraints that push creativity" kind of things. To be lifted quickly in time. But no worries about the code: I was actually looking at the merge last night and I'll add back the code where needed.

Thanks again, and look forward to this landing next week. If you're interested, I'm also working on the larger and long term planning for this project.

@enricoros
Copy link
Owner

enricoros commented Sep 15, 2023

Found an issue: if the user opens the app (state is migrated) and then closes the window and opens again, chats are lost.

The reason is that 'setItem' on idb is not called after the _migrateLocalStorageToIndexedDB function restores the localStorage data.

Looking at the 'persist.ts' middleware implementation in Zustand, it seems that there isn't any 'setItem()' after the onRehydrateStorage().

Also see the comment a few lines above::

image

We seem to have the issue that migrate() is the place where to perform the load from localStorage, but migrate is not called (as you point out) because we swap the storage backend. The next place, onRehydrateStorage is not persisted to disk after updating the state, which may lead to data loss if the user reloads or closes the window before updating any state - since at this point the local storage values are gone.

@Ashesh3 Ideas?

enricoros added a commit that referenced this pull request Sep 15, 2023
@Ashesh3
Copy link
Contributor Author

Ashesh3 commented Sep 15, 2023

Found an issue: if the user opens the app (state is migrated) and then closes the window and opens again, chats are lost.

The reason is that 'setItem' on idb is not called after the _migrateLocalStorageToIndexedDB function restores the localStorage data.

Looking at the 'persist.ts' middleware implementation in Zustand, it seems that there isn't any 'setItem()' after the onRehydrateStorage().

Also see the comment a few lines above::

image

We seem to have the issue that migrate() is the place where to perform the load from localStorage, but migrate is not called (as you point out) because we swap the storage backend. The next place, onRehydrateStorage is not persisted to disk after updating the state, which may lead to data loss if the user reloads or closes the window before updating any state - since at this point the local storage values are gone.

@Ashesh3 Ideas?

How about this approach? We only clear localStorage after verifying that the IndexedDB contains the migrated data and that it matches the data stored in localStorage. So, before deleting localStorage, we confirm that the chat count in the state (after hydration) is the same as in localStorage. In this case, we can be confident that the migration has been successfully completed and persisted, and it's safe to remove the data from localStorage.

If a situation arises where the user does not interact with the app (and subsequently closes it), the state update may not be written to IndexedDB as expected. However, the next time the user uses the app and interacts with it, their chats will still be available in localStorage, and migration will be attempted again.

After the migration, we'll check if the chat object matches the one in IndexedDB. We could do this in the onFinishHydration function. Only then will we clear the localStorage key.

This shouldn't cause any performance issues, since the migration will be a one-time thing if the user decides to interact with the app, once localStorage is migrated and cleared, the entire migration flow will be skipped.

Otherwise, we could manually call the setItem function with the key and value to perform the migration.

What do you recommend?

@enricoros
Copy link
Owner

Thanks for the two ideas. I tried looking into the setItem, but could not get visibility into that method. Possibly the first idea is best, but need to look at it a bit more.
By the way, the merge in progress is in the https://github.com/enricoros/big-agi/tree/Ashesh3-IndexedDB-storage branch, in the repo, I've not done anything major - just need to clear this issue first.

@enricoros enricoros merged commit b7c7268 into enricoros:main Sep 19, 2023
1 check passed
@enricoros
Copy link
Owner

@Ashesh3 Thanks for your contribution here. This has been merged and tested thoroughly.

The localStorage key is backed up to app-chats-v2 once imported, and the updated code is using a trick to trigger a migrate-ion.

Very good help here, the app is better now.

@tmikaeld
Copy link

This is awesome! Great commit and I love that you're both so careful ❤️

I still think it could benefit from categories/folders/tabs or similar, because now it's just one list short of chaos.
Plus, it would prevent huge DOM objects causing the browser to lag.

So with the 50/100-object limit proposed, maybe if we apply that to the categories, then anyone can just create a new category when it's been filled up.

And you're right about indexedDB not being limited, I have a database with millions of items and handling that without issues. It's only an issue if you apply string-based search or filtering on it, fetching keys is no issue at all no matter the size.

@Ashesh3
Copy link
Contributor Author

Ashesh3 commented Sep 20, 2023

This is awesome! Great commit and I love that you're both so careful ❤️

I still think it could benefit from categories/folders/tabs or similar, because now it's just one list short of chaos. Plus, it would prevent huge DOM objects causing the browser to lag.

So with the 50/100-object limit proposed, maybe if we apply that to the categories, then anyone can just create a new category when it's been filled up.

And you're right about indexedDB not being limited, I have a database with millions of items and handling that without issues. It's only an issue if you apply string-based search or filtering on it, fetching keys is no issue at all no matter the size.

I love the idea about adding folders for better organization of conversations!

It could look something like:

image

@tmikaeld Can you open a new issue about this?

@tmikaeld
Copy link

@Ashesh3

Sweet mockup! It doesn't avoid the huge-DOM issue though, but that's exactly how I'd imagine setting it up.

Maybe it's better if you create it, since I don't want to use your mockup without permission.

@Ashesh3
Copy link
Contributor Author

Ashesh3 commented Sep 20, 2023

@Ashesh3

Sweet mockup! It doesn't avoid the huge-DOM issue though, but that's exactly how I'd imagine setting it up.

Maybe it's better if you create it, since I don't want to use your mockup without permission.

Feel free to create the issue, I myself scavenged the mockup it from a template.

Would storing groups of chat in separate key (where key is the folder or the date) under the same database, resolve the huge-DOM object issue? In that way, we'd be loading small chunks of conversations, instead of all at once.

@tmikaeld
Copy link

The problem occurs when you see too much on the screen at the same time.

Something like this solves it:
https://jcoreio.github.io/react-router-drilldown

Since only the category you're in is rendered.

An alternative to handle unlimited amounts is virtual scroll.

britt pushed a commit to britt/big-agi that referenced this pull request Oct 17, 2023
britt pushed a commit to britt/big-agi that referenced this pull request Oct 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enhanced Chat History Management
3 participants