Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decide for a caching level #573

Closed
M123-dev opened this issue Sep 12, 2021 · 9 comments
Closed

Decide for a caching level #573

M123-dev opened this issue Sep 12, 2021 · 9 comments

Comments

@M123-dev
Copy link
Member

A big question that is in the air right now is how we want to cache products and I think it would be good if we solve this question as early as possible. Especially for #470 (comment)

At the moment we cache the products in sqflite and show a snackbar on the product page with the time since caching and a refresh button. The only problem with the current implementation is that we also display outdated data teoretically forever.

It would be best if we find a way to get the best out of both of them, my suggestion would be that we request the last edited time from the backend as often as possible and when the data is outdated we either display a snack bar with "We have new data about this product, reload?" or even better refresh the product directly, of course we should then make sure that things like the scroll position don't change.

Since the offf-dart packge minifies the data (if we only need the last modified date, we only get that and not the whole product) we wouldn't overload the phone, the internet and the backend with unnecessary data.

What is your opinion on that: @monsieurtanuki @teolemon @stephanegigandet

@M123-dev
Copy link
Member Author

Regarding the number of cached products which is interesting for the choice of db for web and probably for the whole app.

Offline scanning (#18) got a bit out of sight but if we implement this feature (with sliced data sets) how many products can we expect?

@monsieurtanuki
Copy link
Contributor

A big question that is in the air right now is how we want to cache products and I think it would be good if we solve this question as early as possible. Especially for #470 (comment)

@M123-dev That's 100% relevant: if we find out that we would be better off with another database system, it's better to acknowledge that before the app actually goes live.

At the moment we cache the products in sqflite and show a snackbar on the product page with the time since caching and a refresh button. The only problem with the current implementation is that we also display outdated data teoretically forever.

Not really "at the moment": it's only in non-merged-yet #571, and the snackbar is not systematic - only when the staleness is longer than 5 minutes (which could be set longer or in the end-user preferences).

It would be best if we find a way to get the best out of both of them, my suggestion would be that we request the last edited time from the backend as often as possible and when the data is outdated we either display a snack bar with "We have new data about this product, reload?"

I already evoked this "freshness" trigger in #571 (comment).
And @stephanegigandet already replied in #178 (comment) that it's not that easy as on the backend the last modified field is not 100% accurate - as it may reflect the product data change but not the data analysis change. That could be a start, though: using the last modified field.

Assuming that we rely a lot on the backend, there's no reason why we should have a large local database. Let's say 1000 products is enough (about 10Mb). Which I think is more than generous.
Going back to my benchmark (#470 (comment)), it looks like:

  • opening a local hive database with 1000 products as box would take about 1s (a "lazy" box would be slightly faster)
  • switching from SharedPreferences to hive would probably give back that extra second

Regarding the relevancy of the products, it's fair to apply a LRU algorithm for the garbage collecting if we go beyond the 1000 product limit. And that could also be part of the end-user settings, as I suspect that many people have faster smartphones that mine.

@M123-dev
Copy link
Member Author

Not really "at the moment": it's only in non-merged-yet #571, and the snackbar is not systematic - only when the staleness is longer than 5 minutes (which could be set longer or in the end-user preferences).

Yes I know, I had now described the behavior before the PR since it does not change the fundamental behavior that much.

And @stephanegigandet already replied in #178 (comment) that it's not that easy as on the backend the last modified field is not 100% accurate - as it may reflect the product data change but not the data analysis change. That could be a start, though: using the last modified field.

Ohh that would of course be a reason against using the last edit time, but if we decide for such a kind of caching behavior then it probably won't be a problem to find / build an alternative for it.

Regarding the relevancy of the products, it's fair to apply a LRU algorithm for the garbage collecting if we go beyond the 1000 product limit. And that could also be part of the end-user settings, as I suspect that many people have faster smartphones that mine.

LRU with maybe an exception for products which are stored in lists should do the job. But thats for another discussion.

@stephanegigandet
Copy link
Contributor

At the moment we cache the products in sqflite and show a snackbar on the product page with the time since caching and a refresh button. The only problem with the current implementation is that we also display outdated data teoretically forever.

I really think we should change that so that we always try to refresh the data that we show to users. e.g. when a user scans a product, we refresh it, when an user opens a product page, we refresh it, when an user accesses the history, we refresh it.

The cache is just to show something faster (and then we update it if needed when we get the results back), or to offer a good experience when there is no internet connection.

So there should not be a "refresh" button: we should refresh.

Regarding the last edit / last modified timestamp: the current OFF API does not support returning 304 today, but that something we could explore. Instead of the last edit / last modified, we could also generate a fingerprint instead (similar to the HTTP ETag : https://developer.mozilla.org/fr/docs/Web/HTTP/Headers/ETag ). That would be much better to make sure we get the last version when and only when it has effectively changed.

@monsieurtanuki
Copy link
Contributor

@stephanegigandet That would mean that #571 should be dismissed, am I correct? (no problem, just checking)
And the concept of stale or fresh data would not be a priority either, as we should refresh a product each time it's accessed by the end-user, am I correct?

@stephanegigandet
Copy link
Contributor

@monsieurtanuki Right, I think it would make things simpler and provide a much better user experience. It's more like most other apps do things: they always refresh, and in the mean time they display what they had previously (if anything).

@M123-dev
Copy link
Member Author

Okay that means we have a caching strategy.

For now, I'd say go with the last edited time stack so we have the basic logic in place, but we should definitely switch to a ETag later. @stephane are you taking care of that server side?


Now it's about how much we leave cached.
I would definitely keep the products which are stored in lists (user lists, not history) cached and then maybe up to a thousand and we can of course let the user modify that.
The last question is whether we can find a future-proof solution that includes offline scanning (#18). Can we query the csv file directly or do we have to add the data to a db for performance reasons?

@stephanegigandet
Copy link
Contributor

@stephane are you taking care of that server side?

Eventually, yes, but as @monsieurtanuki said, if we always refresh when we show something, then proactively refreshing everything is not really needed. The ETag feature would save a few bytes though as we could just return a 304 or something if the product has not changed when we try to refresh.

I filed an issue on the server to track it: openfoodfacts/openfoodfacts-server#5674

@monsieurtanuki
Copy link
Contributor

@monsieurtanuki Right, I think it would make things simpler and provide a much better user experience. It's more like most other apps do things: they always refresh, and in the mean time they display what they had previously (if anything).

Just for the record it's not what we do with osmdroid (openstreetmap library on Android): map tiles have an expiry date, and we refresh them only when they are accessed and expired. Typical freshness duration is one week.

But here it's not osmdroid, I know ;)

The ETag solution is elegant, and would save more than a few bytes as a product weights 10Kb.

@openfoodfacts openfoodfacts locked and limited conversation to collaborators Mar 9, 2022
@teolemon teolemon converted this issue into discussion #1186 Mar 9, 2022

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Projects
None yet
Development

No branches or pull requests

3 participants