Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace the current server call for offline products to something more scalable #4066

Closed
Tracked by #18
teolemon opened this issue Jun 2, 2023 · 6 comments · Fixed by #4166
Closed
Tracked by #18

Replace the current server call for offline products to something more scalable #4066

teolemon opened this issue Jun 2, 2023 · 6 comments · Fixed by #4166
Assignees
Labels
🐛 bug Something isn't working Offline - Browsing

Comments

@teolemon
Copy link
Member

teolemon commented Jun 2, 2023

What

Potential solutions

WDYT @stephanegigandet @raphael0202 @alexgarel @CharlesNepote ?

Part of

@monsieurtanuki
Copy link
Contributor

A solution could be to query the top 10k barcodes first, then to download the products as background tasks, in chunks of 1k or 100.

@CharlesNepote
Copy link
Member

CharlesNepote commented Jun 6, 2023

The Mirabelle tool has a very aggressive cache, well suited for this kind of usage: when doing a request the first one can be a bit long (maybe more than 30s depending on your request), but the second one should be very fast. The database and the cache are refreshed every day.

Datasette (the engine of Mirabelle) can export data in CSV or JSON:

See my examples here: openfoodfacts/openfoodfacts-server#6328 (comment)

The service is working well but we should test the http connection (error code) and the result.

@monsieurtanuki monsieurtanuki self-assigned this Jun 11, 2023
@monsieurtanuki
Copy link
Contributor

This is what I'm going to implement:

  • create a new sql table: offline_barcode(barcode pk)
  • create a new background task that downloads the top 10k products
    • download the top 10k barcodes, possibly with Mirabelle, possibly by 1k, and populate table offline_barcode
    • download the products in offline_barcode that are not already downloaded, possibly by 1k

That would also lead the way to something @teolemon will appreciate: refreshing all the products locally stored.

@teolemon
Copy link
Member Author

@monsieurtanuki

  • why should we create a new table ?
  • eventually, we should be able to invalidate local caches when needed (eg: when we refresh translations for knowledge panels, add a new one, or even like 2 weeks ago add clickability of the summary card for allergens)

@monsieurtanuki
Copy link
Contributor

@teolemon We are better off with an additional barcode work temporary table, that helps us split long running and potentially failing http queries into smaller bullet-proof operations.

That will also help us share code with the "full refresh" feature: downloading all products from a list of barcodes, either the top 10k or the current local products.

@monsieurtanuki
Copy link
Contributor

Really not sure what you're worried about, performance-wise: I've just managed to download the top 10K (FR_fr) barcodes in 191 seconds, with 100 item pages (less than 2 seconds per query).
The next step is just to download the products from a barcode list, again with 100 item pages.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐛 bug Something isn't working Offline - Browsing
Development

Successfully merging a pull request may close this issue.

3 participants