Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store Cloud Vision OCR json files compressed in gzip format #6273

Closed
Tracked by #8764
stephanegigandet opened this issue Jan 11, 2022 · 6 comments
Closed
Tracked by #8764

Store Cloud Vision OCR json files compressed in gzip format #6273

stephanegigandet opened this issue Jan 11, 2022 · 6 comments
Labels
disk space ✨ Feature Features or enhancements to Open Food Facts server good first issue Welcome to Open Food Facts. This issue should be approachable if you're new. Get in touch for help. Hacktoberfest NGINX OCR 🚅 Performance ⏰ Stale This issue hasn't seen activity in a while. You can try documenting more to unblock it.

Comments

@stephanegigandet
Copy link
Contributor

stephanegigandet commented Jan 11, 2022

What

We currently run Cloud Vision on all source images of products, and store the result as .json files, uncompressed:

html/images/products/330/201/010/0105$ ls -lrt
total 1128
-rw-r--r-- 1 off off 152033 Mar 28 2021 1.jpg.orig
-rw-r--r-- 1 off off 182454 Mar 28 2021 1.jpg
-rw-r--r-- 1 off off 56126 Mar 28 2021 2.jpg.orig
-rw-r--r-- 1 off off 68245 Mar 28 2021 2.jpg
-rw-r--r-- 1 off off 170344 Mar 28 2021 1.json
-rw-r--r-- 1 off off 221838 Mar 28 2021 2.json

It takes a lot of space. We could instead store the files directly in gzip format.

Proposed solution

  1. Change the nginx configuration so that nginx looks for 1.json.gz and serves it (either compressed or uncompressed)

nginx on debian is compiled with --with-http_gunzip_module --with-http_gzip_static_module

http://nginx.org/en/docs/http/ngx_http_gzip_static_module.html

  1. Write the files directly as 1.json.gz
  2. Run a script to compress all existing files

Part of

@stephanegigandet stephanegigandet added the ✨ Feature Features or enhancements to Open Food Facts server label Jan 11, 2022
@teolemon teolemon added the OCR label Jan 20, 2022
@stephanegigandet
Copy link
Contributor Author

Trying nginx_static.

This does not work:

    location ~ ^/(.well-known|fonts|images|js|css|rss|data|files|resources)/ {
            # First attempt to serve request as file, then
            # as directory, then fall back to displaying a 404.
            include cors_support.conf;
            auth_basic "off";
            allow all; # Allow all to see content
            gzip_static always;
            try_files $uri $uri/ =404;
    }

Maybe because of https://trac.nginx.org/nginx/ticket/1570

@github-actions
Copy link
Contributor

github-actions bot commented May 3, 2022

This issue is stale because it has been open 90 days with no activity.

@github-actions github-actions bot added the ⏰ Stale This issue hasn't seen activity in a while. You can try documenting more to unblock it. label May 3, 2022
@raphael0202
Copy link
Contributor

Is this issue still relevant? It seems the change is done in production, I can only find gzipped OCR JSON file.

@alexgarel
Copy link
Member

@alexgarel alexgarel added good first issue Welcome to Open Food Facts. This issue should be approachable if you're new. Get in touch for help. Hacktoberfest labels Sep 27, 2022
@raphael0202
Copy link
Contributor

As a summary (because the previous exchange on this thread does not reflect the current status):

  • A script has been run by @cquest to compress all OCR JSON (gzip) on the production server
  • The current nginx config has been updated (both on prod and on openfoodfacts-server repository) to allow to serve gzipped JSON
  • What needs yet to be done to close this issue is to save the OCR JSON as a gzipped file

@raphael0202
Copy link
Contributor

This has been fixed by #8320

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
disk space ✨ Feature Features or enhancements to Open Food Facts server good first issue Welcome to Open Food Facts. This issue should be approachable if you're new. Get in touch for help. Hacktoberfest NGINX OCR 🚅 Performance ⏰ Stale This issue hasn't seen activity in a while. You can try documenting more to unblock it.
Projects
Development

No branches or pull requests

4 participants