feat: USDA API import to a list of products missing in OFF #9083

rkiddy · 2023-09-27T23:05:11Z

I am not sure of how to start so I am just starting something and, if it is completely wrong, I can re-start.

I can see correspondences between many of the keys in the USDA structure and the OFF structure. How much am I supposed to be add from the USDA data? How much is necessary to have in the OFF data? If some of the OFF data is created, are there processes that will create the other values that can be derived from these?

I cannot see the answers to these questions. If my python seems gross, apologies. I spent many years doing java and at times I am not pythonic.

…ate them.

rkiddy · 2023-09-27T23:15:12Z

You do not need to run this yourself. :--) Here is the output at this point:

 $ python3 usda_to_off.py
 -------------------------
 status: 200
 -------------------------
 USDA data:
 {'allHighlightFields': '<b>GTIN/UPC</b>: <em>619128673216</em>',
  'brandOwner': "NATURE'S HARVEST",
  'dataSource': 'LI',
  'dataType': 'Branded',
  'description': 'SOUTHWESTERN HOT BAR MIX, HOT',
  'fdcId': 1115329,
  'finalFoodInputFoods': [],
  'foodAttributeTypes': [{'description': 'Changes that were made to this food',
                          'foodAttributes': [{'id': 1019459,
                                              'name': 'Description',
                                              'value': '2'}],
                          'id': 998,
                          'name': 'Update Log'}],
  'foodAttributes': [],
  'foodCategory': 'Other Snacks',
  'foodMeasures': [],
  'foodNutrients': [{'derivationCode': 'LCCS',
                     'derivationDescription': 'Calculated from value per '
                                              'serving size measure',
                     'derivationId': 70,
                     'foodNutrientId': 13797640,
                     'foodNutrientSourceCode': '12',
                     'foodNutrientSourceDescription': "Manufacturer's "
                                                      'analytical; partial '
                                                      'documentation',
                     'foodNutrientSourceId': 9,
                     'indentLevel': 1,
                     'nutrientId': 1003,
                     'nutrientName': 'Protein',
                     'nutrientNumber': '203',
                     'rank': 600,
                     'unitName': 'G',
                     'value': 25.0},
                    {'derivationCode': 'LCCS',
                     'derivationDescription': 'Calculated from value per '
                                              'serving size measure',
                     'derivationId': 70,
                     'foodNutrientId': 13797641,
                     'foodNutrientSourceCode': '12',
                     'foodNutrientSourceDescription': "Manufacturer's "
                                                      'analytical; partial '
                                                      'documentation',
                     'foodNutrientSourceId': 9,
                     'indentLevel': 1,
                     'nutrientId': 1004,
                     'nutrientName': 'Total lipid (fat)',
                     'nutrientNumber': '204',
                     'percentDailyValue': 21,
                     'rank': 800,
                     'unitName': 'G',
                     'value': 50.0},
                    ...(SNIP)....
                    {'derivationCode': 'LCCS',
                     'derivationDescription': 'Calculated from value per '
                                              'serving size measure',
                     'derivationId': 70,
                     'foodNutrientId': 13797652,
                     'foodNutrientSourceCode': '12',
                     'foodNutrientSourceDescription': "Manufacturer's "
                                                      'analytical; partial '
                                                      'documentation',
                     'foodNutrientSourceId': 9,
                     'indentLevel': 1,
                     'nutrientId': 1257,
                     'nutrientName': 'Fatty acids, total trans',
                     'nutrientNumber': '605',
                     'rank': 15400,
                     'unitName': 'G',
                     'value': 0.0},
                    {'derivationCode': 'LCCS',
                     'derivationDescription': 'Calculated from value per '
                                              'serving size measure',
                     'derivationId': 70,
                     'foodNutrientId': 13797653,
                     'foodNutrientSourceCode': '12',
                     'foodNutrientSourceDescription': "Manufacturer's "
                                                      'analytical; partial '
                                                      'documentation',
                     'foodNutrientSourceId': 9,
                     'indentLevel': 1,
                     'nutrientId': 1258,
                     'nutrientName': 'Fatty acids, total saturated',
                     'nutrientNumber': '606',
                     'percentDailyValue': 10,
                     'rank': 9700,
                     'unitName': 'G',
                     'value': 7.14}],
  'foodVersionIds': [],
  'gtinUpc': '619128673216',
  'ingredients': 'CHILE CHICKPEAS, CHILE PEANUTS, PEPITAS, SMOKEHOUSE HICKORY '
                 'SMOKED ALMONDS, CHILE POWDER, CITRIC ACID, VEGETABLE OIL & '
                 'SALT.',
  'marketCountry': 'United States',
  'microbes': [],
  'modifiedDate': '2020-09-12',
  'publishedDate': '2020-11-13',
  'score': -405.18314,
  'servingSize': 28.0,
  'servingSizeUnit': 'g',
  'tradeChannels': ['NO_TRADE_CHANNEL']}
 -------------------------
 OFF data:
 {'categories_tags': ['en:Snacks'],
  'code': '0619128673216',
  'code_tags': ['code-13',
                '0619128673XXX',
                '061912867XXXX',
                '06191286XXXXX',
                '0619128XXXXXX',
                '061912XXXXXXX',
                '06191XXXXXXXX',
                '0619XXXXXXXXX',
                '061XXXXXXXXXX',
                '06XXXXXXXXXXX',
                '0XXXXXXXXXXXX'],
  'ingredients_text': 'CHILE CHICKPEAS, CHILE PEANUTS, PEPITAS, SMOKEHOUSE '
                      'HICKORY SMOKED ALMONDS, CHILE POWDER, CITRIC ACID, '
                      'VEGETABLE OIL & SALT.',
  'ingredients_text_en': 'CHILE CHICKPEAS, CHILE PEANUTS, PEPITAS, SMOKEHOUSE '
                         'HICKORY SMOKED ALMONDS, CHILE POWDER, CITRIC ACID, '
                         'VEGETABLE OIL & SALT.',
  'nutriments': {'calcium_100g': 14.3,
                 'calcium_serving': 143,
                 'calcium_unit': 'MG',
                 'calcium_value': 143,
                 'carbohydrates_100g': 0.679,
                 'carbohydrates_serving': 67.9,
                 'carbohydrates_unit': 'G',
                 'carbohydrates_value': 67.9,
                 'cholesterol_100g': 0.0,
                 'cholesterol_serving': 0.0,
                 'cholesterol_unit': 'MG',
                 'cholesterol_value': 0.0,
                 'energy_serving': 571,
                 'energy_unit': 'KCAL',
                 'energy_value': 571,
                 'fat_100g': 0.5,
                 'fat_serving': 50.0,
                 'fat_unit': 'G',
                 'fat_value': 50.0,
                 'fiber_100g': 0.071,
                 'fiber_serving': 7.1,
                 'fiber_unit': 'G',
                 'fiber_value': 7.1,
                 'iron_100g': 0.257,
                 'iron_serving': 2.57,
                 'iron_unit': 'MG',
                 'iron_value': 2.57,
                 'proteins_100g': 0.25,
                 'proteins_serving': 25.0,
                 'proteins_unit': 'G',
                 'proteins_value': 25.0,
                 'saturated-fat_100g': 0.07139999999999999,
                 'saturated-fat_serving': 7.14,
                 'saturated-fat_unit': 'G',
                 'saturated-fat_value': 7.14,
                 'sodium_100g': 67.9,
                 'sodium_serving': 679,
                 'sodium_unit': 'MG',
                 'sodium_value': 679,
                 'sugars_100g': 0.035699999999999996,
                 'sugars_serving': 3.57,
                 'sugars_unit': 'G',
                 'sugars_value': 3.57,
                 'trans-fat_100g': 0.0,
                 'trans-fat_serving': 0.0,
                 'trans-fat_unit': 'G',
                 'trans-fat_value': 0.0,
                 'vitamin-a_serving': 0.0,
                 'vitamin-a_unit': 'IU',
                 'vitamin-a_value': 0.0,
                 'vitamin-c_100g': 0.43,
                 'vitamin-c_serving': 4.3,
                 'vitamin-c_unit': 'MG',
                 'vitamin-c_value': 4.3},
  'serving_quantity': 28.0,
  'serving_size': '28.0 g',
  'sources_fields': {'org-database-usda': {'available_date': None,
                                           'fdc_category': 'Other Snacks',
                                           'fdc_data_source': 'LI',
                                           'fdc_id': 1115329,
                                           'modified_date': '2020-09-12',
                                           'published_date': '2020-11-13'}}}
 -------------------------

raphael0202 · 2023-09-28T08:16:08Z

Hello!
We already import USDA data, see https://github.com/openfoodfacts/openfoodfacts-server/tree/main/scripts/usda-import

rkiddy · 2023-09-29T16:43:37Z

Hello! We already import USDA data, see https://github.com/openfoodfacts/openfoodfacts-server/tree/main/scripts/usda-import

Yes. We are trying to use the USDA API and not the downloaded csv file.

Having the code to do this in something more modern than perl would not be a terrible thing also.

see https://docs.google.com/spreadsheets/d/1EoguFCEF3ZOxyhoJikNX2a6mjGBxV0jqt5A9gV8wmPo/edit#gid=1527328796
and
#4943

…into usda_import_via_api

rkiddy · 2023-10-13T22:10:28Z

I am not seeing something. The USDA gives me nutrients per 100g. I can see some things in the USDA's labelNutrients values. If something is in the per100g data, I can calculate the label value.

But. It works if I get the serving size in grams. But what if I get the serving size in mL and no way to determine the mass of a serving?

For example, see:

 python3 usda_check.py --upc 041190406913

alexgarel

Hi @rkiddy your approach seems good to me.

I think originally @stephanegigandet did this through a csv so it can be imported in the producers platform and checked there. But I think it's ok to have json objects, we could then submit to the API or transform as CSV.

Although, you don't need to create "_tags" fields, they will be created by ProductOpener on product import.

You can attend at the example csv file on the pro platform to have an idea of fields that needs to be supplied (or simply look at the product form on open food facts web).

I will commit a file of the correspondences of fields as setup by @stephanegigandet at the time.

So you could also lean on this file to transform field names in the script (with a for loop) and then do specific transformations on fields that needs it.

alexgarel · 2023-10-25T14:48:55Z

scripts/usda-import/usda_to_off.py

+def category(name):
+    global categories
+    if len(categories) == 0:
+        with open('/home/ray/Projects/OFF/usda/USDA_fdc_categories.csv', newline='') as csvfile:


beware the absolute path ;-)

So, Stephane includes absolute paths for files on his computers but I cannot do it here? Humph. :--)

alexgarel · 2023-10-25T14:54:40Z

scripts/usda-import/USDA_fdc_categories.csv

+Beer,1,en:Beers,
+Amino Acid Supplements,1,,
+Processed Cheese & Cheese Novelties,1,,
+Sauces - Cooking (Shelf Stable),1,en:Sauces,


@rkiddy ok, so this is an export of https://docs.google.com/spreadsheets/d/1kO8r2OWLLRuqP-NLkAR8gliVELQHqir6HZ-4CIRs4cM/edit#gid=0 right ?

That may have been where I got it. Stephane pointed it out to me in a slack thread. I am, by the way, not understanding that part of it yet. So, I am open to any suggestions.

alexgarel · 2023-10-25T15:08:19Z

fields potentially provided by producers openfoodfacts_import.xlsx

…into usda_import_via_api

rkiddy · 2023-10-25T18:45:16Z

@alexgarel A few things in the the usda_to_off_fields.json:

I assume that this list is not complete, yes? For example, I see in a product that there are columns for "omega-3-fat", "omega-6-fat", "omega-9-fat" and others. I will go ahead and add them.

Are these the same columns as in the API? For example, there I do not see a "modified-date", but I do see a "last_modified_t" that is a unix timestamp. So there may be a separate list for the API?

Would it be objectionable to add a "path" string? Perhaps this is how to get around the API difference issue. Perhaps "fiber-g-100g:paths" should be ["nutriment:fiber", "nutriment:fiber_100g", "nutriment:fiber_unit", "nutriment:fiber_value" }.

See, for example, https://world.openfoodfacts.org/api/v3/product/3760232740033

Also, I am tempted to add a "datatype" field but will resist for now. For example, "code" is a arbitrary-length integer and "category" is a string, or perhaps a list of strings. And "to-create" is a boolean? But "modified-date" would seem to be a date and I will (for now) add a "format" value here, that might look like "YYYY-mm-dd HH:MM:ss" in UTC.

rkiddy · 2023-10-25T19:13:56Z

FYI, I was asked to look at 10 barcodes, but:

Not in USDA and not in OFF:
'20200129783'
'44400176002'
'41570094754'
'72745804113'
'15800050117'

Not in USDA but in OFF:
'0053000006329'

In USDA and OFF:
'4099100028829'
'850229005207'
'856481003043'
'4099100099157

…a json file.

sonarcloud · 2023-10-25T19:32:17Z

Kudos, SonarCloud Quality Gate passed!

0 Bugs
0 Vulnerabilities
0 Security Hotspots
11 Code Smells

No Coverage information
0.0% Duplication

codecov-commenter · 2023-10-26T14:25:01Z

Codecov Report

Merging #9083 (0ee12e4) into main (bd6b3da) will not change coverage.
Report is 3 commits behind head on main.
The diff coverage is n/a.

@@           Coverage Diff           @@
##             main    #9083   +/-   ##
=======================================
  Coverage   47.95%   47.95%           
=======================================
  Files          65       65           
  Lines       20223    20223           
  Branches     4914     4914           
=======================================
  Hits         9697     9697           
  Misses       9271     9271           
  Partials     1255     1255

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

alexgarel · 2023-10-26T15:56:48Z

Are these the same columns as in the API? For example, there I do not see a "modified-date", but I do see a "last_modified_t" that is a unix timestamp. So there may be a separate list for the API?

@rkiddy you are right. The fact is this is a mapping to obtain a csv file that can be imported through the import function.

So we have to options:

either you stick to your approach of creating products through the API (which seems fine to me)
either from the USDA API you create a csv/excel file which is imported to the producers platform

@stephanegigandet any thoughts on that ?

rkiddy · 2023-11-08T18:48:44Z

@alexgarel I have learned some things.

The first thing is that no API coming out of the USDA has per-serving nutrition information. So, no matter what, we will have to calculate things.

The second thing is a bit of s surprise. Looking at the information coming out of the API and out of the exported CSV file, the data coming out of the CSV contains a higher number of digits. The API information is rounded. This suggests that the API info should not be the canonical source. It suggests that the CSV file should be.

But there is obviously a problem with processing updates to the information. So, what might you think of these suggestions?

code can run periodically and automatically (somehow) to make note of new files in the USDA download page.
code for doing the CSV import can be moved from ... that other project where it is, to inside the openfoodfacts-server project
code can be put into the openfoodfacts-server project that will check the API for updated or added products, to be run periodically and automatically (again, somehow). This has a chance of staying under the rate limit for the USDA API.
anything that I am forgetting that will better integrate the import process into the app.

What think you?

stephanegigandet · 2023-11-10T10:53:19Z

I think it's best to keep things simple. We can use the CSV only, and update it every 6 months when there is a new CSV export.

Start something. From a list of products missing in OFF, start to cre…

a1d543f

…ate them.

rkiddy requested a review from a team as a code owner September 27, 2023 23:05

github-actions bot assigned rkiddy Sep 27, 2023

rkiddy changed the title ~~Start something. From a list of products missing in OFF, start to create them~~ USDA API to a list of products missing in OFF, start to create them Sep 29, 2023

rkiddy added 4 commits September 29, 2023 12:16

Merge branch 'main' of github.com:openfoodfacts/openfoodfacts-server …

252169f

…into usda_import_via_api

Tests include one created food.

34ae37a

Merge branch 'main' of github.com:openfoodfacts/openfoodfacts-server …

3d6a96a

…into usda_import_via_api

First version of tool to check USDA importability.

b7599e3

alexgarel approved these changes Oct 25, 2023

View reviewed changes

chore: add usda - off fields correspondance as json

71bf631

teolemon added 🇺🇸 United States Project to improve support in the United States. Data import 🐍 Python labels Oct 25, 2023

rkiddy changed the title ~~USDA API to a list of products missing in OFF, start to create them~~ USDA API import to a list of products missing in OFF Oct 25, 2023

Merge branch 'main' of github.com:openfoodfacts/openfoodfacts-server …

00551b6

…into usda_import_via_api

rkiddy added 2 commits October 25, 2023 12:26

Including requested barcodes as potential tests and adding column dat…

e90b30f

…a json file.

Resolve differences in usda_to_off_fields.json

0ee12e4

github-actions bot removed the Data import label Oct 25, 2023

teolemon changed the title ~~USDA API import to a list of products missing in OFF~~ feat: USDA API import to a list of products missing in OFF Oct 26, 2023

teolemon added 🏭 Producers Platform - data imports 🇺🇸 USDA import labels Nov 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: USDA API import to a list of products missing in OFF #9083

feat: USDA API import to a list of products missing in OFF #9083

rkiddy commented Sep 27, 2023

rkiddy commented Sep 27, 2023

raphael0202 commented Sep 28, 2023 •

edited

rkiddy commented Sep 29, 2023 •

edited

rkiddy commented Oct 13, 2023 •

edited

alexgarel left a comment

alexgarel Oct 25, 2023

rkiddy Oct 25, 2023

alexgarel Oct 25, 2023

rkiddy Oct 25, 2023

alexgarel commented Oct 25, 2023 •

edited

rkiddy commented Oct 25, 2023

rkiddy commented Oct 25, 2023

sonarcloud bot commented Oct 25, 2023

codecov-commenter commented Oct 26, 2023

alexgarel commented Oct 26, 2023

rkiddy commented Nov 8, 2023 •

edited

stephanegigandet commented Nov 10, 2023

feat: USDA API import to a list of products missing in OFF #9083

Are you sure you want to change the base?

feat: USDA API import to a list of products missing in OFF #9083

Conversation

rkiddy commented Sep 27, 2023

rkiddy commented Sep 27, 2023

raphael0202 commented Sep 28, 2023 • edited

rkiddy commented Sep 29, 2023 • edited

rkiddy commented Oct 13, 2023 • edited

alexgarel left a comment

Choose a reason for hiding this comment

alexgarel Oct 25, 2023

Choose a reason for hiding this comment

rkiddy Oct 25, 2023

Choose a reason for hiding this comment

alexgarel Oct 25, 2023

Choose a reason for hiding this comment

rkiddy Oct 25, 2023

Choose a reason for hiding this comment

alexgarel commented Oct 25, 2023 • edited

rkiddy commented Oct 25, 2023

rkiddy commented Oct 25, 2023

sonarcloud bot commented Oct 25, 2023

codecov-commenter commented Oct 26, 2023

Codecov Report

alexgarel commented Oct 26, 2023

rkiddy commented Nov 8, 2023 • edited

stephanegigandet commented Nov 10, 2023

raphael0202 commented Sep 28, 2023 •

edited

rkiddy commented Sep 29, 2023 •

edited

rkiddy commented Oct 13, 2023 •

edited

alexgarel commented Oct 25, 2023 •

edited

rkiddy commented Nov 8, 2023 •

edited