New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: USDA API import to a list of products missing in OFF #9083
base: main
Are you sure you want to change the base?
Conversation
You do not need to run this yourself. :--) Here is the output at this point:
|
Hello! |
Yes. We are trying to use the USDA API and not the downloaded csv file. Having the code to do this in something more modern than perl would not be a terrible thing also. see https://docs.google.com/spreadsheets/d/1EoguFCEF3ZOxyhoJikNX2a6mjGBxV0jqt5A9gV8wmPo/edit#gid=1527328796 |
…into usda_import_via_api
…into usda_import_via_api
I am not seeing something. The USDA gives me nutrients per 100g. I can see some things in the USDA's labelNutrients values. If something is in the per100g data, I can calculate the label value. But. It works if I get the serving size in grams. But what if I get the serving size in mL and no way to determine the mass of a serving? For example, see:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @rkiddy your approach seems good to me.
I think originally @stephanegigandet did this through a csv so it can be imported in the producers platform and checked there. But I think it's ok to have json objects, we could then submit to the API or transform as CSV.
Although, you don't need to create "_tags" fields, they will be created by ProductOpener on product import.
You can attend at the example csv file on the pro platform to have an idea of fields that needs to be supplied (or simply look at the product form on open food facts web).
I will commit a file of the correspondences of fields as setup by @stephanegigandet at the time.
So you could also lean on this file to transform field names in the script (with a for loop) and then do specific transformations on fields that needs it.
def category(name): | ||
global categories | ||
if len(categories) == 0: | ||
with open('/home/ray/Projects/OFF/usda/USDA_fdc_categories.csv', newline='') as csvfile: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
beware the absolute path ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, Stephane includes absolute paths for files on his computers but I cannot do it here? Humph. :--)
Beer,1,en:Beers, | ||
Amino Acid Supplements,1,, | ||
Processed Cheese & Cheese Novelties,1,, | ||
Sauces - Cooking (Shelf Stable),1,en:Sauces, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rkiddy ok, so this is an export of https://docs.google.com/spreadsheets/d/1kO8r2OWLLRuqP-NLkAR8gliVELQHqir6HZ-4CIRs4cM/edit#gid=0 right ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That may have been where I got it. Stephane pointed it out to me in a slack thread. I am, by the way, not understanding that part of it yet. So, I am open to any suggestions.
fields potentially provided by producers openfoodfacts_import.xlsx |
…into usda_import_via_api
@alexgarel A few things in the the usda_to_off_fields.json: I assume that this list is not complete, yes? For example, I see in a product that there are columns for "omega-3-fat", "omega-6-fat", "omega-9-fat" and others. I will go ahead and add them. Are these the same columns as in the API? For example, there I do not see a "modified-date", but I do see a "last_modified_t" that is a unix timestamp. So there may be a separate list for the API? Would it be objectionable to add a "path" string? Perhaps this is how to get around the API difference issue. Perhaps "fiber-g-100g:paths" should be ["nutriment:fiber", "nutriment:fiber_100g", "nutriment:fiber_unit", "nutriment:fiber_value" }. See, for example, https://world.openfoodfacts.org/api/v3/product/3760232740033 Also, I am tempted to add a "datatype" field but will resist for now. For example, "code" is a arbitrary-length integer and "category" is a string, or perhaps a list of strings. And "to-create" is a boolean? But "modified-date" would seem to be a date and I will (for now) add a "format" value here, that might look like "YYYY-mm-dd HH:MM:ss" in UTC. |
FYI, I was asked to look at 10 barcodes, but: Not in USDA and not in OFF: Not in USDA but in OFF: In USDA and OFF: |
Kudos, SonarCloud Quality Gate passed! 0 Bugs No Coverage information |
Codecov Report
@@ Coverage Diff @@
## main #9083 +/- ##
=======================================
Coverage 47.95% 47.95%
=======================================
Files 65 65
Lines 20223 20223
Branches 4914 4914
=======================================
Hits 9697 9697
Misses 9271 9271
Partials 1255 1255 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
@rkiddy you are right. The fact is this is a mapping to obtain a csv file that can be imported through the import function. So we have to options:
@stephanegigandet any thoughts on that ? |
@alexgarel I have learned some things. The first thing is that no API coming out of the USDA has per-serving nutrition information. So, no matter what, we will have to calculate things. The second thing is a bit of s surprise. Looking at the information coming out of the API and out of the exported CSV file, the data coming out of the CSV contains a higher number of digits. The API information is rounded. This suggests that the API info should not be the canonical source. It suggests that the CSV file should be. But there is obviously a problem with processing updates to the information. So, what might you think of these suggestions?
What think you? |
I think it's best to keep things simple. We can use the CSV only, and update it every 6 months when there is a new CSV export. |
I am not sure of how to start so I am just starting something and, if it is completely wrong, I can re-start.
I can see correspondences between many of the keys in the USDA structure and the OFF structure. How much am I supposed to be add from the USDA data? How much is necessary to have in the OFF data? If some of the OFF data is created, are there processes that will create the other values that can be derived from these?
I cannot see the answers to these questions. If my python seems gross, apologies. I spent many years doing java and at times I am not pythonic.