feat: Packaging import through producers platform #8207

stephanegigandet · 2023-03-15T09:59:10Z

This PR enables producers to send us detailed packaging import data through CSV / Excel files uploaded on the producers platform.

The default is for producers to send fields like "packaging 1 shape", "packaging 1 material" etc. for each packaging component, with separate columns for each component.

At least one big producer (Les Mousquetaires / Intermarché) is sending us data with multiple lines for one product (one for each packaging component), so we now have a mechanism to support this as well.

Also included changes:

Extended packaging shapes and materials taxonomies, to support values sent from some producers
New packaging-shapes, packaging-materials and packaging-recycling facets, that are very useful to see if we can correctly map producer data to our taxonomies. Those are populated from the packagings data structure.
New feature in Tags.pm canonicalize_taxonomy_tag() now recognizes entries like "Parent / Child" and "Synonym 1 / Synonym 2" (respectively mapped to the child, and to the entry that matches both synonyms)
Remove the import of packaging data from GS1 (we only had one single shape for all of the product, the data is often incorrect. GS1 now has a new much improved format for packaging data, that we can add support for)
Fix for 2 exact same packaging components added through API are conflated into one #8197
Some refactoring (e.g. deduplicating regular expressions used to process imported data)
A lot of tests

alexgarel

Great work !

I'll let you see my (minor) comments.

alexgarel · 2023-03-30T12:46:43Z

lib/ProductOpener/Import.pm

+	# packaging data is specified in the CSV file in columns named like packagings_1_number_of_units
+	for (my $i = 1; $i <= 10; $i++) {


Suggested change

# packaging data is specified in the CSV file in columns named like packagings_1_number_of_units

for (my $i = 1; $i <= 10; $i++) {

# packaging data is specified in the CSV file in columns named like packagings_1_number_of_units

# we currently search up to 10 components

$IMPORT_MAX_COMPONENTS = 10;

for (my $i = 1; $i <= $IMPORT_MAX_COMPONENTS; $i++) {

lib/ProductOpener/Import.pm

alexgarel · 2023-03-30T12:52:50Z

lib/ProductOpener/Import.pm

+	if ($data_is_complete) {
+		# We seem to have complete data, replace existing data
+		$product_ref->{packagings} = \@input_packagings;


$data_is_complete only tells that you have at least one complete line. Is this enough to consider complete ?

Merging packaging data is very tricky and very likely to generate duplicates, so if we have weights from the producer, for at least one component, I think it's better to replace the whole structure.

alexgarel · 2023-03-30T12:54:17Z

lib/ProductOpener/ImportConvert.pm

+$empty_regexp = '(?:,|\%|;|_|°|-|\/|\\|\.|\s)*';
+$unknown_regexp = 'unknown|inconnu|inconnue|non renseigné(?:e)?(?:s)?|nr|n\/r';
+$not_applicable_regexp = 'n(?:\/|\\|\.|-)?a(?:\.)?|(?:not|non)(?: |-)applicable|no aplica';
+$none_regexp = 'none|aucun|aucune|aucun\(e\)';


really cool.

lib/ProductOpener/Producers.pm

alexgarel · 2023-03-30T15:24:53Z

tests/integration/convert_and_import_excel_file.t


 	# TODO verify images
 	# clean csv and sto
 	unlink $inputs_dir . "eco-score-template.xlsx.csv";
 	unlink $inputs_dir . "test.columns_fields.sto";
-	rmdir remove_tree($outputs_dir);
+	#rmdir remove_tree($outputs_dir);


shan't we remove outputs dir ?

ah yes, I commented it for debugging and forgot it

alexgarel · 2023-03-30T15:26:15Z

tests/integration/expected_test_results/api_v2_product_read/get-existing-product.json

@@ -770,6 +769,23 @@
      "origin" : "france",
      "origin_en" : "france",
      "other_nutritional_substances_tags" : [],
+      "packaging_materials_tags" : [


Why do we have all those changes ?

It's a new feature, I added new packaging_(shapes|materials|recycling) facets in order to be able to better see what producers will send us, and what needs to be added to the taxonomies.

alexgarel · 2023-03-30T15:30:10Z

tests/integration/convert_and_import_excel_file.t

 	create_sto_from_json($columns_fields_json, $columns_fields_file);

 	# step3 convert file
-	my $converted_file = $outputs_dir . "test.converted.csv";
-	my $conv_result;
-	($out, $err, $conv_result) = capture_ouputs(


I think I did use capture_outputs because I was a bit annoyed by the long logs when running test that make them really hard to exploit…

The issue I have with it is that if the code inside dies, then the test actually passes, and it's very hard to understand what happened.

If you can fix it and keep the capture it would be cool :-)

tests/unit/packaging.t

alexgarel · 2023-03-30T15:34:16Z

tests/unit/producers.t

+foreach my $l (qw(en fr es)) {
+	compare_to_expected_results(
+		init_fields_columns_names_for_lang($l),
+		$expected_result_dir . "/column_names_$l.json",
+		$update_expected_results
+	);
+}


😎 Cool addition.

Well it made me realize that the generated hash tables are huge, and that we probably need a better solution than generating zillions of column synonyms. Maybe do some universal synonyms normalization first.

Co-authored-by: Alex Garel <alex@garel.org>

…acts-server into packaging-import

sonarcloud · 2023-04-03T16:38:33Z

Kudos, SonarCloud Quality Gate passed!

0 Bugs
0 Vulnerabilities
0 Security Hotspots
0 Code Smells

No Coverage information
0.0% Duplication

stephanegigandet added 5 commits March 14, 2023 14:29

feat: import packaging data (start)

226573c

lint

49c8455

test column names to fields mapping

b6000d7

column names for packaging components

50fc0c5

update tests

203e03a

stephanegigandet requested a review from a team as a code owner March 15, 2023 09:59

github-actions bot assigned stephanegigandet Mar 15, 2023

stephanegigandet marked this pull request as draft March 15, 2023 09:59

stephanegigandet added 9 commits March 15, 2023 15:01

improve column names matching for packagings

88959ae

fix for weights

cc1c9b9

Merge branch 'main' into packaging-import

2c60d1e

lint

dd2359e

Merge branch 'main' into packaging-import

0cc5211

Merge branch 'main' into packaging-import

a2d83e6

refactor

3d1cafa

refactor and better tests

4bc3c61

lint

bba0882

packaging components on multiple lines import

d265a4f

github-actions bot added the STO label Mar 21, 2023

taxonomy changes for Les Mousquetaires / Intermarché

53e0e13

github-actions bot added the 💥 Merge Conflicts 💥 Merge Conflicts label Mar 23, 2023

stephanegigandet added 3 commits March 23, 2023 19:34

merge

5604e21

canonicalize_tags: match Synonym 1 / Synonym 2 and Parent / Child

818f62a

lint

7c0649a

alexgarel reviewed Mar 30, 2023

View reviewed changes

github-actions bot removed the 💥 Merge Conflicts 💥 Merge Conflicts label Mar 30, 2023

stephanegigandet and others added 7 commits March 30, 2023 18:28

Update lib/ProductOpener/Import.pm

1469dd6

Co-authored-by: Alex Garel <alex@garel.org>

Update lib/ProductOpener/Producers.pm

6a7f91f

Co-authored-by: Alex Garel <alex@garel.org>

remove debug print

dd211f2

Update lib/ProductOpener/Producers.pm

00522ca

Co-authored-by: Alex Garel <alex@garel.org>

Update lib/ProductOpener/Tags.pm

829d8f6

Co-authored-by: Alex Garel <alex@garel.org>

Update tests/unit/packaging.t

9cc4d5e

Co-authored-by: Alex Garel <alex@garel.org>

small fix

c2a2934

stephanegigandet changed the title ~~feat: Packaging import through producers platform (wip)~~ feat: Packaging import through producers platform Mar 30, 2023

stephanegigandet and others added 5 commits March 30, 2023 19:16

Update lib/ProductOpener/Producers.pm

57e2e25

Co-authored-by: Alex Garel <alex@garel.org>

Apply suggestions from code review

6a0a7eb

Co-authored-by: Alex Garel <alex@garel.org>

suggestions from code review

c0f3543

Merge branch 'packaging-import' of github.com:openfoodfacts/openfoodf…

cdb9f55

…acts-server into packaging-import

Merge branch 'main' into packaging-import

ff0fcdd

github-actions bot added the Text label Mar 31, 2023

stephanegigandet added 8 commits March 31, 2023 18:13

fix constant

f38baa3

update tests

7c7d259

update tests

1dd6508

update tests

2f5053f

fix tests

8bfd76a

update tests

d69e5cf

fix test

f04ba04

lint

67c0951

alexgarel approved these changes Apr 4, 2023

View reviewed changes

alexgarel merged commit bfc1fe2 into main Apr 4, 2023

alexgarel deleted the packaging-import branch April 4, 2023 09:19

openfoodfacts-bot mentioned this pull request Apr 4, 2023

chore(main): release 2.10.0 #8251

Merged

stephanegigandet mentioned this pull request Jun 27, 2023

2 exact same packaging components added through API are conflated into one #8197

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Packaging import through producers platform #8207

feat: Packaging import through producers platform #8207

stephanegigandet commented Mar 15, 2023 •

edited

Loading

alexgarel left a comment

alexgarel Mar 30, 2023

stephanegigandet Mar 30, 2023

alexgarel Mar 30, 2023

stephanegigandet Mar 30, 2023

alexgarel Mar 30, 2023

alexgarel Mar 30, 2023

stephanegigandet Mar 30, 2023

alexgarel Mar 30, 2023

stephanegigandet Mar 30, 2023

alexgarel Mar 30, 2023

stephanegigandet Mar 30, 2023

alexgarel Mar 31, 2023

alexgarel Mar 30, 2023

stephanegigandet Mar 30, 2023

sonarcloud bot commented Apr 3, 2023

		# packaging data is specified in the CSV file in columns named like packagings_1_number_of_units
		for (my $i = 1; $i <= 10; $i++) {

feat: Packaging import through producers platform #8207

feat: Packaging import through producers platform #8207

Conversation

stephanegigandet commented Mar 15, 2023 • edited Loading

alexgarel left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sonarcloud bot commented Apr 3, 2023

stephanegigandet commented Mar 15, 2023 •

edited

Loading