Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build: Add check_quality service for real-time data validation #10129

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

IsaiahLevy
Copy link

@IsaiahLevy IsaiahLevy commented Apr 14, 2024

What

This update introduces a new check_quality service within the APIProductServices module to provide real-time quality warnings and errors for nutrition tables and ingredient lists not directly associated with a product.

Screenshot

Related issue(s) and discussion

@IsaiahLevy IsaiahLevy requested a review from a team as a code owner April 14, 2024 19:40
@github-actions github-actions bot added the API Issues related to the Open Food Facts API. More specific labels exist & should be used (API WRITE…) label Apr 14, 2024
@openfoodfacts openfoodfacts deleted a comment from teolemon Apr 15, 2024
@openfoodfacts openfoodfacts deleted a comment from teolemon Apr 15, 2024
@codecov-commenter
Copy link

Codecov Report

Attention: Patch coverage is 0% with 59 lines in your changes are missing coverage. Please review.

Project coverage is 49.59%. Comparing base (dc04d18) to head (2d0ac16).
Report is 239 commits behind head on main.

Files Patch % Lines
lib/ProductOpener/APIProductServices.pm 0.00% 59 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #10129      +/-   ##
==========================================
+ Coverage   49.54%   49.59%   +0.05%     
==========================================
  Files          67       71       +4     
  Lines       20650    20996     +346     
  Branches     4980     5036      +56     
==========================================
+ Hits        10231    10414     +183     
- Misses       9131     9290     +159     
- Partials     1288     1292       +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Comment on lines 275 to 285
# Check if nutrition data is provided
if (exists $product_ref->{nutrition}) {
ProductOpener::DataQuality::check_quality($product_ref->{nutrition});
$updated_product_fields_ref->{nutrition_data_quality_tags} = $product_ref->{nutrition}->{data_quality_tags};
}

# Check if ingredient data is provided
if (exists $product_ref->{ingredients}) {
ProductOpener::DataQuality::check_quality($product_ref->{ingredients});
$updated_product_fields_ref->{ingredient_data_quality_tags} = $product_ref->{ingredients}->{data_quality_tags};
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check_quality needs the product_ref directly not the field.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would call check_quality($product_ref) and then remove eventual false positive (warnings that come from missing fields).

Maybe we will have to categorize the quality errors/warning in respective taxonomies to say on which field they are dependent to filter them.

Or maybe we should split the check_quality function with more granularity or even add a parameter to say which fields we want to check quality.

Comment on lines 230 to 241
if ($service eq 'check_quality') {
# Create a temporary product reference for quality checks
my $temp_product_ref = {};
$temp_product_ref->{nutrition} = $request_body_ref->{nutrition} if defined $request_body_ref->{nutrition};
$temp_product_ref->{ingredients} = $request_body_ref->{ingredients} if defined $request_body_ref->{ingredients};

# Call the check_quality service, passing the temporary product ref
&$service_function($temp_product_ref, $request_ref->{updated_product_fields});

# Integrate the quality check results back into the main product_ref
$product_ref->{data_quality_tags} = $temp_product_ref->{data_quality_tags} if defined $temp_product_ref->{data_quality_tags};
} else {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea of this pattern is to avoid having specific code according to the service…

Also we will put the ingredients and nutrition fields under the "product" field so you have product_ref working.

}
}
return $error;
my $response_ref = $request_ref->{api_response};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Perl files, we use tabs, they shouldn't be converted to spaces. (depending on your editor, it might pick it from the .editorconfig file)

@@ -107,69 +107,99 @@ my %service_functions = (
extend_ingredients => \&ProductOpener::Ingredients::extend_ingredients_service,
estimate_ingredients_percent => \&ProductOpener::Ingredients::estimate_ingredients_percent_service,
analyze_ingredients => \&ProductOpener::Ingredients::analyze_ingredients_service,
check_quality => \&check_quality_service,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be best to put the check_quality_service function in the DataQuality.pm module like the other services

In fact we could change the existing check_quality function so that it behaves like a service (you can look at other service functions: it expects a product_ref as a first argument, and a reference to a hash of updated fields as the second argument) , and rename it to check_quality_service().

@stephanegigandet
Copy link
Contributor

Hi @IsaiahLevy , thanks a lot for the PR.
Have you been able to run it successfully locally?
Could you add some tests in tests/integration/api_v3_product_services.t to verify that it works as expected?
Thank you!

Copy link

sonarcloud bot commented May 5, 2024

Quality Gate Passed Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
No data about Duplication

See analysis details on SonarCloud

@github-actions github-actions bot added the 💥 Merge Conflicts 💥 Merge Conflicts label May 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Issues related to the Open Food Facts API. More specific labels exist & should be used (API WRITE…) config 🧽 Data quality https://wiki.openfoodfacts.org/Quality 🖼️ Images 💥 Merge Conflicts 💥 Merge Conflicts Products 🧪 tests
Projects
Status: In progress
Development

Successfully merging this pull request may close these issues.

Create an API route to provide Quality Warnings and Errors on a given nutrition table/ingredients list
4 participants