Split insight/prediction (part 2) #556

mithridatea · 2022-01-24T10:04:13Z

2nd part of insight refactoring (see #544).

Create a new Prediction table, store all predictions in this table. Keep the insight import logic as it is.

All latent insights will be moved to the Prediction table, it should greatly reduces the API latency issue described here: #567.

Latent insight migration script, to be run after the version is deployed:

WITH to_copy_rows AS (
    SELECT
        barcode,
        type,
        data,
        timestamp,
        value_tag,
        value,
        source_image,
        automatic_processing,
        server_domain,
        predictor
    FROM
        product_insight
    WHERE
        latent is true
)
INSERT INTO
    prediction(
        barcode,
        type,
        data,
        timestamp,
        value_tag,
        value,
        source_image,
        automatic_processing,
        server_domain,
        predictor
    )
SELECT
    *
FROM
    to_copy_rows;

DELETE FROM
    product_insight
WHERE
    latent IS TRUE;

Instead of manually creating InsightImporter with InsightImporterFactory

It makes the insight import process easier to read by removing the BaseInsightImporter class. ingredient_spellcheck insights are currently not generated.

It makes the import process simpler to understand

Latent insights don't exist anymore.

Split completely insight generation (all in process_insights) and insight import

codecov · 2022-01-24T14:22:46Z

Codecov Report

Merging #556 (1e63ca2) into master (30dc472) will increase coverage by 7.05%.
The diff coverage is 60.64%.

@@            Coverage Diff             @@
##           master     #556      +/-   ##
==========================================
+ Coverage   37.68%   44.73%   +7.05%     
==========================================
  Files         103       96       -7     
  Lines        7438     6981     -457     
==========================================
+ Hits         2803     3123     +320     
+ Misses       4635     3858     -777

Impacted Files	Coverage Δ
robotoff/cli/insights.py	`0.00% <0.00%> (ø)`
robotoff/health.py	`0.00% <0.00%> (ø)`
robotoff/metrics.py	`25.80% <0.00%> (-1.32%)`	⬇️
...prediction/category/prediction_from_ocr/cleaner.py	`95.65% <ø> (ø)`
...ediction/category/prediction_from_ocr/constants.py	`100.00% <ø> (ø)`
...ediction/category/prediction_from_ocr/predictor.py	`95.83% <ø> (ø)`
robotoff/prediction/langid.py	`76.74% <ø> (ø)`
robotoff/prediction/object_detection/__init__.py	`100.00% <ø> (ø)`
robotoff/prediction/object_detection/download.py	`0.00% <ø> (ø)`
...rediction/object_detection/utils/label_map_util.py	`22.22% <ø> (ø)`
... and 74 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 89c1c13...1e63ca2. Read the comment docs.

robotoff/app/core.py

alexgarel

This is mostly style comments, but a possible non negated condition.

I've seen you removed the init.py files in tests structure. I reread https://docs.pytest.org/en/6.2.x/goodpractices.html#choosing-a-test-layout-import-rules and I'm ok with that, but I will add a check that there are no filename duplicates in make tests !

robotoff/app/core.py

robotoff/insights/importer.py

Co-authored-by: Alex Garel <alex@garel.org>

Retrieve existing prediction in batch (same barcode).

sonarcloud · 2022-01-28T13:20:34Z

Kudos, SonarCloud Quality Gate passed!

0 Bugs
0 Vulnerabilities
1 Security Hotspot
28 Code Smells

0.0% Coverage
0.0% Duplication

alexgarel

Ok, let's go !

alexgarel · 2022-01-31T09:53:29Z

I tested the DB modifications in preprod but adjusting to drop indexes before and recreate them after, because this is much more efficient.

Here is the script https://gist.github.com/alexgarel/c139f5dd8ed79506bde564a38b6a437a

Mithridatea and others added 17 commits January 18, 2022 15:12

Use import_insights everywhere

e4a4d2c

Instead of manually creating InsightImporter with InsightImporterFactory

Move tests.unit.insights.ocr to tests.unit.prediction.ocr

c5f9e8c

Remove IngredientSpellcheckImporter and BaseInsightImporter classes

142c5c3

It makes the insight import process easier to read by removing the BaseInsightImporter class. ingredient_spellcheck insights are currently not generated.

Remove _process_product_insights method

2b76cb7

It makes the import process simpler to understand

Add is_valid_product_predictions to check ProductPredictions validity

01df964

Save all latent insight in new Prediction table

68d3b99

Latent insights don't exist anymore.

Rename GroupedByOCRInsights into GroupedByBarcodeInsights

e365048

Simplify insight import process

0668b59

Split completely insight generation (all in process_insights) and insight import

Remove legacy migration script

66f709d

Simplify validator processing after latent insight deletion

e52427a

Update latent.py to use predictions instead of insights as input

ae2c5f1

Cosmetics

e49875d

Simplify get_insights function

0b49462

Remove additional legacy code after latent insight removal

8823719

Update generate_nutrition_image_insights after latent insight removal

e4d64a5

Update comment in scheduler file

78a02d9

Format code with black and isort

1a301d0

mithridatea changed the title ~~Split prediction insight 2~~ Slit insight/prediction (part 2) Jan 24, 2022

Mithridatea added 2 commits January 24, 2022 11:09

Merge branch 'master' into split-prediction-insight-2

fa71078

Fix branch after upstream changes in master

d7bf517

mithridatea requested a review from alexgarel January 24, 2022 10:19

mithridatea marked this pull request as ready for review January 24, 2022 10:20

Mithridatea and others added 3 commits January 24, 2022 12:34

Improve docstrings in insights/importer.py

5131f60

Format code with black and isort

99971fd

Fix tests in test_product_updated

29358b9

mithridatea changed the title ~~Slit insight/prediction (part 2)~~ Split insight/prediction (part 2) Jan 24, 2022

Merge branch 'master' into split-prediction-insight-2

36e5389

alexgarel reviewed Jan 27, 2022

View reviewed changes

robotoff/app/core.py Show resolved Hide resolved

alexgarel requested changes Jan 27, 2022

View reviewed changes

mithridatea and others added 4 commits January 28, 2022 11:14

Add docstring to InsightImporter.process_insights

3c4ff7e

Co-authored-by: Alex Garel <alex@garel.org>

fixup: Format Python code with Black

43dd208

Fix bug in is_duplicated_prediction

be840a8

Co-authored-by: Alex Garel <alex@garel.org>

Improve performance of Prediction import

1e63ca2

Retrieve existing prediction in batch (same barcode).

mithridatea requested a review from alexgarel January 28, 2022 13:19

alexgarel approved these changes Jan 28, 2022

View reviewed changes

mithridatea merged commit a2b09e1 into master Jan 28, 2022

mithridatea deleted the split-prediction-insight-2 branch January 28, 2022 14:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Split insight/prediction (part 2) #556

Split insight/prediction (part 2) #556

mithridatea commented Jan 24, 2022 •

edited

Loading

codecov bot commented Jan 24, 2022 •

edited

Loading

alexgarel left a comment

sonarcloud bot commented Jan 28, 2022

alexgarel left a comment

alexgarel commented Jan 31, 2022

Split insight/prediction (part 2) #556

Split insight/prediction (part 2) #556

Conversation

mithridatea commented Jan 24, 2022 • edited Loading

codecov bot commented Jan 24, 2022 • edited Loading

Codecov Report

alexgarel left a comment

Choose a reason for hiding this comment

sonarcloud bot commented Jan 28, 2022

alexgarel left a comment

Choose a reason for hiding this comment

alexgarel commented Jan 31, 2022

mithridatea commented Jan 24, 2022 •

edited

Loading

codecov bot commented Jan 24, 2022 •

edited

Loading