Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DFC] match taxons to dfc product types #11817

Conversation

rioug
Copy link
Collaborator

@rioug rioug commented Nov 16, 2023

ℹ️ This is a funded feature. Please use the Clockify project with the following name when working on this, including review and test: #9170 OFN DFC Products

What? Why?

Upstream patch has been merged datafoodconsortium/connector-codegen#10, which fix previously blocking issue: datafoodconsortium/connector-ruby#13. We'll have to remove the parser monkey patching once the changes are releases.

To make matching taxon to DFC product type, we added a "dfc id" field so the user can specify which DFC product type a given taxon refers to. "dfc id" are meant to refers to a DFC product type URI defined in https://github.com/datafoodconsortium/taxonomies/releases/latest/download/productTypes.rdf, you can browse the file here . For instance, if you wanted to link the "Bread" category to the "DFC Bread", you would use the following URI : https://github.com/datafoodconsortium/taxonomies/releases/latest/download/productTypes.rdf#bread
The "dfc id" is then used to pick the correct DFC product type when the API returns a product.
Similarly, when creating/updating a product, the DFC api will try to find the appropriate taxon using the taxon's dfc id.

Technical note:

Product types (and other SKOSConcepts) are modelled as Class method on the connector, which return a SKOSConcept object. IE: to access the "soft drink" product type you call connector.PRODUCT_TYPES.DRINK.SOFT_DRINK. For ease of retrieving the SKOSConcept we want, we first parse the Product type tree and store it in SuppliedProductBuilder::PRODUCT_TYPES hash, with the path to access it, for instance :
PRODUCT_TYPES.DRINK.SOFT_DRINK.LEMONADE is stored as :
SuppliedProductBuilder::PRODUCT_TYPES = { lemonade: [:DRINK, :SOFT_DRINK] }

What should we test?

As a super admin

  • visit Configuration -> Taxonomies
  • Edit the product Taxon
  • Right click on your chosen taxon -> Edit

NOTE "dfc name" has been renamed "dfc id"

taxon_dfc_name

  • Add a dfc id click update
    -> dfc id is saved
  • Right click on the same taxon -> Edit
  • Update the dfc id
    -> dfc id is saved

Release notes

Changelog Category (reviewers may add a label for the release notes):

  • User facing changes
  • API changes (V0, V1, DFC or Webhook)
  • Technical changes only
  • Feature toggled

The title of the pull request will be included in the release notes.

@rioug rioug self-assigned this Nov 16, 2023
@rioug
Copy link
Collaborator Author

rioug commented Nov 16, 2023

As explained here datafoodconsortium/connector-ruby#13 , only the narrowest (more specific one) product type is modelled as a DataFoodConsortium::Connector::SKOSConcept, other broader product types are modelled as ataFoodConsortium::Connector::SKOSInstance.
The DfcLoader.connector.export requires a DataFoodConsortium::Connector::SKOSConcept for the property dfc-b:hasType, meaning we can only use the most specific produc type:

  • we can't use Drink -> Soft Drink,
  • we can only use the most specific one Drink -> Soft Drink -> Lemonade

We also need broaders, narrowers and prefLabels to be populated for DataFoodConsortium::Connector::SKOSConcept so we can easily traverse the product type try for matching purposes

As a starting point this is where the DataFoodConsortium::Connector::SKOSInstance is added to the parent, so we can call something like : DfcLoader.connector.PRODUCT_TYPES.DRINK
https://github.com/datafoodconsortium/connector-ruby/blob/330d97f0a94696ed51919231e8678e05d67f0021/lib/datafoodconsortium/connector/skos_parser.rb#L126C1-L128

@rioug rioug force-pushed the 10809-match-taxons-to-DFC-product-types branch from 287876b to 792834d Compare December 15, 2023 03:20
@rioug rioug changed the title [WIP][DFC] match taxons to dfc product types [DFC] match taxons to dfc product types Dec 15, 2023
@rioug rioug added the api changes These pull requests change the API and can break integrations label Dec 15, 2023
@rioug rioug marked this pull request as ready for review December 15, 2023 04:39
@rioug rioug requested a review from mkllnk December 15, 2023 04:40
Copy link
Member

@mkllnk mkllnk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great!

I'm tempted to approve but I think that we should change to using a URI instead of a name. That will avoid guessing and allows for name changes.

I also have a pull request on the connector which will replace the guessing of setter method names. I wonder if we need to tweak the import feature there as well:

@@ -11,9 +11,11 @@
id: 90_000,
supplier: enterprise, name: "Pesto", description: "Basil Pesto",
variants: [variant],
primary_taxon: taxon
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you change the spec data, it's good to run swagger to document that change in the same commit.

Comment on lines 113 to 114
dfc_name = supplied_product.productType.prefLabels[:en].downcase
taxon = Spree::Taxon.find_by(dfc_name: )
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that it would be more reliable to store the URL as dfc_id instead of the name. Labels can change. And we may want to support additional vocabularies one day.

def self.taxon(supplied_product)
# We use english locale, might need to make this configurable
dfc_name = supplied_product.productType.prefLabels[:en].downcase
taxon = Spree::Taxon.find_by(dfc_name: )
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we look for the presence of broader types? If a product comes in as wine and OFN doesn't have wine then it could still be classified as drink, right? That would mean traversing through the broader terms. Potentially, that could be in one SQL query if we can sort by broadness. That may be easier in Ruby. Anyway, this could be an improvement later on.

@@ -0,0 +1,5 @@
class AddDfcNameToSpreeTaxons < ActiveRecord::Migration[7.0]
def change
add_column :spree_taxons, :dfc_name, :string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I imagined a new table that can attach a DFC link to anything. But it may be good to start simple like this. We can always generalise later and move the data into a new table.

@rioug
Copy link
Collaborator Author

rioug commented Dec 18, 2023

I also have a pull request on the connector which will replace the guessing of setter method names. I wonder if we need to tweak the import feature there as well:

I had a quick look, and I think it should work as it is but I'll double check that.

@rioug rioug requested a review from mkllnk December 18, 2023 05:47
db/schema.rb Outdated
@@ -924,6 +925,7 @@
t.boolean "show_api_key_view", default: false, null: false
t.string "provider"
t.string "uid"
t.datetime "terms_of_service_accepted_at"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
t.datetime "terms_of_service_accepted_at"

@rioug rioug requested a review from mkllnk December 22, 2023 03:56
Copy link
Member

@mkllnk mkllnk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good refactor.

Am I right to assume that this will become easier when the connector has broader and narrower defined?

type = call_dfc_product_type(current_stack)

id = type.semanticId
@product_types[id] = current_stack
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't you store the type directly? Then you don't need to resolve the stack when looking up types by id.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure to be honest, it makes so much sense now that you are pointing it out 😄

@rioug
Copy link
Collaborator Author

rioug commented Dec 22, 2023

Am I right to assume that this will become easier when the connector has broader and narrower defined?

With the patched skos parser, skos concept now have broader and narrower populated, but they return an array of URI. So it's just easier to do something like this DfcLoader.connector.PRODUCT_TYPES.DRINK.SOFT_DRINK.method(false) to get a list of narrower product types, because you can then call the result straight away without having to parse/transform the URI to get the matching method name.
The way they implemented it looks like what Maxime was talking about here for the type script version : datafoodconsortium/connector-codegen#10 (review) , but I wonder if it would be easier to have some sort of store like the PHP version, so then we could just query the connector like : DfcLoader.connector.store.fetch("https://test.com/productTypes.rdf#lemonade") so we don't have to do the extra matching work.

But that means your suggested improvement about checking broader terms should be easily feasible.

@rioug rioug requested a review from mkllnk December 22, 2023 05:36
Copy link
Member

@mkllnk mkllnk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice improvement. 👍

@rioug rioug requested a review from mkllnk January 4, 2024 04:51
Comment on lines 132 to 134
).sub(
"dfc-pt:",
"https://github.com/datafoodconsortium/taxonomies/releases/latest/download/productTypes.rdf#"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mkllnk I fixed the importer based on your comment here : datafoodconsortium/connector-codegen#12 (comment)

Copy link
Member

@mkllnk mkllnk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!

Now we just have to reconcile this with the code generator... 🙃

@mkllnk mkllnk force-pushed the 10809-match-taxons-to-DFC-product-types branch 4 times, most recently from ab61505 to e9399ce Compare January 5, 2024 03:35
@drummer83 drummer83 force-pushed the 10809-match-taxons-to-DFC-product-types branch 2 times, most recently from 49f569a to 9a48e7f Compare January 10, 2024 11:49
@drummer83 drummer83 self-assigned this Jan 10, 2024
@drummer83 drummer83 added pr-staged-fr staging.coopcircuits.fr pr-staged-uk staging.openfoodnetwork.org.uk labels Jan 10, 2024
rioug and others added 20 commits January 22, 2024 10:12
It relies on having dfc_name populated on the given taxon.
Matching is as follow:
 - parse the DFC product types and store in PRODUCT_TYPES if needed
 - match the dfc_name against PRODUCT_TYPES
 - call the method returned on the DFC connector
This is to test the product type matching is working properly
Ie: you can now use "Soft drink" instead of "soft_drink", case is also
ignored
It keeps SuppliedProductBuilder and move all the matching logic to its
own class
Doesn't try to load skos concept if we are not dealing with an object
that doesn't refers to a skos concept
The given productType.rdf file doesn't give us any context for `dfc-pt`,
so there was no reason to do that.
We still need to do some substitution in the importer, as some times
we are given `dfc-pt` as input data.
The `dfc-b:hasType` value can only be parsed as object id if the context
contains:

```
    "dfc-b:hasType":{
      "@type":"@id"
    },
```

The standard context includes this and it's easier to use. Now that the
URIs of product types are correctly resolved, we don't need to
substitute the URI manually.

Also dropped an old unneeded spec for backwards compatibility.
This was missed in a previous refactor
@rioug rioug force-pushed the 10809-match-taxons-to-DFC-product-types branch from 45268fe to 96a0100 Compare January 21, 2024 23:13
Currently it's not possibel to compare two
`DataFoodConsortium::Connector::SKOSConcept` or
two `VirtualAssembly::Semantizer::SemanticObject with` `==`.

Related to : https://github.com/assemblee-virtuelle/semantizer-ruby/pull/2/files
@rioug rioug requested a review from mkllnk January 22, 2024 00:53
@drummer83 drummer83 self-assigned this Jan 24, 2024
@drummer83 drummer83 added the pr-staged-uk staging.openfoodnetwork.org.uk label Jan 24, 2024
@drummer83
Copy link
Contributor

Hi @rioug,
Thanks for fixing the conflict and providing such detailed testing information.

Before staging

image

After staging

We now have the field DFC-URI available - note that it's neither called DFC ID nor DFC Name - as I had expected from the opening post.
Anyway, I can add, edit and save the value.

image

I have also tried the "bread" example. It worked well when I found out that the request requires the variant id, not the product id. I found this misleading because the URL actually has "supplied_product" in it - not sure if this could benefit from some clarification.
image

In the end it worked well. 👍
Thanks again! I will merge this one! 🚀

@drummer83 drummer83 merged commit 7731317 into openfoodfoundation:master Jan 24, 2024
52 checks passed
@drummer83 drummer83 removed the pr-staged-uk staging.openfoodnetwork.org.uk label Jan 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api changes These pull requests change the API and can break integrations
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Match product taxons to DFC product types
4 participants