Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

codemetapy fails to merge triples for the same person #43

Closed
apirogov opened this issue Jun 16, 2023 · 3 comments
Closed

codemetapy fails to merge triples for the same person #43

apirogov opened this issue Jun 16, 2023 · 3 comments
Labels
invalid This doesn't seem right

Comments

@apirogov
Copy link

apirogov commented Jun 16, 2023

File in1.json:

{
  "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
  "@type": "SoftwareSourceCode",
  "author": [
    {
      "@id": "https://orcid.org/0000-1234-5678-9101",
      "@type": "Person",
      "familyName": "Doe",
      "givenName": "John"
    }
  ],
  "codeRepository": "https://github.com/example/repository",
  "description": "an example",
  "name": "example",
  "version": "0.1.0"
}

File in2.json:

{
  "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
  "@type": "SoftwareSourceCode",
  "author": [
    {
      "email": "john.doe@example.com",
      "@type": "Person",
      "familyName": "Doe",
      "givenName": "John"
    }
  ],
  "codeRepository": "https://github.com/example/repository",
  "description": "an example",
  "name": "example",
  "version": "0.1.0"
}

Run codemetapy in1.json in2.json

Expected result:

Person will have both email and orcid

Actual result:

Person has only email (when passed in this order) or only orcid (when passing in2.json before in1.json)

@apirogov apirogov changed the title codemetapy files to merge triples for the same person codemetapy fails to merge triples for the same person Jun 16, 2023
@broeder-j
Copy link

@apirogov: The current compose in codemetapy is a simple overwrite on the triple level and triples for which are not in the new graph are removed than there is an rdf merge. There is no entity resolution implemented in codemetapy, but this is also stated in the readme.

I can image that one can do better.

A simple rdf merge could already be better (in some cases), but would not be enough, since it only works for objects with identifiers in both graphs.
But it would at least merge the email if the second person also has an orcid as identifier, due to a usual rdf merge, please check if this is the case. I am not sure how blank nodes are handled in detail in codemetapy.

@proycon
Copy link
Owner

proycon commented Sep 13, 2023

The current
compose
in codemetapy is a simple overwrite on the triple level and triples for
which are not in the new graph are removed than there is an rdf merge. There
is no entity resolution implemented in codemetapy, but this is also stated in
the readme.

Correct, it overwrites the entire triple. This behaviour is by design so you
can compose a codemeta file from multiple input files, where the ordering
determines which takes priority. This behaviour is used by
codemeta-harvester.

A simple rdf merge could already be better (in some cases), but would not be enough, since it only works for objects with identifiers in both graphs.

Yes. If you want a merge, the only way to do so currently is to ensure the authors
have the same @id. So if everything already has ORCIDs it'll work fine.
I realize it's sub-optimal and some better mechanism could be implemented

However, merging multiple instances of persons is more tricky than it might
seem. Names are not always consistent (an extra middle name, a missing
diacritic, etc). Then which do you choose? We definitely don't want to end up
with multiple givenName and familyName properties. Multiple emails or urls
may be ok.

Another challenge is when having a graph of multiple SoftwareSourceCode
instances (which codemetapy supports) where an author appears in multiple
projects; but what if he/she has different affiliations in such a context?

@proycon proycon added the invalid This doesn't seem right label Sep 13, 2023
@proycon
Copy link
Owner

proycon commented Sep 13, 2023

Closing as 'invalid' since it's not a bug but by design. But of course the question and discussion itself (feel free to continue here) is very valid, and a better solution may be devised.

@proycon proycon closed this as completed Sep 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
invalid This doesn't seem right
Projects
None yet
Development

No branches or pull requests

3 participants