-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add provenance information #128
Comments
Regarding the concrete use case I just realized that one can already filter by source through the
What's missing is the possibility to use the scroll parameter for API queries. Will open another issue for this. Regarding this issue, I rename it to cover provenance information in general which we should add. |
We should add provenance information like we do in lobid-resources, e.g.: "id": "http://lobid.org/resources/HT017203152#!",
"describedby":[
{
"dateCreated":"20120417",
"dateModified":"20120426",
"id":"http://lobid.org/resources/HT017203152"
}
] As schema.org property we should use http://schema.org/mainEntityOfPage. The creation and modification dates of the source record are in fields 001A resp. 001B. Here is a first draft of how it could look like. {
"@context":{
"id":"@id",
"mainEntityOfPage":{
"@id":"http://schema.org/mainEntityOfPage",
"@type":"@id"
},
"dateCreated":"http://schema.org/dateCreated",
"dateModified":"http://schema.org/dateModified",
"isPartOf": "http://schema.org/isPartOf",
"wasDerivedFrom":"http://www.w3.org/ns/prov#wasDerivedFrom"
},
"id":"http://beta.lobid.org/organisations/DE-6#!",
"mainEntityOfPage":[
{
"id":"http://beta.lobid.org/organisations/DE-6",
"dateModified":"2016-01-06",
"wasDerivedFrom":"http://services.dnb.de/sru/bib?operation=searchRetrieve&query=isl%3DDE-6&recordSchema=PicaPlus-xml&version=1.1",
"isPartOf": [ "http://lobid.org/organisations/DBS", "http://lobid.org/organisations/ISIL" ]
}
]
} In this example draft, we would have to create descriptions of the DBS dataset and ISIL registry at http://lobid.org/organisations/DBS and http://lobid.org/organisations/ISIL. I will take some time looking into PROV ontology as well to find a good solution. |
After taking a look at the PROV ontology, I suggest something like this, the following approach of describing the merging activity, pointing to the base records (we'll need to create an entry for DBS data for each record) and the morph files: {
"@context":{
"id":"@id",
"mainEntityOfPage":{
"@id":"http://schema.org/mainEntityOfPage",
"@type":"@id"
},
"dateCreated":"http://schema.org/dateCreated",
"dateModified":"http://schema.org/dateModified",
"wasGeneratedBy":{
"@id":"http://www.w3.org/ns/prov#wasGeneratedBy",
"@type":"@id"
},
"Activity":"http://www.w3.org/ns/prov#Activity",
"startedAtTime":{
"@id":"http://www.w3.org/ns/prov#startedAtTime",
"@container":"xsd:dateTime"
},
"endedAtTime":{
"@id":"http://www.w3.org/ns/prov#startedAtTime",
"@container":"xsd:dateTime"
},
"used":{
"@id":"http://www.w3.org/ns/prov#used",
"@type":"@id"
}
},
"id":"http://lobid.org/organisations/DE-6#!",
"mainEntityOfPage":{
"id":"http://lobid.org/organisations/DE-6",
"wasGeneratedBy":{
"type":"Activity",
"startedAtTime":"2016-09-01T04:30:00Z",
"endedAtTime":"2016-09-01T05:00:00Z",
"used":[
{
"id":"http://services.dnb.de/sru/bib?operation=searchRetrieve&query=isl%3DDE-6&recordSchema=PicaPlus-xml&version=1.1",
"dateModified":"2016-11-16",
"dateCreated":"1999-11-18"
},
{
"id":"http://lobid.org/dbs/AC006"
},
"https://github.com/hbz/lobid-organisations/blob/master/conf/morph-sigel.xml",
"https://github.com/hbz/lobid-organisations/blob/master/conf/morph-dbs.xml",
"https://github.com/hbz/lobid-organisations/blob/master/conf/morph-enriched.xml"
]
}
}
} Information for <ppxml:tag id="001A" occ="">
<ppxml:subf id="0">9006:18-11-99</ppxml:subf>
</ppxml:tag>
<ppxml:tag id="001B" occ="">
<ppxml:subf id="0">9006:02-11-16</ppxml:subf>
<ppxml:subf id="t">14:29:24.000</ppxml:subf>
</ppxml:tag> |
Simple version as discussed offline deployed to stage: http://stage.lobid.org/organisations/DE-38.json |
+1 |
Deployed to production, closing. See http://lobid.org/organisations/DE-38.json (But is this really all we need for this issue to be complete?) |
Reopening as we need to link to the source data. I suggest to use the approach from #128 (comment) wich gets a bit simpler by now as, we stopped merging information from the two sources (sigel registry and dbs) in one entry. |
Here is an updated version of the proposal: {
"@context":{
"id":"@id",
"mainEntityOfPage":{
"@id":"http://schema.org/mainEntityOfPage",
"@type":"@id"
},
"dateCreated":"http://schema.org/dateCreated",
"dateModified":"http://schema.org/dateModified",
"wasGeneratedBy":{
"@id":"http://www.w3.org/ns/prov#wasGeneratedBy",
"@type":"@id"
},
"Activity":"http://www.w3.org/ns/prov#Activity",
"startedAtTime":{
"@id":"http://www.w3.org/ns/prov#startedAtTime",
"@container":"xsd:dateTime"
},
"endedAtTime":{
"@id":"http://www.w3.org/ns/prov#startedAtTime",
"@container":"xsd:dateTime"
},
"used":{
"@id":"http://www.w3.org/ns/prov#used",
"@type":"@id"
}
},
"id":"http://lobid.org/organisations/DE-6#!",
"mainEntityOfPage":{
"id":"http://lobid.org/organisations/DE-6",
"dateModified":"2016-11-16",
"dateCreated":"1999-11-18",
"wasGeneratedBy":{
"type":"Activity",
"startedAtTime":"2016-09-01T04:30:00Z",
"endedAtTime":"2016-09-01T05:00:00Z",
"used":[
{
"id":"http://services.dnb.de/sru/bib?operation=searchRetrieve&query=isl%3DDE-6&recordSchema=PicaPlus-xml&version=1.1"
},
"https://github.com/metafacture/metafacture-core",
"https://github.com/hbz/lobid-organisations/blob/master/conf/morph-sigel.xml",
"https://github.com/hbz/lobid-organisations/blob/master/conf/morph-enriched.xml"
]
}
}
} If we did it all correctly, we'd have to add |
@acka47 any reason why we do not use |
We chose another approach for lobid-organisations: to use schema.org where applicable and add properties from other vocabs or fromn lobid-vocabs if schema.org does not contain what we want. |
@acka47:
for sigil:
|
They created problems for the elasticSearch
Perhaps we introduce a facette for this. |
The property sourceOrganisation needs to be changed to sourceOrganization |
Changed property name: |
Add Info about provenance to GUI #497 ElasticSearch is not able to provide mixed arrays of strings and of objects in |
We should have information about the source dataset(s) of a record that enables people to filter results list so that only orgs in ISIL/DBS are listed.
The text was updated successfully, but these errors were encountered: