Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add provenance information #128

Closed
acka47 opened this issue Apr 12, 2016 · 15 comments
Closed

Add provenance information #128

acka47 opened this issue Apr 12, 2016 · 15 comments
Assignees

Comments

@acka47
Copy link
Contributor

acka47 commented Apr 12, 2016

We should have information about the source dataset(s) of a record that enables people to filter results list so that only orgs in ISIL/DBS are listed.

@acka47 acka47 self-assigned this Apr 12, 2016
@acka47
Copy link
Contributor Author

acka47 commented Apr 12, 2016

Regarding the concrete use case I just realized that one can already filter by source through the isil and dbsID fields.

What's missing is the possibility to use the scroll parameter for API queries. Will open another issue for this. Regarding this issue, I rename it to cover provenance information in general which we should add.

@acka47 acka47 changed the title Add information about source of a record (ISIL/DBS) Add provenance information Apr 12, 2016
@acka47
Copy link
Contributor Author

acka47 commented Apr 12, 2016

We should add provenance information like we do in lobid-resources, e.g.:

"id": "http://lobid.org/resources/HT017203152#!",
"describedby":[
   {
      "dateCreated":"20120417",
      "dateModified":"20120426",
      "id":"http://lobid.org/resources/HT017203152"
   }
]

As schema.org property we should use http://schema.org/mainEntityOfPage. The creation and modification dates of the source record are in fields 001A resp. 001B. Here is a first draft of how it could look like.

{
   "@context":{
      "id":"@id",
      "mainEntityOfPage":{
         "@id":"http://schema.org/mainEntityOfPage",
         "@type":"@id"
      },
      "dateCreated":"http://schema.org/dateCreated",
      "dateModified":"http://schema.org/dateModified",
      "isPartOf": "http://schema.org/isPartOf",
      "wasDerivedFrom":"http://www.w3.org/ns/prov#wasDerivedFrom"
   },
   "id":"http://beta.lobid.org/organisations/DE-6#!",
   "mainEntityOfPage":[
      {
         "id":"http://beta.lobid.org/organisations/DE-6",
         "dateModified":"2016-01-06",
         "wasDerivedFrom":"http://services.dnb.de/sru/bib?operation=searchRetrieve&query=isl%3DDE-6&recordSchema=PicaPlus-xml&version=1.1",
         "isPartOf": [ "http://lobid.org/organisations/DBS", "http://lobid.org/organisations/ISIL" ]
      }
   ]
}

In this example draft, we would have to create descriptions of the DBS dataset and ISIL registry at http://lobid.org/organisations/DBS and http://lobid.org/organisations/ISIL. I will take some time looking into PROV ontology as well to find a good solution.

@acka47
Copy link
Contributor Author

acka47 commented Sep 1, 2017

After taking a look at the PROV ontology, I suggest something like this, the following approach of describing the merging activity, pointing to the base records (we'll need to create an entry for DBS data for each record) and the morph files:

{
   "@context":{
      "id":"@id",
      "mainEntityOfPage":{
         "@id":"http://schema.org/mainEntityOfPage",
         "@type":"@id"
      },
      "dateCreated":"http://schema.org/dateCreated",
      "dateModified":"http://schema.org/dateModified",
      "wasGeneratedBy":{
         "@id":"http://www.w3.org/ns/prov#wasGeneratedBy",
         "@type":"@id"
      },
      "Activity":"http://www.w3.org/ns/prov#Activity",
      "startedAtTime":{
         "@id":"http://www.w3.org/ns/prov#startedAtTime",
         "@container":"xsd:dateTime"
      },
      "endedAtTime":{
         "@id":"http://www.w3.org/ns/prov#startedAtTime",
         "@container":"xsd:dateTime"
      },
      "used":{
         "@id":"http://www.w3.org/ns/prov#used",
         "@type":"@id"
      }
   },
   "id":"http://lobid.org/organisations/DE-6#!",
   "mainEntityOfPage":{
      "id":"http://lobid.org/organisations/DE-6",
      "wasGeneratedBy":{
         "type":"Activity",
         "startedAtTime":"2016-09-01T04:30:00Z",
         "endedAtTime":"2016-09-01T05:00:00Z",
         "used":[
            {
               "id":"http://services.dnb.de/sru/bib?operation=searchRetrieve&query=isl%3DDE-6&recordSchema=PicaPlus-xml&version=1.1",
               "dateModified":"2016-11-16",
               "dateCreated":"1999-11-18"
            },
            {
               "id":"http://lobid.org/dbs/AC006"
            },
            "https://github.com/hbz/lobid-organisations/blob/master/conf/morph-sigel.xml",
            "https://github.com/hbz/lobid-organisations/blob/master/conf/morph-dbs.xml",
            "https://github.com/hbz/lobid-organisations/blob/master/conf/morph-enriched.xml"
         ]
      }
   }
}

Information for dateCreated and dateModified of Sigel records can be found in fields 001A and 001B, see also https://wiki.dnb.de/download/attachments/43090988/normdaten_badr.pdf. Example:

<ppxml:tag id="001A" occ="">
  <ppxml:subf id="0">9006:18-11-99</ppxml:subf>
</ppxml:tag>
<ppxml:tag id="001B" occ="">
  <ppxml:subf id="0">9006:02-11-16</ppxml:subf>
  <ppxml:subf id="t">14:29:24.000</ppxml:subf>
</ppxml:tag>

@fsteeg
Copy link
Member

fsteeg commented Sep 15, 2017

Simple version as discussed offline deployed to stage: http://stage.lobid.org/organisations/DE-38.json

@fsteeg fsteeg assigned acka47 and unassigned fsteeg Sep 15, 2017
@fsteeg fsteeg added review and removed ready labels Sep 15, 2017
@acka47
Copy link
Contributor Author

acka47 commented Sep 15, 2017

+1

@fsteeg
Copy link
Member

fsteeg commented Sep 18, 2017

Deployed to production, closing. See http://lobid.org/organisations/DE-38.json

(But is this really all we need for this issue to be complete?)

@fsteeg fsteeg closed this as completed Sep 18, 2017
@fsteeg fsteeg removed the deploy label Sep 18, 2017
@acka47
Copy link
Contributor Author

acka47 commented Apr 20, 2018

Reopening as we need to link to the source data. I suggest to use the approach from #128 (comment) wich gets a bit simpler by now as, we stopped merging information from the two sources (sigel registry and dbs) in one entry.

@acka47 acka47 reopened this Apr 20, 2018
@acka47
Copy link
Contributor Author

acka47 commented Apr 20, 2018

Here is an updated version of the proposal:

{
  "@context":{
    "id":"@id",
    "mainEntityOfPage":{
      "@id":"http://schema.org/mainEntityOfPage",
      "@type":"@id"
    },
    "dateCreated":"http://schema.org/dateCreated",
    "dateModified":"http://schema.org/dateModified",
    "wasGeneratedBy":{
      "@id":"http://www.w3.org/ns/prov#wasGeneratedBy",
      "@type":"@id"
    },
    "Activity":"http://www.w3.org/ns/prov#Activity",
    "startedAtTime":{
      "@id":"http://www.w3.org/ns/prov#startedAtTime",
      "@container":"xsd:dateTime"
    },
    "endedAtTime":{
      "@id":"http://www.w3.org/ns/prov#startedAtTime",
      "@container":"xsd:dateTime"
    },
    "used":{
      "@id":"http://www.w3.org/ns/prov#used",
      "@type":"@id"
    }
  },
  "id":"http://lobid.org/organisations/DE-6#!",
  "mainEntityOfPage":{
    "id":"http://lobid.org/organisations/DE-6",
    "dateModified":"2016-11-16",
    "dateCreated":"1999-11-18",
    "wasGeneratedBy":{
      "type":"Activity",
      "startedAtTime":"2016-09-01T04:30:00Z",
      "endedAtTime":"2016-09-01T05:00:00Z",
      "used":[
        {
          "id":"http://services.dnb.de/sru/bib?operation=searchRetrieve&query=isl%3DDE-6&recordSchema=PicaPlus-xml&version=1.1"
        },
        "https://github.com/metafacture/metafacture-core",
        "https://github.com/hbz/lobid-organisations/blob/master/conf/morph-sigel.xml",
        "https://github.com/hbz/lobid-organisations/blob/master/conf/morph-enriched.xml"
      ]
    }
  }
}

If we did it all correctly, we'd have to add dateCreatedand dateModified` to the PicaPlus-xml resource but that would mean an API break (see also hbz/lobid-resources#809 (comment))).

@acka47 acka47 assigned acka47 and unassigned fsteeg Sep 10, 2018
@acka47 acka47 added the ready label Sep 10, 2018
@acka47 acka47 removed the ready label Apr 9, 2019
@TobiasNx
Copy link
Contributor

@acka47 any reason why we do not use describedBy as in lobid-resources here?

TobiasNx added a commit that referenced this issue Jul 11, 2023
@acka47
Copy link
Contributor Author

acka47 commented Jul 13, 2023

any reason why we do not use describedBy as in lobid-resources here?

We chose another approach for lobid-organisations: to use schema.org where applicable and add properties from other vocabs or fromn lobid-vocabs if schema.org does not contain what we want.

@TobiasNx
Copy link
Contributor

@acka47:
for dbs:

	"mainEntityOfPage": {
		"id": "http://lobid.org/organisations/DE-9#!",
		"wasGeneratedBy": {
			"type": "Activity",
			"used": [
				{
					"sourceOrganisation": {
						"id": "https://www.bibliotheksstatistik.de/",
						"label": "Deutsche Bibliotheksstatistik (DBS)"
					}
				},
				"https://github.com/metafacture/metafacture-core",
				"https://github.com/hbz/lobid-organisations/blob/master/conf/fix-dbs.fix",
				"https://github.com/hbz/lobid-organisations/blob/master/conf/fix-enriched.fix"
			]
		}
	},

for sigil:

	"mainEntityOfPage": {
		"id": "http://lobid.org/organisations/DE-294#!",
		"dateCreated": "18-11-99",
		"dateModified": "24-04-12",
		"wasGeneratedBy": {
			"type": "Activity",
			"used": [
				{
					"id": "http://services.dnb.de/sru/bib?operation=searchRetrieve&query=isl%3DDE-294&recordSchema=PicaPlus-xml&version=1.1",
					"sourceOrganisation": {
						"id": "https://sigel.staatsbibliothek-berlin.de/vergabe/isil/",
						"label": "Deutsche ISIL-Agentur und Sigelstelle an der Staatsbibliothek zu Berlin"
					}
				},
				"https://github.com/metafacture/metafacture-core",
				"https://github.com/hbz/lobid-organisations/blob/master/conf/fix-sigel.fix",
				"https://github.com/hbz/lobid-organisations/blob/master/conf/fix-enriched.fix"
			]
		}
	}
``

TobiasNx added a commit that referenced this issue Aug 8, 2023
They created problems for the  elasticSearch
@TobiasNx
Copy link
Contributor

TobiasNx commented Aug 9, 2023

The property sourceOrganisation needs to be changed to sourceOrganization

@TobiasNx
Copy link
Contributor

Add Info about provenance to GUI #497

ElasticSearch is not able to provide mixed arrays of strings and of objects in used therefore we only use the link to the sourcedata now.

@acka47 acka47 mentioned this issue Feb 19, 2024
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

No branches or pull requests

3 participants