New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separate subject chains from another in subject array #187

Closed
acka47 opened this Issue Jan 23, 2017 · 15 comments

Comments

Projects
None yet
7 participants
@acka47
Contributor

acka47 commented Jan 23, 2017

In hbz/nwbib#363 (comment) I wrote:

I just remembered that we still have no way of separating two subject chains in the JSON. See for example http://test.lobid.org/resources/HT013099804.

@dr0i said in response:

As I remembered we talked about leaving the subjectChains as they are. If not, please make an issue and let us discuss how to encode the subject chains under the subject-list.

I wouldn' t have brought this up again but today @larsgsvensson made me aware over at the DINI KIM Titeldatengruppen Wiki that there exists a way to express this with MADS RDF which would actually not look bad in our JSON-LD. For the example document http://lobid.org/resources/HT013099804 this would look like this (only writing down the interesting parts):

{
   "@context":"http://lobid.org/download/context.json",
   "id":"http://lobid.org/resources/HT013099804#!",
   "subject":[
      {
         "type":"madsrdf:ComplexSubject",
         "madsrdf:componentList":{
            "@list":[
               {
                  "id":"http://d-nb.info/gnd/4032952-5",
                  "label":"Krefeld",
                  "type":[
                     "PlaceOrGeographicName"
                  ]
               },
               {
                  "id":"http://d-nb.info/gnd/4041156-4",
                  "label":"Naherholungsgebiet",
                  "type":[
                     "SubjectHeading"
                  ]
               },
               {
                  "id":"http://d-nb.info/gnd/4125082-5",
                  "label":"Freizeitverhalten",
                  "type":[
                     "SubjectHeading"
                  ]
               },
               {
                  "id":"http://d-nb.info/gnd/4061619-8",
                  "label":"Umweltbelastung",
                  "type":[
                     "SubjectHeading"
                  ]
               }
            ]
         }
      },
      {
         "type":"madsrdf:ComplexSubject",
         "madsrdf:componentList":{
            "@list":[
               {
                  "id":"http://d-nb.info/gnd/4032952-5",
                  "label":"Krefeld",
                  "type":[
                     "PlaceOrGeographicName"
                  ]
               },
               {
                  "id":"http://d-nb.info/gnd/4041156-4",
                  "label":"Naherholungsgebiet",
                  "type":[
                     "SubjectHeading"
                  ]
               },
               {
                  "id":"http://d-nb.info/gnd/4280189-8",
                  "label":"Sportverhalten",
                  "type":[
                     "SubjectHeading"
                  ]
               },
               {
                  "id":"http://d-nb.info/gnd/4061619-8",
                  "label":"Umweltbelastung",
                  "type":[
                     "SubjectHeading"
                  ]
               }
            ]
         }
      }
   ]
}
@acka47

This comment has been minimized.

Show comment
Hide comment
@acka47

acka47 Jan 23, 2017

Contributor

And yes, this really is valid JSON-LD and no "lists in lists" error is thrown, see this example in the JSON-LD playground.

Contributor

acka47 commented Jan 23, 2017

And yes, this really is valid JSON-LD and no "lists in lists" error is thrown, see this example in the JSON-LD playground.

@nichtich

This comment has been minimized.

Show comment
Hide comment
@nichtich

nichtich Jan 24, 2017

This can be simplified with @context:

{
  "@context": {
    "componentList": {
      "@id": "http://www.loc.gov/mads/rdf/v1#componentList",
      "@container": "@list"
    },
    "subject": "http://purl.org/dc/elements/1.1/subject",
    "id": "@id"
  },
  "id": "http://lobid.org/resources/HT013099804#!",
  "subject": [
    {
      "type": "madsrdf:ComplexSubject",
      "componentList": [
          {
            "id": "http://d-nb.info/gnd/4032952-5",
            "label": "Krefeld",
            "type": [
              "PlaceOrGeographicName"
            ]
          },
          {
            "id": "http://d-nb.info/gnd/4041156-4",
            "label": "Naherholungsgebiet",
            "type": [
              "SubjectHeading"
            ]
          },
          {
            "id": "http://d-nb.info/gnd/4125082-5",
            "label": "Freizeitverhalten",
            "type": [
              "SubjectHeading"
            ]
          },
          {
            "id": "http://d-nb.info/gnd/4061619-8",
            "label": "Umweltbelastung",
            "type": [
              "SubjectHeading"
            ]
          }
        ]

    }
  ]
}

As far as I know, the use of mads:componentList with SKOS has first been implemented in mc2skos.

In JSKOS draft I use the JSON key memberList see JSKOS concept bundles instead of componentList. By the way I have not found corresponding properties for sets and choices. One of both could be the default but the other needs a way to be expressed. In summary there are three ways that multiple subjects can be grouped:

  • ordered lists (A, B, C...)
  • sets (A AND B AND C AND ...)
  • choices (A OR B OR C OR ...)

nichtich commented Jan 24, 2017

This can be simplified with @context:

{
  "@context": {
    "componentList": {
      "@id": "http://www.loc.gov/mads/rdf/v1#componentList",
      "@container": "@list"
    },
    "subject": "http://purl.org/dc/elements/1.1/subject",
    "id": "@id"
  },
  "id": "http://lobid.org/resources/HT013099804#!",
  "subject": [
    {
      "type": "madsrdf:ComplexSubject",
      "componentList": [
          {
            "id": "http://d-nb.info/gnd/4032952-5",
            "label": "Krefeld",
            "type": [
              "PlaceOrGeographicName"
            ]
          },
          {
            "id": "http://d-nb.info/gnd/4041156-4",
            "label": "Naherholungsgebiet",
            "type": [
              "SubjectHeading"
            ]
          },
          {
            "id": "http://d-nb.info/gnd/4125082-5",
            "label": "Freizeitverhalten",
            "type": [
              "SubjectHeading"
            ]
          },
          {
            "id": "http://d-nb.info/gnd/4061619-8",
            "label": "Umweltbelastung",
            "type": [
              "SubjectHeading"
            ]
          }
        ]

    }
  ]
}

As far as I know, the use of mads:componentList with SKOS has first been implemented in mc2skos.

In JSKOS draft I use the JSON key memberList see JSKOS concept bundles instead of componentList. By the way I have not found corresponding properties for sets and choices. One of both could be the default but the other needs a way to be expressed. In summary there are three ways that multiple subjects can be grouped:

  • ordered lists (A, B, C...)
  • sets (A AND B AND C AND ...)
  • choices (A OR B OR C OR ...)
@acka47

This comment has been minimized.

Show comment
Hide comment
@acka47

acka47 Jan 24, 2017

Contributor

Thanks, Jakob. Interestingly, you can also make the subject array a list and it is ok in the JSON Playground. But I guess the order of the subject chains is not important so we can ignore this.

{
   "@context":{
      "componentList":{
         "@id":"http://www.loc.gov/mads/rdf/v1#componentList",
         "@container":"@list"
      },
      "subject":{
         "@id":"http://purl.org/dc/terms/",
         "@container":"@list"
      },
      "id":"@id"
   },
   "id":"http://lobid.org/resources/HT013099804#!",
   "subject":[
      {
         "type":"madsrdf:ComplexSubject",
         "componentList":[
            {
               "id":"http://d-nb.info/gnd/4032952-5",
               "label":"Krefeld",
               "type":[
                  "PlaceOrGeographicName"
               ]
            },
            {
               "id":"http://d-nb.info/gnd/4041156-4",
               "label":"Naherholungsgebiet",
               "type":[
                  "SubjectHeading"
               ]
            },
            {
               "id":"http://d-nb.info/gnd/4125082-5",
               "label":"Freizeitverhalten",
               "type":[
                  "SubjectHeading"
               ]
            },
            {
               "id":"http://d-nb.info/gnd/4061619-8",
               "label":"Umweltbelastung",
               "type":[
                  "SubjectHeading"
               ]
            }
         ]
      },
      {
         "type":"madsrdf:ComplexSubject",
         "madsrdf:componentList":[
            {
               "id":"http://d-nb.info/gnd/4032952-5",
               "label":"Krefeld",
               "type":[
                  "PlaceOrGeographicName"
               ]
            },
            {
               "id":"http://d-nb.info/gnd/4041156-4",
               "label":"Naherholungsgebiet",
               "type":[
                  "SubjectHeading"
               ]
            },
            {
               "id":"http://d-nb.info/gnd/4280189-8",
               "label":"Sportverhalten",
               "type":[
                  "SubjectHeading"
               ]
            },
            {
               "id":"http://d-nb.info/gnd/4061619-8",
               "label":"Umweltbelastung",
               "type":[
                  "SubjectHeading"
               ]
            }
         ]
      }
   ]
}
Contributor

acka47 commented Jan 24, 2017

Thanks, Jakob. Interestingly, you can also make the subject array a list and it is ok in the JSON Playground. But I guess the order of the subject chains is not important so we can ignore this.

{
   "@context":{
      "componentList":{
         "@id":"http://www.loc.gov/mads/rdf/v1#componentList",
         "@container":"@list"
      },
      "subject":{
         "@id":"http://purl.org/dc/terms/",
         "@container":"@list"
      },
      "id":"@id"
   },
   "id":"http://lobid.org/resources/HT013099804#!",
   "subject":[
      {
         "type":"madsrdf:ComplexSubject",
         "componentList":[
            {
               "id":"http://d-nb.info/gnd/4032952-5",
               "label":"Krefeld",
               "type":[
                  "PlaceOrGeographicName"
               ]
            },
            {
               "id":"http://d-nb.info/gnd/4041156-4",
               "label":"Naherholungsgebiet",
               "type":[
                  "SubjectHeading"
               ]
            },
            {
               "id":"http://d-nb.info/gnd/4125082-5",
               "label":"Freizeitverhalten",
               "type":[
                  "SubjectHeading"
               ]
            },
            {
               "id":"http://d-nb.info/gnd/4061619-8",
               "label":"Umweltbelastung",
               "type":[
                  "SubjectHeading"
               ]
            }
         ]
      },
      {
         "type":"madsrdf:ComplexSubject",
         "madsrdf:componentList":[
            {
               "id":"http://d-nb.info/gnd/4032952-5",
               "label":"Krefeld",
               "type":[
                  "PlaceOrGeographicName"
               ]
            },
            {
               "id":"http://d-nb.info/gnd/4041156-4",
               "label":"Naherholungsgebiet",
               "type":[
                  "SubjectHeading"
               ]
            },
            {
               "id":"http://d-nb.info/gnd/4280189-8",
               "label":"Sportverhalten",
               "type":[
                  "SubjectHeading"
               ]
            },
            {
               "id":"http://d-nb.info/gnd/4061619-8",
               "label":"Umweltbelastung",
               "type":[
                  "SubjectHeading"
               ]
            }
         ]
      }
   ]
}
@fsteeg

This comment has been minimized.

Show comment
Hide comment
@fsteeg

fsteeg Jan 24, 2017

Contributor

But I guess the order of the subject chains is not important so we can ignore this.

I seem to recall that the order is actually consciously cataloged. Like 1. topic, 2. topic etc.

Contributor

fsteeg commented Jan 24, 2017

But I guess the order of the subject chains is not important so we can ignore this.

I seem to recall that the order is actually consciously cataloged. Like 1. topic, 2. topic etc.

@larsgsvensson

This comment has been minimized.

Show comment
Hide comment
@larsgsvensson

larsgsvensson Jan 24, 2017

But I guess the order of the subject chains is not important so we can ignore this.

I seem to recall that the order is actually consciously cataloged. Like 1. topic, 2. topic etc.

To my knowledge the order of the chains does not carry any semantic importance, it is merely a matter of the order of the topics as they occur in the document. The order of the terms in the chain of course do matter. I guess the ultimate answer can be found in the RSWK (or other relevant cataloguing code).

larsgsvensson commented Jan 24, 2017

But I guess the order of the subject chains is not important so we can ignore this.

I seem to recall that the order is actually consciously cataloged. Like 1. topic, 2. topic etc.

To my knowledge the order of the chains does not carry any semantic importance, it is merely a matter of the order of the topics as they occur in the document. The order of the terms in the chain of course do matter. I guess the ultimate answer can be found in the RSWK (or other relevant cataloguing code).

@dr0i dr0i added the launch label Jan 31, 2017

@acka47 acka47 added the ready label Jan 31, 2017

@dr0i dr0i added working and removed ready labels Feb 9, 2017

@dr0i

This comment has been minimized.

Show comment
Hide comment
@dr0i

dr0i Feb 9, 2017

Contributor

<rant>
I am overtly despising subject chains!
Working with them (and making them in the first place) makes a lot of trouble bloating complexity on all levels, and the gain is just extraordinarily minimalistic small - if at all existing.

Instead of subject-chains, users down-drill big result-sets using facets (which are build upon solely subjects), restricting the resulting result set further by down-drilling using another facet on that result-set etc.
Subject chains are just an annoyance one should reject at whole.
</rant>

Contributor

dr0i commented Feb 9, 2017

<rant>
I am overtly despising subject chains!
Working with them (and making them in the first place) makes a lot of trouble bloating complexity on all levels, and the gain is just extraordinarily minimalistic small - if at all existing.

Instead of subject-chains, users down-drill big result-sets using facets (which are build upon solely subjects), restricting the resulting result set further by down-drilling using another facet on that result-set etc.
Subject chains are just an annoyance one should reject at whole.
</rant>

@jprante

This comment has been minimized.

Show comment
Hide comment
@jprante

jprante Feb 13, 2017

This issue is an example of underestimating RSWK.

It makes only limited sense to keep track of the terms and use syntactic indexing. This is like catalog card era where librarians wrote subject terms (descriptors) down in order, in the hope the catalog user is able to navigate over the catalog titles.

A more powerful use of RSWK subject is linking. You can link between subjects following rules.

Example:

Imagine RSWK-Schlagwortkette Sierra Leone ; Schule ; Bildungshilfe ; Geschichte

This can be modeled in Turtle-RDF like the following sketch

SierraLeone
   a rswk:Place;
   rdfs:label "Sierra Leone"@de .

Schule
   a rswk:SubjectName;
   rdfs:label "Schule"@de .
   
Bildungshilfe
   a rswk:Topic;
   rdfs:label "Bildungshilfe"@de .
   
Geschichte
  a rswk:Temporal;
  rdfs:label "Geschichte"@de .
       
rel1
  a rswk:Relation;
  from SierraLeone;
  to Schule .

rel2
  a rswk:Relation;
  from Schule;
  to Bildungshilfe .

rel3
  a rswk:Relation;
  from Bildungshilfe;
  to Geschichte .

A title my_title can be classified by

my_title hasSubject [
      SierraLeone, Schule, Bildungshilfe, Geschichte 
    ] .

or

my_title hasPlace SierraLeone;
   hasSubjectName Schule;
   hasTopic Bildungshilfe;
   hasTemporal Geschichte .

or

my_title hasNavigation [
    rel1, rel2, rel3
   ] .

The hasNavigation reveals the power. You can have four main entry points into navigation, authorized by the descriptor, follow the chain in forward or backward direction. Or you can set up faceted navigation. Whatever you like. You can even group titles by common navigation patterns and find similar / related titles. You could compute transitive closures (example, give me all titles that cover a given set of terms with a given set of relations) which is simulating the search process in engineering sciences (e.g. when in search for patents or norms).

jprante commented Feb 13, 2017

This issue is an example of underestimating RSWK.

It makes only limited sense to keep track of the terms and use syntactic indexing. This is like catalog card era where librarians wrote subject terms (descriptors) down in order, in the hope the catalog user is able to navigate over the catalog titles.

A more powerful use of RSWK subject is linking. You can link between subjects following rules.

Example:

Imagine RSWK-Schlagwortkette Sierra Leone ; Schule ; Bildungshilfe ; Geschichte

This can be modeled in Turtle-RDF like the following sketch

SierraLeone
   a rswk:Place;
   rdfs:label "Sierra Leone"@de .

Schule
   a rswk:SubjectName;
   rdfs:label "Schule"@de .
   
Bildungshilfe
   a rswk:Topic;
   rdfs:label "Bildungshilfe"@de .
   
Geschichte
  a rswk:Temporal;
  rdfs:label "Geschichte"@de .
       
rel1
  a rswk:Relation;
  from SierraLeone;
  to Schule .

rel2
  a rswk:Relation;
  from Schule;
  to Bildungshilfe .

rel3
  a rswk:Relation;
  from Bildungshilfe;
  to Geschichte .

A title my_title can be classified by

my_title hasSubject [
      SierraLeone, Schule, Bildungshilfe, Geschichte 
    ] .

or

my_title hasPlace SierraLeone;
   hasSubjectName Schule;
   hasTopic Bildungshilfe;
   hasTemporal Geschichte .

or

my_title hasNavigation [
    rel1, rel2, rel3
   ] .

The hasNavigation reveals the power. You can have four main entry points into navigation, authorized by the descriptor, follow the chain in forward or backward direction. Or you can set up faceted navigation. Whatever you like. You can even group titles by common navigation patterns and find similar / related titles. You could compute transitive closures (example, give me all titles that cover a given set of terms with a given set of relations) which is simulating the search process in engineering sciences (e.g. when in search for patents or norms).

@dr0i dr0i added ready working and removed working ready labels Feb 16, 2017

@acka47

This comment has been minimized.

Show comment
Hide comment
@acka47

acka47 Feb 20, 2017

Contributor

Currently, we put DDC (from 700) and GND subjects (from 9--s) both into one subject list, see e.g. http://lobid.org/resources/HT006932194. @dr0i now asks whether the madsrdf:ComplexSubject solution should only be used for transforming the 9--s or also for 700s. For the sake of homogeneous queries a unified approach would be good. We could add something like bf:source "DDC" . resp. bf:source "GND" (best with URIs) to indicate the underlying authority file for a complex subject (As we already there, we might also think about adding the NWBib subjects in the same way.)

On the other hand, there are a lot of entries with only one subject where a complex subject seems to be too much...

Contributor

acka47 commented Feb 20, 2017

Currently, we put DDC (from 700) and GND subjects (from 9--s) both into one subject list, see e.g. http://lobid.org/resources/HT006932194. @dr0i now asks whether the madsrdf:ComplexSubject solution should only be used for transforming the 9--s or also for 700s. For the sake of homogeneous queries a unified approach would be good. We could add something like bf:source "DDC" . resp. bf:source "GND" (best with URIs) to indicate the underlying authority file for a complex subject (As we already there, we might also think about adding the NWBib subjects in the same way.)

On the other hand, there are a lot of entries with only one subject where a complex subject seems to be too much...

@dr0i dr0i added ready and removed working labels Feb 21, 2017

@dr0i dr0i assigned acka47 and unassigned dr0i Feb 21, 2017

@acka47

This comment has been minimized.

Show comment
Hide comment
@acka47

acka47 Feb 21, 2017

Contributor

@jprante Thanks for the ideas. As the coverage with subject chains isn't very good for now in the lobid data, we are now focusing on adequately representing subject chains in the RDF for display purposes. To implement solutions for more sophisticated use cases which you describe, we don't have the time and resources.

For browsing and searching with subject chains, we have a feature in the NWBib called Themensuche, see http://nwbib.de/topics. It is based on strings for each subject chain which are build concatenating the preferred names with seperator |. We probably will leave this in the data as it isn't trivial to implement this based on mads:ComplexSubjects.

Contributor

acka47 commented Feb 21, 2017

@jprante Thanks for the ideas. As the coverage with subject chains isn't very good for now in the lobid data, we are now focusing on adequately representing subject chains in the RDF for display purposes. To implement solutions for more sophisticated use cases which you describe, we don't have the time and resources.

For browsing and searching with subject chains, we have a feature in the NWBib called Themensuche, see http://nwbib.de/topics. It is based on strings for each subject chain which are build concatenating the preferred names with seperator |. We probably will leave this in the data as it isn't trivial to implement this based on mads:ComplexSubjects.

@nichtich

This comment has been minimized.

Show comment
Hide comment
@nichtich

nichtich Feb 21, 2017

@dr0i http://lobid.org/resources/HT006932194 only contains simple DDC entries:

so this is no good example for mads:compexSubject. I doubt that DDC is used for subject chains because a complex DDC number is already a combination of topics. VZG intends to provide a service to map a long DDC URI to its complex list of smaller numbers.

nichtich commented Feb 21, 2017

@dr0i http://lobid.org/resources/HT006932194 only contains simple DDC entries:

so this is no good example for mads:compexSubject. I doubt that DDC is used for subject chains because a complex DDC number is already a combination of topics. VZG intends to provide a service to map a long DDC URI to its complex list of smaller numbers.

@acka47

This comment has been minimized.

Show comment
Hide comment
@acka47

acka47 Mar 7, 2017

Contributor

We decided to only use the ComplexSubject approach for RSWK subjects.

Contributor

acka47 commented Mar 7, 2017

We decided to only use the ComplexSubject approach for RSWK subjects.

@acka47 acka47 assigned dr0i and unassigned acka47 Mar 7, 2017

@dr0i dr0i added working and removed ready labels Mar 14, 2017

dr0i added a commit that referenced this issue Apr 7, 2017

Add mads:componentList
- enable lists in sets
- add componentList and ComplexSubject to context

See #187.

dr0i added a commit that referenced this issue Apr 7, 2017

Add mads:componentList
- enable lists in sets
- add componentList and ComplexSubject to context

See #187.

dr0i added a commit that referenced this issue Apr 7, 2017

Add mads:componentList
- enable lists in sets
- add componentList and ComplexSubject to context

See #187.

@dr0i dr0i added review and removed working labels Apr 10, 2017

@dr0i dr0i removed their assignment Apr 10, 2017

@fsteeg fsteeg added ready and removed review labels Apr 11, 2017

@dr0i

This comment has been minimized.

Show comment
Hide comment
@dr0i

dr0i May 12, 2017

Contributor

One more question:
we still have the subjectChain in our data. Can't we now get rid of it?

Contributor

dr0i commented May 12, 2017

One more question:
we still have the subjectChain in our data. Can't we now get rid of it?

@dr0i dr0i reopened this May 12, 2017

@dr0i dr0i added the ready label May 12, 2017

@ChristophEwertowski

This comment has been minimized.

Show comment
Hide comment
@ChristophEwertowski

ChristophEwertowski May 12, 2017

Contributor

Since all is covered by the componentList, yes. I don't know if the subjectOrder isn't needed as well by someone.

Contributor

ChristophEwertowski commented May 12, 2017

Since all is covered by the componentList, yes. I don't know if the subjectOrder isn't needed as well by someone.

@ChristophEwertowski ChristophEwertowski removed their assignment May 12, 2017

@acka47

This comment has been minimized.

Show comment
Hide comment
@acka47

acka47 May 15, 2017

Contributor

In API 1.0, we only used the data from subjectChain for the NWBib UI. I am not sure how and whether we can recreate the same function with componentList. Thus, I suggest to retain subjectChain for now but to not provide it via the public API.

Contributor

acka47 commented May 15, 2017

In API 1.0, we only used the data from subjectChain for the NWBib UI. I am not sure how and whether we can recreate the same function with componentList. Thus, I suggest to retain subjectChain for now but to not provide it via the public API.

@acka47

This comment has been minimized.

Show comment
Hide comment
@acka47

acka47 May 16, 2017

Contributor

As discussed today in the stand up, please go ahead and remove subjectChain.

Contributor

acka47 commented May 16, 2017

As discussed today in the stand up, please go ahead and remove subjectChain.

@acka47 acka47 assigned dr0i and unassigned acka47 May 16, 2017

@dr0i dr0i added working and removed ready labels Jun 1, 2017

dr0i added a commit that referenced this issue Jun 2, 2017

Remove subjectChain as this is already covered
With 'componentList' the chained subjects are already covered.
This 'subjectChain' is superfluous.

Complements #187.

@dr0i dr0i added review deploy and removed working review labels Jun 2, 2017

@dr0i dr0i closed this in 9793d49 Jun 8, 2017

@dr0i dr0i removed the deploy label Jun 8, 2017

ChristophEwertowski added a commit that referenced this issue Jul 5, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment