Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VOCAB: Adds abstract property to Article #228

Closed
wants to merge 1 commit into from
Closed

Conversation

Tpt
Copy link
Contributor

@Tpt Tpt commented Jan 18, 2015

This property is very useful for scholar articles that are often referenced with their abstract by databases like JSTOR, arXiv, HAL...

@dbs
Copy link
Contributor

dbs commented Jan 18, 2015

How would abstract differ from http://schema.org/description "A short description of the item."? I fear this proposal would add essentially add a duplicate what is essentially the same existing property.

That said, we could certainly enhance examples to show the expected use of schema:description in the role of an abstract (although arguably the last example in http://schema.org/Article already does just that).

@dbs
Copy link
Contributor

dbs commented Jan 18, 2015

See #229 for a pull request that addresses the missing schema:description markup in the MedicalScholarlyArticle example.

@progval
Copy link

progval commented Jan 18, 2015

I think the difference is the abstract is part of the paper, while the description is an annotation of this paper, that would be different from a database to another.

@Tpt
Copy link
Contributor Author

Tpt commented Jan 18, 2015

I agree with @progval. People may be interested to provide both a short description of the article and its abstract that may be longer (more than 50 words) and make a distinction between them.

But, yes, it is true that this property is very closed schema:description (it only adds the semantic "it is the official abstract of the article and not a description done by an external person"). So, I'm ok if you close this pull request with a "won't add it to schema.org vocabulary".

Should I start a thread on the public-vocabs mailing list to get more inputs on this?

@sballesteros
Copy link

Hello, I think that the description property falls short for structured abstract (abstract with distinct, labeled sections (e.g., Introduction, Methods, Results, Discussion)). Structured abstract are quite important in science and for search applications (see for instance the 23M of abstracts of PubMed).

How about introducing:

  • an Abstract class, subclass of CreativeWork.
  • an abstract property to Article
  • take advantage of the hasPart property to handle structured abstracts.

Taking inspiration from the Review class and this example of a structured review (view source)).

we could (optionally) introduce 2 new properties

  • abstractBody (we could use text if we don't want to add more properties)
  • itemAbstracted (we could use isPartOf if we don't want to add more properties)

Here is an example for a structured abstract:

"abstract": {
     "@id": "http://example.com/abstract",
     "@type": "Abstract",
     "hasPart": [
         {
             "@id": "http://example.com/abstract#Methods",
             "@type": ["Abstract", "http://purl.org/spar/deo/Methods"],
             "headline": "Methods",
             "abstractBody": ...
         },
         {
             "@id": "http://example.com/abstract#Results",
             "@type": ["Abstract",  "http://purl.org/spar/deo/Results"],
             "headline": "Results",
             "abstractBody": ...
         }
     ]
}

In case of simple unstructed abstract we would have:

"abstract": {
   "@id": "http://example.com/abstract",
   "@type": "Abstract",
   "abstractBody": ...
}

Happy to send a PR or bring that to the mailing list if it helps.

Thanks!

@dbs
Copy link
Contributor

dbs commented Jan 19, 2015

Of the original examples @Tpt cited (JSTOR, arXiv, and HAL), the abstracts for articles seem to generally fall in line with single paragraph descriptions. There is also a mention of abstracts being longer than 50-word descriptions, but so far as I can tell no length limit is expressed in the schema.org documentation for schema:description.

@sballesteros makes a good point that Pubmed (and perhaps medical scholarly literature in general?) does appear to provide abstracts that are more structured, although the headings differ slightly from abstract to abstract in a random sampling. Checking another large science database, the Biosis Citation Index, I see single-paragraph abstracts again. So between arXiv, HAL, and Biosis, I'm not sure that the general statement that "structured abstracts are quite important in science" is borne out by the evidence.

If we were to put forward a new Abstract type, then I think we could just use schema:about rather than the proposed itemAbstracted or schema:isPartOf properties to link the abstract to the full-text article.

It might make more sense, though, to just use the SPAR ontology directly for those who want to provide that level of granularity in markup. Something like the following example allows one to describe the Article via an abstract on one page and link to the full-text of the article on another page via schema:url, providing the schema:description properties for the coarse description and the SPAR structure classes for those clients that might desire the more granular structure:

<div vocab="http://schema.org/" prefix="deo: http://purl.org/spar/deo/" typeof="Article">
    <h1 property="name">Prevalence of data-driven decision-making in various disciplines</h1>
    <h2 property="author" typeof="Person"><a href="http://example.com/author1">Bar, Foo</a></h2>
    <div><a href="http://example.com/fulltext.pdf" property="url">Full-text</a></div>
    <div property="description">
        <h3>Methods</h3>
        <div property="hasPart" typeof="deo:methods">Given a random sampling of conference proceedings from...</div>
    </div>
    <div property="description">
        <h3>Results</h3>
        <div property="hasPart" typeof="deo:results">27% of papers in the Alchemy discipline cited sample sizes greater than...</div>
    </div>
</div>

In any case, yes, I think the proposal warrants more discussion on public-vocabs.

@sballesteros
Copy link

I could not find super recent data (~2005) but the following references are interesting:

From the first ref:

The top thirty journals according to impact factors noted in the “Medicine, General and Internal category of the ISI Journal Citation Reports (2000) were sampled [...]
Among 304 original articles that included abstracts, 188 (61.8%) had structured and 116 (38.2%) had unstructured abstracts.

From the second ref:

The percentage of new MEDLINE records containing structured abstracts rose from 2.5% for 1992 to 20.3% for 2005 (Figure 1). Because the number of articles indexed each year was also increasing throughout this period, the absolute number of structured abstracts increased even more substantially, from 9,975 (1992) to 118,051 (2005, the last publication year with complete data in the research dataset), more than 1,000.0%.

@danbri danbri added status:needs review schema.org vocab General top level tag for issues on the vocabulary labels Jan 22, 2015
@danbri
Copy link
Contributor

danbri commented Jan 22, 2015

Thanks for the (evidence based :) discussion...

I've opened an issue to make sure we track this aside from its proposed implementation - #276

Would an 'abstract' be a sub-property of 'description', in that every value for the former would also be a reasonable value for the latter? Maybe, but then I see the discussion evolved towards an 'abstract' entity with structure and relations...

@csarven
Copy link

csarven commented Jun 7, 2015

schema:abstract is a nice addition, but -1 to it being a property of schema:Abstract:

It may be preferable to have schema:abstract aligned closer to dcterms:abstract, i.e., "A summary of the resource." Hence, it should not be specific to schema:Article. It may also be preferable to use domains schema:CreativeWork, schema:Event, schema:Product for schema:abstract (if not overly broad schema:Thing). If the semantics is ballpark "summary" - see for instance iCal SUMMARY - "abstract", "short caption", an event or a product can also use this property.

There is sufficient semantic distinction between "abstract" and "description" that the former need not be a sub-property of the latter. Otherwise, abstract comes across as an even "shorter" version of description. As there is no way to tell what constitutes "short" in schema:description, I'm not sure if schema:abstract being a subPropertyOf will bring anything interesting. Better left flat.

@darobin
Copy link
Contributor

darobin commented Mar 10, 2016

I agree that abstract is not a subproperty of description. It is true that some document types confuse the two (e.g. W3C specifications routinely have a description under an "Abstract" heading), but overall the two are distinct.

An abstract really provides an outline. That is why structured abstracts are becoming more common: they make better outlines for people who no longer have the time to read everything that is published in their domain.

Is there any specific impediment to making progress on this property that we could help alleviate?

@danbri
Copy link
Contributor

danbri commented Apr 24, 2016

Do we have an open issue corresponding to this discussion? I'd like to explore getting a draft property into the pending extension (see pending.webschemas.org). In particular, whether "structured" is handled in terms of properties, or in terms of HTML marking within a single property value.

@danbri
Copy link
Contributor

danbri commented Apr 25, 2016

re-reading the thread and @sballesteros's earlier proposal, the idea would be roughly to have a type (presumably a CreativeWork) to represent abstracts, and which might have parts which were of the same type. Is there some support here for adding something in this direction to pending.[web]schema[s].org?

One nit - we would prefer not to use the same word for new types and properties, if they differ only by case (i.e. the 'abstract' property takes an 'Abstract' as its value).

@csarven
Copy link

csarven commented Apr 25, 2016

Do we agree on two UCs to capture abstracts: 1) literals and 2) structured abstracts ?

For the literal case, schema:abstract would be in line with how schema:description/purpose/name.. is used. If we don't want to overload schema:abstract, it may be clear to have a different property for structured abstracts.

@Dataliberate
Copy link
Contributor

Dataliberate commented Apr 25, 2016

At least for a start schema:abstract with a range of Text and CreativeWork
would cover both use cases.

It being most likely that a structured abstract would be described in a
CreativeWork or current or future subtype thereof.

@danbri
Copy link
Contributor

danbri commented Aug 4, 2016

Thanks everyone for the discussion. I am going to close this pull request to encourage any followups to happen in the corresponding issue, #276. From a quick look back at the discussion my sense is that the notion of a structured abstract is what adds complexity here. but -> #276...

@danbri danbri closed this Aug 4, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
schema.org vocab General top level tag for issues on the vocabulary
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants