Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

On Scholia landing page, provide some overview stats about Wikidata and scholarly publications in it #336

Closed
Daniel-Mietchen opened this issue Apr 9, 2018 · 14 comments
Assignees
Labels
LandingPage default page for a Scholia aspect P50-author Wikidata property P108-employer Wikidata property P166-award-received Wikidata property P225-taxon-name Wikidata property P356-DOI Wikidata property P496-ORCID Wikidata property P625-geolocation Wikidata property P921-main-subject Wikidata property P932-PMCID Wikidata property P1416-affiliation Wikidata property P2093-author-name-string Wikidata property P2860-cites Wikidata property stats quantitative information

Comments

@Daniel-Mietchen
Copy link
Member

e.g. number of triples in Wikidata

SELECT (count(*) as ?counts) WHERE {
  ?s ?p ?o .
  }

and some WikiCite-focused ones, e.g. as per this list

or some version of http://wikicite.org/statistics.html .

@Daniel-Mietchen Daniel-Mietchen added LandingPage default page for a Scholia aspect stats quantitative information labels Apr 9, 2018
fnielsen added a commit that referenced this issue Apr 9, 2018
Now statistics on number of triples, DOIs and PMIDs.
@fnielsen fnielsen self-assigned this Apr 9, 2018
@fnielsen
Copy link
Collaborator

fnielsen commented Apr 9, 2018

Now running https://tools.wmflabs.org/scholia/

@fnielsen fnielsen closed this as completed Apr 9, 2018
@Daniel-Mietchen
Copy link
Member Author

I think adding a few more would be useful, e.g. total number of items and of scientific articles, and then a good selection of properties from the above list and/ or from https://www.wikidata.org/wiki/Template:Bibliographic_properties .

@Daniel-Mietchen
Copy link
Member Author

Daniel-Mietchen commented Apr 11, 2018

Here is a query that gives a more comprehensive list:

SELECT ?count ?description
WITH {
  SELECT (COUNT(*) AS ?count) WHERE { [] ?p [] . }
} AS %triples
WITH {
  SELECT (COUNT(*) AS ?count) WHERE { [] wdt:P50 []. }
} AS %authors
WITH {
  SELECT (COUNT(*) AS ?count) WHERE { [] wdt:P356 []. }
} AS %dois
WITH {
  SELECT (COUNT(*) AS ?count) WHERE { [] wdt:P496 []. }
} AS %orcids
WITH {
  SELECT (COUNT(*) AS ?count) WHERE { [] wdt:P698 []. }
} AS %pmids
WITH {
  SELECT (COUNT(*) AS ?count) WHERE { [] wdt:P932 []. }
} AS %pmcids
WITH {
  SELECT (COUNT(*) AS ?count) WHERE { [] wdt:P2093[]. }
} AS %authorstrings
WITH {
  SELECT (COUNT(*) AS ?count) WHERE { [] wdt:P2860 [] . }
} AS %cites
WHERE {
  {
    INCLUDE %triples
    BIND("Total number of triples" AS ?description)
  }
  UNION
  {
    INCLUDE %pmids
    BIND("Items with a PubMed ID" AS ?description)
  }
  UNION
  {
    INCLUDE %pmcids
    BIND("Items with a PubMed Central ID" AS ?description)
  }
  UNION
  {
    INCLUDE %dois
    BIND("Items with a DOI" AS ?description)
  }
  UNION
  {
    INCLUDE %cites
    BIND("Citations" AS ?description)
  }
  UNION
  {
    INCLUDE %authors
    BIND("Links from items about works to items about their authors" AS ?description)
  }
  UNION
  {
    INCLUDE %authorstrings
    BIND("Author name strings on items about works" AS ?description)
  }
  UNION
  {
    INCLUDE %orcids
    BIND("Items about authors with an ORCID profile that has public content" AS ?description)
  }
}
ORDER BY DESC(?count)
 

Still missing:

  • total number of items
  • total number of works
  • probably some more, e.g. aXiv ID, taxon author, doctoral advisor, published in, affiliation/employer, field of work, educated at, ISSN

@fnielsen
Copy link
Collaborator

Added with b8f8f6a and now running at https://tools.wmflabs.org/scholia/

@Daniel-Mietchen
Copy link
Member Author

Here are some further ideas on what to include into these stats:

@Daniel-Mietchen Daniel-Mietchen added this to To do in Taxa May 5, 2018
@lucaswerkmeister
Copy link
Contributor

Number of properties:

SELECT (COUNT(*) AS ?propertyCount) WHERE {
  ?property a wikibase:Property.
}

For the number of triples, you can also use ?s ?p ?o (subject predicate object) instead of [] ?p [] – equivalent but slightly more readable :)

Daniel-Mietchen added a commit that referenced this issue May 18, 2018
@Daniel-Mietchen
Copy link
Member Author

Thanks, @lucaswerkmeister — I've just included it in the above batch of additional stats.

@Daniel-Mietchen
Copy link
Member Author

The above patch caused display problems, so we reverted it. Here is the query again:

SELECT ?count ?description
WITH {
  SELECT (COUNT(*) AS ?count) WHERE { [] ?p [] . }
} AS %triples
WITH {
  SELECT (COUNT(*) AS ?count) WHERE { ?property a wikibase:Property.  }
} AS %properties
WITH {
  SELECT (COUNT(*) AS ?count) WHERE { [] wdt:P50 []. }
} AS %authors
WITH {
  SELECT (COUNT(*) AS ?count) WHERE { [] wdt:P69 [] . }
} AS %almamater
WITH {
  SELECT (COUNT(*) AS ?count) WHERE { [] wdt:P108 [] . }
} AS %employer
WITH {
  SELECT (COUNT(*) AS ?count) WHERE { [] wdt:P166 [] . }
} AS %award_received
WITH {
  SELECT (COUNT(*) AS ?count) WHERE { [] wdt:P212 [] . }
} AS %isbn13
WITH {
  SELECT (COUNT(*) AS ?count) WHERE { [] wdt:P225 []. }
} AS %taxa
WITH {
  SELECT (COUNT(*) AS ?count) WHERE { [] wdt:P234 []. }
} AS %inchi
WITH {
  SELECT (COUNT(DISTINCT ?serials) AS ?count) WHERE { ?serials wdt:P236 [] . }
} AS %issn
WITH {
  SELECT (COUNT(*) AS ?count) WHERE { [] wdt:P356 []. }
} AS %dois
WITH {
  SELECT (COUNT(*) AS ?count) WHERE { [] wdt:P496 []. }
} AS %orcids
WITH {
  SELECT (COUNT(*) AS ?count) WHERE { [] wdt:P625 []. }
} AS %geoloc
WITH {
  SELECT (COUNT(*) AS ?count) WHERE { [] wdt:P638 [] . }
} AS %pdb
WITH {
  SELECT (COUNT(*) AS ?count) WHERE { [] wdt:P686 [] . }
} AS %gene
WITH {
  SELECT (COUNT(*) AS ?count) WHERE { [] wdt:P698 []. }
} AS %pmids
WITH {
  SELECT (COUNT(*) AS ?count) WHERE { [] wdt:P699 [] . }
} AS %disease
WITH {
  SELECT (COUNT(*) AS ?count) WHERE { [] wdt:P859 [] . }
} AS %sponsor
WITH {
  SELECT (COUNT(*) AS ?count) WHERE { [] wdt:P818 [] . }
} AS %arxivID
WITH {
  SELECT (COUNT(*) AS ?count) WHERE { [] wdt:P921 []. }
} AS %topics
WITH {
  SELECT (COUNT(*) AS ?count) WHERE { [] wdt:P932 []. }
} AS %pmcids
WITH {
  SELECT (COUNT(*) AS ?count) WHERE { [] wdt:P1416 [] . }
} AS %affiliation
WITH {
  SELECT (COUNT(*) AS ?count) WHERE { [] wdt:P2093 []. }
} AS %authorstrings
WITH {
  SELECT (COUNT(*) AS ?count) WHERE { [] wdt:P2427 [] . }
} AS %GRID
WITH {
  SELECT (COUNT(*) AS ?count) WHERE { [] wdt:P2860 [] . }
} AS %cites
WHERE {
  {
    INCLUDE %triples
    BIND("Total number of triples" AS ?description)
  }
  UNION
  {
    INCLUDE %properties
    BIND("Total number of properties" AS ?description)
  }
  UNION
  {
    INCLUDE %pmids
    BIND("Items with a PubMed ID" AS ?description)
  }
  UNION
  {
    INCLUDE %pmcids
    BIND("Items with a PubMed Central ID" AS ?description)
  }
  UNION
  {
    INCLUDE %dois
    BIND("Items with a Digital Object Identifier (DOI)" AS ?description)
  }
  UNION
  {
    INCLUDE %cites
    BIND("Citations" AS ?description)
  }
  UNION
  {
    INCLUDE %authors
    BIND("Links from items about works to items about their authors" AS ?description)
  }
  UNION
  {
    INCLUDE %authorstrings
    BIND("Author name strings on items about works" AS ?description)
  }
  UNION
  {
    INCLUDE %orcids
    BIND("Items about authors with an ORCID profile that has public content" AS ?description)
  }
  UNION
  {
    INCLUDE %taxa
    BIND("Items with a taxon name" AS ?description)
  }
  UNION
  {
    INCLUDE %geoloc
    BIND("Items with a geolocation" AS ?description)
  }
  UNION
  {
    INCLUDE %topics
    BIND("Links from items about works to items about their main subjects" AS ?description)
  }
  UNION
  {
    INCLUDE %inchi
    BIND("Items with an International Chemical Identifier (InChI)" AS ?description)
  }
  UNION
  {
    INCLUDE %isbn13
    BIND("Items with a 13-digit International Standard Book Number (ISBN 13)" AS ?description)
  }
  UNION
  {
    INCLUDE %award_received
    BIND("Links from items about people or others to an award they have received" AS ?description)
  }
  UNION
  {
    INCLUDE %affiliation
    BIND("Links from items about people to items about groups they are affiliated with" AS ?description)
  }
  UNION
  {
    INCLUDE %employer
    BIND("Links from items about people to items about their employer" AS ?description)
  }
  UNION
  {
    INCLUDE %almamater
    BIND("Links from items about people to items about the educational establishments they attended" AS ?description)
  }
  UNION
  {
    INCLUDE %issn
    BIND("Items with an International Standard Serial Number (ISSN)" AS ?description)
  }
  UNION
  {
    INCLUDE %arxivID
    BIND("Items with an arxivID" AS ?description)
  }
  UNION
  {
    INCLUDE %GRID
    BIND("Items about institutions with an identifier from the Global Research Identifier Database (GRID ID)" AS ?description)
  }
  UNION
  {
    INCLUDE %sponsor
    BIND("Links from items about anything to items about corresponding sponsors" AS ?description)
  }
  UNION
  {
    INCLUDE %disease
    BIND("Items indexed in the Disease Ontology" AS ?description)
  }
  UNION
  {
    INCLUDE %gene
    BIND("Items indexed in the Gene Ontology" AS ?description)
  }
  UNION
  {
    INCLUDE %pdb
    BIND("Protein structures indexed in the Protein Data Bank" AS ?description)
  }
}
ORDER BY DESC(?count)

Pinging @lucaswerkmeister

@lucaswerkmeister
Copy link
Contributor

What kinds of display problems did it cause?

@fnielsen
Copy link
Collaborator

There was no response from WDQS, probably because the query was too lone. Perhaps the getJSON can be modified to a POST.

@lucaswerkmeister
Copy link
Contributor

WDQS already retries using POST if the GET request fails due to being too long. If I run the query in @Daniel-Mietchen’s comment on WDQS, it works both on index.html and embed.html.

@fnielsen
Copy link
Collaborator

fnielsen commented Jul 3, 2018

"Items about authors with an ORCID profile that has public content" Why "that has public content"?

@fnielsen
Copy link
Collaborator

fnielsen commented Jul 3, 2018

"Items with a 13-digit International Standard Book Number (ISBN 13)" This should be rephrased as there might be items with multiple ISBN (there is, especially Springer volume).

@Daniel-Mietchen Daniel-Mietchen added P50-author Wikidata property P2093-author-name-string Wikidata property P2860-cites Wikidata property P1416-affiliation Wikidata property P921-main-subject Wikidata property P625-geolocation Wikidata property P108-employer Wikidata property P496-ORCID Wikidata property labels Aug 23, 2018
@Daniel-Mietchen Daniel-Mietchen added P225-taxon-name Wikidata property P356-DOI Wikidata property P698-PMID Wikidata property P166-award-received Wikidata property P932-PMCID Wikidata property and removed P698-PMID Wikidata property labels Aug 23, 2018
@Daniel-Mietchen
Copy link
Member Author

I have reworked the query, as per Daniel-Mietchen/ideas#1022 (comment) .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
LandingPage default page for a Scholia aspect P50-author Wikidata property P108-employer Wikidata property P166-award-received Wikidata property P225-taxon-name Wikidata property P356-DOI Wikidata property P496-ORCID Wikidata property P625-geolocation Wikidata property P921-main-subject Wikidata property P932-PMCID Wikidata property P1416-affiliation Wikidata property P2093-author-name-string Wikidata property P2860-cites Wikidata property stats quantitative information
Projects
Meta
  
Done
Taxa
  
Done
Development

No branches or pull requests

3 participants