Skip to content

stko-lab/LD-Connect

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

LD-Connect

LD Connect is a Linked Data portal for IOS Press scientometrics, consisting of all IOS Press bibliographic data enriched with geographic information. This is a work funded by IOS Press in collaboration with the STKO lab at UC Santa Barbara. A SPARQL endpoint for retrieving information in LD Connect is published as http://ld.iospress.nl/sparql. In this documentation, we provide descriptions about the shared data and scientometric system along with instructions on how to reuse it. The shared data includes ontology, triples, and embeddings, which can be accessed in our figshare repository. To use the shared data, please download it first and put it inside the root folder of this Github repository. More information about our work is provided in our Spotlight Paper "LD Connect: A Linked Data Portal for IOS Press Scientometrics" accepted by ESWC 2022.

Ontology

The ontology triple can be found at data/ontology/ontology.ttl. Two schema diagrams below show ontology fragments of iospress:Publication and iospress:Contributor respectively. In addition, we include a recent collection of selected triples in data/triples/ that are extracted from LD Connect for convenience of reuse. The categories.ttl contains triples about the mapping between a iospress:Journal and corresponding iospress:Category, geocoded.ttl contains geocoded information about iospress:Organization, and triplify-union.ttl contains the union of all triples LD Connect consisted of (at the time of data collection).

Fig.1 An overview of the ontology behind LD Connect. Edges with filled arrows are object/datatype properties; and edges with open arrow heads represent subclass relations. All classes and properties without any prefix are in the namespace iospress: http://ld.iospress.nl/rdf/ontology/ .

Semantic search is available at http://ld.iospress.nl/explore/semantic-search/. A sample SPARQL query is provided below, which is used to retrieve information about papers whose first author is from affiliations located in China.

select ?title (group_concat(?keyword; separator=',')
       as ?keywords) ?year ?journal ?first_author_name ?org_name 
{
    ?paper iospress:publicationTitle ?title;
           iospress:publicationIncludesKeyword ?keyword;
           iospress:publicationDate ?date;
           iospress:articleInIssue/iospress:issueInVolume/
           iospress:volumeInJournal ?journal;
           iospress:publicationAuthorList ?author_list.
    ?author_list rdf:_0 ?first_author.
    ?first_author iospress:contributorFullName ?first_author_name;
                  iospress:contributorAffiliation ?org.
    ?org iospress:geocodingInput ?org_name ;
		 iospress:geocodingOutput/
		 iospress-geocode:country ?org_country.    
    bind(year(?date) as ?year)
    values ?org_country {"China"@en}
} group by ?title ?year ?journal ?first_author_name ?org_name

Embeddings

A version of pre-trained embeddings are located in data/embeddings/. We have provided document embeddings in plain text format (see data/embeddings/IOS-Doc2Vec-TXT/). The doc2vec.txt is the Doc2Vec model. The doc2vec_voc.txt contains a list of all the paper entity URLs of the document embeddings. The w2v.txt is the corresponding Word2Vec model. The w2v_voc.txt contains a list of the word vocabulary of the word embeddings. In addition, we provide knowledge graph embeddings in plain text format as well (see data/embeddings/IOS-TransE/). Specifically, the graph embeddings TransE_person.txt provided consist of contributor information. Also, entity_sameAs_merge_mapping_iri.json is a JSON file about how same entities (e.g., contributors, affiliations, etc.) are linked after co-reference resolution. The dimension of all embeddings is 200.

To explore how embeddings unleash the power of IOS Press data, please refer to server.js, mod-author-similarity.js, mod-paper-similarity.js to see how we achieve the embedding-based similarity search in our scientometric system.

IOS Press scientometrics

Getting started

IOS Press scientometrics are built upon LD Connect and developed by using several Javascript libraries such as D3.js and Leaflet. The scientometrics can be downloaded from the scientometrics folder, migrated to other academic knowledge graphs and reused for relevant applications and research. Follow the instructions below to set it up and run locally.

  1. After cloning this repository, type the following commands in the terminal.

    $ cd scientometrics/
    $ npm install
  2. Create a folder data/ within scientometrics/sites/. Copy both pre-trained embedding folders (including data/embeddings/IOS-Doc2Vec-TXT/) and data/embeddings/IOS-TransE/) to the scientometrics/sites/data/ directory.

  3. Launch the server on an open port:

    $ node src/server/server.js

    You can modify the port by changing N_PORT in server.js. The default is set to be 7200.

  4. Now, open a browser and navigate to http://localhost:N_PORT/iospress_scientometrics.

Descriptions

IOS Press scientometrics can be accessed through http://stko-roy.geog.ucsb.edu:7200/iospress_scientometrics. Note that the HTTP header should be used instead of HTTPS.

These scientometrics include Home (a choropleth map), Country Collaboration, Author Map, Author Similarity, Paper Similarity, Keyword Graph and Streamgraph. Please select a journal category first and then a journal of interest for bibliographic analysis, visualization and embedding-based similarity search. An example about how information is displayed for the Semantic Web journal are attached below.

License

This work is licensed under the Creative Commons Attribution-NonCommercial 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc/4.0/ or send a letter to Creative Commons, PO Box 1866, Mountain View, CA 94042, USA.