Heidi Uphoff edited this page Dec 16, 2015 · 12 revisions

Introduction

This is my final project for 590 Document Modeling. I created a XSD schema and Schematron validation file for metadata records of scientific research outputs. The schema focuses on descriptive and administrative elements only. I was interested in linking records from the same project together and the schema allows for that. I created XPath queries to demonstrate how projects could be queried versus all records. I also created a XSL stylesheet to display these records in a web browser.

Description/Proposal Summary

For my final project, I have created a XML schema for records of scientific research project collections. The schema accounts for descriptive and administrative elements only. This type of collection would need one record for the entire collection, and records for the types of materials found inside the collection: file directories of data, single data items, documentation, grants, data management plans, conference papers, presentations, posters, author-final-version journal articles, related resources, etc. The schema also accounts for files that need records, but do not have an associated research project record. The type of scientific research project collection I aimed to document through this XML schema is small to medium data projects, not big data projects. The schema may have the potential to be adapted by big data projects, although those types of projects often have their own specific needs.

This type of metadata schema could be utilized by research project members to document their project and the location of their project files. It could also be used by an institutional repository that collects research products. However, the specific use case I designed this demo project for is institutional repositories that both publicly share products of research and keep track of metrics of everything their institution’s authors share and produce. Therefore, it needed to fit most disciplines and be useful to others who discover and use the data that is described. It also needs to be flexible enough to fit other metadata schema so that records from other public data repositories can be pulled into the institution’s repository.

There are a few other important details accounted for in the metadata schema. The schema accounts for the fact that the data files may not be stored within the repository, but elsewhere. Size of data projects is a huge barrier for institutional repositories to overcome. For now, many data files are shared through request of the PI or stored on a large server space paid for by the research grant. Varying locations of the data files can be documented with this schema. The schema also needed to account for the fact that not all data will be saved after the project finishes due to cost. The decisions about what is kept and what was destroyed may be an important item to describe when sharing the project, especially if researchers are mandated to share the data openly by their funders. To this end, users needed opportunities to formally and informally share their data management plans for the project they are sharing. With the XML model, users can put a short data management plan description in an element in the record or they could create a record for a Data Management Plan (DMP) document and link that to a research project.

It was important in this demo to create a stylesheet to guide the xml data model. By examining the records through the end-user’s perspective I could see what needed to be included in each individual record versus what global settings should be set by stylesheets or repository controls. One example is administrative access and public access. In the stylesheet for this demo, funder is marked in a different color to demonstrate that this is an admin only field. In an actual repository, admin fields would only be visible when administrators log into the system. Likewise, if a record is marked draft, only administrators will be able to see the record. The stylesheet also formats and normalizes some of the data fields, eliminating some of the formatting considerations in the xml model. If there is no link to the item being described with the record, the record’s display will show “Contact Author.”

Deliverables

  • Documentation – definition of elements, purpose for schema
  • XSD schema
  • XSL Stylesheet for displaying elements
  • 15 – 20 Example Records, some associated with the same research projects, and a few standalone records (without an associated research project record) displayed on my VPS for demonstration purposes
  • Xpath expressions to demonstrate searching in a database of this type of xml record

Goals for XSD Schema

  • Simple, flexible
  • General enough to pull records from other sources and track the original vendor
  • Distinguish between the types of research products (papers, datasets, media, presentations, etc) and provide more specific elements for these different types
  • Link records together from same project, but allow “stand alone” records that are not linked to a project record
  • Link to outside resources
  • Provide access levels for elements and draft statuses for records to influence how these records would be displayed in a repository or database

Goals for Schematron document

  • Ensure no empty elements
  • Make sure elements that should contain text do

Schema Documentation

Also available in excel file in project files http://192.241.220.234/document_modeling/documentation

Element Required Multiple Children Parent Elements Attributes Description Example
record yes no, root element title, alt_title, creator, contributor, object_id, object_url, object_doi, date_ingested, creator_keyword, osti_subject, abstract, report_number, related_publication_meeting, funder, data_management_plans, language, related_resource, place, method, processing, variable_list, version, required_software, file_structure, rights, project_record collection or none if root element id, draft_status, resource_type record for research project product
title yes no none record none short descriptive title for item being described by record Synthesis, Structure, and Luminescent Property of a 1D Zn(II) Polymer with a Rigid Dicarboxylate
alt_title no yes none record none short descriptive title for item being described by record Synthesis, Structure, and Luminescent Property of a 1D Zn(II) Polymer with a Rigid Dicarboxylate
creator at least one creator or contributor must be included yes name, orcid, organization, email record none creator of the item being described by the record, name is a required child element of creator
contributor at least one creator or contributor must be included yes name, orcid, organization, email record none contributor to the project. Organization or name is a required child element of contributor. If no name is listed, than orcid and email are not allowed.
name yes if child of creator, name or organization is required as child or contributor no none creator, contributor none person name. last name first, first name last Uphoff, Heidi A.
orcid no no none creator, contributor none orcid researcher identifer http://orcid.org/0000-0002-1266-6008
organization name or organization is required as child of contributor no none creator, contributor none organization person is affiliated with Los Alamos National Laboratory, Research Library
email no no none creator, contributor none contact email for person hauphoff@lanl.gov
object_id no yes none record none internal database identifer for object being described
object_url no yes none record none url to object being described http://figshare.com/articles/Synthesis_Structure_and_Luminescent_Property_of_a_1D_Zn_II_Polymer_with_a_Rigid_Dicarboxylate/1254919
object_doi no yes none record none doi to object being described 10.6084/m9.figshare.1254919
date no yes none record none date associated with item, could be date range of project or date created 2010-2015
date_ingested yes no none record none date record was ingested into database 2014/12/01
creator_keyword yes yes none record none keywords to assist with discovery of record Phase transition
osti_subject no yes none record none subjects used by the Office of Scientific and Technical Information to track research categories in their databases Condensed Matter Physics, Superconductivity & Superfluidity(75)
abstract no no none record none a summary of the contents of the item be described by the record We report on the doping-induced antiferromagnetic state and Fermi-liquid state that are connected by a superconducting region in a series of CeIrIn5-xHgx, CeIrIn5-xSnx, and CeIr1-xPtxIn5 single crystals. Measurements of the specific heat C(T) and electrical resistivity rho(T) demonstrate that hole doping via Hg/In substitution gives rise to an antiferromagnetic ground state, but substitutions of In by Sn or Ir by Pt (electron doping) favor a paramagnetic Fermi-liquid state. A conelike non-Fermi-liquid region is observed near CeIrIn5, showing a diverging effective mass on the slightly Hg-doped side. The obtained temperature-doping phase diagram suggests that CeIrIn5 is in proximity to an antiferromagnetic quantum critical point, and heavy fermion superconductivity in this compound is mediated by magnetic quantum fluctuations rather than by valence fluctuations.
report_number no no none record none report number LA-UR-2012-02542
related_publication no yes publication_title, paper_title, citation, year, doi record none publication related to the dataset or the final publication citation for a pre or post print
publiation_title yes as child of related_publication no none related_publication none title of journal or publication Physical Review B
paper_title yes as child of related_publication no none related_publication none title of related paper CeIrIn5: Superconductivity on a magnetic instability
pub_citation yes as child of related_publication no none related_publication none citation to related paper Vol.89, iss.4, p.041101, JAN 3 2014
pub_year yes as child of related_publication no none related_publication none year related publication was published 2014
pub_doi no no none related_publication, related resource none doi to related publication or resource 10.1103/PhysRevB.89.041101
meeting no yes meeting_name, start_date, end_date, city, state, country record none conference or meeting related to the resource
meeting_name yes as child of meeting no none meeting none name of meeting or conference Physics Conference Series
start_date no no none meeting none start date of meeting or conference 2014/12/01
end_date no no none meeting none end date of meeting or conference 2014/12/03
city no no none meeting none city meeting or conference was held in Los Alamos
state no no none meeting none state meeting or conference was held in New Mexico
country no no none meeting none country meeting or conference was held in United States
funder no yes none record none research funder DOE Office of Science
data_management_plans no no none record none short description of data management plans for datasets or research project. Useful if formal DMP is not recorded in databse
language no yes none record none language of item being described in record French
related_resource no yes resource_title, source_title, citation, url, doi record none resource related to item being described in record
resource_title no no none related_resource none title of related resource
source_title no no none related_resource none source of related resource
resource_citation no no none related_resource none citation of related resource
resource_url either url or doi is required no none related_resource none url to related resource
resource_doi either url or doi is required no none related_resource none doi to related resource
data_source no yes none record none citation to source dataset or project was derived from
record_vendor no no none record none if record was pulled from other source, the name of that source or vendor FigShare
place no yes none record none relevant geographical locations to item being described by record " Carlsbad Caverns National Park
method no no none record none method of gathering data
processing no no none record none processing done to normalize data
variable_list no no variable record none list of variables in data
variable yes as child of variable_list yes variable_name, variable_description variable_list none variable in data or software
variable_name yes as child of variable no none variable none name of variable in data or software temp
variable_description yes as child of variable no none variable none desciption of variable in data or software temperature of sample in Celsius
version no no none record none version of item v2, version 5.2
required_software no yes none record none software required to use source, especially useful for datasets Mathematica
file_structure no no none record none description of organization and structure of file directories and their naming Data in organized in file folders by run and named by data collected.
rights no yes none record none relevant rights or copyright information Creative Commons Attribution 4.0 International
project_record no no none record none related project record id to resource
Attribute Name Description Options
id unique identifier for record
draft_status is the record a draft item, or an item not ready to be searchable and visible in the database yes or no
resource_type type of resource described figure, media, dataset, poster, thesis, code, presentation, file_directory, proposal, dmp, report, pre-print, post-print, documentation, project

XPath Query Examples

  • Find all project records ** /record[@resource_type="project"]
  • Find all records from the project with the phrase “Synaptic Proteins” in the title ** /record[@resource_type="project"]/title[contains(.,'Synaptic Proteins')]/parent::record/@id/string() <- get record id ** /record/title[./following-sibling::project_record[contains(.,"project_record3.xml")]] <-Find titles that are a part of the project
  • Find all data management plans in the database– will have to search for both dmp records and for records with the <data_management_plan> element ** /record[@resource_type="dmp"] <- find records for Data Management Plans (DMPs) ** /record[data_management_plans] <-find records that have data_management_plan elements
  • List all organizations in the database ** //organization/text()
  • Full record search for “Biology” or “Climate” ** /record[contains(.,"Biology")]|/record[contains(.,"Climate")]|/record[contains(.,"biology")]|/record[contains(.,"climate")]

Goals for XSL Stylesheet

  • Provide descriptive labels for users
  • Mark admin only elements (example: record_vendor, funder) in a different color
  • Formatting for multiple elements (example: creator_keyword) and for elements that need concatenated (example: meeting name, start_date – end_date, city state, county)
  • Display project records differently and link its related records on display
  • If there is not resource to download or link to, instruct users to “Contact Author”
Clone this wiki locally
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.