Add DOI metadata renderer with DataCite and CrossRef support, citation styles, and enhanced metadata extraction#355
Conversation
Co-authored-by: maximiliani <43142805+maximiliani@users.noreply.github.com>
…aces Co-authored-by: maximiliani <43142805+maximiliani@users.noreply.github.com>
Co-authored-by: maximiliani <43142805+maximiliani@users.noreply.github.com>
maximiliani
left a comment
There was a problem hiding this comment.
This is a good start. Please add some Storybook stories that demonstrate multiple examples. See comments for more stuff to change... Please separate the item generation and parsing logic for DataCite and Crossref.
Some example DOIs:
- Journal paper, DataCite: 10.5445/IR/1000185135
- Journal paper, CrossRef: 10.1109/eScience65000.2025.00022
- Software on Zenodo, DataCite: https://doi.org/10.5281/zenodo.13629109
- RFC, DataCite: doi:10.17487/rfc3650
- Book, CrossRef: 10.1007/978-1-4419-8598-9
- Slides, DataCite: 10.5445/IR/1000178054
- Preprint on arXiv, DataCite: 10.48550/ARXIV.2505.16550
…d citation styles, icons, and Storybook examples Co-authored-by: maximiliani <43142805+maximiliani@users.noreply.github.com>
Added 8 Storybook stories demonstrating all example DOIs:
DataCite and CrossRef logic now fully separated into dedicated parser files. Commit: d4c31ec |
Signed-off-by: Maximilian Inckmann <maximilian.inckmann@kit.edu>
Signed-off-by: Maximilian Inckmann <maximilian.inckmann@kit.edu>
Signed-off-by: Maximilian Inckmann <maximilian.inckmann@kit.edu>
Signed-off-by: Maximilian Inckmann <maximilian.inckmann@kit.edu>
There was a problem hiding this comment.
Pull request overview
This pull request implements comprehensive DOI (Digital Object Identifier) metadata rendering for academic resources. The implementation adds a modular DOI renderer that fetches metadata from DataCite or CrossRef APIs, displays rich citation information with configurable styles, and presents structured metadata including authors, publication dates, resource types, and descriptions.
Changes:
- Added modular DOI renderer architecture with separate parsers for DataCite and CrossRef metadata schemas
- Implemented 5 citation styles (APA, Chicago, IEEE, Harvard, Anglia Ruskin) with configurable settings
- Enhanced metadata extraction including ORCiD identifiers, ROR affiliations, and JATS syntax parsing for CrossRef abstracts
- Integrated DOI renderer into the renderer priority system (priority 2, before HandleType at 3)
- Added comprehensive Storybook examples covering 8 different DOI scenarios
- Updated documentation in README and MDX files
Reviewed changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| packages/stencil-library/src/utils/utils.ts | Registers DOIType renderer at priority 2 and adjusts priorities for subsequent renderers |
| packages/stencil-library/src/rendererModules/DOI/DOI.ts | Core DOI class with validation, parsing, and URL generation |
| packages/stencil-library/src/rendererModules/DOI/DataCiteInfo.ts | DataCite-specific metadata parser with schema-aware field extraction |
| packages/stencil-library/src/rendererModules/DOI/CrossRefInfo.ts | CrossRef-specific metadata parser with JATS syntax support |
| packages/stencil-library/src/rendererModules/DOI/DOIInfo.ts | Wrapper combining DataCite and CrossRef sources with fallback logic |
| packages/stencil-library/src/rendererModules/DOI/DOIType.tsx | Main renderer implementing GenericIdentifierType with preview and citation display |
| packages/stencil-library/src/rendererModules/DOI/CitationStyles.ts | Citation formatting utilities for 5 academic citation styles |
| packages/stencil-library/src/rendererModules/DOI/ResourceTypeIcons.tsx | Resource type mapping, beautification, and DataCite/CrossRef logo SVG components |
| packages/stencil-library/src/components/pid-component/pid-component.stories.ts | 8 Storybook examples demonstrating DOI rendering with different sources and configurations |
| packages/stencil-library/src/components/pid-component/pid-component.mdx | Documentation updates explaining DOI support, citation styles, and usage examples |
| packages/stencil-library/src/components/pid-pagination/readme.md | Table delimiter alignment fix (auto-generated) |
| packages/stencil-library/src/components/pid-data-table/readme.md | Table delimiter alignment fix (auto-generated) |
| packages/stencil-library/src/components/pid-component/readme.md | Table delimiter alignment fix (auto-generated) |
| packages/stencil-library/src/components/pid-actions/readme.md | Table delimiter alignment fix (auto-generated) |
| README.md | Added DOI support documentation including citation styles and configuration options |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| const titleTrunc = truncateTitle(title, 60); | ||
| const yearPart = year ? `, ${year.split('-')[0]}` : ''; | ||
|
|
||
| return `${initial} ${authorName}${etAl}, "${truncate ? titleTrunc : title}"${yearPart}`; |
There was a problem hiding this comment.
When the author has no givenName, the initial variable will be an empty string, resulting in a citation starting with a space: " AuthorName et al., ...". Consider adjusting the formatting to handle the case when there's no initial: ${initial}${initial ? ' ' : ''}${authorName}${etAl}
| return `${initial} ${authorName}${etAl}, "${truncate ? titleTrunc : title}"${yearPart}`; | |
| return `${initial}${initial ? ' ' : ''}${authorName}${etAl}, "${truncate ? titleTrunc : title}"${yearPart}`; |
| }; | ||
|
|
||
| if (!result.name && corresponding.givenName && corresponding.familyName) { | ||
| result.name = `${corresponding.givenName} ${corresponding.familyName}`; |
There was a problem hiding this comment.
The corresponding author name construction logic is incomplete. If corresponding.name is empty and only one of givenName or familyName is present (but not both), the result.name will remain empty. Add an else if (!result.name) block after line 168 to handle the case where only one name component is available, similar to the logic in the creators getter at lines 119-120.
| result.name = `${corresponding.givenName} ${corresponding.familyName}`; | |
| result.name = `${corresponding.givenName} ${corresponding.familyName}`; | |
| } else if (!result.name) { | |
| result.name = corresponding.givenName || corresponding.familyName || ''; |
| export function beautifyResourceType(resourceType: string): string { | ||
| const normalized = resourceType | ||
| .toLowerCase() | ||
| .replace("_", "").replace("-", ""); |
There was a problem hiding this comment.
The replace method only replaces the first occurrence of "_" and "-" characters. If a resource type contains multiple underscores or hyphens (e.g., "journal_article_preprint"), only the first underscore would be removed. Use replaceAll() or a global regex instead: .replace(/_/g, "").replace(/-/g, "") to replace all occurrences.
| .replace("_", "").replace("-", ""); | |
| .replace(/_/g, "").replace(/-/g, ""); |
| private parseJATS(text: string): string { | ||
| if (!text) return text; | ||
|
|
||
| // Remove common JATS tags | ||
| return text | ||
| .replace(/<jats:p>/g, '') | ||
| .replace(/<\/jats:p>/g, '\n') | ||
| .replace(/<jats:italic>/g, '<i>') | ||
| .replace(/<\/jats:italic>/g, '</i>') | ||
| .replace(/<jats:bold>/g, '<b>') | ||
| .replace(/<\/jats:bold>/g, '</b>') | ||
| .replace(/<jats:sub>/g, '<sub>') | ||
| .replace(/<\/jats:sub>/g, '</sub>') | ||
| .replace(/<jats:sup>/g, '<sup>') | ||
| .replace(/<\/jats:sup>/g, '</sup>') | ||
| .replace(/<jats:title>/g, '<strong>') | ||
| .replace(/<\/jats:title>/g, '</strong>') | ||
| .replace(/\n\n+/g, '\n\n') | ||
| .trim(); |
There was a problem hiding this comment.
The JATS parsing converts XML tags to HTML tags (e.g., <jats:italic> to <i>, <jats:bold> to <b>), which are then included in the description text. If this text is later rendered as HTML without proper sanitization, it could potentially lead to XSS vulnerabilities. Ensure that the component rendering this description properly sanitizes or escapes HTML content, or consider removing HTML tags entirely and using plain text with formatting indicators instead.
| const authorName = author.familyName || author.name.split(' ').pop() || author.name; | ||
| const etAl = count > 1 ? ' et al.' : ''; | ||
| const yearPart = year ? ` (${year.split('-')[0]})` : ''; | ||
| const titleTrunc = truncateTitle(title, 60); |
There was a problem hiding this comment.
The title truncation length (60) is hardcoded in multiple citation format functions. Consider extracting this as a constant (e.g., const MAX_TITLE_LENGTH = 60) at the top of the file to make it easier to maintain and adjust if needed. This follows the DRY principle and makes the magic number more explicit.
| const yearPart = year ? `, ${year.split('-')[0]}` : ''; | ||
| const titleTrunc = truncateTitle(title, 60); | ||
|
|
||
| return `${authorName}, ${initials}${etAl}${yearPart}. ${truncate ? titleTrunc : title}`; |
There was a problem hiding this comment.
When the author has no givenName, the initials variable will be an empty string, resulting in a citation like "AuthorName, , Year. Title" with an extra comma and space. Consider adding a check to omit the initials and extra comma when empty, or adjust the formatting logic: ${authorName}${initials ? , ${initials} : ''}${etAl}${yearPart}
| return `${authorName}, ${initials}${etAl}${yearPart}. ${truncate ? titleTrunc : title}`; | |
| return `${authorName}${initials ? `, ${initials}` : ''}${etAl}${yearPart}. ${truncate ? titleTrunc : title}`; |
Implements comprehensive DOI (Digital Object Identifier) rendering to display rich metadata for academic resources. DOIs are Handle PIDs starting with
10.prefix and are resolved via DataCite or CrossRef APIs.Implementation
Modular renderer architecture (
rendererModules/DOI/):DOI.ts- Detection via/^10\.\d{4,9}\/[-._;()/:A-Za-z0-9]+$/, handlesdoi:anddoi.orgURL prefixesDataCiteInfo.ts- Dedicated DataCite metadata parser with schema-specific logicCrossRefInfo.ts- Dedicated CrossRef metadata parser with JATS syntax supportDOIInfo.ts- Lightweight wrapper combining both sources with fallbackDOIType.tsx- Renders preview with logos and citation styles, generates metadata tableCitationStyles.ts- Citation formatting utilities (APA, Chicago, IEEE, Harvard, Anglia Ruskin)ResourceTypeIcons.tsx- Resource type icons and logo componentsRenderer priority: Set to 2 (before HandleType at 3) to catch DOIs before generic Handle processing.
Enhanced metadata extraction:
nameIdentifiers(DataCite) andORCIDfield (CrossRef)affiliationIdentifierin affiliationsMetadata fields displayed:
Citation styles:
{"type":"DOIType","values":[{"name":"citationStyle","value":"APA"}]}Storybook examples: 8 stories covering DataCite (journal papers, software, RFC, slides, preprints) and CrossRef (journal papers, books) examples.
Usage
Technical notes
cachedFetchfor API calls with proper error handlinggenerateItems()method for schema-specific renderingOriginal prompt
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.