Skip to content

metanorma/pubid-ieee

Repository files navigation

IEEE publication identifiers ("IEEE PubID")

Purpose

Implements a mechanism to parse and utilize IEEE publication identifers.

Historic identifier patterns

There are at least two major "pattern series" of identifiers due to historical reasons: old (type I) and new (type II). This implementation attempts to support both types of publication identifier patterns.

Use cases to support

  • analyze a pattern of type I idetifier

  • parse type II idetifier into components

  • generate a filename from the components similar to type I pattern

Elements of the PubID

Publisher

Name Abbrev

Institute of Electrical and Electronics Engineers

IEEE

Report number

{number} - is a set of one or more digits and optional letters

Part

{part} - is a set of digits and optional letters; starts with a digit; if a letter or letters are present then they are in the end; optional

Subpart

{subpart} - is a set of digits and optional letters; optional, many subparts are possible

Year

{year} - is a set of 4 digits; optional

Corrigendum & Amendment

{cor} - is a corrigendum or an amendments with the pattern Cor {cornum}-{year} or Amd {cornum}:{year} where {cornum} is a set of digits; optional

Type I pattern

{publisher} {type} {series} {number}{part}.{subpart}{year} {edition}/{conform}/{correction}
  • {publisher} IEEE

  • {type} one of the values: Standard, Std, Draft, Draft Standard, Draft Supplement *

  • {series} one of the values: ISO/IEC, ISO/IEC/IEEE *

  • {number} set of digits optionally prefixed with uppercase letter and optionally suffixed with letter

  • {part} from 1 to 2 digits prefixed with . or - and optionally suffixed with up to 4 letters *

  • {subpart} 1 digit optionally suffixed with a letter *

  • {year} 4 digits prefixed with -, :, ` - `, or breakspace *

  • {edition} prefix Edition followed by a reference in brackets or prefix First edition followed by date in format YYYY-MM-DD *

  • {conform} prefix Conformance followed by 2 digits, dash, and 4 digits year *

  • {correction} prefix Cor optionally followed by breakspace, or prefix Amd followed by ., followed by from 1 to 2 digits, dash and 4 digits year *

(*) - optional

An identifier can be composed of 2 other identifiers with breakspace delimiter. Only the first identifier needs to cnatain puplisher, for the secont it’s optional

Following RegEx expression parses 100% of identifiers from the type I dataset:

{
  ^IEEE\s
  ((?<type1>Standard|Std|Draft(\sStandard|\sSupplement)?)\s)?
  ((?<series>ISO\/IEC(\/IEEE)?)\s)?
  (?<number1>[A-Z]?\d+[[:alpha:]]?)
  ([.-](?<part1>\d{1,2}(?!\d)[[:alpha:]]{0,4}))?
  (\.(?<subpart1>\d[[:alpha:]]?))?
  (?<year1>([-:]|\s-\s|,\s)\d{4})?
  (\s(IEEE\s(?<type2>Std)\s)?(?<number2>[A-Z]?\d+[[:alpha:]]?)
    ([.-](?<part2>\d{1,2}(?!\d)[[:alpha:]]{0,4}))?
    ([.](?<subpart2>\d[[:alpha:]]?))?
    (?<year2>([-:.]|_-|\s-\s|,\s)\d{4})?)?
  (\s(?<edition>Edition(\s\([^)]+\))?|First\sedition\s[\d-]+))?
  (\/(?<conform>Conformance\d{2})-(?<confyear>\d{4}))?
  (\/(?<correction>(Cor\s?|(Amd\.)\d{1,2})
    (?<coryear>(:|-|:-)\d{4}))?$
}x

Pasing PubID elements from type II identifiers

To parse PubID elements from the type II pattern identifiers we can use a RegEx expression:

{
  ^IEEE\s(?<number1>\w+(\.[A-Z]\d|\sHBK)?)
  (?<part1>(\.|\s)\d{1,4}[[:alpha:],]{0,7}|-\d?[A-Z]+|-\d(?=[-.]))?
  (?<subpart11>\.\d{1,3}[a-z]?|-\d{5}[a-z]?|-\d+(?=[-:_]))?
  (?<subpart12>\.\d|-\d+(?=-))?
  (?<year1>([-:.]|_-|\s-)\d{4})?
  (\/(?<number2>([A-Z]?\d+[a-z]?|Conformance\d+))
    ((\.|-)(?<part2>\d{1,3}[a-z]?)(?!\d))?
    (\.(?<subpart21>\d{1,2}))?)?
  (\/(?<number3>\d+)(\.(?<part3>\d))?)?
  (?<year2>([-:.]|_-|\s-)\d{4})?
  ((\/|_|-|\s\/)(?<correction>(Cor|(?i)Amd(?-i))(\s|\.|\.\s)?\d{1,2})
    (?<coryear>(:|-|:-|_[A-Z][a-z]{2}_)\d{4}(-\d{4})?)?)?$
}x

This RegEx expession covers 99% of the identifiers from the type II bibxml-ieee dataset.

File name generator

For type I identifiers file names are generated by replacing symbols /, \, ,, ', ", (, ), and breakspace with symbol . Sequences of multiple sybols should be squized to one symbol.

For type II identifiers it needs to parse PubID elements than join the elements in order:

IEEE.{number1}_{part1}.{subpart11}.{subpart12}-{year1}_{number2}_{part2}.{subpart21}_{number3}_{part3}-{year2}_{correction}-{coryear}