Skip to content

Pajek parser: quoted node labels with spaces or commas are truncated to first word #233

@oxy86

Description

@oxy86

Summary

When a Pajek .net file has node labels that are:

  • quoted (wrapped in "...")
  • contain spaces or commas (or other special characters like &)

SocNetV truncates the label to only the first whitespace-delimited token, discarding the rest.
Because the GraphML exporter writes whatever label is stored in memory, exporting to GraphML also produces the truncated label, so the original data is silently lost.

Reproduction

File TinyPajek_Comma_Label_Quoted_N2.net (already in the test-data directory):

*Network CommaLabelRegression

*Vertices 2
1 "Bureau of Alcohol, Tobacco, & Firearms"
2 "Federal Bureau of Investigation"

*Arcs
1 2 1

Expected node labels:

  • Node 1 → Bureau of Alcohol, Tobacco, & Firearms
  • Node 2 → Federal Bureau of Investigation

Actual labels shown (and saved to GraphML):

  • Node 1 → Bureau
  • Node 2 → Federal

Root cause

In parser/parser_pajek.cpp the line is split on \s+ (whitespace) before the label is extracted:

myRegExp.setPattern("\\s+");
lineElement = str.split(myRegExp, Qt::SkipEmptyParts);
...
label = lineElement[1];   // only the first token of a multi-word quoted label

For 1 "Bureau of Alcohol, Tobacco, & Firearms", lineElement[1] is "Bureau — just the opening-quoted first word. The remaining words are never read.

Metadata

Metadata

Assignees

Labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions