Summary
When a Pajek .net file has node labels that are:
- quoted (wrapped in
"...")
- contain spaces or commas (or other special characters like
&)
SocNetV truncates the label to only the first whitespace-delimited token, discarding the rest.
Because the GraphML exporter writes whatever label is stored in memory, exporting to GraphML also produces the truncated label, so the original data is silently lost.
Reproduction
File TinyPajek_Comma_Label_Quoted_N2.net (already in the test-data directory):
*Network CommaLabelRegression
*Vertices 2
1 "Bureau of Alcohol, Tobacco, & Firearms"
2 "Federal Bureau of Investigation"
*Arcs
1 2 1
Expected node labels:
- Node 1 →
Bureau of Alcohol, Tobacco, & Firearms
- Node 2 →
Federal Bureau of Investigation
Actual labels shown (and saved to GraphML):
- Node 1 →
Bureau
- Node 2 →
Federal
Root cause
In parser/parser_pajek.cpp the line is split on \s+ (whitespace) before the label is extracted:
myRegExp.setPattern("\\s+");
lineElement = str.split(myRegExp, Qt::SkipEmptyParts);
...
label = lineElement[1]; // only the first token of a multi-word quoted label
For 1 "Bureau of Alcohol, Tobacco, & Firearms", lineElement[1] is "Bureau — just the opening-quoted first word. The remaining words are never read.
Summary
When a Pajek
.netfile has node labels that are:"...")&)SocNetV truncates the label to only the first whitespace-delimited token, discarding the rest.
Because the GraphML exporter writes whatever label is stored in memory, exporting to GraphML also produces the truncated label, so the original data is silently lost.
Reproduction
File
TinyPajek_Comma_Label_Quoted_N2.net(already in the test-data directory):Expected node labels:
Bureau of Alcohol, Tobacco, & FirearmsFederal Bureau of InvestigationActual labels shown (and saved to GraphML):
BureauFederalRoot cause
In
parser/parser_pajek.cppthe line is split on\s+(whitespace) before the label is extracted:For
1 "Bureau of Alcohol, Tobacco, & Firearms",lineElement[1]is"Bureau— just the opening-quoted first word. The remaining words are never read.