Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spaces in data labels on #L and other lines #109

Closed
JPHammonds opened this issue Jun 29, 2017 · 23 comments
Closed

Spaces in data labels on #L and other lines #109

JPHammonds opened this issue Jun 29, 2017 · 23 comments

Comments

@JPHammonds
Copy link
Contributor

I have run into a case where a beam line uses spaces in data labels (column names in data). This feeds over into some other lines which refer to these column labels such as #M & #@Roi). I have attached a file where scan 26 exhibits this problem. I see the problem when trying to parse #M line. This does seem to split the L line properly although I do not notice why in the code. On L lines, there are double spaces between column labels. On M line there is also double space between monitor counts and (channel name). Also for #@Roi.

lineup.zip

@JPHammonds
Copy link
Contributor Author

I should note that on #@Roi the double space only appears before (channel name) not after.

@prjemian
Copy link
Owner

prjemian commented Jun 29, 2017 via email

@JPHammonds
Copy link
Contributor Author

This is not double space in the data column, it is double space in the #L line, I think so that the spaces in column labels can be found. It is also in the #M and #@Roi. It seems to be in the files consistently. These are files generated by spec. Looks like in #L you use re.split(" +, strip_first_word(text)) to split the names.

@JPHammonds
Copy link
Contributor Author

People should just not use spaces in labels in a text file, but spec has allowed it this long.

@prjemian
Copy link
Owner

prjemian commented Jun 29, 2017 via email

@JPHammonds
Copy link
Contributor Author

JPHammonds commented Jun 29, 2017 via email

@prjemian
Copy link
Owner

For #L lines, this splits on two or more spaces: scan.L = re.split(" +", strip_first_word(text))

@prjemian
Copy link
Owner

Re: SPEC data files rely on separating the data column labels with a two-space delimiter. One space is allowed in the column label, two spaces (or more) marks the start of a new column label. Yet this common practice, it was necessary to cover the special case where a scan used a single space as a delimiter.

@prjemian
Copy link
Owner

Not sure I understand how the #M line is involved. Syntax is #M num

@prjemian
Copy link
Owner

syntax for #@ROI line: is #@ROI n f l where n is a name and f and l are integers.

This is parsed with:

        scan.M, dname = strip_first_word(text).split()
        scan.monitor_name = dname.lstrip('(').rstrip(')')

The code expects there is no white space in the dname. To handle a name with space would be a change of the code which does not look difficult.

Can you post an example here?

@prjemian
Copy link
Owner

Above, I'm referring to the SPEC syntax, as described in spec2nexus docs.

@JPHammonds
Copy link
Contributor Author

JPHammonds commented Jun 29, 2017 via email

@JPHammonds
Copy link
Contributor Author

JPHammonds commented Jun 29, 2017 via email

@JPHammonds
Copy link
Contributor Author

More food for thought here. The file that I supplied for #110 (I'll include here also) shows a column name after a #T line as well. This comes from a different APS beam line than the one mentioned above. I am not sure if this is a modification of SPEC to allow defining the column that did not make it into the user manual, or if this is simply a local customization that does not seem to bother SPEC.

bnt5bt.spec.zip

@prjemian
Copy link
Owner

There's a pattern emerging here.

  1. People can't read the SPEC documentation.
  2. The parser must be more robust for lines such as:
#T 0.5  (Seconds)
#M 200000 (I0)

The string provided in parentheses is not necessary since the standard macros put the column with the counting reference (time or monitor) ALWAYS in the next to last column.

Nevertheless, the parser should be changed to strip the #T or #M away and then parse only the next number. Anything else on the line should be ignored since it has no value.

@jkirchman: This is 8-ID-E writing this data file. They need some advice.

@prjemian prjemian changed the title Spaces in data labels on #L and other lines Spaces in data labels on #L and other lines Jul 11, 2017
@prjemian prjemian changed the title Spaces in data labels on #L and other lines Spaces in data labels on #L and other lines Jul 11, 2017
@prjemian
Copy link
Owner

That same file is even more troublesome (attn @jkirchman). Later, scan 26 starts with counting time of 0 seconds:

#S 26  xpcsscan 20 1
#D Wed Jun 07 23:31:38 2017
#T 0  (Seconds)

and then reports counting time of 1 second each point

#L img_n  Epoch  pind1  pind2  pind3  pind4  I_APS  cyber  cyber_u  cyber_l  atten  T_CTL  T_SAM  Seconds  ccdc
1 21903 0 1030 850 549 509029 0 2 2 0 0 27 1 25.37771
2 21905 0 1030 850 549 509004 1 2 3 0 0 27 1 51.346129

@prjemian
Copy link
Owner

The problem seems to be when the data name supplied on the control line in an unexpected place (such #T, #M, #@ROI, ...) has a space in it.

@jkirchman: Is the parenthesized data column name something new in SPEC? It is not listed in the SPEC page about the standard file content.

@prjemian
Copy link
Owner

@JPHammonds : the example file you provided for the #@ROI lines has these two ROIs:

FeKa(mca1R1)  MnKa(mca1R0)

Users should be cautioned against using data column names with special characters such as parentheses, braces, brackets, and lots of other characters that are often delimiters or decorators.

@prjemian prjemian added this to the 2017-07 bugfix release milestone Jul 11, 2017
@jkirchman
Copy link

I checked this with numerous spec data files from several beamlines. I found files from as far back as 2010 that contained either #T or #M lines with the description in parentheses.
#T 1 (Seconds)
#M 100000 (IC6-B)
I believe this has been spec's normal behavior for quite some time now. I think the suggestion to ignore
anything after the first number following either #T or #M is a good one.

@JPHammonds
Copy link
Contributor Author

JPHammonds commented Jul 11, 2017 via email

@prjemian
Copy link
Owner

prjemian commented Jul 11, 2017 via email

@JPHammonds
Copy link
Contributor Author

JPHammonds commented Jul 11, 2017 via email

@jkirchman
Copy link

Checking the spec manual is advisable but please be aware that the spec user manual at certif.com is not updated as often as it should be. I would not call the website the definitive help that you might expect. The most effective help can be found through use of the help command in spec or the latest release notes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants