Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

particles.star from relion export particles not compatible with pyem #91

Closed
pconesa opened this issue Aug 8, 2019 · 6 comments
Closed
Assignees

Comments

@pconesa
Copy link
Contributor

pconesa commented Aug 8, 2019

From @andreanans, ported in Scipion repo: I2PC/scipion#2024

Hi,

I noticed that the star file written out by 'relion export particles' is incompatible with the pyem scripts. I have to pass the star file through relion_star_handler to produce a star file in the correct format.

We often run this command:
csparc2star.py --copy-micrograph-coordinates particles.star particles_selected.cs final_particles.star

and get the following error:

File "pandas/_libs/parsers.pyx", line 545, in pandas._libs.parsers.TextReader.cinit
pandas.errors.EmptyDataError: No columns to parse from file

I think it's something to do with the delimiter but I have no clue how to fix it other than running relion_star_handler and removing a column to reformat the file.

Any ideas?

@azazellochg
Copy link
Member

@andreanans, do you mind attaching examples of particles.star particles_selected.cs final_particles.star files? I do not use cryosparc or pyem, but I have installed pyem.

@azazellochg
Copy link
Member

Alright, I found the error: https://github.com/asarnow/pyem/blob/master/pyem/star.py#L260 - the string is not stripped before parsing. It will fail if you have a space before "_rlnColumnName..." @andreanans you should report it to pyem developer. Relion itself has no problems parsing such files.

Original STAR format definition does not specify anything about this, however there are examples of using a space in the paper.

@andreanans
Copy link

Thank you so much, I spent tons of time trying to debug but didn't notice there was a space before _rlnColumnName etc. I'll send a message to the pyem github. Cheers

@asarnow
Copy link

asarnow commented Aug 28, 2019

I fixed this in pyem (master branch, release branch soon). However the original specification linked above does clearly require data names to begin with an underscore - see Table 1.

The only lines that start with spaces in the paper are lines of data items, since these items contain "text strings" which are bounded by "blanks, single quotes, double quotes or semicolons as the first character of a line."

@delarosatrevin
Copy link
Member

@asarnow Thanks a lot for the speedy fix!

I was curious and took a look at the reference provided by @azazellochg . In Table 1, the first definition is 'text string' which states that is a string delimited by blanks...so, all the following values are the same 'text string':

' _name'
'    _name'
'       _name'

So, I understand from it that the correct parsing is to strip leading and trailing spaces from any line.
So, after this first definition, the 'data name' definition (a 'text string' starting with a '') make all the above values valid 'data names'.
Indeed, in one of the following examples, after the 'loop
' there is a spaces before the following 'data names'

start-example

@asarnow
Copy link

asarnow commented Aug 28, 2019

That's true, a data name is explicitly stated to be a text string. I guess I read the paper closely, but needed further grinding.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants