-
Notifications
You must be signed in to change notification settings - Fork 115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
String/Bytes with PDB Files #81
Comments
You can fix with
|
"fix" |
Not that I know of. Or rather, I don't think PDB files are ever supposed to contain non-ASCII characters. |
One of these two things needs to change. I'm not sure which though. |
(for file io, this is the difference between |
We always use 'r', not 'rb', so that's the problem. |
No, that's why it works within OpenMM, because |
Yes, that's what I meant. The problem is that it expects a string, not bytes. How do we change the urlopen() call to get strings instead? |
I don't think there's an easy one liner. You can do |
We could change OpenMM to respond appropriately to either strings or bytes too. |
What would be the easiest way of doing that? PdbStructure reads lines from the PDB file with
But if it's a stream of bytes, each iteration will get one byte instead of one line, right? Is there an easy way it can detect the type of stream and convert/wrap/whatever it? |
The iterator still splits on bytes-newlines.
|
So if we just add "pdb_line = pdb_line.decode('utf-8')", will that work everywhere? |
I think so. |
No. In Py3 if you open as 'r' instead of 'rb', you read strings. Strings in Py3 don't have a >>> 'this is a string'.decode('utf-8')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'str' object has no attribute 'decode'
>>> b'these are bytes'.decode('utf-8')
'these are bytes' This is, by far, the worst part about the Py2->Py3 transition. It ruins duck typing between file-like objects unless you just open everything with 'rb'... but yuck. For instance: I have a helper class in ParmEd that wraps around an open file-like object. See https://github.com/ParmEd/ParmEd/blob/master/chemistry/formats/io.py#L62-L103 for example. |
just check the type before decoding it. On Thu, Feb 26, 2015 at 4:23 PM, Jason Swails notifications@github.com
|
Ok, how about this:
That should work everywhere, right? |
How many places do you read a line from the file? You need to do this in each place. I have so many parsers in ParmEd that I needed to write a class to handle it more robustly |
PdbStructure is the only one we're talking about here. There are lots of other file readers, but they aren't relevant to PDBFixer. |
I meant how many times to you read a line from a file inside PDBFixer? Or is it just one big |
You mean PdbStructure? It's just one big loop. |
This is hopefully fixed by openmm/openmm#839. |
One of the py3 errors is
The reason for this is that
pdbfile
, in this error case, is a urllib request which is a file-like object of bytes. the type ofpdb_line
is bytes.the error is the same error as in
I'm not sure if we should count this as a bug in
simtk/openmm/app/internal/pdbstructure.py
or not. Does the PDB format specify an encoding? Should we consider it as string-based, or bytes based?The text was updated successfully, but these errors were encountered: