-
-
Notifications
You must be signed in to change notification settings - Fork 17.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: long str unconvert fix #6166
Conversation
this was a fixed bug in numpy 1.7.2 |
needs a perf check iirc this was an odd bottleneck course it only matters if the data is encoded |
I did an update on numpy and it fixed the issue. If you think the I should pref check the patch for numpy <= 1.7.1, I could do this, otherwise you can close the pr |
no...I like this PR! I don't think its actually an issue (the issue is in decoding strings, which is why I used this method in the first place which is quite fast). True you have to figure out the max length of the string... tell you what...why don't you put a conditional based on numpy <= 1.7.1 and then do your way, otherwise leave it? (use LooseVersion) |
I did the perf test, nothing changed significantly, and also implemented the check for the numpy version |
@@ -4157,8 +4158,8 @@ def _convert_string_array(data, encoding, itemsize=None): | |||
data = np.array(data, dtype="S%d" % itemsize) | |||
return data | |||
|
|||
|
|||
def _unconvert_string_array(data, nan_rep=None, encoding=None): | |||
_numpy_needs_str_fix = LooseVersion(np.__version__) < '1.7.2' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
name this like: _np_version_under_172
and put near the top of the file
Did the changes and added release notes. |
This change is actually very tricky and subtle because of exactly 3.3 / 2.7 interact with encodings.... still failing on 3.2 on 1.7.1..... I am not sure how to specify a dtype on 3.2 like 'S100'....maybe you can figure this out? |
what about |
do u have the ability to try in py3.3 on windows u can use the just released 3.1 and try modifying the source if u want to try compiling on windows I can help though |
can't help with this, have no windows machine around. |
ok no problem |
I did a PR on your branch with some fixes to make this work on windows: wabu#1 |
@wabu pls incorporate my changes when u can |
prior to numpy 1.7.2 long strings got truncated when read back from a hdf5 file
@wabu can you incoporate the changes and rebase? |
closing in favor of #6821 |
When storing long strings in hdf5, they get truncated at 64bytes when converted back to strings. The fix computes the itemsize of the strings and uses it before converting to strings. It would be possible to use the attribute information of the table, but that would require changes in the calling code.
Test that failed without the patch is included.