Issues with very wide datasets #20

Tagar · 2017-10-19T16:13:24Z

There is a bug in parso when it tries to read a really wide file. When I read a sas7bdat with only 18 columns, it works perfectly, but when I read my sas7bdat with ~13000 columns, it breaks. For some reason, any of the values that aren’t zeros are replaced with null.

Yana-Guseva · 2017-10-20T09:41:04Z

Hi Ruslan,

Could you please specify which version of Parso you use? Could you also tell me more about the error or ever copy its stacktrace here? It would be great to get the file where the error occurs, it is the easiest way to understand where the problem is. Thank you.

Tagar · 2017-10-20T15:31:41Z

Hi Yana,

We're using 1.2.1 through this library: https://github.com/saurfang/spark-sas7bdat/
We ran into multiple issues, some of them got resolved by switched off compression in
sas7bdat files. But there are still some issues with floating point number are not getting
through correctly to. I think at this point it makes for us to try to get spark-sas7bdat
compiled with latest version of parso/ to make sure we're not running into issues that
are already resolved.

Thank you.

Yana-Guseva · 2017-10-20T18:35:52Z

Ok, let me know if it doesn't help.

printsev · 2017-10-26T09:05:14Z

Please note that this old version doesn't support some compression mechanisms, while the latest version does. Also some other issues were fixed, as you understand 1.2.1 is significantly outdated - there was a major release and a couple of minor ones.

Tagar · 2017-10-26T23:24:18Z

We can read very wide SAS datasets with latest version of Parso library.
Thank you @printsev and @Yana-Guseva

Tagar mentioned this issue Oct 19, 2017

Issues with very wide datasets saurfang/spark-sas7bdat#26

Closed

printsev assigned Tagar Oct 26, 2017

Tagar closed this as completed Oct 26, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issues with very wide datasets #20

Issues with very wide datasets #20

Tagar commented Oct 19, 2017

Yana-Guseva commented Oct 20, 2017 •

edited

Loading

Tagar commented Oct 20, 2017

Yana-Guseva commented Oct 20, 2017

printsev commented Oct 26, 2017

Tagar commented Oct 26, 2017

Issues with very wide datasets #20

Issues with very wide datasets #20

Comments

Tagar commented Oct 19, 2017

Yana-Guseva commented Oct 20, 2017 • edited Loading

Tagar commented Oct 20, 2017

Yana-Guseva commented Oct 20, 2017

printsev commented Oct 26, 2017

Tagar commented Oct 26, 2017

Yana-Guseva commented Oct 20, 2017 •

edited

Loading