Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with very wide datasets #20

Closed
Tagar opened this issue Oct 19, 2017 · 5 comments
Closed

Issues with very wide datasets #20

Tagar opened this issue Oct 19, 2017 · 5 comments
Assignees

Comments

@Tagar
Copy link

Tagar commented Oct 19, 2017

There is a bug in parso when it tries to read a really wide file. When I read a sas7bdat with only 18 columns, it works perfectly, but when I read my sas7bdat with ~13000 columns, it breaks. For some reason, any of the values that aren’t zeros are replaced with null.

@Yana-Guseva
Copy link
Collaborator

Yana-Guseva commented Oct 20, 2017

Hi Ruslan,

Could you please specify which version of Parso you use? Could you also tell me more about the error or ever copy its stacktrace here? It would be great to get the file where the error occurs, it is the easiest way to understand where the problem is. Thank you.

@Tagar
Copy link
Author

Tagar commented Oct 20, 2017

Hi Yana,

We're using 1.2.1 through this library: https://github.com/saurfang/spark-sas7bdat/
We ran into multiple issues, some of them got resolved by switched off compression in
sas7bdat files. But there are still some issues with floating point number are not getting
through correctly to. I think at this point it makes for us to try to get spark-sas7bdat
compiled with latest version of parso/ to make sure we're not running into issues that
are already resolved.

Thank you.

@Yana-Guseva
Copy link
Collaborator

Ok, let me know if it doesn't help.

@printsev
Copy link
Contributor

Please note that this old version doesn't support some compression mechanisms, while the latest version does. Also some other issues were fixed, as you understand 1.2.1 is significantly outdated - there was a major release and a couple of minor ones.

@Tagar
Copy link
Author

Tagar commented Oct 26, 2017

We can read very wide SAS datasets with latest version of Parso library.
Thank you @printsev and @Yana-Guseva

@Tagar Tagar closed this as completed Oct 26, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants