Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[INF] Support Pyspark but not require it? #551

Closed
hectormz opened this issue Sep 2, 2019 · 3 comments
Closed

[INF] Support Pyspark but not require it? #551

hectormz opened this issue Sep 2, 2019 · 3 comments
Labels
available for hacking This issue has not been claimed by any individual. good intermediate issue Issues that are good for seasoned programmers to make a contribution infrastructure Infrastructure-related issues

Comments

@hectormz
Copy link
Collaborator

hectormz commented Sep 2, 2019

I'm trying to catch up on the pyspark conversation. But I was wondering if we were supporting pyspark, but not requiring it @ericmjl , @zjpoh ?

I updated my local pyjanitor and had to download pyspark as the requirements changed, and it was about 215MB. I think that's fine if it's required for development, but might not expected for users that only anticipate using pandas. I believe rdkit is currently optional for those users that will use it (maybe something else too), and I think that could be a good idea here too.

Again, I haven't looked through the entire pyspark conversation and changes yet.

@zjpoh
Copy link
Collaborator

zjpoh commented Sep 2, 2019

I think that makes a lot of sense. I'll look into how rdkit is set as optional and apply that to pyspark. I'm open to making that optional for development version too.

@hectormz
Copy link
Collaborator Author

hectormz commented Sep 2, 2019

Sounds good, feel free to discuss any thoughts, or we'll check it out in the PR

@ericmjl ericmjl added available for hacking This issue has not been claimed by any individual. good intermediate issue Issues that are good for seasoned programmers to make a contribution infrastructure Infrastructure-related issues labels Sep 4, 2019
@hectormz
Copy link
Collaborator Author

hectormz commented Sep 4, 2019

@zjpoh , on your next pyspark related PR, do you want to add spark_functions to pytest.ini following the pattern of other modules?

Running pytest gives this warning:

Unknown pytest.mark.spark_functions - is this a typo?  You can register custom marks to avoid this warning 

@zjpoh zjpoh mentioned this issue Sep 5, 2019
8 tasks
@hectormz hectormz closed this as completed Sep 6, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
available for hacking This issue has not been claimed by any individual. good intermediate issue Issues that are good for seasoned programmers to make a contribution infrastructure Infrastructure-related issues
Projects
None yet
Development

No branches or pull requests

3 participants