Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add EDA for input data set #125 #145

Merged
merged 30 commits into from Aug 25, 2020
Merged

Add EDA for input data set #125 #145

merged 30 commits into from Aug 25, 2020

Conversation

shahules786
Copy link
Contributor

Added EDA for the input data set
Results stored in Automl/EDA/Readme.md

@pplonski
Copy link
Contributor

@shahules786 thank you for the contribution! Great job! 👍

My comments below:

  1. Please use black for formatting.
  2. EDA should work even if y_train is missing. For example, EDA for test data with no target available. Can you add a test for that?
  3. I think compute in EDA should be static. Not need to store X, y in the EDA object. And the call from AutoML class should be static as well (it doesn't depend on AutoML parameters).

Maybe keep static compute in EDA and call it for fit data in AutoML if explain_level == 2 ?

  1. Not sure, if we need information about the ml_task, I think we can just guess what kind to target it is and plot the distribution?
  2. Can we add mean and std for continuous features?
  3. Would love to see some examples and maybe little explanations how to use ins in docs directory (just add automatic_eda.md file in docs and explain with examples how it can be used)

@pplonski pplonski added this to the version 0.6.1 milestone Aug 19, 2020
@shahules786
Copy link
Contributor Author

shahules786 commented Aug 20, 2020

@pplonski Thanks for your review, I have some doubts
1.sorry, didn't get that..can you explain?
2. yes, Will do.
3 I can make compute in EDA static, the eda method of automl class should be static too? So how to identify the results_path?
4. Yes, will do
5. I think it's already there?
6. Yes, will do

@pplonski
Copy link
Contributor

Hey @shahules786 Im on the trip right now. I will be back on Monday.

Black is pyhon code formating program.

@pplonski
Copy link
Contributor

Hey @shahules786! Regarding questions:
3. You dont need eda method in AutoML class. The eda will be only called on new data in fit method in AutoML when explain_level==2. What do you think?
5. You are right. I missed that.

@shahules786
Copy link
Contributor Author

@pplonski Ok,got it. Will do the necessary changes.

@shahules786
Copy link
Contributor Author

@pplonski Done. Can you review it?
I will add the examples in this issue #146
Planning to add more detailed examples displaying more functionalities.

@pplonski
Copy link
Contributor

Thank you! Looks good :) Please fix the tests and I can merge it.

@pplonski pplonski merged commit f97e534 into mljar:master Aug 25, 2020
@pplonski
Copy link
Contributor

Thank you @shahules786 for contribution! 👏 👏 👏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants