-
Notifications
You must be signed in to change notification settings - Fork 855
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds plot_decision_region_slices function #189
Conversation
…/plotting/decision_regions.py
A couple of comments:
training_features = ['feature1', 'feature2']
target_feature = 'target'
filler_feature_dict = {'feature3': 1.0}
plot_decision_region_slices('feature1', 'feature2', dataframe,
training_features, target_feature, clf,
filler_feature_dict=filler_feature_dict) But this can also be done using numpy arrays and replacing the column names with the column indices. Which direction (pandas dataframe vs. numpy array) do you think is the better way to go? |
Good point. I think what we could do is to add a parameter to the plot_decision_regions(..., feature_index=None) And if Or in other words, I guess there's no good way around making the What do you think? |
Good idea. On second thought, I think that modifying
Couldn't agree more 👍. I'll work on adding this additional feature to |
plot_decision_regions should behave exactly the same for the case when the number of training features (X.shape[1]) is 1 or 2. This commit adds the additional keyword arguments feature_index and filler_feature_dict. Feature_index is an array-like object with two items that specify the features indices in X to be plotted on the x and y axis of the decision regions plot. Filler_feature_dict is a dictionary of feature index-value pairs that will be filled in for the features not on the x and y axis.
Hello @jrbourbeau! Thanks for updating the PR. Cheers ! There are no PEP8 issues in this Pull Request. 🍻 Comment last updated on May 16, 2017 at 02:30 Hours UTC |
A few comments about the previous commit
plot_decision_regions(X, y, clf, feature_index=(0, 2), filler_feature_dict={1: 999}) Passing the filler values in with a dictionary of index-value pairs is a little awkward, but I haven't been able to think of simpler way of doing it. Any suggestions?
My solution to this problem was...the null solution. I put |
Thanks for the notes!
Sounds great! Unfortunately, there are currently no unit tests for the plotting functions, but I will also test it "manually" before I merge (later when we think the PR looks ready :))
But the
Hm, not sure, but is there a reason why you would use different filler values for different feature columns? Otherwise, we could simply accept an integer for plot_decision_regions(X, y, clf, feature_index=(0, 2), filler_feature_dict=999) would will the columns with index 1 and index 3 with the 999 filler values. Would that work, and would this also address the issue in
|
Sorry, my first response is super long!
The
When the
The user needs to specify values to be filled in for the other features (in this case the petal length and petal width). Say, they specify a petal length filler value of 1.2 and petal width filler value of 0.3. Then the new feature array would be something like
This example, plotting the sepal length and width decision region for a petal length of 1.2 and a petal width of 0.3, would be implemented using the following code. plot_decision_regions(X_iris, y_iris, clf,
feature_index=(0, 1),
filler_feature_dict={2: 1.2, 3: 0.3}) That was pretty long-winded, hopefully my explanation makes some sense
I think that the filler values will change what the decision regions look like. In the above example, if instead of choosing a petal length of 1.2 and petal width of 0.3 the user chose 1.2 for both the petal length and petal width, one would expect the decision boundaries to look different from the 1.2 and 0.3 case. While a single filler value can be used when there are 3 training features, for more than 3 features multiple filler values should be possible. Although, you bring up a good point, we could build into |
Thanks for the clarification; I was a bit confused yesterday and thought one would typically use the same filler values. But yeah, you bring up an important point:
We need to remember to be careful with using a dictionary as a default for a function argument -- i.e., we mustn't modify it in the function in any way, since it is mutable. I remember some gotcha from when I was new at Python :P : >>> def my_func(dct={'a': 1, 'b': 2}, some_pair=('c', 1)):
... dct[some_pair[0]] = some_pair[1]
... return dct
>>> res1 = my_func()
>>> print(res1)
{'a': 1, 'b': 2, 'c': 1}
>>> res2 = my_func(some_pair=('d', 2))
>>> print(res2)
{'a': 1, 'b': 2, 'c': 1, 'd': 2} |
That's a great point. I was thinking about making Is this sufficient to avoid mutability issues with |
That sounds great! It would make it also kind of fool-proof to avoid confusion about potentially unexpected results if someone hasn't read the docstring carefully. |
Awesome! With regards to making a scatter plot of the training data on top of the decision regions, I was thinking about a two options:
What do you think about this? |
Adds a test that checks the format of feature_index and the existence of filler_feature_dict. Also removes test that checks that there aren't more than 2 training features.
That's a good idea; hope it doesn't overcomplicate things when plotting the training samples though. Otherwise we could just leave them out -- I think it would be useful to have them though. Would you suggest specifying this range via the |
Yeah, it seems like either way will be a little cluttered : / My thought was to make it an additional function parameter, say plot_decision_regions(X, y, clf,
feature_index=(0, 1),
filler_feature_dict={2: 1.2, 3: 0.3}) will just plot the decision regions. And if the user really wants to scatter the training data, they can then provide the width of the feature values they want to include plot_decision_regions(X, y, clf,
feature_index=(0, 1),
filler_feature_dict={2: 1.2, 3: 0.3},
filler_feature_width={2: 0.1, 3: 0.05}) Does this seem too complicated, or okay to you? |
That sounds okay. This way, it would do only sth for people who really want to -- and others don't have to worry about the additional param :P |
Sounds good to me! I'll go ahead and add the training set plotting to the PR. Also, I was thinking about dropping the |
Yeah, I somehow didn't think that it would require so many additions to the original function. I think dropping it would be fine, but it would be nice to have an example for plotting a grid of slices in the documentation (https://github.com/rasbt/mlxtend/blob/master/docs/sources/user_guide/plotting/plot_decision_regions.ipynb) then! Thanks for putting so much thought effort into it, I really appreciate it! :) |
In addition, this commit also: * Replaces several of existing checks with check_Xy in the utils module * Adds additional input validation
…2 training feaures.
…to be either a float or an array-like object. This allows for different x-axis and y-axis resolutions. * Adds a test * Adds two examples to plot_decision_regions.ipynb in the user guide * Updates the changelog accordingly
That series of commits should take care of the stuff we've been discussing. The last thing I wanted your opinion on was the name of the new function parameters. Do you think No problem! I think this is a really great project and I'm happy to help out however I can |
They sound all reasonable to me! However, what about I am looking forward to give the new code a try tomorrow to see how it works and if I have additional feedback :) -- looks great so far! |
Was looking through the code; this PR looks fantastic! Thanks a lot! Nice that it doesn't change the "previous" usage but just adds additional functionality that can come in handy for higher-dim datasets. I think the I am happy merge this now, and regarding the unit tests, mlxtend.plotting will be added back to the CI after merging #190 which addresses an issue in one of the other plotting functions. Thanks for the great work! |
Awesome! Thanks for all the comments and suggestions |
Description
Adds
plot_decision_region_slices
function to mlxtend/plotting/decision_regions.pyRelated issues or pull requests
See issue #188
Pull Request requirements
./mlxtend/*/tests
directoriesnosetests ./mlxtend -sv
and make sure that all unit tests passnosetests ./mlxtend --with-coverage
flake8 ./mlxtend
./docs/sources/CHANGELOG.md
filemlxtend/docs/sources/
(optional)