-
Notifications
You must be signed in to change notification settings - Fork 855
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added sparse option to onehot transform #328
Conversation
Hello @jaksmid! Thanks for updating the PR. Cheers ! There are no PEP8 issues in this Pull Request. 🍻 Comment last updated on February 17, 2018 at 10:15 Hours UTC |
Hello @rasbt , please check if the API changes are to your liking. Wasnt sure about two possible format outcomes in the Also not sure about the documentation notebooks though. Should I change them? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, just a few minor suggestions :)
mlxtend/preprocessing/onehot.py
Outdated
Returns | ||
------------ | ||
onehot : NumPy array [n_transactions, n_unique_items] | ||
The NumPy one-hot encoded integer array of the input transactions, | ||
onehot : NumPy array [n_transactions, n_unique_items] If not sparse |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest changing
"If not sparse"
to
"if sparse=False
(default)
and SciPy Compressed Sparse Row (CSR) matrix, otherwise
mlxtend/preprocessing/onehot.py
Outdated
@@ -119,12 +120,19 @@ def transform(self, X): | |||
['Milk', 'Beer', 'Rice'], | |||
['Milk', 'Beer'], | |||
['Apple', 'Bananas']] | |||
|
|||
sparse: bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggested change
sparse: bool -> sparse: bool (default=False)
oht = OnehotTransactions() | ||
oht.fit(dataset) | ||
trans = oht.transform(dataset, sparse=True) | ||
np.testing.assert_array_equal(expect, trans.todense()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here, for readability, adding a line
assert (isinstance(trans, csr_matrix))
wouldn't hurt :)
docs/sources/CHANGELOG.md
Outdated
@@ -32,6 +32,7 @@ The CHANGELOG for the current development version is available at | |||
- Raises an informative error message if `predict` or `predict_meta_features` is called prior to calling the `fit` method in `StackingRegressor` and `StackingCVRegressor`. ([#315](https://github.com/rasbt/mlxtend/issues/315)) | |||
- The `plot_decision_regions` function now automatically determines the optimal setting based on the feature dimensions and supports anti-aliasing. The old `res` parameter has been deprecated. ([#309](https://github.com/rasbt/mlxtend/pull/309) by [Guillaume Poirier-Morency](https://github.com/arteymix)) | |||
- Apriori code is faster due to optimization in `onehot transformation` and the amount of candidates generated by the `apriori` algorithm. ([#327](https://github.com/rasbt/mlxtend/pull/327) by [Jakub Smid](https://github.com/jaksmid)) | |||
- `onehot transform` can be now called with `sparse` argument resulting in the sparse representation of the `onehot` matrix |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can append
(#328 by Jakub Smid)
Thanks for the code review @rasbt . Incorporated your suggestions. What about the documentation in the notebooks though? Should I address that as well? |
Good point. If you would like to add an example to the Jupyter Notebook, that would be great (but not 100% necessary). To update the API documentation in the notebooks, you need to go to mlxtend/docs and execute
in this directory. It will then update/generate the markdown documentation files. After that's completed, you can open the jupyter notebook and just rerun the last section of the notebook. As a note, these markdown files are not added to the version control in GitHub because of redundancy (i.e., the API doc markdown gets included in the notebook at the bottom), and the markdown generated from the notebook for MKdocs has the same content as the notebook (so, for space reasons, only the notebooks are currently in version control on GitHub). |
Will do. Was also thinking about changing the |
* Onehot uses bools
Hello @rasbt , updated the doc, added a Looking forward to the feedback. |
Oh, I somehow assumed we were talking about the OneHotTransaction class regarding the Jupyter notebook documentation. But yeah, modifying the apriori documentation is even better :). However, it would maybe be nice to update the OneHotTransaction documentation as well (and then add a note about why it's using bool by default, and maybe showing an Overall, this PR is great, and it was a good idea to use sparse representations for this! |
Just a little suggestion, but can you modify the Changelog message a little bit to illustrate that the apriori func also "improves"? E.g., changing the original
to The Below is a rawtext version for convenience :)
|
Not a problem at all. I tried to find all places that needs/deserves an update, but as I am not that familiar with the whole codebase, it may well happen that I unintentionally overlook something. Really appreciate your suggestions. |
Looks great, thanks a lot! Will merge it now, thanks for the PR! |
Added option to create sparse one hot matrices on
onehot transform
Should mitigate issue #298