-
Notifications
You must be signed in to change notification settings - Fork 281
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/pca 2d projection #75
Feature/pca 2d projection #75
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great. Thanks a lot!
Just a few minor stylefixes needed, otherwise LGTM. Maybe you could try using a Python linter with whatever IDE you're using.
scikitplot/decomposition.py
Outdated
title_fontsize="large", text_fontsize="medium"): | ||
figsize=None, cmap='Spectral', | ||
title_fontsize="large", text_fontsize="medium", | ||
biplot=False, feature_labels=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These new arguments should be before ax
scikitplot/decomposition.py
Outdated
|
||
feature_labels (array-like, shape (n_classes), optional): List of labels | ||
that represent each feature of X. It's index position must also be | ||
relative to the features. If none is given, then labels will be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If ``None``
scikitplot/decomposition.py
Outdated
biplots. If false, the biplots are not generated. | ||
|
||
feature_labels (array-like, shape (n_classes), optional): List of labels | ||
that represent each feature of X. It's index position must also be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its*
scikitplot/decomposition.py
Outdated
@@ -156,14 +166,36 @@ def plot_pca_2d_projection(clf, X, y, title='PCA 2-D Projection', ax=None, | |||
classes = np.unique(np.array(y)) | |||
|
|||
colors = plt.cm.get_cmap(cmap)(np.linspace(0, 1, len(classes))) | |||
|
|||
|
|||
vectors=np.transpose(clf.components_[0:2, :]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style: Add spaces around equals sign.
Also, not sure about this but can 0:2
be written as simply :2
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sry, Bad habit on indexing, I'll fix that.
scikitplot/decomposition.py
Outdated
vectors=np.transpose(clf.components_[0:2, :]) | ||
|
||
xs = transformed_X[:,0] | ||
ys = transformed_X[:,1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style: space after comma
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vectors
, xs
, and ys
should be evaluated within the if biplot:
block since they're only used there.
scikitplot/decomposition.py
Outdated
xs = transformed_X[:,0] | ||
ys = transformed_X[:,1] | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style: Remove one newline
scikitplot/decomposition.py
Outdated
if biplot: | ||
for i in range(vectors.shape[0]): | ||
ax.annotate("", xy=(vectors[i,0]*max(xs), | ||
vectors[i,1]*max(ys)), xycoords='data', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style: space after comma
scikitplot/decomposition.py
Outdated
|
||
if feature_labels is None: | ||
plt.text(vectors[i,0]*max(xs)*1.05, vectors[i,1]*max(ys)*1.05, | ||
"Variable"+str(i), color='b', fontsize=text_fontsize) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style: Space after comma, space around operators
scikitplot/decomposition.py
Outdated
"Variable"+str(i), color='b', fontsize=text_fontsize) | ||
else: | ||
plt.text(vectors[i,0]*max(xs)*1.05, vectors[i,1]*max(ys)*1.05, | ||
feature_labels[i], color='b', fontsize=text_fontsize) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the if-else branch here should only set a single variable feature_label
since the rest of the calculations are duplicated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually this is cleaner
plt.text(vectors[i,0]*max(xs)*1.05, vectors[i,1]*max(ys)*1.05,
feature_labels[i] if feature_labels else "Variable" + str(i), color='b', fontsize=text_fontsize)
scikitplot/decomposition.py
Outdated
ax.legend(loc='best', shadow=False, scatterpoints=1, | ||
fontsize=text_fontsize) | ||
ax.set_xlabel('First Principal Component', fontsize=text_fontsize) | ||
ax.set_ylabel('Second Principal Component', fontsize=text_fontsize) | ||
ax.tick_params(labelsize=text_fontsize) | ||
|
||
return ax | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should only have single newline at end of file
First time I used pylint, solid Haha. Just let me know if I missed anything. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for dragging this on, but could you add a test that sets the new parameters added by this PR and make sure it doesn't raise any exceptions?
scikitplot/decomposition.py
Outdated
xs = transformed_X[:, 0] | ||
ys = transformed_X[:, 1] | ||
for i in range(vectors.shape[0]): | ||
ax.annotate("", xy=(vectors[i, 0]*max(xs), vectors[i, 1] * max(ys)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe you could do this calculation outside the for loop since it's duplicated e.g.
vectors = np.transpose(clf.components_[:2, :])
xs = transformed_X[:, 0]
ys = transformed_X[:, 1]
vectors_scaled = vectors * [max(xs), max(ys)]
Then inside the for loop:
ax.annotate("", xy=vectors_scaled[i], ...
Btw I'm not sure what vectors_scaled
actually is so you can probably name it something else.
Also, haven't tested this code so might have made a mistake somewhere. Idea is to take advantage of numpy
's matrix multiplication instead of multiplying inside a Python for loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, I can work on a test.
You are absolutely correct, I should have considered matrix multiplication.
vectors_scaled
seems like an appropriate enough name. Its simply a multiplier to scale the biplot vectors so that they may be better visualized in the pca plane.
scikitplot/decomposition.py
Outdated
xycoords='data', xytext=(0, 0), textcoords='data', | ||
arrowprops={'arrowstyle': '-|>', 'ec': 'r'}) | ||
|
||
plt.text(vectors[i, 0] * max(xs) * 1.05, vectors[i, 1] * max(ys) * 1.05, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then here you could write vectors_scaled[i, 0] * 1.05, vectors_scaled[i, 1] * 1.05
Merged. Thanks a lot 👍 |
addressed enhancement on #21.
Also updated the pca_2d_projection notebook example to show biplots.