Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/pca 2d projection #75

Merged
merged 6 commits into from
Jan 10, 2018

Conversation

ryanliwag
Copy link
Contributor

addressed enhancement on #21.
Also updated the pca_2d_projection notebook example to show biplots.

Copy link
Owner

@reiinakano reiinakano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great. Thanks a lot!

Just a few minor stylefixes needed, otherwise LGTM. Maybe you could try using a Python linter with whatever IDE you're using.

title_fontsize="large", text_fontsize="medium"):
figsize=None, cmap='Spectral',
title_fontsize="large", text_fontsize="medium",
biplot=False, feature_labels=None):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These new arguments should be before ax


feature_labels (array-like, shape (n_classes), optional): List of labels
that represent each feature of X. It's index position must also be
relative to the features. If none is given, then labels will be
Copy link
Owner

@reiinakano reiinakano Jan 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If ``None``

biplots. If false, the biplots are not generated.

feature_labels (array-like, shape (n_classes), optional): List of labels
that represent each feature of X. It's index position must also be
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its*

@@ -156,14 +166,36 @@ def plot_pca_2d_projection(clf, X, y, title='PCA 2-D Projection', ax=None,
classes = np.unique(np.array(y))

colors = plt.cm.get_cmap(cmap)(np.linspace(0, 1, len(classes)))


vectors=np.transpose(clf.components_[0:2, :])
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style: Add spaces around equals sign.

Also, not sure about this but can 0:2 be written as simply :2?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sry, Bad habit on indexing, I'll fix that.

vectors=np.transpose(clf.components_[0:2, :])

xs = transformed_X[:,0]
ys = transformed_X[:,1]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style: space after comma

Copy link
Owner

@reiinakano reiinakano Jan 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

vectors, xs, and ys should be evaluated within the if biplot: block since they're only used there.

xs = transformed_X[:,0]
ys = transformed_X[:,1]


Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style: Remove one newline

if biplot:
for i in range(vectors.shape[0]):
ax.annotate("", xy=(vectors[i,0]*max(xs),
vectors[i,1]*max(ys)), xycoords='data',
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style: space after comma


if feature_labels is None:
plt.text(vectors[i,0]*max(xs)*1.05, vectors[i,1]*max(ys)*1.05,
"Variable"+str(i), color='b', fontsize=text_fontsize)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style: Space after comma, space around operators

"Variable"+str(i), color='b', fontsize=text_fontsize)
else:
plt.text(vectors[i,0]*max(xs)*1.05, vectors[i,1]*max(ys)*1.05,
feature_labels[i], color='b', fontsize=text_fontsize)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the if-else branch here should only set a single variable feature_label since the rest of the calculations are duplicated.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually this is cleaner

plt.text(vectors[i,0]*max(xs)*1.05, vectors[i,1]*max(ys)*1.05,
feature_labels[i] if feature_labels else "Variable" + str(i), color='b', fontsize=text_fontsize)

ax.legend(loc='best', shadow=False, scatterpoints=1,
fontsize=text_fontsize)
ax.set_xlabel('First Principal Component', fontsize=text_fontsize)
ax.set_ylabel('Second Principal Component', fontsize=text_fontsize)
ax.tick_params(labelsize=text_fontsize)

return ax

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should only have single newline at end of file

@ryanliwag
Copy link
Contributor Author

First time I used pylint, solid Haha. Just let me know if I missed anything.

Copy link
Owner

@reiinakano reiinakano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for dragging this on, but could you add a test that sets the new parameters added by this PR and make sure it doesn't raise any exceptions?

xs = transformed_X[:, 0]
ys = transformed_X[:, 1]
for i in range(vectors.shape[0]):
ax.annotate("", xy=(vectors[i, 0]*max(xs), vectors[i, 1] * max(ys)),
Copy link
Owner

@reiinakano reiinakano Jan 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you could do this calculation outside the for loop since it's duplicated e.g.

vectors = np.transpose(clf.components_[:2, :])
xs = transformed_X[:, 0]
ys = transformed_X[:, 1]
vectors_scaled = vectors * [max(xs), max(ys)]

Then inside the for loop:

ax.annotate("",  xy=vectors_scaled[i], ...

Btw I'm not sure what vectors_scaled actually is so you can probably name it something else.

Also, haven't tested this code so might have made a mistake somewhere. Idea is to take advantage of numpy's matrix multiplication instead of multiplying inside a Python for loop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I can work on a test.

You are absolutely correct, I should have considered matrix multiplication.
vectors_scaled seems like an appropriate enough name. Its simply a multiplier to scale the biplot vectors so that they may be better visualized in the pca plane.

xycoords='data', xytext=(0, 0), textcoords='data',
arrowprops={'arrowstyle': '-|>', 'ec': 'r'})

plt.text(vectors[i, 0] * max(xs) * 1.05, vectors[i, 1] * max(ys) * 1.05,
Copy link
Owner

@reiinakano reiinakano Jan 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then here you could write vectors_scaled[i, 0] * 1.05, vectors_scaled[i, 1] * 1.05

@reiinakano reiinakano merged commit cba009f into reiinakano:master Jan 10, 2018
@reiinakano
Copy link
Owner

Merged. Thanks a lot 👍

@reiinakano reiinakano mentioned this pull request Jan 10, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants