Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Clustergram when passing labels. #492

Merged
merged 16 commits into from
Mar 16, 2020
Merged

Fix Clustergram when passing labels. #492

merged 16 commits into from
Mar 16, 2020

Conversation

mkcor
Copy link
Contributor

@mkcor mkcor commented Feb 28, 2020

Fixes #449

About

  • I am closing an issue
  • This is a new component
  • I am adding a feature to an existing component, or improving an existing feature

Description of changes

When clustering along rows (resp. columns), the bug would appear for non-unique row (resp. column) labels mixed with non-unique rows (resp. columns). For example, it would get unnoticed when calling Clustergram(data, row_labels=row_labels) if

> data
array([[1, 1, 1, 1],
       [3, 3, 3, 3],
       [1, 1, 1, 1],
       [3, 3, 3, 3],
       [1, 1, 1, 1],
       [3, 3, 3, 3]])

and

> row_labels
['a', 'b', 'a', 'b', 'a', 'b']

but the clustering would fail if, instead,

> row_labels
['a', 'b', 'b', 'b', 'b', 'b']

(as used in the unit tests), leading to the following (wrong) clustering:

array([[3, 3, 3, 3],
       [1, 1, 1, 1],
       [3, 3, 3, 3],
       [3, 3, 3, 3],
       [3, 3, 3, 3],
       [3, 3, 3, 3]])

Apparently I forgot to push after I committed 3c83048; I wanted to showcase the (wrong) clustering result (along columns):

ipdb> curves_dict['heatmap']['z']                                                                         
array([[3, 1, 3, 3, 3, 3],
       [3, 1, 3, 3, 3, 3],
       [3, 1, 3, 3, 3, 3],
       [3, 1, 3, 3, 3, 3]])
ipdb> clustered_data                                                                                      
array([[1, 1, 1, 3, 3, 3],
       [1, 1, 1, 3, 3, 3],
       [1, 1, 1, 3, 3, 3],
       [1, 1, 1, 3, 3, 3]])

before implementing the fix (f23f7ed).

@mkcor mkcor force-pushed the clustergram-row-labels branch 2 times, most recently from 5575e18 to c9a8cf4 Compare February 29, 2020 18:03
@mkcor
Copy link
Contributor Author

mkcor commented Feb 29, 2020

I noticed that we didn't have any integration test for prop hidden_labels so I added one with cf5c66b. While running the Clustergram integration tests locally, I noticed that (row or column) labels would not show up in the (testing) 'simple app' although I passed them via, e.g.,

dash_bio.Clustergram(
    data=data,
    row_labels=row_labels,
    column_labels=col_labels
)

e0a60eb fixed it.

With the latter change, I also meant to echo what @joelostblom suggested at some point, i.e., (whenever input data is a dataframe) we should use the dataframe's labels by default: Say df is the input dataframe, df.index (resp. df.columns) get passed as row_labels (resp. column_labels) by default.

Note that, in this spirit, the Clustergram demo app uses prop/parameter row_labels_source (which is not readily available in the Clustergram component).

Finally, this change (e0a60eb) is 'non-breaking' even if it lays the ground for a new behaviour. To give all the details, it does not affect the current rendering in terms of not showing row (resp. column) labels whenever row_labels (resp. column_labels) is not specified, even if hidden_labels=['row'] (resp. hidden_labels=['col']) is not specified either. Indeed:

Copy link
Contributor

@shammamah-zz shammamah-zz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a CHANGELOG entry and update the version. Other than that, this looks good - thanks!

@mkcor
Copy link
Contributor Author

mkcor commented Mar 12, 2020

I'm not sure which version I should update to: The latest changes to CHANGELOG by @Marc-Andre-Rivet introduced a ## [Unreleased] section with two changes, so I'm not sure the next version should be patch (## [0.4.8] - ) or minor (## [0.5.0] - ).

@mkcor
Copy link
Contributor Author

mkcor commented Mar 13, 2020

@shammamah I added 1fc32b4 assuming the build would be fine and we could release today, but apparently it's not the case! I went like

npm ci
npm run build
npm i  # to update package-lock.json
npm audit fix  # following npm's suggestion

@Marc-Andre-Rivet would you mind having a look? Thanks!

@mkcor
Copy link
Contributor Author

mkcor commented Mar 13, 2020

By the way, what happened to the CI visual (Percy) check?

@shammamah-zz
Copy link
Contributor

@mkcor

There is nothing in package-lock.json that should be updated in this PR (other than the version number), since it's updating a Python component. Please refer to the contributing guidelines: npm i should only be run when installing a new npm package (which you would do if working on a React component).

Please revert the changes and only run npm ci; npm run build after updating package.json.

@mkcor
Copy link
Contributor Author

mkcor commented Mar 13, 2020

Thanks for the quick response, @shammamah ! The changes to package-lock.json other than the version number come from running npm audit fix indeed.

But the version number change for dash-bio in package-lock.json was obtained from running npm i -- how else can we achieve the package version update in package-lock.json?

@shammamah-zz
Copy link
Contributor

@mkcor If you just run npm i after npm ci; npm run build it should update package-lock.json in the way that we would expect (just the version of the package itself gets changed)! I think the only issue here was the npm audit fix.

Re: the tests, we're seeing something similar in other Dash repos (e.g., here). This might have to do with the pytest-related changes made in plotly/dash#1151.

@mkcor
Copy link
Contributor Author

mkcor commented Mar 13, 2020

@shammamah exactly, that's what I did and reported. So I am allowed to run npm i? ^^

@mkcor
Copy link
Contributor Author

mkcor commented Mar 13, 2020

Done. It's just a little stressful to let

found 37991 vulnerabilities (37850 low, 136 moderate, 5 high)
  run `npm audit fix` to fix them, or `npm audit` for details

be, with 'high' printed in red... :s

Thanks for pointing me to related Python test issues!

@mkcor
Copy link
Contributor Author

mkcor commented Mar 16, 2020

@shammamah @Marc-Andre-Rivet please review/approve: Could we release today? Thanks!

@shammamah-zz shammamah-zz self-requested a review March 16, 2020 19:05
Copy link
Contributor

@shammamah-zz shammamah-zz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💃 Thank you for taking care of this! :)

@mkcor mkcor merged commit c920884 into master Mar 16, 2020
@mkcor mkcor deleted the clustergram-row-labels branch March 16, 2020 19:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Clustergram row labels change heatmap values/colors
2 participants