Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different results with plotly ternary vs python-ternary #140

Open
olgabot opened this issue Jun 17, 2020 · 8 comments
Open

Different results with plotly ternary vs python-ternary #140

olgabot opened this issue Jun 17, 2020 · 8 comments

Comments

@olgabot
Copy link

olgabot commented Jun 17, 2020

Hello,
Thank you so much for making this package! I'd like to overlay heatmap + scatter as mentioned in this issue: #129 and addressed in #121. However, I'm having trouble using the library.

I'm plotting median values of gene expression of a cell type across three species. When I use plotly, the result makes sense to me, where there are many dots in the middle, indicating many shared genes:

Screen Shot 2020-06-17 at 9 09 57 AM

However, when I use that same data for python-ternary, the result didn't make any sense to me. There's a bunch of points outside of the plot, and it's not clear what's happening to the rest. The code is in "Details" below.

Here is the code:

import ternary

def threewayplot(data, title=None):
    fig, tax = ternary.figure()
    tax.scatter(data.values.tolist())
    tax.set_title(title)
    tax.boundary(linewidth=1.0)
    
    corner_offset = 0.005
    tax.right_corner_label(data.columns[0], offset=corner_offset)
    tax.top_corner_label(data.columns[1], offset=0.12)
    tax.left_corner_label(data.columns[2], offset=corner_offset)
    tax.gridlines(color="blue", multiple=5)

    tax.ticks()

    tax.get_axes().axis('off')
    tax.clear_matplotlib_ticks()
    fig.tight_layout()
    
threewayplot(df_nonzero_for_ternary)

Screen Shot 2020-06-17 at 9 10 07 AM

I thought this was a simple rescaling issue and divided each column by the maximum so there were no values greater than 0, but this didn't replicate the results I saw in Plotly, and don't make sense to me as there are still dots outside of the plot, and the pattern doesn't match what I see in plotly:

Screen Shot 2020-06-17 at 9 10 16 AM

threewayplot(df_nonzero_for_ternary/df_nonzero_for_ternary.max())

Do you know what may be happening?

Here is the data for reference: medians.csv.txt

@marcharper
Copy link
Owner

So I can't speak to what Plotly is doing, but I think one difference is that ternary assumes that the coordinates to be plotted sum to a constant, in this case 1 by default. In fact ternary typically ignores the 3rd coordinate, assuming that z = scale - x - y. The data linked above doesn't have this property, so when the coordinates are projected to the planar simplex, they don't necessarily fall within the boundary triangle.

Since some of the data rows sum to more than 1, I presume that Plotly is doing some kind of truncation or normalization to keep plots in the simplex, or the meaning of a ternary scatter plot is different in their implementation.

@olgabot
Copy link
Author

olgabot commented Jun 18, 2020

So I can't speak to what Plotly is doing, but I think one difference is that ternary assumes that the coordinates to be plotted sum to a constant, in this case 1 by default. In fact ternary typically ignores the 3rd coordinate, assuming that z = scale - x - y. The data linked above doesn't have this property, so when the coordinates are projected to the planar simplex, they don't necessarily fall within the boundary triangle.

Since some of the data rows sum to more than 1, I presume that Plotly is doing some kind of truncation or normalization to keep plots in the simplex, or the meaning of a ternary scatter plot is different in their implementation.

So is the position being normalized per-row rather than per-column? I was confused that even when normalizing the data such that the maximum of each column was 1.

I renormalized the data so the rows sum to 1 and yay, it's working now, thank you so much!!

Screen Shot 2020-06-18 at 9 31 20 AM

I guess I assumed that normalization would happen within the program. Or potentially a check on the data to make sure the rows sum to 1. What do you think? I'd be happy to add it.

@olgabot olgabot closed this as completed Jun 18, 2020
@olgabot olgabot reopened this Jun 18, 2020
@marcharper
Copy link
Owner

marcharper commented Jun 19, 2020

It could be useful to check that the sum isn't equal to scale (and maybe that the values are all positive) warning the user if not. If you'd like to try to add it, feel free to open a PR! Maybe project_point is a good place to add the check, or a function in TernaryAxesSubplot that the other plotting functions can call when they receive data.

@cmacdonald
Copy link

It could be useful to check that the sum isn't equal to scale (and maybe that the values are all positive) warning the user if not. If you'd like to try to add it, feel free to open a PR! Maybe project_point is a good place to add the check, or a function in TernaryAxesSubplot that the other plotting functions can call when they receive data.

I think even having this stated in the README.md and in the introduction Jupyter notebook would be helpful. I liked the idea of a ternary notebook, but spent hours scratching my head before I found the "sum to constant" constraint mentioned on Wikipedia.

@marcharper
Copy link
Owner

Hi @cmacdonald, do you want to open a PR with a change to the readme where you would have liked a warning / statement? You should be able to do it easily through the github interface.

@ivan-marroquin
Copy link

Hi @marcharper and @cmacdonald

Thanks for the detailed discussion on the need to have the sum equal to scale on a row basis. I think that it will be beneficial to include such information in the main github page and documentation. As well to provide a nice example on how to perform the normalization.

In my case, I have a data set with following characteristics:
"A" --> magnitude in the thousands
"B" --> magnitude in the tens
"C" --> magnitude between [1,2]

So, I proceed in two steps: i) min max scaling normalization on each column, and ii) row normalization (as done by @olgabot) to produce a ternary plot

@marcharper
Copy link
Owner

marcharper commented Oct 16, 2021

Hi, thanks for the suggestions.

The wikipedia page on ternary plots explains the coordinates have to sum to a constant. There's a link to the wikipedia page on the top of the documentation.

This library just plots. There are many ways to normalize or otherwise transform data and the library doesn't know which methods the user wants. For almost any scenario, there exists example code on Stack Overflow and other sites that explain how to for example normalize a Pandas dataframe by row or column.

@ivan-marroquin
Copy link

@marcharper

Indeed, there are so many ways to normalize data sets and it is up to users to decide what is the best way. However, I do think that it will be very helpful to let know new users of this nice package that the coordinates must sum to a constant (either 1 or 100, or even something else). Including the link to wikipedia will be an extra benefit!

And again, many thanks for such great package!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants