Different results with plotly ternary vs python-ternary #140

olgabot · 2020-06-17T16:18:32Z

Hello,
Thank you so much for making this package! I'd like to overlay heatmap + scatter as mentioned in this issue: #129 and addressed in #121. However, I'm having trouble using the library.

I'm plotting median values of gene expression of a cell type across three species. When I use plotly, the result makes sense to me, where there are many dots in the middle, indicating many shared genes:

However, when I use that same data for python-ternary, the result didn't make any sense to me. There's a bunch of points outside of the plot, and it's not clear what's happening to the rest. The code is in "Details" below.

Here is the code:

import ternary

def threewayplot(data, title=None):
    fig, tax = ternary.figure()
    tax.scatter(data.values.tolist())
    tax.set_title(title)
    tax.boundary(linewidth=1.0)
    
    corner_offset = 0.005
    tax.right_corner_label(data.columns[0], offset=corner_offset)
    tax.top_corner_label(data.columns[1], offset=0.12)
    tax.left_corner_label(data.columns[2], offset=corner_offset)
    tax.gridlines(color="blue", multiple=5)

    tax.ticks()

    tax.get_axes().axis('off')
    tax.clear_matplotlib_ticks()
    fig.tight_layout()
    
threewayplot(df_nonzero_for_ternary)

I thought this was a simple rescaling issue and divided each column by the maximum so there were no values greater than 0, but this didn't replicate the results I saw in Plotly, and don't make sense to me as there are still dots outside of the plot, and the pattern doesn't match what I see in plotly:

threewayplot(df_nonzero_for_ternary/df_nonzero_for_ternary.max())

Do you know what may be happening?

Here is the data for reference: medians.csv.txt

The text was updated successfully, but these errors were encountered:

marcharper · 2020-06-18T03:23:24Z

So I can't speak to what Plotly is doing, but I think one difference is that ternary assumes that the coordinates to be plotted sum to a constant, in this case 1 by default. In fact ternary typically ignores the 3rd coordinate, assuming that z = scale - x - y. The data linked above doesn't have this property, so when the coordinates are projected to the planar simplex, they don't necessarily fall within the boundary triangle.

Since some of the data rows sum to more than 1, I presume that Plotly is doing some kind of truncation or normalization to keep plots in the simplex, or the meaning of a ternary scatter plot is different in their implementation.

olgabot · 2020-06-18T16:33:32Z

So I can't speak to what Plotly is doing, but I think one difference is that ternary assumes that the coordinates to be plotted sum to a constant, in this case 1 by default. In fact ternary typically ignores the 3rd coordinate, assuming that z = scale - x - y. The data linked above doesn't have this property, so when the coordinates are projected to the planar simplex, they don't necessarily fall within the boundary triangle.

Since some of the data rows sum to more than 1, I presume that Plotly is doing some kind of truncation or normalization to keep plots in the simplex, or the meaning of a ternary scatter plot is different in their implementation.

So is the position being normalized per-row rather than per-column? I was confused that even when normalizing the data such that the maximum of each column was 1.

I renormalized the data so the rows sum to 1 and yay, it's working now, thank you so much!!

I guess I assumed that normalization would happen within the program. Or potentially a check on the data to make sure the rows sum to 1. What do you think? I'd be happy to add it.

marcharper · 2020-06-19T04:23:34Z

It could be useful to check that the sum isn't equal to scale (and maybe that the values are all positive) warning the user if not. If you'd like to try to add it, feel free to open a PR! Maybe project_point is a good place to add the check, or a function in TernaryAxesSubplot that the other plotting functions can call when they receive data.

cmacdonald · 2021-01-09T15:50:23Z

It could be useful to check that the sum isn't equal to scale (and maybe that the values are all positive) warning the user if not. If you'd like to try to add it, feel free to open a PR! Maybe project_point is a good place to add the check, or a function in TernaryAxesSubplot that the other plotting functions can call when they receive data.

I think even having this stated in the README.md and in the introduction Jupyter notebook would be helpful. I liked the idea of a ternary notebook, but spent hours scratching my head before I found the "sum to constant" constraint mentioned on Wikipedia.

marcharper · 2021-01-12T03:32:42Z

Hi @cmacdonald, do you want to open a PR with a change to the readme where you would have liked a warning / statement? You should be able to do it easily through the github interface.

ivan-marroquin · 2021-10-13T16:22:17Z

Hi @marcharper and @cmacdonald

Thanks for the detailed discussion on the need to have the sum equal to scale on a row basis. I think that it will be beneficial to include such information in the main github page and documentation. As well to provide a nice example on how to perform the normalization.

In my case, I have a data set with following characteristics:
"A" --> magnitude in the thousands
"B" --> magnitude in the tens
"C" --> magnitude between [1,2]

So, I proceed in two steps: i) min max scaling normalization on each column, and ii) row normalization (as done by @olgabot) to produce a ternary plot

marcharper · 2021-10-16T16:46:52Z

Hi, thanks for the suggestions.

The wikipedia page on ternary plots explains the coordinates have to sum to a constant. There's a link to the wikipedia page on the top of the documentation.

This library just plots. There are many ways to normalize or otherwise transform data and the library doesn't know which methods the user wants. For almost any scenario, there exists example code on Stack Overflow and other sites that explain how to for example normalize a Pandas dataframe by row or column.

ivan-marroquin · 2021-10-16T18:43:45Z

@marcharper

Indeed, there are so many ways to normalize data sets and it is up to users to decide what is the best way. However, I do think that it will be very helpful to let know new users of this nice package that the coordinates must sum to a constant (either 1 or 100, or even something else). Including the link to wikipedia will be an extra benefit!

And again, many thanks for such great package!

olgabot closed this as completed Jun 18, 2020

olgabot reopened this Jun 18, 2020

marcharper mentioned this issue Oct 17, 2021

Add warning if input data doesn't sum to a constant #183

Open

fairliereese mentioned this issue Mar 17, 2022

ternary does not seem to use third axis value when plotting lines #187

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Different results with plotly ternary vs python-ternary #140

Different results with plotly ternary vs python-ternary #140

olgabot commented Jun 17, 2020 •

edited

Loading

marcharper commented Jun 18, 2020

olgabot commented Jun 18, 2020

marcharper commented Jun 19, 2020 •

edited

Loading

cmacdonald commented Jan 9, 2021

marcharper commented Jan 12, 2021

ivan-marroquin commented Oct 13, 2021

marcharper commented Oct 16, 2021 •

edited

Loading

ivan-marroquin commented Oct 16, 2021

Different results with plotly ternary vs python-ternary #140

Different results with plotly ternary vs python-ternary #140

Comments

olgabot commented Jun 17, 2020 • edited Loading

marcharper commented Jun 18, 2020

olgabot commented Jun 18, 2020

marcharper commented Jun 19, 2020 • edited Loading

cmacdonald commented Jan 9, 2021

marcharper commented Jan 12, 2021

ivan-marroquin commented Oct 13, 2021

marcharper commented Oct 16, 2021 • edited Loading

ivan-marroquin commented Oct 16, 2021

olgabot commented Jun 17, 2020 •

edited

Loading

marcharper commented Jun 19, 2020 •

edited

Loading

marcharper commented Oct 16, 2021 •

edited

Loading