Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding orders of magnitude to fractional values in plot_bar_stacked #43

Closed
magruca opened this issue Nov 27, 2018 · 6 comments
Closed

Comments

@magruca
Copy link

magruca commented Nov 27, 2018

Is this a BUG REPORT or FEATURE REQUEST?:

Uncomment only one, leave it on its own line:

type: bug

type: feature

Environment:

  • Chartify version(s): beta (pip install)
  • Operating System(s): Linux 4.19.3-300.fc29.x86_64 x86_64
  • Python version(s): 3.7

What happened:
I'm getting an extra two orders of magnitude (or more if I set a floating point on the tick value) for values with floating points in stacked bar charts.

What you expected to happen:
When setting the y axis tick values to '0%", I should get percentage values ranging from 0-100%.

How to reproduce it (as minimally and precisely as possible):
Plot all chromosome data using stacked bar plot

RdGy = chartify.color_palettes['RdGy']
shifted_RdGy = RdGy.shift_palette('black', percent=20)
shifted_RdGy.show()

(chartify.Chart(blank_labels=True,
x_axis_type='categorical')
.style.set_color_palette('diverging', palette=shifted_RdGy)
.plot.bar_stacked(
data_frame=results,
categorical_columns='#ID',
categorical_order_by='labels',
categorical_order_ascending=True,
numeric_column='Covered_percent',
stack_column='sample',
normalize=False)
.set_legend_location('outside_right', orientation='vertical')
.axes.set_yaxis_tick_format('0.0%')
.axes.set_xaxis_tick_orientation('vertical')
.show('png'))

Anything else we need to know?:
Included the df and ipynb if you want to replicate.
Allen_concatData.txt
pileup.zip

@cphalpert
Copy link
Contributor

Hi @magruca

It looks like percent formatting is working as expected (see screenshot below).

Percents are expected to be expressed as decimals with 0.0 - 1.0 mapping to 0% to 100%. E.g. 0.1 will be displayed as 10%.

In your case you can divide the Covered percent by 100 to get it to work.
results['Covered_percent'] = results['Covered_percent'] / 100.

image

@magruca
Copy link
Author

magruca commented Nov 28, 2018

The problem I'm having isn't with the normal bar plot but with the stacked bar plot. Your explanation of the percent function makes sense, and the base functions might be the same between both plots -- I haven't looked into it -- but with or without the percent, I'm getting an extra couple orders of magnitude:

image

Code (same as above, just removing the axis tick format):

(chartify.Chart(blank_labels=True,
                x_axis_type='categorical')
 .style.set_color_palette('diverging', palette=shifted_RdGy)
 .plot.bar_stacked(
     data_frame=results,
     categorical_columns='#ID',
     categorical_order_by='labels',
     categorical_order_ascending=True,
     numeric_column='Covered_percent',
     stack_column='sample',
     normalize=False)
 .set_legend_location('outside_right', orientation='vertical')
 .axes.set_xaxis_tick_orientation('vertical')
 .show('png'))

with Covered_percent data formatted as such:

6.91
5.53
5.38
4.03
4.89

I do not have the same issue with a standard bar plot:

image

RdGy = chartify.color_palettes['RdGy']
shifted_RdGy = RdGy.shift_palette('black', percent=20)
shifted_RdGy.show()

color_order = [
        'SRR1105736', 'SRR1224574', 'SRR1105738', 'SRR1105739', 'SRR1105740', 'SRR1105737', 'SRR1105741', 'SRR1224573'
    ]
sample_order = [
        'SRR1105736', 'SRR1224574', 'SRR1105738', 'SRR1105739', 'SRR1105740', 'SRR1105737', 'SRR1105741', 'SRR1224573'
    ]

ch = chartify.Chart(blank_labels=True, y_axis_type='categorical')
ch.style.set_color_palette('sequential', palette=shifted_RdGy)
ch.axes.set_yaxis_label('Sample ID')
ch.axes.set_xaxis_label('Chr1 Coverage (%)')
ch.plot.bar(
    data_frame=results,
    categorical_columns='sample',
    numeric_column='Covered_percent',
    categorical_order_by=sample_order,
    categorical_order_ascending=False,
    color_column='sample',
    color_order=color_order)
ch.axes.set_xaxis_tick_format('0%')
ch.show('png')

Thanks for the help!

@cphalpert
Copy link
Contributor

Are you sure that it's not an issue with the input data?

Summing over ID in your sample data gives me totals in the 100-200 range, which is consistent with what's shown in the stacked bar graph that you shared:

results.groupby('#ID')['Covered_percent'].sum()
image

@magruca
Copy link
Author

magruca commented Nov 28, 2018

Ah I see, thank you! So in my mind I was imaging a stacked bar plot with each categorical value having a max numerical value based on the largest sample value (e.g 32% in this case for chr1), rather than summing the values from each sample, then extending past each other as if they were stacked in order behind each other on the z-axis.

@cphalpert
Copy link
Contributor

You could use this if the order of the largest sample is consistent across #ID:

sample_order = results.groupby('sample')['Covered_percent'].max().sort_values().index
xaxis_order = ['chr' + str(number) for number in range(1, 21)] + ['chrX', 'chrY']

results = results.sort_values(['#ID', 'Covered_percent'])
results['incremental_percent'] = results.groupby('#ID')['Covered_percent'].apply(lambda x: x - x.shift().fillna(0))

# Plot all chromosome data using stacked bar plot

RdGy = chartify.color_palettes['RdGy']
shifted_RdGy = RdGy.shift_palette('black', percent=20)
shifted_RdGy.show()

ch = (chartify.Chart(blank_labels=True,
                x_axis_type='categorical')
 .style.set_color_palette('diverging', palette=shifted_RdGy)
     )

ch.plot.bar_stacked(
     data_frame=results,
     categorical_columns='#ID',
     categorical_order_by=xaxis_order,
     numeric_column='incremental_percent',
     stack_column='sample',
     stack_order=sample_order,
     normalize=False)
ch.set_legend_location('outside_right', orientation='vertical')
# ch.axes.set_yaxis_tick_format('0%')
ch.axes.set_xaxis_tick_orientation('vertical')
ch.show()

If the order changes I'd try something like:


RdGy = chartify.color_palettes['RdGy']
shifted_RdGy = RdGy.shift_palette('black', percent=20)
shifted_RdGy.show()

ch = (chartify.Chart(blank_labels=True,
                x_axis_type='categorical')
 .style.set_color_palette('diverging', palette=shifted_RdGy)
     )

ch.plot.parallel(
     data_frame=results,
     categorical_columns='#ID',
     categorical_order_by=xaxis_order,
     numeric_column='Covered_percent',
     color_column='sample',)
ch.set_legend_location('outside_right', orientation='vertical')
# ch.axes.set_yaxis_tick_format('0%')
ch.axes.set_xaxis_tick_orientation('vertical')
ch.show()```

@magruca
Copy link
Author

magruca commented Nov 28, 2018

Gotcha that makes sense -- thanks for the help! Sorry for the confusion on my end.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants