Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plot GC content from user provided dataframe (.tsv) #21

Closed
LFelipe-B opened this issue Apr 12, 2023 · 2 comments
Closed

Plot GC content from user provided dataframe (.tsv) #21

LFelipe-B opened this issue Apr 12, 2023 · 2 comments
Labels
question Further information is requested

Comments

@LFelipe-B
Copy link

Dear moshi4, wish to thank you for the earlier implementation of using gff multiple contig file as input from the user, it was really helpful. Can I ask now for assistance on how to plot GC content from an uploaded table (or other type of similar information) as colorbar (or lines for sliding window for ie. as a separate track? The examples only show the calculations from .gbk files and I don't have these type of file, only .gffs. I think it would be useful for the whole community, since your tool is great and really user friendly!

I assume that here in the colorbar example: https://moshi4.github.io/pyCirclize/plot_tips/ user could provide the dataframe in the section matrix1 = np.random.randint(vmin1, vmax1, (5, 100)) ?

And for the lines I am not sure. Another idea that I had was to provide the GC value just for the CDS and map the prot_ID to the specific value, but no clue on how add this data to a specific track. Hope I am clear, and sorry if this is a really trivial problem for python users. Thank you!

@moshi4
Copy link
Owner

moshi4 commented Apr 13, 2023

Here is an example implementation code for your question below.

  • How to plot GC content from a file
  • How to plot GC content with fill_between, heatmap(with colorbar)
from pycirclize import Circos
import pandas
from pycirclize.parser import Genbank, Gff
from pycirclize.utils import load_prokaryote_example_file

# Generate GC content example file (gc_content.tsv)
gbk_file = load_prokaryote_example_file("enterobacteria_phage.gbk")
gbk = Genbank(gbk_file)
pos_list, gc_contents = gbk.calc_gc_content()
gc_content_df = pandas.DataFrame({"Position": pos_list, "GCcontent": gc_contents})
gc_content_tsv_file = "gc_content.tsv"
gc_content_df.to_csv(gc_content_tsv_file, sep="\t", index=False)

# Plot genomic features & GC content (from GFF & GC content file)
gff_file = load_prokaryote_example_file("enterobacteria_phage.gff")
gff = Gff(gff_file)

circos = Circos(sectors={gff.name: gff.range_size}, start=0, end=320)
sector = circos.sectors[0]

outer_track = sector.add_track(r_lim=(100, 100))
outer_track.xticks_by_interval(5000, label_formatter=lambda v: f"{v / 1000:.0f} Kb", show_bottom_line=True)
f_cds_track = sector.add_track(r_lim=(94, 98), r_pad_ratio=0.1)
f_cds_track.genomic_features(gff.extract_features(target_strand=1), fc="tomato")
r_cds_track = sector.add_track(r_lim=(90, 94), r_pad_ratio=0.1)
r_cds_track.genomic_features(gff.extract_features(target_strand=-1), fc="skyblue")

df = pandas.read_csv(gc_content_tsv_file, sep="\t")
pos_list, gc_contents = df["Position"].to_numpy(), df["GCcontent"].to_numpy()
max_gc, min_gc = max(gc_contents), min(gc_contents)

gc_fill_track = sector.add_track(r_lim=(75, 85))
gc_fill_track.grid()
gc_fill_track.fill_between(pos_list, gc_contents, y2=min_gc, vmin=min_gc, color="lightgrey")
yticks = [min_gc, max_gc]
yticks_label = [f"{y:.1f}" for y in yticks]
gc_fill_track.yticks(yticks, yticks_label, vmin=min_gc, label_size=6, side="left")

gc_heatmap_track = sector.add_track(r_lim=(65, 70))
gc_heatmap_track.heatmap(gc_contents.reshape(1, -1), cmap="bwr")
circos.colorbar((0.3, 0.5, 0.4, 0.02), vmin=min_gc, vmax=max_gc, cmap="bwr", orientation="horizontal", colorbar_kws=dict(label="GC Content (%)"))

circos.text("Forward CDS ", r=96, ha="right", size=8)
circos.text("Reverse CDS ", r=92, ha="right", size=8)
circos.text("GC content (fill_between) ", r=80, ha="right", size=8)
circos.text("GC content (heatmap) ", r=67.5, ha="right", size=8)

circos.savefig("gc_content_plot_example.png", dpi=300)

gc_content.tsv

Position	GCcontent
0	23.333333333333332
60	38.333333333333336
120	53.333333333333336
180	52.5
...
60840	35.0
60900	33.33333333333333
60942	33.33333333333333

gc_content_plot_example.png
gc_content_plot_example

With these code examples, I think you can achieve what you want to do.
I do not intend to explain and teach everything in detail, so please do your best with this code examples.

@LFelipe-B
Copy link
Author

Thank you again moshi4, you were really helpful! Cheers

@moshi4 moshi4 added the question Further information is requested label May 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants