-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Plot GC content from user provided dataframe (.tsv) #21
Labels
question
Further information is requested
Comments
Here is an example implementation code for your question below.
from pycirclize import Circos
import pandas
from pycirclize.parser import Genbank, Gff
from pycirclize.utils import load_prokaryote_example_file
# Generate GC content example file (gc_content.tsv)
gbk_file = load_prokaryote_example_file("enterobacteria_phage.gbk")
gbk = Genbank(gbk_file)
pos_list, gc_contents = gbk.calc_gc_content()
gc_content_df = pandas.DataFrame({"Position": pos_list, "GCcontent": gc_contents})
gc_content_tsv_file = "gc_content.tsv"
gc_content_df.to_csv(gc_content_tsv_file, sep="\t", index=False)
# Plot genomic features & GC content (from GFF & GC content file)
gff_file = load_prokaryote_example_file("enterobacteria_phage.gff")
gff = Gff(gff_file)
circos = Circos(sectors={gff.name: gff.range_size}, start=0, end=320)
sector = circos.sectors[0]
outer_track = sector.add_track(r_lim=(100, 100))
outer_track.xticks_by_interval(5000, label_formatter=lambda v: f"{v / 1000:.0f} Kb", show_bottom_line=True)
f_cds_track = sector.add_track(r_lim=(94, 98), r_pad_ratio=0.1)
f_cds_track.genomic_features(gff.extract_features(target_strand=1), fc="tomato")
r_cds_track = sector.add_track(r_lim=(90, 94), r_pad_ratio=0.1)
r_cds_track.genomic_features(gff.extract_features(target_strand=-1), fc="skyblue")
df = pandas.read_csv(gc_content_tsv_file, sep="\t")
pos_list, gc_contents = df["Position"].to_numpy(), df["GCcontent"].to_numpy()
max_gc, min_gc = max(gc_contents), min(gc_contents)
gc_fill_track = sector.add_track(r_lim=(75, 85))
gc_fill_track.grid()
gc_fill_track.fill_between(pos_list, gc_contents, y2=min_gc, vmin=min_gc, color="lightgrey")
yticks = [min_gc, max_gc]
yticks_label = [f"{y:.1f}" for y in yticks]
gc_fill_track.yticks(yticks, yticks_label, vmin=min_gc, label_size=6, side="left")
gc_heatmap_track = sector.add_track(r_lim=(65, 70))
gc_heatmap_track.heatmap(gc_contents.reshape(1, -1), cmap="bwr")
circos.colorbar((0.3, 0.5, 0.4, 0.02), vmin=min_gc, vmax=max_gc, cmap="bwr", orientation="horizontal", colorbar_kws=dict(label="GC Content (%)"))
circos.text("Forward CDS ", r=96, ha="right", size=8)
circos.text("Reverse CDS ", r=92, ha="right", size=8)
circos.text("GC content (fill_between) ", r=80, ha="right", size=8)
circos.text("GC content (heatmap) ", r=67.5, ha="right", size=8)
circos.savefig("gc_content_plot_example.png", dpi=300) gc_content.tsv Position GCcontent
0 23.333333333333332
60 38.333333333333336
120 53.333333333333336
180 52.5
...
60840 35.0
60900 33.33333333333333
60942 33.33333333333333 With these code examples, I think you can achieve what you want to do. |
Thank you again moshi4, you were really helpful! Cheers |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Dear moshi4, wish to thank you for the earlier implementation of using gff multiple contig file as input from the user, it was really helpful. Can I ask now for assistance on how to plot GC content from an uploaded table (or other type of similar information) as colorbar (or lines for sliding window for ie. as a separate track? The examples only show the calculations from .gbk files and I don't have these type of file, only .gffs. I think it would be useful for the whole community, since your tool is great and really user friendly!
I assume that here in the colorbar example: https://moshi4.github.io/pyCirclize/plot_tips/ user could provide the dataframe in the section matrix1 = np.random.randint(vmin1, vmax1, (5, 100)) ?
And for the lines I am not sure. Another idea that I had was to provide the GC value just for the CDS and map the prot_ID to the specific value, but no clue on how add this data to a specific track. Hope I am clear, and sorry if this is a really trivial problem for python users. Thank you!
The text was updated successfully, but these errors were encountered: