-
Notifications
You must be signed in to change notification settings - Fork 166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
unable to 'scatter' plot region of chromosome name with punctuation marks #602
Comments
Well... let's just say this wasn't a use case that was ever really considered :D Thank you for reporting this, I've submitted #603 which should fix this problem, hopefully even without breaking anything. |
Hi Kirill Tsukanov, Thank you for the fast response and quick fix! I went ahead and did the quick change in the No zoom of chromosome NC_039902.1:Command:
Output: Zoom into region 10000000-15000000 of chromosome NC_039902.1:Command:
Output: Is there an edit that can correctly output the zoomed-in region? We are very close to getting the right output! Thanks ahead of time, |
@amora197 Hmm, that's interesting, thanks for letting me know. I didn't have any plant test data at hand, so I just ran my tests using human data with one of the chromosomes renamed to Could you share your |
Note to self: files received |
PR is merged, but reopening for additional remaining investigations |
@amora197 Thank you for waiting; I was now able to take a look into this. It loooks like the
This makes
In turn, the gene names in your CNR file look like this almost certainly because something went wrong during the target construction ( So, in light of this, could you please elaborate how your target files were constructed? What were the commands/flags used and how does the original (unprocessed) target BED look like? |
@tskir Thank you for your reply and for the wait. We took into account your suggestions and tried some troubleshooting. We have indeed used a GFF3 file to build our bed file (general case for us). The GFF3 file we used was downloaded using the following NCBI websites:
We had previously refrained from editing the We realized that there was non-uniqueness in the We realized that the In summary, we were able to zoom into a region of interest after preparing our bed file like so:
More general, though: Can you please detail how the ideal bed file for running CNVkit needs to look like? More specifically, could we please get advised on the following:
Finally we have a feature request: Thank you again for your assistance. We look forward to your reply. |
Thanks for the details, @amora197 . CNVkit can read GFF directly; you can give a GFF3/GTF/GFF2 file as input in most places where BED works. You can use the bundled script
The GFF reader checks for these tags in order: Gene names do not need to be unique. Apparently the gene name is allowed to be pretty long, as you've seen; CNVkit does not enforce any limit, but the plots start to look weird when the strings are very long. Overlapping regions are allowed. There is no minimum region size; 0 or smaller will tend to get dropped automatically within the CNVkit pipeline. When you give a target BED as input to the CNVkit will work for non-diploid genomes already. The |
Hello,
I am trying to plot a region of a plant chromosome but to no avail. I first plotted successfully the whole chromosome
NC_039902.1
of my sample1-C7
using the following command:When I want to zoom into a specific region of the chromosome, I get the following error:
The command I am using for plotting the region wanted is the following:
It seems that when I provide a region in
-c NC_039902.1:10000000-15000000
, the CNVkit raises an error due to the period or point.
in the chromosome nameNC_039902.1
. The error is not raised when I delete the chromosome name suffix.1
, but obviously the output figure is empty since the specific chromosome nameNC_039902
does not exist in my bed/fasta/bam files.I am using the original NCBI chromosome names and wish to keep them unchanged. Is there a way around this issue for the cnvkit.py scatter command to tolerate punctuation marks in the chromosome names? Some of the chromosome names I am using are: NC_039898.1 NC_039899.1 NC_039900.1 NC_039901.1 NC_039902.1 etc. And not: chr1 chr2 chr3 etc.
Thanks ahead of time!
-Anibal
The text was updated successfully, but these errors were encountered: