Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Position and Cluster_Position #30

Closed
dsantesmasses opened this issue Mar 24, 2023 · 3 comments
Closed

Position and Cluster_Position #30

dsantesmasses opened this issue Mar 24, 2023 · 3 comments

Comments

@dsantesmasses
Copy link

Hi!

Thanks for the quick reply on the other issues! I managed to run it successfully on chr22.

I am inspecting output.bestMerge.txt. What does Position correspond to? and what is the difference with Cluster_Position?
How can I get start and end coordinates of the sequence in DNA?

Thanks!

@ManuelTgn
Copy link
Contributor

Hi @dsantesmasses,

the position field corresponds to the position of the first nucleotide (nt) of each sequence reported as a target.
Below I'll give you two brief examples on how to interpret the values in position:

  • Let us assume that the PAM sequence occurs downstream with respect to the guide (e.g. Cas9). If the reported target has been found on the + strand, position represents the first target's nt. If the reported target has been found on the - strand, position represents the first target's nt, but reverse complemented.
  • Let us assume that the PAM sequence occurs upstream, with respect to the guide (e.g. Cas12). position still represents the first target's nt, but the reverse complement is computed for targets found on the + strand.

Note that position accounts for bulges.

Cluster_position represents the position of the first PAM's nt minus the guide length, without accounting for bulges. In other words, it identifies all the genome sequences sharing the same exact PAM sequence (without accounting for bulges).

However, we suggest to use position to get the real target position across the genome sequence.

CRISPRme does not report the targets' end coordinates, but they can be computed by adding the length of the complete sequence guide + PAM, to the corresponding value in position.

Let us know if you have any further question.

Manuel

@dsantesmasses
Copy link
Author

Hi @ManuelTgn , thanks very much for your reply!

Just to make sure I got it right, in the case of Cas9, position corresponds to the genomic coordinate aligned to the first nt of the guide (see below). Therefore if the alignment is on the top strand, position points to the start coordinate (lowest number) whereas if the alignment is on the reverse strand, position is the end coordinate (highest number), is that correct?

position in + strand
NNNNNNNNNNNPAM
^

position in - strand
PAMNNNNNNNNNNN
             ^

Thanks!

@samuelecancellieri
Copy link
Collaborator

Hello @dsantesmasses

Position corresponds always to the position of the first nucleotide in 5'.
So if you use a downstream PAM as spcas9, you will have something like,

NNNNNNNPAM
P

For both - and + strand, since the software applies reverse complement to negative stranded targets.

If you use an upstream PAM as cas12,

you will have something like this,

PAMNNNNNNN
P

Since in this case, the software applies reverse complement to positive stranded targets.
But the position is always the first nucleotide of the 5'-3' sequence.

Hope this helps and if you have any other question don't hesitate to ask.

Best,
Samuele

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants