Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

getting mostly NaN values for affinity calculations #3

Open
chrisclarkson opened this issue Jan 10, 2019 · 8 comments
Open

getting mostly NaN values for affinity calculations #3

chrisclarkson opened this issue Jan 10, 2019 · 8 comments

Comments

@chrisclarkson
Copy link

Hi,
I am interested in using your package to calculating the affinity of predicted binding sites and subsequently the significance of the affinity values.

My pipeline (using your instructions from https://rdrr.io/github/matthuska/tRap/man/) is as follows:

pfm
1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20
A 2  3  5  1  0 13  1  3 18  0  6  1  0  1  1  6  1  2  6  2
C 8  3  2 17 19  0 12 10  1  0  0  1  0  1 18  0 12  7  6  5
G 2 10 12  1  0  5  7  1  1 20 14 14 20 18  0 14  5  3  7  7
T 8  4  2  1  0  2  0  6  0  0  0  5  0  1  1  0  1  8  1  6

pwm <- toPWM(pfm)
pwm=PWMatrix(ID="Unknown", name=tf, matrixClass="Unknown", strand="+", bg=c(A=0.25, C=0.25, G=0.25, T=0.25), tags=list(), profileMatrix=pwm, pseudocounts=numeric())
peaks = searchSeq(pwm, seq, min.score = "80%",mc.cores=10L)
peaks_bed = as(peaks, "GRanges")

head(as.data.frame(peaks_bed$siteSeqs)$x)
[1] "AGCCCACTAGGGTGCAGTCC" "ATACCAGAAGAAGGCATCAG" "ACACCAGAAGAGGGCGTCAG"
[4] "ATGCCACGAGGTGGAGATAA" "GACTCACTAGAGGGCACAGG" "TCTACAGCAGGTGGCAACAC"

af=affinity(normalize.pwm(pwm@profileMatrix), as.data.frame(peaks_bed$siteSeqs)$x)

However this results in a many NaN values....

sum(af=='NaN')/length(af)
[1] 0.4785195

I was advised that tRap is used on long sequences rather than short ones so I extended the sequences:

start(peaks_bed) <- start(peaks_bed) - 30
end(peaks_bed) <- end(peaks_bed) + 30
extended_seqs <- getSeq(Mmusculus, peaks_bed)
af_ext=affinity(normalize.pwm(pwm@profileMatrix),as.data.frame(extended_seqs)$x)

However this results in 100% NaNs....

I'm wondering if I am doing something wrong?

@matthiasheinig
Copy link
Collaborator

matthiasheinig commented Jan 10, 2019 via email

@chrisclarkson
Copy link
Author

chrisclarkson commented Jan 10, 2019

Hi Matthias,
Thank you for your quick reply:
There are not many zeros in the normalised matrix:

normalize.pwm(pwm@profileMatrix)
           1          2           3          4          5          6
A  1.0626965  0.9503723  0.04448988  0.4578354  0.3879632 -0.2923233
C -0.5626965  0.9503723  0.85949647 -0.3735062 -0.1638895  1.0223918
G  1.0626965 -1.3188119 -0.76348283  0.4578354  0.3879632  0.0000000
T -0.5626965  0.4180672  0.85949647  0.4578354  0.3879632  0.2699315
            7          8          9         10          11          12
A  0.41349136  0.4404797 -0.2536981  0.3870730 -0.03296475  0.71519479
C -0.24047408 -0.6112445  0.2969491  0.3870730  0.61061995  0.71519479
G -0.09176563  1.3303426  0.2969491 -0.1612191 -0.18827516 -0.45258182
T  0.91874835 -0.1595778  0.6597998  0.3870730  0.61061995  0.02219224
          13         14         15          16          17         18
A  0.3870730  0.4538871  0.2969491 -0.03296475  0.75263253  1.5229892
C  0.3870730  0.4538871 -0.2536981  0.61061995 -0.47909620 -0.5761614
G -0.1612191 -0.3616612  0.6597998 -0.18827516 -0.02616885  0.8595932
T  0.3870730  0.4538871  0.2969491  0.61061995  0.75263253 -0.8064209
          19         20
A -0.2228909  2.3968502
C -0.2228909  0.0000000
G -0.4123795 -0.9067515
T  1.8581614 -0.4900988

when I try the following, I still get NaNs:

affinity(normalize.pwm(pwm@profileMatrix)+0.0000000000000001, as.data.frame(tmp)$x)
[1] NaN
Warning messages:
1: In log(maxCG/p[3]) : NaNs produced
2: In log(maxAT/p) : NaNs produced

@matthiasheinig
Copy link
Collaborator

matthiasheinig commented Jan 10, 2019 via email

@chrisclarkson
Copy link
Author

Hi sorry for delay,
Hmm very strange.... I just got the PFM from the jaspar 2018 database and then TFBSTools::toPWM command which results in a matrix like the one seen above... Can you recommend a package/ command that could perform the conversion (PFM>PWM) in the way that is necessary for your package to work?

@matthiasheinig
Copy link
Collaborator

matthiasheinig commented Jan 11, 2019 via email

@chrisclarkson
Copy link
Author

chrisclarkson commented Jan 11, 2019

Hi again Matthias,
Thank you for this fantastic help.
It works now
Just 2 last questions:
1.
I would also like to apply this analysis to more than one transcription factor- hence if I download a list of PFMs from Jaspar:

ARNT
  [,1] [,2] [,3] [,4] [,5] [,6]
A    4   19    0    0    0    0
C   16    0   20    0    0    0
G    0    1    0   20    0   20
T    0    0    0    0   20    0

AHR
  [,1] [,2] [,3] [,4] [,5] [,6]
A    3    0    0    0    0    0
C    8    0   23    0    0    0
G    2   23    0   23    0   24
T   11    1    1    1   24    0

Ddit3::Cebpa
  [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
A   14   11   18    0    0    4   38   36    0    14     4     0
C    7    7    3    1    0   33    1    2    6    17    23    26
G   12   14   15    0   38    0    0    1    0     5     9     6
T    6    7    3   38    1    2    0    0   33     3     3     7
...

Can I take it as a suitable strategy to add 0.25 to all of these PFMs and then implement your analysis on them?

  1. As for the command local.paffinity, I tried it on the calculated affinity values:
seqs=as.data.frame(extended_seqs)$x
head(seqs)
  A DNAStringSet instance of length 6
    width seq
[1]    79 TACGTAAGTACACTGTAGCTGTCTTCAGACACAC...TCAGATCTCATTATGGGTAGTTGTGAGCTACCA
[2]    79 TTTTACTTTCTCTCTCCCTCTTATTGCTAGATGC...ATAAACAGCTTGCTTCTGCCATGTTCTGCAGAA
[3]    79 GACATCTGAGTACCTTCCCTGTAAGAGAGCTTGC...CTGAGCACTGAAACTCAGAGGAGAGAATCTGTC

head(af_ext)
[1] 10.586463 12.458601 10.153033  7.571788  9.838501 10.966423

af_ext=affinity(normalize.pwm(pwm.for.trap@profileMatrix),seqs)

for(i in c(1:length(af_ext))){
         print(local.paffinity(af_ext[i],pwm.for.trap,seqs[i]))
}

[1] 0.01612903
[1] 0.01612903
[1] 0.01612903
[1] 0.01612903
[1] 0.01612903
........

The p-values is 0.01612903 in every case....
Can I take these values as correct or am I not applying this function correctly?

@matthiasheinig
Copy link
Collaborator

matthiasheinig commented Jan 11, 2019 via email

@chrisclarkson
Copy link
Author

Do you mean that I should use the unnormalised PFMs? Not the PWMs as they have values < 0 (as shown above)...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants