Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adjusting signals of Agilent file fails #2

Open
lbeltrame opened this issue Jun 6, 2016 · 8 comments
Open

Adjusting signals of Agilent file fails #2

lbeltrame opened this issue Jun 6, 2016 · 8 comments

Comments

@lbeltrame
Copy link

> cgh_test

Instance of class rCGH-Agilent

Dataset with 132452 probes and 7 columns.
Array information:

                                                                      info
fileName         US45102933_254852310046_S01_CytoCGH_0209_4x_Mar14_1_1.txt
sampleName                                                            <NA>
labName                                                               <NA>
platform                                                           Agilent
suppressFlags                                                         TRUE
genome                                                                hg19
barCode                                                   254852310046_1_1
gridName                                                   048523_20130328
scanDate                                                        2015-11-04
programVersion                           CytoCGH_0209_4x_Mar14 (Read Only)
gridGenomicBuild                                       hg19:GRCh37:Feb2009
reference                                         Dual color hybridization
analyseDate                                                     2016-06-06
rCGH_version                                                         1.2.2

> cgh_test = adjustSignal(cgh_test, Ref="cy5")
Recall you are using cy5 as reference.
Cy effect adjustment...
GC% adjustment...
Error in .dlrs(cnSet$Log2Ratio) : Vector length>2 needed for computation

> traceback()
4: stop("Vector length>2 needed for computation")
3: .dlrs(cnSet$Log2Ratio)
2: adjustSignal(cgh_test, Ref = "cy5")
1: adjustSignal(cgh_test, Ref = "cy5")

> sessionInfo()
R version 3.3.0 (2016-05-03)
Platform: x86_64-suse-linux-gnu (64-bit)
Running under: openSUSE Tumbleweed (20160603) (x86_64)

locale:
 [1] LC_CTYPE=it_IT.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=it_IT.UTF-8        LC_COLLATE=it_IT.UTF-8    
 [5] LC_MONETARY=it_IT.UTF-8    LC_MESSAGES=it_IT.UTF-8   
 [7] LC_PAPER=it_IT.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=it_IT.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] rCGH_1.2.2

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.5                            
 [2] BiocInstaller_1.22.2                   
 [3] plyr_1.8.3                             
 [4] GenomeInfoDb_1.8.1                     
 [5] XVector_0.12.0                         
 [6] GenomicFeatures_1.24.2                 
 [7] bitops_1.0-6                           
 [8] tools_3.3.0                            
 [9] zlibbioc_1.18.0                        
[10] mclust_5.2                             
[11] biomaRt_2.28.0                         
[12] digest_0.6.9                           
[13] preprocessCore_1.34.0                  
[14] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2
[15] RSQLite_1.0.0                          
[16] gtable_0.2.0                           
[17] lattice_0.20-33                        
[18] Matrix_1.2-6                           
[19] shiny_0.13.2                           
[20] DBI_0.4-1                              
[21] parallel_3.3.0                         
[22] cluster_2.0.4                          
[23] rtracklayer_1.32.0                     
[24] Biostrings_2.40.1                      
[25] S4Vectors_0.10.1                       
[26] IRanges_2.6.0                          
[27] multtest_2.28.0                        
[28] stats4_3.3.0                           
[29] grid_3.3.0                             
[30] TxDb.Hsapiens.UCSC.hg18.knownGene_3.2.2
[31] Biobase_2.32.0                         
[32] R6_2.1.2                               
[33] AnnotationDbi_1.34.3                   
[34] DNAcopy_1.46.0                         
[35] survival_2.39-4                        
[36] BiocParallel_1.6.2                     
[37] XML_3.98-1.4                           
[38] limma_3.28.5                           
[39] ggplot2_2.1.0                          
[40] org.Hs.eg.db_3.3.0                     
[41] MASS_7.3-45                            
[42] splines_3.3.0                          
[43] GenomicAlignments_1.8.1                
[44] scales_0.4.0                           
[45] Rsamtools_1.24.0                       
[46] htmltools_0.3.5                        
[47] BiocGenerics_0.18.0                    
[48] GenomicRanges_1.24.1                   
[49] SummarizedExperiment_1.2.2             
[50] TxDb.Hsapiens.UCSC.hg38.knownGene_3.1.3
[51] mime_0.4                               
[52] xtable_1.8-2                           
[53] colorspace_1.2-6                       
[54] httpuv_1.3.3                           
[55] aCGH_1.50.0                            
[56] affy_1.50.0                            
[57] RCurl_1.95-4.8                         
[58] munsell_0.4.3                          
[59] affyio_1.42.0
@lbeltrame
Copy link
Author

lbeltrame commented Jun 6, 2016

This is due to the cnSet slot not having any LogRatio column.

[1] "ProbeName"      "SystematicName" "ChrNum"         "ChrStart"      
[5] "ChrEnd"         "gMedianSignal"  "rMedianSignal" 

The LogRatio column is however present in the FE generated file.

@fredcommo
Copy link
Owner

Hi Luca,

Thanks for reporting me this bug.

I just checked on the Agilent demo file provided with the package and everything works well:

filePath <- system.file("extdata", "Agilent4x180K.txt.bz2", package = "rCGH")
cgh <- readAgilent(filePath, sampleName = "Agilent4x180K", labName = "myLab")
head(getCNset(cgh))
ProbeName SystematicName ChrNum ChrStart ChrEnd gMedianSignal
1 A_16_P00000195 chr1:931484-931543 1 931484 931543 327.0
2 A_14_P118104 chr1:1217984-1218032 1 1217984 1218032 719.0
3 A_18_P10003523 chr1:1680138-1680187 1 1680138 1680187 215.5
4 A_16_P00001630 chr1:2321395-2321454 1 2321395 2321454 209.0
5 A_16_P15007723 chr1:3149613-3149672 1 3149613 3149672 724.0
6 A_18_P10006605 chr1:3397635-3397694 1 3397635 3397694 212.0
rMedianSignal
1 219.0
2 1304.0
3 188.0
4 318.0
5 905.0
6 539.5
cgh <- adjustSignal(cgh)
Recall you are using cy3 as reference.
Cy effect adjustment...
GC% adjustment...
Log2Ratios QCs:
dLRs: 0.191
MAD: 0.235

Scaling...
Signal filtering…

Can you please check whether you have ‘gMedianSignal’ and ‘rMedianSignal’ in your input file. These values are used to compute the Log2Ratio and the corresponding column is created at this step.
Are you using an original Agilent Feature Extraction file ?

Fred

Le 6 juin 2016 à 10:51, Luca Beltrame notifications@github.com a écrit :

This is due to the cnSet slot not having any LogRatio column.


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.

@lbeltrame
Copy link
Author

Yes, the file has been generated by Agilent CytoGenomics 3 (FE does not process CGH arrays, at least not the most recent versions).

And both columns are present in the file. Where in the code is the Log2Ratio calculated? I see the code explicitly skips the Log2Ratio column in the feature extracted files.

@lbeltrame
Copy link
Author

lbeltrame commented Jun 6, 2016

For reference:

head(getCNset(cgh_test))
                        ProbeName     SystematicName ChrNum ChrStart ChrEnd
1    0754_1c1_1_152_s_PSO-60-0084   chr1:10478-10537      1    10478  10537
2 0754_409c1_1_3048_s_PSO-60-1908 chr1:566032-566091      1   566032 566091
3  0754_452c1_1_945_s_PSO-60-0088 chr1:632953-633012      1   632953 633012
4 0754_479c1_1_3081_s_PSO-60-2775 chr1:657598-657657      1   657598 657657
5  0754_557c1_1_253_s_PSO-60-0088 chr1:723318-723377      1   723318 723377
6  0754_585c1_1_751_s_PSO-60-0690 chr1:752671-752730      1   752671 752730
  gMedianSignal rMedianSignal
1        7682.0       22443.5
2       31500.5       17967.0
3        7682.0        7345.0
4       29365.5       18970.5
5        3083.0        4204.5
6        2901.0        3381.0

(Probe IDs are different because this chip is manufactured by Agilent for another supplier).

@lbeltrame
Copy link
Author

lbeltrame commented Jun 6, 2016

I've done a bit of debugging: Log2Ratio is added correctly in the Cy adjustment step. And now I know what's wrong.

This chip is made by Agilent for Oxford Gene Technology (OGT). So FE / CytoGenomics will actually process it correctly, but the problem is that the probes will not match in .GCadjust(cnSet). Therefore this will create two non valid items (tmpDB and cnSet) and so the if will pass, causing errors:

Browse[3]> head(cnSet)
[1] ProbeName      SystematicName ChrNum         ChrStart       ChrEnd        
[6] gMedianSignal  rMedianSignal  Log2Ratio     
<0 rows> (or 0-length row.names)
Browse[3]> head(tmpDB)
[1] ProbeID GC     
<0 rows> (or 0-length row.names)
Browse[3]> 

The workaround is to disable GC correction, but likely this needs to be taken into account (i.e. the case where both tmpDB and cnSet are empty).

@fredcommo
Copy link
Owner

Oh I see.
A possible workaround would be to compute the corrections and Log2Ratio
outside rCGH, then to use readGeneric().
This function has been added precisely to support custom designs. You may
want to look at ?readGeneric to pass the expected columns.

Sorry for the inconveniences it may have caused,
Fred

2016-06-06 11:39 GMT+02:00 Luca Beltrame notifications@github.com:

I've done a bit of debugging: Log2Ratio is added correctly in the Cy
adjustment step. And now I know what's wrong.

This chip is made by Agilent for Oxford Gene Technology (OGT). So FE /
CytoGenomics will actually process it correctly, but the problem is that
the probes will not match in .GCadjust(cnSet). Therefore this will create
two non valid items (tmpDB and cnSet) and so the if will pass, causing
errors:

Browse[3]> head(cnSet)
[1] ProbeName SystematicName ChrNum ChrStart ChrEnd
[6] gMedianSignal rMedianSignal Log2Ratio
<0 rows> (or 0-length row.names)
Browse[3]> head(tmpDB)
[1] ProbeID GC
<0 rows> (or 0-length row.names)
Browse[3]>


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#2 (comment), or mute
the thread
https://github.com/notifications/unsubscribe/ACnJ-yQiUx6dyPzdTJrPWiEdt1t31m0vks5qI-rpgaJpZM4Iurpc
.

@lbeltrame
Copy link
Author

readAgilent does most of the boring work of reading data so I'd rather avoid reimplementing it. ;) . What I could do is to do the Cy adjustment in adjustSignal then do the GC adjustment externally.

@fredcommo
Copy link
Owner

Right ! :)

2016-06-06 12:12 GMT+02:00 Luca Beltrame notifications@github.com:

readAgilent does most of the boring work of reading data so I'd rather
avoid reimplementing it. ;) . What I could do is to do the Cy adjustment in
adjustSignal then do the GC adjustment externally.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#2 (comment), or mute
the thread
https://github.com/notifications/unsubscribe/ACnJ-w3QxVMuuB5lrrJo7pRHjkUS6RG7ks5qI_KRgaJpZM4Iurpc
.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants