Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

createDataFile returns empty diag.geno.file #15

Closed
ryshi06 opened this issue Sep 14, 2023 · 6 comments
Closed

createDataFile returns empty diag.geno.file #15

ryshi06 opened this issue Sep 14, 2023 · 6 comments

Comments

@ryshi06
Copy link

ryshi06 commented Sep 14, 2023

Hi,

I have raw text files from Illumina and I am following the DataCleaning guide to prepare the snpAnnotation and scanAnnotation data frame. But when I tried to generate the gds file, the corresponding diagnostic file returns NULL for several values including sample, sample.match, etc. I have attached the empty output I received. I double-checked the file path, the two annotation dataframe and the raw data files are good. Can you give me some idea why I am having this issue? Thank you!

Screenshot 2023-09-13 at 8 07 31 PM
@smgogarten
Copy link
Owner

I can't diagnose this problem without a reproducible example.

@smgogarten
Copy link
Owner

Can you please also supply the code you used that produced the error, and the output of sessionInfo()?

@smgogarten
Copy link
Owner

Email attachments don't seem to work in replying to GitHub issues. Please go to the Issue page on the github website and paste your code into the comment box.

@ryshi06
Copy link
Author

ryshi06 commented Sep 21, 2023

Generate SnpAnnot:

ref <- read.table("GDA_A1_snps.txt", sep="\t", header = FALSE)
colnames(ref) <- c("snpName", "chromosome", "position", "perc_match", "strand", "TOP")

d1 <- subset(ref, select=c("snpName", "chromosome", "position"))

d1$chromosome[d1$chromosome=="X"] <- 23
d1$chromosome[d1$chromosome=="Y"] <- 25
d1$chromosome[d1$chromosome=="MT"] <- 26
d1$chromosome[d1$chromosome=="0"] <- 27

d1$chromosome[d1$chromosome=="XY"] <- 24

d1$chromosome <- as.integer(d1$chromosome)
d <- d1[order(d1$chromosome, d1$position), ]
d$snpID <- 1:nrow(d)
d <- d[,c("snpID", "snpName", "chromosome", "position")]
snpAnnot <- SnpAnnotationDataFrame(d)

meta <- varMetadata(snpAnnot)
meta[c("snpID", "snpName", "chromosome", "position"),
"labelDescription"] <- c("unique integer ID for SNPs (row number assigned)",
"BeadSet SNP ID from Illumina",
paste("integer code for chromosome: 1:22=autosomes,",
"23=X, 24=pseudoautosomal, 25=Y, 26=Mitochondrial, 27=Unknown"), "base pair position on chromosome (build 37)")
varMetadata(snpAnnot) <- meta

Generate ScanAnnot:
d <- read.table("scanAnnot_fake.txt", sep = "\t", header = TRUE)

scanAnnot <- ScanAnnotationDataFrame(d)

meta <- varMetadata(scanAnnot)
meta[c("scanID","scanName","file","sex","race"), "labelDescription"] <-
c("unique ID for scans",
"subject identifier",
"raw data file",
"Sex",
"Race")
varMetadata(scanAnnot) <- meta

Create gds file:
path <- "."
geno.file <- "tmp.geno.gds"

scan_annotation <- getAnnotation(scanAnnot)
snp_annotation <- getAnnotation(snpAnnot)

col.nums <- as.integer(c(1,2,10,11))
names(col.nums) <- c("snp", "sample", "a1", "a2")
diag.geno.file <- "diag.geno.RData"
diag.geno <- createDataFile(path=path, geno.file, file.type="gds",
variables="genotype",
snp.annotation=snp_annotation,
scan.annotation=scan_annotation, sep.type="\t",
skip.num=10, col.total=11, col.nums=col.nums,
scan.name.in.file=1, diagnostics.filename=diag.geno.file)

sample1.txt
sample2.txt
sample3.txt
scanAnnot_fake.txt
GDA_A1_snps.txt

@smgogarten
Copy link
Owner

I just ran your code and got the expected (non-empty) output:

> diag.geno
$read.file
[1] 1 1 1

$row.num
[1] 90 90 90

$samples
$samples[[1]]
[1] "sample1"

$samples[[2]]
[1] "sample2"

$samples[[3]]
[1] "sample3"


$sample.match
[1] 1 1 1

$missg
$missg[[1]]
character(0)

$missg[[2]]
character(0)

$missg[[3]]
character(0)


$snp.chk
[1] 1 1 1

$chk
[1] 1 1 1

Details on my R session:

> sessionInfo()
R version 4.3.1 (2023-06-16)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Ventura 13.4

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: America/Los_Angeles
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] GWASTools_1.46.0    Biobase_2.60.0      BiocGenerics_0.46.0

loaded via a namespace (and not attached):
 [1] shape_1.4.6          formula.tools_1.7.1  lattice_0.21-8       vctrs_0.6.3         
 [5] tools_4.3.1          generics_0.1.3       sandwich_3.0-2       tibble_3.2.1        
 [9] fansi_1.0.4          RSQLite_2.3.1        pan_1.9              blob_1.2.4          
[13] pkgconfig_2.0.3      jomo_2.7-6           Matrix_1.6-0         data.table_1.14.8   
[17] lifecycle_1.0.3      compiler_4.3.1       MatrixModels_0.5-2   codetools_0.2-19    
[21] SparseM_1.81         quantreg_5.97        GWASExactHW_1.01     glmnet_4.1-8        
[25] mice_3.16.0          pillar_1.9.0         nloptr_2.0.3         tidyr_1.3.0         
[29] MASS_7.3-60          cachem_1.0.8         iterators_1.0.14     rpart_4.1.19        
[33] boot_1.3-28.1        foreach_1.5.2        mitml_0.4-5          nlme_3.1-162        
[37] tidyselect_1.2.0     dplyr_1.1.2          purrr_1.0.1          splines_4.3.1       
[41] operator.tools_1.6.3 fastmap_1.1.1        grid_4.3.1           cli_3.6.1           
[45] magrittr_2.0.3       survival_3.5-5       utf8_1.2.3           broom_1.0.5         
[49] backports_1.4.1      bit64_4.0.5          quantsmooth_1.66.0   logistf_1.26.0      
[53] bit_4.0.5            nnet_7.3-19          lme4_1.1-34          zoo_1.8-12          
[57] memoise_2.0.1        DNAcopy_1.74.1       lmtest_0.9-40        mgcv_1.9-0          
[61] rlang_1.1.1          Rcpp_1.0.11          glue_1.6.2           DBI_1.1.3           
[65] gdsfmt_1.36.1        rstudioapi_0.15.0    minqa_1.2.5          R6_2.5.1 

@ryshi06
Copy link
Author

ryshi06 commented Sep 22, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants