Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improper header in pheno dataframe with single column pheno file #103

Closed
fdchevalier opened this issue Sep 18, 2023 · 3 comments
Closed
Labels

Comments

@fdchevalier
Copy link

Dear Karl,

First, thank you very much for having made and maintaining this very useful package.

I came across an unexpected behavior when using the read.cross() function with a phenotype file that has a single column. Despite the column having a "id" header, this header is not present after creating the cross object. This is not the case when the phenotype file has 2 columns.

Here is a reproducible example:

# Mock data
phe <- structure(list(id = c("F2A_1", "F2A_10", "F2A_100", "F2A_101", "F2A_102", "F2A_103")), row.names = c(NA, 6L), class = "data.frame")
gen <- structure(list(id = c("", "", "F2A_1", "F2A_10", "F2A_100", "F2A_101",
			"F2A_102", "F2A_103"), X1 = c("1", "0.781214",
			"LL", "HH", "LL", "HL", "HH", "HL"), X2 = c("1",
			"0.981928", "LL", "HH", "LL", "HL", "HH", "HL"), X3 = c("1",
			"1.060362", "LL", "HH", "LL", "HL", "HH", "HL"), X4 = c("1",
			"1.201365", "LL", "HH", "LL", "HL", "HH", "HL"), X5 = c("1",
			"1.220872", "LL", "HH", "LL", "HL", "HH", "HL")), row.names = c(NA,
			8L), class = "data.frame")

# Write mock data into files
write.table(phe, "phe.csv", row.names = F, quote = F, sep=",")
write.table(cbind(phe, phe), "phe2.csv", row.names = F, quote = F, sep=",")
write.table(gen, "gen.csvs", row.names = F, quote = F, sep=",")

# Create a cross object with a single-column phenotype file
cross1 <- read.cross("csvs", genfile = "gen.csvs", phefile = "phe.csv", estimate.map = FALSE, genotypes = c("LL", "HL", "HH"), alleles = c("L", "H"))
colnames(cross1$pheno)

# Create a cross object with a two-column phenotype file
cross2 <- read.cross("csvs", genfile = "gen.csvs", phefile = "phe2.csv", estimate.map = FALSE, genotypes = c("LL", "HL", "HH"), alleles = c("L", "H"))
colnames(cross2$pheno)

This prevents getid() to work as expected.

Here is my environment details:

sessionInfo()
R version 4.2.3 (2023-03-15)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /xxxx/miniconda3/envs/gen_map/lib/libopenblasp-r0.3.24.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8
 [8] LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base

other attached packages:
[1] qtl_1.60       magrittr_2.0.3

loaded via a namespace (and not attached):
[1] compiler_4.2.3 parallel_4.2.3 tools_4.2.3

Please let me know if you need more details.

Fred

@kbroman
Copy link
Owner

kbroman commented Sep 18, 2023

Thanks! I really appreciate the excellent example.

The problem is in lines 199-201 of read.cross.csvs.R:

colnames(pheno) <- unlist(pheno[1,])
pheno <- apply(pheno, 2, function(a) { a[!is.na(a) & a==""] <- NA; a })
pheno <- as.data.frame(pheno[-1,], stringsAsFactors=TRUE)

The apply() function with a single-column data frame messes up the column names.

@kbroman kbroman added the bug label Sep 18, 2023
@fdchevalier
Copy link
Author

I am glad the example helped.

So, a simple fix could be storing the column names and setting them after the data frame is created. Something like:

pnames <- unlist(pheno[1,])
pheno <- apply(pheno, 2, function(a) { a[!is.na(a) & a==""] <- NA; a })
pheno <- as.data.frame(pheno[-1,], stringsAsFactors=TRUE)
colnames(pheno) <- pnames

Happy to send a PR your way if you would like.

@kbroman
Copy link
Owner

kbroman commented Sep 18, 2023

@fdchevalier I've got it fixed; thanks!

@kbroman kbroman closed this as completed Sep 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants