Permalink
Browse files

Added Nakayama et al. (2007) sarcoma data set. Also, bumped version t…

…o 0.2.1
  • Loading branch information...
1 parent e1d2a0e commit 253bcddb58b7d9daec71c17ebbe1f5c16c076a54 @ramhiser committed Jan 10, 2013
View
No changes.
View
@@ -1,7 +1,7 @@
Package: datamicroarray
Title: Collection of Data Sets for Classification
-Version: 0.2
-Date: 2012-12-18
+Version: 0.2.1
+Date: 2013-01-09
Author: John A. Ramey <johnramey@gmail.com>
Maintainer: John A. Ramey <johnramey@gmail.com>
Description: A collection of scripts to download, process, and load
View
7 NEWS
@@ -0,0 +1,7 @@
+datamicroarray 0.2.1
+-------------
+
+NEW DATA SETS
+
+* Nakayama et al. (2007) Sarcoma Data Set
+* Sun et al. (2006) Glioma Data Set
View
@@ -27,12 +27,14 @@ describe_data <- function() {
c("gordon", 2002, 181, 12533, 2, "Lung Cancer"),
c("gravier", 2010, 168, 2905, 2, "Breast Cancer"),
c("khan", 2001, 63, 2308, 4, "SRBCT"),
+ c("nakayama", 2001, 105, 22283, 10, "Sarcoma"),
c("pomeroy", 2002, 60, 7128, 2, "CNS Tumor"),
c("shipp", 2002, 58, 6817, 2, "Lymphoma"),
c("singh", 2002, 102, 12600, 2, "Prostate Cancer"),
c("sorlie", 2001, 85, 456, 5, "Breast Cancer"),
c("su", 2002, 102, 5565, 4, "N/A"),
c("subramanian", 2005, 50, 10100, 2, "N/A"),
+ c("sun", 2006, 180, 54613, 4, "Glioma"),
c("tian", 2003, 173, 12625, 2, "Myeloma"),
c("west", 2001, 49, 7129, 2, "Breast Cancer"),
c("yeoh", 2002, 248, 12625, 6, "Leukemia")
View
@@ -34,6 +34,8 @@ Each data set is listed below by the first author on the original paper. The dat
* [Tian (2003)](https://github.com/ramey/datamicroarray/wiki/Tian-%282003%29)
* Prostate Cancer
* [Singh (2002)](https://github.com/ramey/datamicroarray/wiki/Singh-%282002%29)
+* Sarcoma
+ * [Nakayama (2007)](https://github.com/ramey/datamicroarray/wiki/Nakayama-%282007%29)
* Small Round Blue Cell Tumors
* [Khan (2001)](https://github.com/ramey/datamicroarray/wiki/Khan-%282001%29)
* Miscellaneous
View
Binary file not shown.
@@ -0,0 +1,7 @@
+# Nakayama et al. (2007) Sarcoma Data Set
+# Installs the data sets from Gene Expression Omnibus (GEO) database
+source('http://bioconductor.org/biocLite.R')
+biocLite('GEOquery')
+library('GEOquery')
+geo_obj <- getGEO('GDS2736')
+
@@ -0,0 +1,9 @@
+# Nakayama et al. (2007) Sarcoma Data Set
+nakayama_x <- Table(geo_obj)
+x <- unname(t(data.matrix(nakayama_x[, -c(1:2)])))
+colnames(x) <- nakayama_x[, 1]
+
+y <- Columns(geo_obj)$disease.state
+
+nakayama <- list(x = x, y = y)
+
@@ -0,0 +1,5 @@
+# Save a compressed version of the Nakayama et al. (2007) data set.
+# The 'xz' compression format will compress the data more than the
+# default 'gzip' format. However, the 'xz' takes slightly longer
+# (~2 seconds longer) than 'gzip'.
+save(nakayama, file = "nakayama.RData", compress = "xz")

0 comments on commit 253bcdd

Please sign in to comment.