Skip to content
This repository

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Browse code

imported source for r-cran-catools

  • Loading branch information...
commit 4fe8f019f0ceed3f64eb392e882b14420d777aab 1 parent 6c1e1bf
Scott Smith authored
90  CHANGES
... ...
@@ -0,0 +1,90 @@
  1
+Changes:
  2
+1) caMassClass-1.0 to caTools-1.0 - caTools created after splitting caMassClass
  3
+   package:
  4
+  - bin2raw and raw2bin files: endian variable was added
  5
+  - raw2bin: additional error checking
  6
+  - runmean, runmax, runmin, runquantile, runmad: additional error checking and
  7
+    corrected error checking
  8
+  - EndRule: added error checking
  9
+  - runmean, runmax, runmin: additional parameter "alg" was added
  10
+  - runmean - new C code can survive round-off errors
  11
+2) caTools-1.1  (7/15/2005)
  12
+  - read.GIF and write.GIF files were added.
  13
+  - bin2raw and raw2bin files: much faster raw to numeric conversion
  14
+  - sum.exact, cumsum.exact and runsum.exact  functions were added for 
  15
+    under-flow safe addition.
  16
+3) caTools-1.2 (8/8/2005)
  17
+  - minor changes to .Rd files to fix problems found by new 'checkDocFiles'
  18
+    function.
  19
+4) caTools-1.3 (9/23/2005)
  20
+  - changes to colAUC function. Parameters: plotROC and alg were added. 
  21
+    Parameter 'p.val' was removed, since it gave wrong results in case of data 
  22
+    with ties. And it was too complicated to fix.
  23
+  - added much better testing of "colAUC" in Example section.
  24
+4) caTools-1.4 (9/28/2005)
  25
+  - a small change in 1.3 that used call to 'max' instead of 'pmax' made 
  26
+    'colAUC' return wrong numbers in case of multi-feature data.
  27
+5) caTools-1.5 (11/10/2005)
  28
+  - several new examples
  29
+  - functions raw2bin and bin2raw still work but return warnings. They will be 
  30
+    retired in the next version, since they parallel new capabilities of 
  31
+    readBin and writeBin.
  32
+  - Argument 'col' of function 'write.gif' was changed to allow several other 
  33
+    ways to define a color palette.
  34
+  - base64encode and base64decode now use readBin and writeBin instead
  35
+    raw2bin and bin2raw.
  36
+ 6) caTools-1.6 (04/11/2006)
  37
+  - functions raw2bin and bin2raw were retired, since they parallel new 
  38
+    (as of R-2.2.0) capabilities of readBin and writeBin.
  39
+  - Bug in plotting in colAUC function was fixed, after it was reported by Tom 
  40
+    Wright.
  41
+  - Documentation of colAUC was updated and some examples modified in order to 
  42
+    reduce dependency on external libraries.
  43
+  - GifTools.cc was split into GifTools.cpp and GifTools.cc. The first file
  44
+    contains GIF encoding/decoding algorithm and second is a shell that 
  45
+    comunicates with R.
  46
+ 7) caTools-1.7 (Jan 31 2007)
  47
+  - Added 4th variable in 'rgb' function (line 174).
  48
+ 8) caTools-1.8 (Oct. 9 2007)
  49
+  - Major changes to moving window statistics functions:
  50
+    * use fast C code to process array edges
  51
+    * added suport for NaN's
  52
+    * added suport for even size windows
  53
+    * split help file into 5 new help files
  54
+    * added much more examples and self-tests
  55
+  - Followed Prof. Brian Ripley request to move R header files out of extern "C" 
  56
+    blocks in C++ files.
  57
+  - Changed licence to GPL
  58
+  9) caTools-1.9 (July 7 2008)
  59
+  - No code changes
  60
+  - Corrected inconsistent licence information
  61
+  10) caTools-1.10 (Oct 08 2009)
  62
+  - Fixed runmax to handle correctly negative numbers
  63
+  - Added align argument to moving window statistics functions: runmean, runmax, 
  64
+    runmin, runsd, runmad and runquintile functions. It allows left and right 
  65
+    alligned window in addition to centered window. 
  66
+  - Added support for 2D arrays to be passed to moving window statistics 
  67
+    functions: runmean, runmax, runmin, runsd, runmad and runquintile functions.
  68
+    If input array is a 2D matrix than the operations are performed for each 
  69
+    column separately. This change was mostly handled by EndRule function. Speed
  70
+    for 2D array input is expected to be slower than for vectors since 
  71
+    calculations for beggining and end of arrays if handled in R instead of C.
  72
+    is expected to be 
  73
+  - Changed "if(!require(LIB)) warning(..." to ""library(LIB)" to get rid or 
  74
+    warnings by RCMD CHECK
  75
+  11) caTools-1.11 (Dec 2010)  
  76
+  - Changed EndRule function used by all run... functions to fix handling of 
  77
+    different endrules when for non-central aligments and for k=2.
  78
+  - Fully retired runsum.exact, which was not working for a while, use runmean
  79
+    with "exact" option instead.
  80
+  - Removed references to several no-longer-existing packages and functions from 
  81
+    Rd files.  
  82
+  - changed documentation of predict.LogitBoost to follow S3 mathod syntax.
  83
+  - renamed sum.exact and cumsum.exact to sumexact and cumsumexact to avoid 
  84
+    package build errors. those functions were confused with S3 mathods.
  85
+====================================================\==========================                 
  86
+ Jarek Tuszynski, PhD.                           o / \ 
  87
+ Science Applications International Corporation  <\__,|  
  88
+ Jaroslaw.W.Tuszynski@saic.com                    ">   \
  89
+                                                   `    \
  90
+
17  DESCRIPTION
... ...
@@ -0,0 +1,17 @@
  1
+Package: caTools
  2
+Version: 1.11
  3
+Date: December 6, 2010
  4
+Title: Tools: moving window statistics, GIF, Base64, ROC AUC, etc.
  5
+Author: Jarek Tuszynski <jaroslaw.w.tuszynski@saic.com>
  6
+Maintainer: Jarek Tuszynski <jaroslaw.w.tuszynski@saic.com>
  7
+Depends: R (>= 2.2.0), bitops
  8
+Suggests: MASS, rpart
  9
+Description: Contains several basic utility functions including: moving
  10
+        (rolling, running) window statistic functions, read/write for
  11
+        GIF and ENVI binary files, fast calculation of AUC, LogitBoost
  12
+        classifier, base64 encoder/decoder, round-off error free sum
  13
+        and cumsum, etc.
  14
+License: GPL-3
  15
+Repository: CRAN
  16
+Date/Publication: 2010-12-17 07:35:29
  17
+Packaged: 2010-12-16 17:17:05 UTC; tuszynskij
113  R/ENVI.R
... ...
@@ -0,0 +1,113 @@
  1
+#===========================================================================#
  2
+# caTools - R library                                                       #
  3
+# Copyright (C) 2005 Jarek Tuszynski                                        #
  4
+# Distributed under GNU General Public License version 3                    #
  5
+#===========================================================================#
  6
+
  7
+write.ENVI = function(X, filename, interleave=c("bsq", "bil", "bip") ) 
  8
+{ # write matrix or data cube to binary ENVI file
  9
+  if (is.vector(X)) {
  10
+    nCol = length(X)
  11
+    nRow <- nBand <- 1
  12
+  } else {
  13
+    d = dim(X)
  14
+    nRow  = d[1]
  15
+    nCol  = d[2]
  16
+    nBand = prod(d)/(nRow*nCol)
  17
+  }
  18
+  dim(X) = c(nRow, nCol, nBand)    # make it into 3D array in case it was not
  19
+  
  20
+  # check data type
  21
+  data.type = 0
  22
+  if (is.double (X)) data.type = 5 # 64-bit double
  23
+  if (is.integer(X)) data.type = 3 # 32-bit int
  24
+  if (is.complex(X)) data.type = 9 # 2x64-bit complex<double>
  25
+  if (data.type == 0) {            # do not know what is it -> make it a double
  26
+    X = as.double(X) 
  27
+    data.type = 5 
  28
+  } 
  29
+  
  30
+  # change interleave and store tha data
  31
+  interleave = match.arg(interleave)
  32
+  if      (interleave=="bil") X=aperm(X, c(2,3,1))  # R's [row,col,band] -> bil [col,band,row] 
  33
+  else if (interleave=="bip") X=aperm(X, c(3,2,1))  # R's [row,col,band] -> bip [band,col,row] 
  34
+  else if (interleave=="bsq") X=aperm(X, c(2,1,3))  # R's [row,col,band] -> bsq [col,row,band] 
  35
+  writeBin(as.vector(X), filename)                  # write Envi file
  36
+
  37
+  # write header file
  38
+  out  = "ENVI\ndescription = { R-language data }\n"
  39
+  out  = paste(out, "samples = ", nCol, "\n", sep="")
  40
+  out  = paste(out, "lines = ", nRow, "\n", sep="")
  41
+  out  = paste(out, "bands = ", nBand, "\n", sep="")
  42
+  out  = paste(out, "data type = ",data.type,"\n", sep="")
  43
+  out  = paste(out, "header offset = 0\n", sep="")
  44
+  out  = paste(out, "interleave = ",interleave,"\n", sep="")   # interleave is assumed to be bsq - in case of 1 band images all 3 formats are the same 
  45
+  ieee = if(.Platform$endian=="big") 1 else 0       # does this machine uses ieee (UNIX) format? or is it intel format?
  46
+  out  = paste(out, "byte order = ", ieee, "\n", sep="")
  47
+  cat(out, file=paste(filename, ".hdr", sep=""))
  48
+  invisible(NULL)
  49
+}
  50
+
  51
+# =======================================================================================
  52
+
  53
+read.ENVI = function(filename, headerfile=paste(filename, ".hdr", sep=""))  
  54
+{  # read matrix or data cube from binary ENVI file
  55
+  
  56
+  # parse header file 
  57
+  nCol <- nRow <- nBand <- data.type <- header.offset <- byte.order <- (-1)
  58
+  interleave = "bsq"
  59
+  if (!file.exists(headerfile)) stop("read.ENVI: Could not open input header file: ", headerfile)
  60
+  Lines  = read.table(headerfile, sep="=", strip.white=TRUE, row.names = NULL, as.is=TRUE, fill=TRUE)
  61
+  Fields = c("samples", "lines", "bands", "data type", "header offset", "interleave", "byte order")
  62
+  for (i in 1:nrow(Lines)) {
  63
+    Lab = tolower(Lines[i,1])
  64
+    Lab = gsub("[ ]+", " ", Lab) # Replace all multiple spaces with a single space 
  65
+    j = match(Lab, Fields)
  66
+    Val = Lines[i,2]
  67
+    if (length(j) == 1)
  68
+     switch( j, 
  69
+       nCol          <- as.integer(Val),
  70
+       nRow          <- as.integer(Val),
  71
+       nBand         <- as.integer(Val),
  72
+       data.type     <- as.integer(Val),
  73
+       header.offset <- as.integer(Val),
  74
+       interleave    <- gsub(" ", "", Val),
  75
+       byte.order    <- as.integer(Val)
  76
+     )
  77
+   }
  78
+
  79
+  if (nCol <= 0 | nRow <= 0 | nBand <= 0) 
  80
+    stop("read.ENVI: Error in input header file ", headerfile, " data sizes missing or incorrect", nRow, nCol, nBand)
  81
+  if (! ( data.type %in% c(1,2,3,4,5,9,12) ) ) 
  82
+    stop("read.ENVI: Error in input header file ", headerfile, " data type is missing, incorrect or unsupported ")
  83
+  
  84
+  # read the data binary file
  85
+  ieee = if(.Platform$endian=="big") 1 else 0       # does this machine uses ieee (UNIX) format? or is it intel format?
  86
+  endian = if(ieee==byte.order | byte.order<0) .Platform$endian else "swap"  
  87
+  size   = nRow*nCol*nBand
  88
+  if (!file.exists(filename)) stop("read.ENVI: Could not open input file: ", filename)
  89
+  f = file(filename, "rb")
  90
+  if (header.offset>0) readBin(f, raw(), n=header.offset)
  91
+  switch( data.type, 
  92
+    X <- readBin(f, integer(), n=size, size=1, signed=FALSE),  # data.type==1 -> 1-byte unsigned integer (char)
  93
+    X <- readBin(f, integer(), n=size, size=2, endian=endian), # data.type==2 -> 2-byte short 
  94
+    X <- readBin(f, integer(), n=size, endian=endian),         # data.type==3 -> 4-byte int
  95
+    X <- readBin(f, double() , n=size, size=4, endian=endian), # data.type==4 -> 4-byte float 
  96
+    X <- readBin(f, double() , n=size, endian=endian), , , ,   # data.type==5 -> 8-byte double
  97
+    X <- readBin(f, complex(), n=size, endian=endian), , ,     # data.type==9 -> 2x8-byte complex<double>
  98
+    X <- readBin(f, integer(), n=size, size=2, endian=endian, signed=FALSE) # data.type==12 -> 2-byte unsigned short integer
  99
+  )
  100
+  close(f)
  101
+
  102
+  Fields = c("bil", "bip", "bsq")
  103
+  j = match(interleave, Fields)
  104
+  if (length(j)==0) stop("read.ENVI: Error in input header file ", headerfile, " incorrect interleave type")
  105
+  switch(j,        
  106
+   { dim(X)<-c(nCol,nBand,nRow); X<-aperm(X, c(3,1,2)); }, # bil [col,band,row] -> R's [row,col,band]
  107
+   { dim(X)<-c(nBand,nCol,nRow); X<-aperm(X, c(3,2,1)); }, # bip [band,col,row] -> R's [row,col,band]
  108
+   { dim(X)<-c(nCol,nRow,nBand); X<-aperm(X, c(2,1,3)); }  # bsq [col,row,band] -> R's [row,col,band]
  109
+  )
  110
+  if (nBand==1) dim(X)=c(nRow, nCol)
  111
+  return(X)
  112
+} 
  113
+
179  R/GIF.R
... ...
@@ -0,0 +1,179 @@
  1
+#===========================================================================#
  2
+# caTools - R library                                                       #
  3
+# Copyright (C) 2005 Jarek Tuszynski                                        #
  4
+# Distributed under GNU General Public License version 3                    #
  5
+#===========================================================================#
  6
+
  7
+write.gif = function(image, filename, col="gray", 
  8
+            scale=c("smart", "never", "always"), transparent=NULL, 
  9
+            comment=NULL, delay=0, flip=FALSE, interlace=FALSE)
  10
+{
  11
+  if (!is.character(filename)) stop("write.gif: 'filename' has to be a string")
  12
+  if (length(filename)>1) filename = paste(filename, collapse = "")  # combine characters into a string
  13
+
  14
+  #======================================
  15
+  # cast 'image' into a proper dimentions
  16
+  #======================================
  17
+  dm = dim(image)
  18
+  if (is.null(dm)) stop("write.gif: input 'x' has to be an matrix or 3D array")
  19
+  if (length(dm)<=2) { # this is a 2D matrix or smaller
  20
+    image = as.matrix(image)   # cast to 2D matrix
  21
+    if (flip) x = image[,dm[2]:1]
  22
+    else x = t(image)
  23
+  } else {             # 3D data cube or bigger
  24
+    dim(image) = c(dm[1], dm[2], prod(dm)/(dm[1]*dm[2])) # cast to 3D
  25
+    if (flip) x = image[,dm[2]:1,]
  26
+    else x = aperm(image, c(2,1,3))
  27
+  }
  28
+  image = 0            # release memory
  29
+  dm = dim(x)          # save dimentions and ...
  30
+  x = as.vector(x)     # convert to 1D vector
  31
+  
  32
+  #=================================
  33
+  # scale x into a proper range
  34
+  #=================================
  35
+  scale = match.arg(scale)
  36
+  if (!is.null(transparent)) 
  37
+   if ((transparent<0) || (transparent>255)) 
  38
+    stop("write.gif:'transparent' has to be an integer between 0 and 255")
  39
+  mask = !is.finite(x)
  40
+  xx = 0
  41
+  mColor = 255
  42
+  if (any(mask)) {  # some non-finite numbers were found
  43
+    if (is.null(transparent)) mColor = 254
  44
+    xx = x          # save original x
  45
+    x  = x[!mask]   # remove non-finite numbers
  46
+  }
  47
+  minx = min(x)
  48
+  maxx = max(x)
  49
+  d = mColor/(maxx-minx)
  50
+  if (scale=="never") {
  51
+    if ((minx<0) || (maxx>mColor)) 
  52
+     warning("write.gif: 'x' is not in proper range and 'scale' is set to 'never',",
  53
+     " clipping 'x' to proper range ")
  54
+    if (minx<0     ) x[x<0     ] = 0 
  55
+    if (maxx>mColor) x[x>mColor] = mColor 
  56
+  } else
  57
+  if (scale=="always") {
  58
+    if ((minx>=0) && (maxx<=1)) 
  59
+      x  = mColor*x    # doubles between [0 and 1] -> scale them
  60
+    else 
  61
+      x = (x-minx)*d   # numbers outside allowed range -> scale them
  62
+  } else
  63
+  if (scale=="smart") {
  64
+    if ((minx<0) || (maxx>mColor)) {
  65
+      x = (x-minx)*d   # numbers outside allowed range -> scale them
  66
+    } else if ((minx>=0) && (maxx<=1)) {
  67
+      if (any(x!=as.integer(x))) x = mColor*x    # doubles between [0 and 1] -> scale them
  68
+    }
  69
+  }
  70
+  maxx = max(x)
  71
+
  72
+  if (length(xx)>1) { # some non-finite numbers were found
  73
+    if (is.null(transparent)) transparent = maxx+1
  74
+    xx[ mask] = transparent
  75
+    xx[!mask] = x
  76
+    x = xx
  77
+  }
  78
+  if (is.null(transparent)) transparent = -1
  79
+  x = as.integer(round(x))
  80
+  
  81
+  #=================================
  82
+  # format color palette
  83
+  #=================================
  84
+  n = maxx+1
  85
+  if (is.character(col) && length(col)==1) {
  86
+    if (col %in% c("grey", "gray")) col = gray(0:n/n)
  87
+    if (col=="jet") 
  88
+      col = colorRampPalette(c("#00007F", "blue", "#007FFF", "cyan", "#7FFF7F", 
  89
+            "yellow", "#FF7F00", "red", "#7F0000")) # define "jet" palette
  90
+  }
  91
+  if (length(col)==1) { # if not a vector than maybe it is a palette function
  92
+    FUN = match.fun(col) # make sure it is a function not a string with function name
  93
+    col = FUN(n)
  94
+  }
  95
+  crgb  = col2rgb(col)
  96
+  Palette = as.integer(c(256^(2:0) %*% crgb)) # convert to internal int format
  97
+  nColor = length(Palette)
  98
+  if (nColor<maxx) 
  99
+    stop("write.gif: not enough colors in color palette 'col'. Has ",nColor,
  100
+         " need at least ", maxx)
  101
+  if (nColor<256) Palette = c(Palette, rep(0,256-nColor)) # pad it
  102
+  
  103
+  # format and cast other input variables into proper format
  104
+  param = as.integer(c( dm[2], dm[1], prod(dm)/(dm[1]*dm[2]), nColor, transparent, delay, interlace, 0 ))
  105
+  if (is.null(comment)) comment = as.character("")
  106
+  else comment = as.character(comment)
  107
+  # call C++ function
  108
+  .C("imwritegif", filename, x, Palette, param, comment,
  109
+     NAOK=FALSE, PACKAGE="caTools") 
  110
+  if (param[7]<0) stop("write.gif: cannot open the output file (connection)")
  111
+  invisible(NULL)
  112
+}
  113
+
  114
+#==============================================================================
  115
+
  116
+read.gif = function(filename, frame=0, flip=FALSE, verbose=FALSE)
  117
+{
  118
+  if (!is.character(filename)) stop("write.gif: 'filename' has to be a string")
  119
+  if (length(filename)>1) filename = paste(filename, collapse = "")  # combine characters into a string
  120
+  isURL = length(grep("^http://", filename)) | 
  121
+          length(grep("^ftp://",  filename)) | 
  122
+          length(grep("^file://", filename))
  123
+  if(isURL) {
  124
+    tf <- tempfile()
  125
+    download.file(filename, tf, mode='wb', quiet=TRUE)
  126
+    filename = tf
  127
+  }
  128
+
  129
+  x = .Call("imreadgif", filename, as.integer(frame), as.integer(verbose), 
  130
+       PACKAGE="caTools") 
  131
+  comt = as.character(attr(x, 'comm'))
  132
+  if (isURL) file.remove(filename)
  133
+
  134
+  nRow    = x[1]
  135
+  nCol    = x[2]
  136
+  nBand   = x[3]
  137
+  tran    = x[4]
  138
+  success = x[5]
  139
+  nPixel  = nRow*nCol*nBand
  140
+  stats = -success
  141
+  if (stats>=6)  {
  142
+    warning("write.gif: file '", filename, 
  143
+      "' contains multiple color-maps. Use 'frame' > 0.") 
  144
+    stats = stats-6
  145
+  }
  146
+  if (nPixel==0) {
  147
+    switch (stats,
  148
+    stop("write.gif: cannot open the input file: ", filename, call.=FALSE),
  149
+    stop("write.gif: input file '", filename, "' is not a GIF file", call.=FALSE),
  150
+    stop("write.gif: unexpected end of file: ", filename, call.=FALSE),
  151
+    stop("write.gif: syntax error in file: ", filename, call.=FALSE) )
  152
+  } else {
  153
+    switch (stats, , , 
  154
+    warning("write.gif: unexpected end of file: ", filename, call.=FALSE),
  155
+    warning("write.gif: syntax error in file: ", filename, call.=FALSE),
  156
+    warning("write.gif: file '", filename,
  157
+      "' contains multiple images (frames) of uneven length. Use 'frame' > 0." , call.=FALSE))
  158
+  }   
  159
+  Palette = x[ 10:265 ]
  160
+  x       = x[-(1:265)] # delete non image data
  161
+  if (nBand>1) { # 3D data cubes
  162
+    dim(x)  = c(nCol, nRow, nBand)
  163
+    if (flip) x = x[,ncol(x):1,]
  164
+    else x = aperm(x, c(2,1,3))
  165
+  } else {       # this is a matrix
  166
+    dim(x) = c(nCol, nRow)
  167
+    if (flip) x = x[,ncol(x):1]
  168
+    else x = t(x)
  169
+  }
  170
+  Palette = Palette[Palette>=0]
  171
+  red     = bitAnd(bitShiftR(Palette,16), 255)
  172
+  green   = bitAnd(bitShiftR(Palette, 8), 255)
  173
+  blue    = bitAnd(          Palette    , 255)
  174
+  Palette = rgb (red, green, blue, 255, maxColorValue = 255)
  175
+  if (tran==-1) tran = NULL
  176
+  return (list(image=x, col=Palette, transparent=tran, comment=comt))
  177
+}
  178
+
  179
+# source("c:/programs/R/rw2011/src/library/caTools/R/GIF.R")
201  R/LogitBoost.R
... ...
@@ -0,0 +1,201 @@
  1
+#===========================================================================#
  2
+# caTools - R library                                                       #
  3
+# Copyright (C) 2005 Jarek Tuszynski                                        #
  4
+# Distributed under GNU General Public License version 3                    #
  5
+#===========================================================================#
  6
+
  7
+LogitBoost = function(xlearn, ylearn, nIter=ncol(xlearn))
  8
+#   An implementation of the LogitBoost classification algorithm with
  9
+#   decision stumps as weak learners. 
  10
+#   
  11
+# Input:
  12
+#   xlearn - A dataset, whose n rows contain the training instances.
  13
+#   ylearn - Class labels of dataset
  14
+#   nIter - An integer, describing the number of iterations for
  15
+#     which boosting should be run. 
  16
+#     
  17
+# Output:
  18
+#   Stump - list of decision stumps used:
  19
+#            column 1: contains feature numbers or each stump
  20
+#            column 2: contains threshold
  21
+#            column 3: contains bigger/smaller info: 1 means (col>thresh ? class_1 : class_2);  -1 means (col>thresh ? class_2 : class_1)
  22
+#
  23
+# Writen by Jarek Tuszynski - SAIC (jaroslaw.w.tuszynski@saic.com)
  24
+# This code was addapted from logitboost.R function written by Marcel Dettling
  25
+# See "Boosting for Tumor Classification of Gene Expression Data", Dettling and Buhlmann (2002), 
  26
+#   available on the web page http://stat.ethz.ch/~dettling/boosting.html
  27
+{
  28
+  lablist = sort(unique(ylearn))     # different classes in label array
  29
+  nClass  = length(lablist)          # number of different classes in label array
  30
+  
  31
+  if (nClass>2) {                    # Multi class version uses 2-class code ...
  32
+    Stump = NULL                     # ... recursivly
  33
+    for (jClass in 1:nClass) {
  34
+      y = as.numeric(ylearn!=lablist[jClass]) # lablist[jClass]->1; rest->0
  35
+      Stump = cbind(Stump, LogitBoost(xlearn, y, nIter)$Stump)
  36
+    }
  37
+    object = list(Stump=Stump, lablist=lablist) # create LogitBoost object
  38
+    class(object) <- "LogitBoost"
  39
+    return(object)
  40
+  }
  41
+  
  42
+  Mask = is.na(xlearn)                # any NA in test data will ... 
  43
+  if (any(Mask)) xlearn[Mask] = Inf   # ... be changed to +Inf
  44
+  ylearn = as.numeric(ylearn!=lablist[1]) # change class labels to boolean 
  45
+  nLearn = nrow(xlearn)               # Length of training data         
  46
+  nFeat  = ncol(xlearn)               # number of features to choose from         
  47
+ 
  48
+  # Array Initialization
  49
+  f      = 0                          # range -1 or 1          
  50
+  p      = numeric(nLearn)+1/2        # range [0,1]
  51
+  Stump  = matrix(0, nIter,3)         # will hold the results
  52
+  colnames(Stump) = c("feature", "threshhold", "sign")
  53
+  Thresh =            matrix(0, nLearn, nFeat)  # sorted xlearn each fearure at a time
  54
+  Index  = matrix(as.integer(0), nLearn, nFeat) # order of samples in Thresh
  55
+  Mask   = matrix(as.logical(0), nLearn, nFeat) # mask of unique samples in Thresh 
  56
+  repts  = as.logical(numeric(nFeat))           # for each feature: are all samples unique
  57
+ 
  58
+  # sort all columns and store them in Thresh; Index stores original positions
  59
+  for (iFeat in 1:nFeat) {  
  60
+    x = sort(xlearn[,iFeat], index=TRUE)
  61
+    Thresh[,iFeat] = x[[1]]
  62
+    Index [,iFeat] = x[[2]]
  63
+    Mask  [,iFeat] = c((diff(Thresh[,iFeat])!=0), FALSE) 
  64
+    repts [ iFeat] = !all(Mask[,iFeat])
  65
+  }
  66
+
  67
+  # Boosting Iterations
  68
+  jFeat = 0
  69
+  for (m in 1:nIter) {
  70
+    # Computation of working response and weights
  71
+    w = pmax(p*(1-p), 1e-24) # weight for each sample
  72
+    z = (ylearn-p)/w
  73
+    w = w/sum(w)             # normalize to 1
  74
+
  75
+    # older version similar to rpart (more readable but slower) left as documentation
  76
+      # MinLS = 1000                  # initialize search for minimum Least-Square
  77
+      # for (iFeat in 1:nFeat) {      # for each feature do: for each spliting point calculate least square regresion of z to xlearn with weights w.
  78
+        # Col = xlearn[,iFeat]        # sort all columns and store each one in thr
  79
+        # thr = Thresh[,iFeat]        # sort all columns and store each one in thr
  80
+        # for (j in 1:nLearn) {       # for every possible threshold
  81
+          # ff = 2*(Col<=thr[j])-1     # for every point in the column if (col<=Thresh) then f=1 else f=-1 
  82
+          # LS1[j] = sum(w*(z-ff)^2)   # Least Square array for every possible theshold  ( f = (col<=Thresh ? 1 : -1) )
  83
+          # LS2[j] = sum(w*(z+ff)^2)   # Least Square array for every possible theshold  after swaping output classes ( f = (col<sp ? -1 : 1) )
  84
+        # }
  85
+        # iLS1 = which.min(LS1)
  86
+        # iLS2 = which.min(LS2)
  87
+        # vLS1 = LS1[iLS1]
  88
+        # vLS2 = LS2[iLS2]      
  89
+        # if (MinLS>vLS1) { stump=c(iFeat, thr[iLS1],  1); MinLS=vLS1; }
  90
+        # if (MinLS>vLS2) { stump=c(iFeat, thr[iLS2], -1); MinLS=vLS2; }
  91
+      # }
  92
+      
  93
+    # Same as code above but faster:
  94
+    # for each feature do: for each spliting point calculate least square regresion of z to xlearn with weights w.
  95
+    # This part of the code is my version of rpart object.
  96
+    # LS1 is least square array LS1 = sum((w.*(z-f)).^2) where f = (col<=sp ? 1 : -1) LS1 is calculated for all spliting points
  97
+    # LS2 is least square array LS2 = sum((w.*(z-f)).^2) where f = (col<=sp ? -1 : 1) LS2 is calculated for all spliting points
  98
+    # for each spliting point col(idx) we will take one sample and change its sign, that will change current least square:
  99
+    # LS(i) = LS(i) - ( w(idx(i)) .* ( z(idx(i)) - (-1) ) ).^2   +   ( w(idx(i)) .* ( z(idx(i)) - 1 ) ).^2 what can be simplified to:
  100
+    # LS(i) = LS(i) - 4*(w(idx(i)).^2) .*  z(idx(i))
  101
+    ls1 = sum(w*(z+1)^2)     # Least Square value for left-most theshold  ( f = -1 )
  102
+    ls2 = sum(w*(z-1)^2)     # Least Square value for left-most theshold after swaping output classes ( f =  1 )
  103
+    MinLS = max(ls1,ls2)     # initialize search for minimum Least-Square
  104
+    wz  = 4 * w * z          # precompute vector to be used later
  105
+    for (iFeat in 1:nFeat) { # for each feature do: for each spliting point calculate least square regresion of z to xlearn with weights w.
  106
+      if (iFeat==jFeat) next # Prevent the simplest cycle
  107
+      Col = Thresh[,iFeat]   # get one column of sorted data
  108
+      LS  = cumsum(wz[Index[,iFeat]])  # find offset to Least Square value for every possible theshold
  109
+      if (repts[iFeat]) {    # if any repeating thresholds than delete all but last LS value
  110
+        mask = Mask[,iFeat]  # use mask of unique samples 
  111
+        Col = Col[mask]      # delete non-unique thresholds since they can cause errors
  112
+        LS  = LS [mask]      # delete coressponding Least Square values         
  113
+      }
  114
+      iLS1 = which.max(LS)   # min of LS1=ls1-LS - Least Square value for every possible theshold  ( f = (col<=Thresh ? 1 : -1) )
  115
+      iLS2 = which.min(LS)   # min of LS2=ls2+LS - Least Square value for every possible theshold after swaping output classes ( f = (col<Thresh ? -1 : 1) )
  116
+      vLS1 = ls1-LS[iLS1]
  117
+      vLS2 = ls2+LS[iLS2]
  118
+      if (MinLS>vLS1) { stump=c(iFeat, Col[iLS1],  1); MinLS=vLS1; }
  119
+      if (MinLS>vLS2) { stump=c(iFeat, Col[iLS2], -1); MinLS=vLS2; }
  120
+    }
  121
+ 
  122
+    # =================
  123
+    # Fitting the tree
  124
+    # =================
  125
+    Stump[m,] = stump                                     # if stump[3]>0  f(i) += (xlearn(i,stump[1])<=stump[2] ? 1 : -1)
  126
+    jFeat = stump[1]
  127
+    f = f + stump[3] * (2*( xlearn[,jFeat]<=stump[2] )-1) # if stump[3]<0  f(i) += (xlearn(i,stump[1])> stump[2] ? 1 : -1)
  128
+    p = 1/(1+exp(-f))    # Updating and probabilities - range (0, 1]
  129
+    
  130
+    y = (f>0)
  131
+    y[f==0] = 0.5
  132
+    Conv = sum(abs(ylearn-y)) # keep track of error rate
  133
+  }
  134
+  object = list(Stump=Stump, lablist=lablist)  # create LogitBoost object
  135
+  class(object) <- "LogitBoost"
  136
+  return(object)
  137
+}
  138
+
  139
+
  140
+predict.LogitBoost = function(object, xtest, type = c("class", "raw"), nIter=NA, ...)
  141
+#   An implementation of the LogitBoost classification algorithm with
  142
+#   decision stumps as weak learners. 
  143
+#   
  144
+# Input:
  145
+#   xtest - A dataset, whose n rows contain samples to be classified.
  146
+#   ytest - Class labels of dataset (if given than Conv will return error rates as function of iterations)
  147
+#     
  148
+# input:
  149
+#   Stump - list of decision stumps used:
  150
+#            column 1: contains feature numbers or each stump
  151
+#            column 2: contains threshold
  152
+#            column 3: contains bigger/smaller info: 1 means (col>Thresh ? class_1 : class_2);  -1 means (col>Thresh ? class_2 : class_1)
  153
+#
  154
+# Writen by Jarek Tuszynski - SAIC (jaroslaw.w.tuszynski@saic.com)
  155
+# This code was addapted from logitboost.R function written by Marcel Dettling
  156
+# See "Boosting for Tumor Classification of Gene Expression Data", Dettling and Buhlmann (2002), 
  157
+#   available on the web page http://stat.ethz.ch/~dettling/boosting.html
  158
+{
  159
+  type    = match.arg(type)
  160
+  Stump   = object$Stump
  161
+  lablist = object$lablist
  162
+  if (is.na(nIter)) nIter = nrow(Stump)
  163
+  else nIter  =  min(nIter, nrow(Stump))
  164
+  nTest   = nrow(xtest)
  165
+  nClass  = ncol(Stump)/3
  166
+  if (nClass==1) nClass=2
  167
+  Prob    = matrix(0,nTest, nClass)
  168
+  colnames(Prob) = lablist
  169
+  if (nClass>2) {                     # multi class problem
  170
+    object$lablist = c(1,2)           # generic labels 
  171
+    for(iClass in 1:nClass) {
  172
+      object$Stump = Stump[,3*iClass + (-2:0)]
  173
+      prob = predict.LogitBoost(object, xtest, type="raw", nIter=nIter)
  174
+      Prob[,iClass] = prob[,1] # probability that "sample belongs to iClass" is true 
  175
+    }
  176
+  } else {                            # two class problem
  177
+    Mask    = is.na(xtest)            # any NA in test data will... be changed 
  178
+    if (any(Mask)) xtest[Mask] = Inf  # be changed to +Inf
  179
+    f = numeric(nTest)           
  180
+    for (iter in 1:nIter) {
  181
+      iFeat  = Stump[iter,1]
  182
+      thresh = Stump[iter,2]
  183
+      Sign   = Stump[iter,3]
  184
+      f = f + Sign*(2*(xtest[,iFeat]<=thresh)-1)
  185
+    }
  186
+    Ytest = (f>0)
  187
+    Ytest[f==0] = 0.5
  188
+    prob = 1/(1+exp(-f))
  189
+    Prob[,1] = 1-prob # probability that "sample belongs to 1" 
  190
+    Prob[,2] =   prob # probability that "sample belongs to 2" 
  191
+  }
  192
+  if (type=="raw") RET=Prob        # request to return raw probabilities
  193
+  else {                           # otherwise assign labels
  194
+    ord = t(apply(-Prob, 1, order))# find order of sorted Probs 
  195
+    RET = lablist[ord[,1]]         # find label with highest Prob
  196
+    Prob = -t(apply(-Prob,1,sort)) # sort Probs
  197
+    RET[Prob[,1]==Prob[,2]] = NA   # in case of ties return NA's
  198
+  }
  199
+  return (RET)
  200
+}
  201
+
133  R/base64.R
... ...
@@ -0,0 +1,133 @@
  1
+#===========================================================================#
  2
+# caTools - R library                                                       #
  3
+# Copyright (C) 2005 Jarek Tuszynski                                        #
  4
+# Distributed under GNU General Public License version 3                    #
  5
+#===========================================================================#
  6
+
  7
+#===============================================================================
  8
+# The Base64 encoding is designed to encode arbitrary binary information for 
  9
+# transmission by electronic mail. It is defined by MIME (Multipurpose Internet 
  10
+# Mail Extensions) specification RFC 1341, RFC 1421, RFC 2045 and others. 
  11
+# Triplets of 8-bit octets are encoded as groups of four characters, each 
  12
+# representing 6 bits of the source 24 bits. Only a 65-character subset 
  13
+# ([A-Z,a-z,0-9,+,/,=]) present in all variants of ASCII and EBCDIC is used, 
  14
+# enabling 6 bits to be represented per printable character
  15
+#===============================================================================
  16
+
  17
+base64encode = function(x, size=NA, endian=.Platform$endian)
  18
+{
  19
+   library(bitops)                # needed for bitOr and bitAnd
  20
+   if (typeof(x)!="raw") x = writeBin(x, raw(), size=size, endian=endian)
  21
+   x = as.integer(x)
  22
+   ndByte = length(x)            # number of decoded bytes
  23
+   nBlock = ceiling(ndByte / 3)  # number of blocks/groups
  24
+   neByte = 4 * nBlock           # number of encoded bytes
  25
+
  26
+   # add padding if necessary, to make the length of x a multiple of 3
  27
+   if (ndByte < 3*nBlock) x[(ndByte+1) : (3*nBlock)] = 0;
  28
+   dim(x) = c(3, nBlock)         # reshape the data
  29
+   y = matrix(as.integer(0), 4, nBlock)  # for the encoded data
  30
+   
  31
+   #-------------------------------------------
  32
+   # Split up every 3 bytes into 4 pieces
  33
+   #   x = aaaaaabb bbbbcccc ccdddddd
  34
+   # to form
  35
+   #   y = 00aaaaaa 00bbbbbb 00cccccc 00dddddd
  36
+   # than convert y to integers in 0-63 range
  37
+   # This section is based on Matlab code by Peter Acklam
  38
+   # http://home.online.no/~pjacklam/matlab/software/util/datautil/
  39
+   #-------------------------------------------
  40
+   y[1,] = bitShiftR(x[1,], 2) # 6 highest bits of x(1,:)
  41
+   y[2,] = bitOr(bitShiftL(x[1,], 4), bitShiftR(x[2,], 4))
  42
+   y[3,] = bitOr(bitShiftL(x[2,], 2), bitShiftR(x[3,], 6))
  43
+   y[4,] = x[3,]
  44
+   y = bitAnd(y, 63)           # trim numbers to lower 6-bits
  45
+   
  46
+   #----------------------------------
  47
+   # Perform the following mapping
  48
+   #   0  - 25  ->  A-Z
  49
+   #   26 - 51  ->  a-z
  50
+   #   52 - 61  ->  0-9
  51
+   #   62       ->  +
  52
+   #   63       ->  /
  53
+   #----------------------------------
  54
+   alpha = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"
  55
+   alpha = strsplit(alpha, NULL)[[1]]  # convert string to array of characters
  56
+   z = alpha[y+1]                      # rearrange characters 
  57
+
  58
+   #-------------------------
  59
+   # Add padding if necessary.
  60
+   #-------------------------
  61
+   npbytes = 3 * nBlock - ndByte   # number of padding bytes needed
  62
+   if (npbytes>0) z[(neByte-npbytes+1) : neByte] = '='  # '=' is used for padding
  63
+   z = paste(z, collapse = "")       # combine characters into a string
  64
+   return (z)
  65
+}
  66
+
  67
+#====================================================================
  68
+
  69
+base64decode = function(z, what, size=NA, signed = TRUE, endian=.Platform$endian)
  70
+{  
  71
+  library(bitops)                 # needed for bitOr and bitAnd
  72
+  if (!is.character(z)) 
  73
+    stop("base64decode: Input argument 'z' is suppose to be a string")
  74
+  if (length(z)==1) z = strsplit(z, NULL)[[1]] # convert string to array of characters
  75
+  if (length(z)%%4!=0) 
  76
+   warning("In base64decode: Length of base64 data (z) not a multiple of 4.")
  77
+  #-----------------------------------
  78
+  # Now perform the following mapping
  79
+  #   A-Z  ->  0  - 25
  80
+  #   a-z  ->  26 - 51
  81
+  #   0-9  ->  52 - 61
  82
+  #   +    ->  62
  83
+  #   /    ->  63
  84
+  #   =    ->  64  - special padding character
  85
+  #  otherwise -1
  86
+  #-----------------------------------
  87
+  alpha = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/="
  88
+  alpha = strsplit(alpha, NULL)[[1]]    # convert string to array of characters
  89
+  y     = match(z, alpha, nomatch=-1)-1 # lookup number of each character
  90
+  if (any(y == -1)) 
  91
+    stop("base64decode: Input string is not in Base64 format")
  92
+  if (any(y == 64)) y = y[y != 64]      # remove padding
  93
+  neByte = length(y);                   # number of encoded bytes
  94
+  nBlock = ceiling(neByte/4);           # number of blocks/groups
  95
+  ndByte = 3 * nBlock                   # number of decoded bytes
  96
+  
  97
+  # add padding if necessary
  98
+  if (neByte < 4*nBlock) y[(neByte+1) : (4*nBlock)] = 0;
  99
+  dim(y) = c(4, nBlock);                # shape into a matrix
  100
+  x = matrix(as.integer(0), 3, nBlock); # for the decoded data
  101
+ 
  102
+  #---------------------------------------------
  103
+  # Rearrange every 4 bytes into 3 bytes
  104
+  #    y = 00aaaaaa 00bbbbbb 00cccccc 00dddddd
  105
+  # to form
  106
+  #    x = aaaaaabb bbbbcccc ccdddddd
  107
+  # This section is based on Matlab code by Peter Acklam
  108
+  # http://home.online.no/~pjacklam/matlab/software/util/datautil/
  109
+  #---------------------------------------------
  110
+  x[1,] = bitOr(bitShiftL(y[1,], 2), bitShiftR(y[2,], 4))
  111
+  x[2,] = bitOr(bitShiftL(y[2,], 4), bitShiftR(y[3,], 2))
  112
+  x[3,] = bitOr(bitShiftL(y[3,], 6), y[4,])
  113
+  x = bitAnd(x, 255) # trim numbers to lower 8-bits
  114
+  
  115
+  # remove padding
  116
+  if (neByte %% 4 == 2) x = x[1:(ndByte-2)]
  117
+  if (neByte %% 4 == 3) x = x[1:(ndByte-1)]
  118
+  
  119
+  # perform final conversion from 'raw' to type given by 'what'
  120
+  r = as.raw(x)
  121
+  TypeList = c("logical", "integer", "double", "complex", "character", "raw", 
  122
+               "numeric", "int")
  123
+  if (!is.character(what) || length(what) != 1 || !(what %in% TypeList)) 
  124
+    what <- typeof(what)
  125
+  if (what=="raw") return(r)
  126
+  if (is.na(size)) size = switch(match(what, TypeList), 4, 4, 8, 16, 2, 1, 8, 4) 
  127
+  n = length(r)
  128
+  if (n%%size) stop("raw2bin: number of elements in 'r' is not multiple of 'size'")
  129
+  x = readBin(r, what, n = n%/%size, size=size, signed=signed, endian=endian)
  130
+  if (what=="character")  x = paste(x, collapse = "") # convert arrays of characters to strings
  131
+  return (x)
  132
+}
  133
+
115  R/colAUC.R
... ...
@@ -0,0 +1,115 @@
  1
+#===========================================================================#
  2
+# caTools - R library                                                       #
  3
+# Copyright (C) 2005 Jarek Tuszynski                                        #
  4
+# Distributed under GNU General Public License version 3                    #
  5
+#===========================================================================#
  6
+
  7
+colAUC = function (X, y, plotROC=FALSE, alg=c("Wilcoxon","ROC"))
  8
+{   
  9
+  # input:
  10
+  #   X - 2D matrix (of feature columns and samples rows)
  11
+  #   y - 1D array identifying which class each sample belongs to (csource(lass numbers start from 1)
  12
+  
  13
+  #=======================================
  14
+  # make sure inputs are in correct format
  15
+  #=======================================
  16
+  y    = as.factor(y)
  17
+  X    = as.matrix(X)               # make sure inputs are in correct format
  18
+  alg  = match.arg(alg)
  19
+  if (nrow(X)==1) X = t(X)
  20
+  if (plotROC) alg = "ROC"
  21
+  
  22
+  #=======================================
  23
+  # Prepare for calculations & error-check
  24
+  #=======================================
  25
+  nR   = nrow(X)                    # number of samples
  26
+  nC   = ncol(X)                    # get dimentions of the data set
  27
+  nY   = table(y)                   # number of elements per label
  28
+  uL   = as.factor(rownames(nY))    # find all the classes among the labels 
  29
+  nL   = length(nY)                 # number of unique classes
  30
+  if (nL<=1) 
  31
+    stop("colAUC: List of labels 'y' have to contain at least 2 class labels.")
  32
+  if (!is.numeric(X)) stop("colAUC: 'X' must be numeric")
  33
+  if (nR!=length(y))  stop("colAUC: length(y) and nrow(X) must be the same")
  34
+  L    = matrix(rep(uL,each=nR),nR,nL)  # store vector L as row vector and copy it into nR rows
  35
+  per  = combs(1:nL,2)                  # find all possible pairs of L columns
  36
+  nP   = nrow(per)                      # how many possible pairs were found?
  37
+  Auc  = matrix(0.5,nP,nC)              # initialize array to store results
  38
+  rownames(Auc) = paste(uL[per[,1]]," vs. ",uL[per[,2]], sep="")
  39
+  colnames(Auc) = colnames(X)
  40
+  
  41
+  #=======================================
  42
+  # prepare the plot, if needed
  43
+  #=======================================
  44
+  if (plotROC) {                        # initialize the plot
  45
+    plot(c(0,1), c(0,1), type='n', xaxs="i", yaxs="i",
  46
+     xlab="probability of false alarm", sub="(1-Specificity)", 
  47
+     ylab="probability of detection\n(Sensitivity)")
  48
+    title("ROC Curves")
  49
+    abline(h=0:10/10, v=0:10/10, col = "lightgray") # grid on
  50
+    if (nC*nP<20) { # if too many curves than skip the labels
  51
+      S = colnames(Auc)
  52
+      if (is.null(S)) S=paste('col',1:nC);
  53
+      if (nP>1) S = paste(rep(S,each=nP), "[", rownames(Auc), "]")
  54
+      legend("bottomright", S,  col=1:(nC*nP),  lty=1, lwd=1, pch=20, 
  55
+           merge=TRUE, inset=0.01, bg="white")
  56
+    }
  57
+    nClr = 1
  58
+  }
  59
+  
  60
+  #=============================================
  61
+  # Calculate AUC by using Wilcox test algorithm
  62
+  #=============================================
  63
+  if(alg=='Wilcoxon') {
  64
+    idxL = vector(mode="list", length=nL) 
  65
+    for (i in 1:nL) idxL[[i]] = which(y==uL[i])
  66
+    for (j in 1:nC) {                   # for each column representing a feature
  67
+      for (i in 1:nP) {                 # go through all permutations of columns in d
  68
+        c1 = per[i,1]                   # and identify 2 classes to be compared
  69
+        c2 = per[i,2]
  70
+        n1 = nY[c1]        
  71
+        n2 = nY[c2]
  72
+        if (n1>0 & n2>0) {
  73
+          r = rank(c(X[idxL[[c1]],j], X[idxL[[c2]],j]))
  74
+          Auc[i,j] = (sum(r[1:n1]) - n1*(n1+1)/2) / (n1*n2)
  75
+        }
  76
+      } # end of 'for i' loop
  77
+    } # end of 'for j' loop
  78
+  } 
  79
+  
  80
+  #==============================================
  81
+  # Calculate AUC by using integrating ROC curves
  82
+  #==============================================
  83
+  if(alg=='ROC') {                        # use 'ROC' algorithm
  84
+    for (j in 1:nC) {                     # for each column representing a feature
  85
+      x = sort(X[, j], index=TRUE)        # sort all columns and store each one in x[[1]]. x[[2]] stores original positions
  86
+      nunq = which(diff(x$x)==0)          # find non-unique A's in column j (if vector is [1 1] nunq=1
  87
+      nTies = length(nunq)                # number of non-unique values
  88
+      if (nTies<nR-1) {                   # make sure all numbers in A column are not the same
  89
+        idx = y[x$ix]                     # reorder label vector in the same order as b, or associate label with each number in b
  90
+        # assign column for each label (class) and for each point add 1 in the column corresponding to its class
  91
+        d = ( matrix(rep(idx,nL),nR,nL) == L ) 
  92
+        for (i in 1:nL) d[,i] = cumsum(d[,i])  # cumulative sum of d columns
  93
+        if (nTies) d = d[-nunq, ]         # remove non unique rows if any
  94
+        d = rbind( matrix(0,1,nL), d )    # append row of zeros at the beggining
  95
+        nD = nrow(d)
  96
+        # assume that col#1 ploted on x axis is correct clasification and col#2 (y) is false find AUC
  97
+        for (i in 1:nP) {                 # go through all permutations of columns in d
  98
+          c1 = per[i,1]                   # and identify 2 classes to be compared
  99
+          c2 = per[i,2]
  100
+          n  = d[nD,c1]*d[nD,c2]          # normalize area to 1 at the maximum
  101
+          if (n>0) Auc[i,j] = trapz(d[,c1], d[,c2])/n  # Trapezoidal numerical integration
  102
+          if (plotROC) {                  # plot the results
  103
+            xx = if(n>0) d[,c1]/d[nD,c1] else c(0,1)         
  104
+            yy = if(n>0) d[,c2]/d[nD,c2] else c(0,1)
  105
+            if (2*Auc[i,j]<1) { xx=1-xx; yy=1-yy; } # if auc<0.5 than mirror it to the other side of 0.5
  106
+            lines(xx, yy, col=nClr, type='o', pch=20)
  107
+            nClr = nClr+1                 # next color
  108
+          }
  109
+        } # end of 'for i' loop
  110
+      }
  111
+    } # end of 'for j' loop
  112
+  }
  113
+  Auc = pmax(Auc, 1-Auc) # if any auc<0.5 than mirror it to the other side of 0.5
  114
+  return (Auc)
  115
+}
31  R/combs.R
... ...
@@ -0,0 +1,31 @@
  1
+#===========================================================================#
  2
+# caTools - R library                                                       #
  3
+# Copyright (C) 2005 Jarek Tuszynski                                        #
  4
+# Distributed under GNU General Public License version 3                    #
  5
+#===========================================================================#
  6
+
  7
+combs = function(v,k) {
  8
+# combs(V,K) - finds all unordered combinations of K elements from vector V 
  9
+#  V is a vector of length N
  10
+#  K is a integer 
  11
+# combs(V,K) creates a matrix with N!/((N-K)! K!) rows
  12
+# and K columns containing all possible combinations of N elements taken K at a time.
  13
+# example: combs(1:3,2) returns matrix with following rows (1 2), (1 3), (2 3)
  14
+  n = length(v)
  15
+  if      (n==k  ) P = matrix(v,1,n)
  16
+  else if (k==1  ) P = matrix(v,n,1)
  17
+  else if (k==n-1) P = matrix( rep(v, each=n-1), n, n-1)
  18
+  else if (k< n) {
  19
+    P = matrix(0,0,k)
  20
+    if (k < n & k > 1) {
  21
+      for (i in 1:(n-k+1)) {
  22
+        Q = combs(v[(i+1):n],k-1)
  23
+        j = nrow(Q)
  24
+        P = rbind(P, cbind(rep(v[i],j), Q))
  25
+      }
  26
+    }
  27
+  } else 
  28
+    stop("combs: number m has to be smaller or equal to length of vector v")
  29
+  return(P)
  30
+}
  31
+
275  R/runfunc.R
... ...
@@ -0,0 +1,275 @@
  1
+#===========================================================================#
  2
+# caTools - R library                                                       #
  3
+# Copyright (C) 2005 Jarek Tuszynski                                        #
  4
+# Distributed under GNU General Public License version 3                    #
  5
+#===========================================================================#
  6
+#source('C:/Projects/R Packages/caTools/R/runfunc.R')
  7
+
  8
+runmean = function(x, k, alg=c("C", "R", "fast", "exact"), 
  9
+                   endrule=c("mean", "NA", "trim", "keep", "constant", "func"),
  10
+                   align = c("center", "left", "right"))
  11
+{
  12
+  alg     = match.arg(alg)
  13
+  endrule = match.arg(endrule)
  14
+  align   = match.arg(align)
  15
+  dimx = dim(x) # Capture dimension of input array - to be used for formating y
  16
+  x = as.vector(x) 
  17
+  n = length(x)
  18
+  if (k<=1) return (x)
  19
+  if (k >n) k = n
  20
+  k2 = k%/%2
  21
+  y=double(n)
  22
+   
  23
+  if (alg=="exact") {
  24
+    .C("runmean_exact", x, y , as.integer(n), as.integer(k), 
  25
+       NAOK=TRUE, DUP=FALSE, PACKAGE="caTools") 
  26
+  } else if (alg=="C") {
  27
+    .C("runmean", as.double(x), y , as.integer(n), as.integer(k), 
  28
+       NAOK=TRUE, DUP=FALSE, PACKAGE="caTools") 
  29
+  } else if (alg=="fast") {
  30
+    .C("runmean_lite", as.double(x), y , as.integer(n), as.integer(k), 
  31
+       NAOK=TRUE, DUP=FALSE, PACKAGE="caTools") 
  32
+  } else {     # the similar algorithm implemented in R language
  33
+    k1 = k-k2-1
  34
+    y = c( sum(x[1:k]), diff(x,k) ); # find the first sum and the differences from it
  35
+    y = cumsum(y)/k                  # apply precomputed differences 
  36
+    y = c(rep(0,k1), y, rep(0,k2))   # make y the same length as x
  37
+    if (endrule=="mean") endrule="func"
  38
+  }
  39
+  y = EndRule(x, y, k, dimx, endrule, align, mean, na.rm=TRUE)
  40
+  return(y)
  41
+}
  42
+
  43
+#==============================================================================
  44
+
  45
+runmin = function(x, k, alg=c("C", "R"), 
  46
+                  endrule=c("min", "NA", "trim", "keep", "constant", "func"),
  47
+                  align = c("center", "left", "right"))
  48
+{
  49
+  alg = match.arg(alg)
  50
+  align   = match.arg(align)
  51
+  endrule = match.arg(endrule)
  52
+  dimx = dim(x)  # Capture dimension of input array - to be used for formating y
  53
+  x = as.vector(x) 
  54
+  n = length(x)
  55
+  if (k<=1) return (x)
  56
+  if (k >n) k = n
  57
+  y = double(n)
  58
+  
  59
+  if (alg=="C") {
  60
+    .C("runmin", as.double(x) ,y , as.integer(n), as.integer(k), 
  61
+       NAOK=TRUE, DUP=FALSE, PACKAGE="caTools")
  62
+  } else { # the similar algorithm implemented in R language
  63
+    k2 = k%/%2
  64
+    k1 = k-k2-1
  65
+    a <- y[k1+1] <- min(x[1:k], na.rm=TRUE)
  66
+    if (k!=n) for (i in (2+k1):(n-k2)) {
  67
+      if (a==y[i-1]) # point leaving the window was the min, so ...
  68
+        y[i] = min(x[(i-k1):(i+k2)], na.rm=TRUE) # recalculate min of the window 
  69
+      else           # min=y[i-1] is still inside the window
  70
+        y[i] = min(y[i-1], x[i+k2 ], na.rm=TRUE) # compare it with the new point 
  71
+      a = x[i-k1]    # point that will be removed from the window next
  72
+      if (!is.finite(a)) a=y[i-1]+1 # this will force the 'else' option
  73
+    }
  74
+    if (endrule=="min") endrule="func"
  75
+  }
  76
+  y = EndRule(x, y, k, dimx, endrule, align, min, na.rm=TRUE)
  77
+  return(y)
  78
+}
  79
+
  80
+#==============================================================================
  81
+
  82
+runmax = function(x, k, alg=c("C", "R"), 
  83
+                  endrule=c("max", "NA", "trim", "keep", "constant", "func"),
  84
+                  align = c("center", "left", "right"))
  85
+{
  86
+  alg     = match.arg(alg)
  87
+  endrule = match.arg(endrule)
  88
+  align   = match.arg(align)
  89
+  dimx = dim(x) # Capture dimension of input array - to be used for formating y
  90
+  x = as.vector(x) 
  91
+  n = length(x)
  92
+  k = as.integer(k)
  93
+  if (k<=1) return (x)
  94
+  if (k >n) k = n
  95
+  y = double(n)
  96
+
  97
+  if (alg=="C") {
  98
+    .C("runmax", as.double(x) ,y , as.integer(n), as.integer(k), 
  99
+       NAOK=TRUE, DUP=FALSE, PACKAGE="caTools")
  100
+  } else { # the same algorithm implemented in R language
  101
+    k2 = k%/%2
  102
+    k1 = k-k2-1
  103
+    a <- y[k1+1] <- max(x[1:k], na.rm=TRUE)
  104
+    if (k!=n) for (i in (2+k1):(n-k2)) {
  105
+      if (a==y[i-1]) # point leaving the window was the max, so ...
  106
+        y[i] = max(x[(i-k1):(i+k2)], na.rm=TRUE) # recalculate max of the window 
  107
+      else           # max=y[i-1] is still inside the window
  108
+        y[i] = max(y[i-1], x[i+k2 ], na.rm=TRUE) # compare it with the new point 
  109
+      a = x[i-k1]    # point that will be removed from the window next
  110
+      if (!is.finite(a)) a=y[i-1]+1 # this will force the 'else' option
  111
+    }
  112
+    if (endrule=="max") endrule="func"
  113
+  } 
  114
+  y = EndRule(x, y, k, dimx, endrule, align, max, na.rm=TRUE)
  115
+  return(y)
  116
+}
  117
+
  118
+#==============================================================================
  119
+
  120
+runquantile = function(x, k, probs, type=7,
  121
+                endrule=c("quantile", "NA", "trim", "keep", "constant", "func"),
  122
+                align = c("center", "left", "right"))
  123
+{ ## see http://mathworld.wolfram.com/Quantile.html for very clear definition
  124
+  ## of different quantile types
  125
+  endrule = match.arg(endrule)
  126
+  align   = match.arg(align)
  127
+  dimx = dim(x) # Capture dimension of input array - to be used for formating y
  128
+  yIsVec = is.null(dimx) # original x was a vector 
  129
+  x    = as.vector(x) 
  130
+  n    = length(x)
  131
+  np   = length(probs) 
  132
+  k    = as.integer(k)
  133
+  type = as.integer(type)
  134
+  if (k<=1) return (rep(x,n,np))
  135
+  if (k >n) k = n
  136
+  if (is.na(type) || (type < 1 | type > 9)) 
  137
+    warning("'type' outside allowed range [1,9]; changing 'type' to ", type<-7)
  138
+  
  139
+  y = double(n*np)
  140
+  .C("runquantile", as.double(x) ,y , as.integer(n), as.integer(k), 
  141
+       as.double(probs), as.integer(np),as.integer(type), 
  142
+       NAOK=TRUE, DUP=FALSE, PACKAGE="caTools")
  143
+  dim(y) =  c(n,np) 
  144
+
  145
+  for (i in 1:np) {   # for each percentile
  146
+    yTmp = EndRule(x, y[,i], k, dimx, endrule, align, quantile, probs=probs[i], type=type, na.rm=TRUE)
  147
+    if (i==1) {
  148
+      if (is.null(dimx)) dimy = length(yTmp) else dimy = dim(yTmp)  
  149
+      yy = matrix(0,length(yTmp),np)   # initialize output array
  150
+    }
  151
+    yy[,i] = as.vector(yTmp)    
  152
+  }
  153
+  if (np>1) dim(yy) = c(dimy,np) else dim(yy) = dimy
  154
+  return(yy)
  155
+}
  156
+
  157
+#==============================================================================
  158
+
  159
+runmad = function(x, k, center = runmed(x,k), constant = 1.4826,
  160
+                  endrule=c("mad", "NA", "trim", "keep", "constant", "func"),
  161
+                  align = c("center", "left", "right"))
  162
+{
  163
+  endrule = match.arg(endrule)
  164
+  align   = match.arg(align)
  165
+  dimx = dim(x) # Capture dimension of input array - to be used for formating y
  166
+  x = as.vector(x) 
  167
+  n = length(x)
  168
+  if (k<3) stop("'k' must be larger than 2")
  169
+  if (k>n) k = n
  170
+  y = double(n)
  171
+  .C("runmad", as.double(x), as.double(center), y, as.integer(n), 
  172
+       as.integer(k), NAOK=TRUE, DUP=FALSE, PACKAGE="caTools")
  173
+  y = EndRule(x, y, k, dimx, endrule, align, mad, constant=1, na.rm=TRUE)
  174
+  return(constant*y)
  175
+}
  176
+
  177
+#==============================================================================
  178
+
  179
+runsd = function(x, k, center = runmean(x,k), 
  180
+                 endrule=c("sd", "NA", "trim", "keep", "constant", "func"),
  181
+                 align = c("center", "left", "right"))
  182
+{
  183
+  endrule = match.arg(endrule)
  184
+  align   = match.arg(align)
  185
+  dimx = dim(x) # Capture dimension of input array - to be used for formating y
  186
+  x = as.vector(x) 
  187
+  n = length(x)
  188
+  if (k<3) stop("'k' must be larger than 2")
  189
+  if (k>n) k = n
  190
+  y = double(n)
  191
+  .C("runsd", as.double(x), as.double(center), y, as.integer(n), 
  192
+       as.integer(k), NAOK=TRUE, DUP=FALSE, PACKAGE="caTools")
  193
+  y = EndRule(x, y, k, dimx, endrule, align, sd, na.rm=TRUE)
  194
+  return(y)
  195
+}
  196
+
  197
+#==============================================================================
  198
+
  199
+EndRule = function(x, y, k, dimx,
  200
+             endrule=c("NA", "trim", "keep", "constant", "func"), 
  201
+             align = c("center", "left", "right"), Func, ...)
  202
+{
  203
+  # Function which postprocess results of running windows functions and cast 
  204
+  # them in to specified format. On input y is equivalent to
  205
+  #   y = runFUNC(as.vector(x), k, endrule="func", align="center")
  206
+  
  207
+  # === Step 1: inspects inputs and unify format ===
  208
+  align   = match.arg(align)
  209
+  k = as.integer(k)
  210
+  k2 = k%/%2
  211
+  if (k2<1) k2 = 1
  212
+  yIsVec = is.null(dimx) # original x was a vector -> returned y will be a vector
  213
+  if (yIsVec) dimx=c(length(y),1) # x & y will become 2D arrays
  214
+  dim(x) <- dimx
  215
+  dim(y) <- dimx
  216
+  n = nrow(x)
  217
+  m = ncol(x)
  218
+  if (k>n) k2 = (n-1)%/%2
  219
+  k1 = k-k2-1
  220
+  if (align=="center" && k==2) align='right'
  221
+
  222
+  # === Step 2: Apply different endrules ===
  223
+  if (endrule=="trim") {
  224
+    y = y[(k1+1):(n-k2),] # change y dimensions
  225
+  } else if (align=="center") {
  226
+    idx1 = 1:k1
  227
+    idx2 = (n-k2+1):n
  228
+    # endrule calculation in R will be skipped for most common case when endrule 
  229
+    # is default and array was a vector not a matrix
  230
+    if (endrule=="NA") {
  231
+      y[idx1,] = NA
  232
+      y[idx2,] = NA
  233
+    } else if (endrule=="keep") {
  234
+      y[idx1,] = x[idx1,]
  235
+      y[idx2,] = x[idx2,]
  236
+    } else if (endrule=="constant") {
  237
+      y[idx1,] = y[k1+1+integer(m),]
  238
+      y[idx2,] = y[n-k2+integer(m),]
  239
+    } else if (endrule=="func" || !yIsVec) {
  240
+      for (j in 1:m) {
  241
+        for (i in idx1) y[i,j] = Func(x[1:(i+k2),j], ...)
  242
+        for (i in idx2) y[i,j] = Func(x[(i-k1):n,j], ...)
  243
+      }
  244
+    }
  245
+  } else if (align=="left") {
  246
+    y[1:(n-k1),] = y[(k1+1):n,]
  247
+    idx = (n-k+2):n
  248
+    if (endrule=="NA") {
  249
+      y[idx,] = NA
  250
+    } else if (endrule=="keep") {
  251
+      y[idx,] = x[idx,]
  252
+    } else if (endrule=="constant") {
  253
+      y[idx,] = y[n-k+integer(m)+1,]
  254
+    } else {
  255
+      for (j in 1:m) for (i in idx) y[i,j] = Func(x[i:n,j], ...)
  256
+    }
  257
+  } else if (align=="right") {
  258
+    y[(k2+1):n,] = y[1:(n-k2),]
  259
+    idx = 1:(k-1)
  260
+    if (endrule=="NA") {
  261
+      y[idx,] = NA
  262
+    } else if (endrule=="keep") {
  263
+      y[idx,] = x[idx,]
  264
+    } else if (endrule=="constant") {
  265
+      y[idx,] = y[k+integer(m),]
  266
+    } else {
  267
+      for (j in 1:m) for (i in idx) y[i,j] = Func(x[1:i,j], ...)
  268
+    }
  269
+  } 
  270
+  
  271
+  # === Step 4: final casting and return results ===
  272
+  if (yIsVec) y = as.vector(y);
  273
+  return(y)
  274
+}
  275
+
50  R/sample.split.R
... ...
@@ -0,0 +1,50 @@
  1
+#===========================================================================#
  2
+# caTools - R library                                                       #
  3
+# Copyright (C) 2005 Jarek Tuszynski                                        #
  4
+# Distributed under GNU General Public License version 3                    #
  5
+#===========================================================================#
  6
+
  7
+sample.split = function( Y, SplitRatio = 2/3, group = NULL ) 
  8
+{ # Split data from vector Y into 2 bins in predefined ratio while preserving relative retios of
  9
+  # different labels in Y.
  10
+  # if (0<=SplitRatio<1) then "SplitRatio" fraction of points from Y will go to bin 1
  11
+  # if (1<=SplitRatio)   then "SplitRatio" number   of points from Y will go to bin 1
  12
+  # Returns logical vector of the same length as Y marking points to be added to bin 1
  13
+  nSamp  = length(Y)                  # number of sample labels
  14
+  nGroup = length(group)