# Today's Topic

## 1) R, its pros and cons
## 2) R coding practices and time
## 3) Apply functions in R
## 4) Parallel Processing in R

1. The Pros  
1) The large, vibrant R community. (with a strong base in academmia)
2) Visualization packages (ggvis, ggplot2, googlevis, rcharts)  
3) Easy to start learning even with little programming background 
4) Ideal for individual servers and data analysis

2. The Cons (although rather misunderstood)  
1) Computationally limited
2) Steep Learning Curve


# Ways of Getting Around the Cons 

## 1. Code better!  
a) Avoid loops  
b) Avoid populating arguments in functions that don't require populating  
c) Make vectorisation friendly   
   
## 2. Use R-spinoffs if speed is your priority
a) PQR (pretty quick R)    
b) RENJIN  
   
## 3. Use alongside other languages
a) MATLAB is extremely fast in terms of matrix calculations  
b) Python is useful as well  
c) JUPYTER is a nice way to use multiple languages  

## 4. Use parallel processing features on R
a) R uses one core by default  
b) If "embarassingly parallel" always strive to use multithreading  
c) specific packages "Parallel"  

## 5. Get better hardware
a) Larger memory ( since R requires storage in physical memory)  
b) Better processor


## Performance is very important in coding, especially when dealing with large data


In [52]:
library(parallel)
# Example of a simple numerical computation done on a matrix
create <- function(x){
  x <- vector(mode = "numeric", length = 100000)
  x <- c(1:100000)
  x <- cbind(x,x+1,x+2,x+3,x+4,x+5,x+6,x+7,x+8,x+9)
  return(x)
}

head(create(x))

x,Unnamed: 1,Unnamed: 2,Unnamed: 3,Unnamed: 4,Unnamed: 5,Unnamed: 6,Unnamed: 7,Unnamed: 8,Unnamed: 9
1,2,3,4,5,6,7,8,9,10
2,3,4,5,6,7,8,9,10,11
3,4,5,6,7,8,9,10,11,12
4,5,6,7,8,9,10,11,12,13
5,6,7,8,9,10,11,12,13,14
6,7,8,9,10,11,12,13,14,15


In [53]:
# Now let's see the performance of a simple for loop in R!
# To traverse across all elements in a matrix, must have a nested for-loop

x <- create(x)
temp <- proc.time()
for(i in 1:dim(x)[1]){
  for(j in 1:dim(x)[2]){
    x[i,j] <- x[i,j]^10
    }
}
proc.time()-temp
rm(x)
rm(i,j)





   user  system elapsed 
   2.94    0.00    3.13 

In [54]:
#Let's look at the apply functions. They basically do the same task, but what happens to the code?

#apply function. (by column)
x <- create(x)

temp <- proc.time()
x <- apply(x, 2, FUN = function(x){x^10})
proc.time() - temp
rm(x,temp)

#apply function (by row)
x <- create(x)

temp <- proc.time()
x <- apply(x, 1, FUN = function(x){x^10})
proc.time() - temp
rm(x,temp)

   user  system elapsed 
   0.08    0.00    0.08 

   user  system elapsed 
   0.63    0.02    0.65 

In [57]:
#sapply, lapply, the children of the apply family

#sapply
x <- create(x)
temp <- proc.time()
x <- sapply(x, FUN = function(x){x^10})
proc.time() - temp
rm(x, temp)

#lapply
x <- create(x)
temp <- proc.time()
x <- lapply(x, FUN = function(x){x^10})
proc.time() - temp
rm(x, temp)

   user  system elapsed 
   1.56    0.00    1.56 

   user  system elapsed 
   1.69    0.00    1.89 

In [58]:
#Parallel processing (advanced)
x <- create(x)

n_core <- detectCores() -1
cl <- makeCluster(n_core)
temp <- proc.time()
x <- parApply(cl = cl,x,2, FUN = function(x){x^10})
proc.time()-temp
stopCluster(cl)
rm(temp,cl,n_core)

   user  system elapsed 
   0.17    0.08    0.27 