## Chapter 1 Introduction to R

### 1. 1 What is ${\textsf R}$ ?
    
- go to the official [home page](http://www.R-project.org)
- ${\textsf R}$ consists of two parts
    - base system : the core ${\textsf R}$ language and associated fundamental libraries
    - package : more specialized applications contributed by advanced ${\textsf R}$ user experts  in their
        fields       

### 1.3 ${\textsf R}$ for Clinical Trials

#### ${\textsf R}$ : Regulatory Compliance and Validation Issues
- a [guidance document](http://www.r-project.org/doc/R-FDA.pdf) for the use of ${\textsf R}$ in regulated clinical trial environments
   - the use of R for human clinical trials conducted by the pharmaceutical industry in compliance with regulations of the  the United States Food and Drug Administration (FDA) and the Iternational Conference on Harmonisation (ICH) of Technical Requirements for Registration of Pharmaceuticals in Human Use.

#### CRAN Task View  
- http://cran.r-project.org/web/views/ClinicalTrials.html : specific packages for design, monitoring and analysis of data from clinical trials : 
       
### 1.4 A Simple Simulated Clinical Trial

a simulated simple two-arm clinical trial to compare a new drug to placebo on reducing diastolic blood pressure in hypertensive adult men
- $n=100$ for both groups,  *drug* vs. *placebo*
- variables
 - age : an important risk factor linked to blood pressure
 - baseline diastolic blood pressure just before randomization
 - blood pressure measured at the end of the trial
   

#### 1.4.1 Data Simulation

**${\textsf R}$ Functions**
- density function, cumulative distribution function, quantile function, random number generation for `normal,` `Poisson`, `binomial`, `t`,`Beta`, etc

    - dnorm(x, mean=0, sd=1, log=F)                 # x : vector of quantiles
    - pnorm(q, mean=0, sd=1, lower.tail=T, log.p=F) # q : vector of quatiles
    - qnorm(p, mean=0, sd=1, lower.tail=T, log.p=F) # p : vector of probabilities
    - rnorm(n, mean=0, sd=1)                        # n :  number of observations

**Data Generation and Manipulation**
Simulate $n$
- baseline diastolic blood pressure `bp.base` $\sim N(\mu=100, sd=10)$ (mmHg)
- `age` $\sim N(age.mu=50, age.sd=10)$ (year)
- diastolic blood pressure decreased by the new drug $mu.d=20$

In [None]:
# simulated input values
n      = 100
mu     = 100
sd     = 10
mu.d   = 20
age.mu = 50
age.sd = 10

- $n$=100 **placebo** participants with 
    - `age`, 
    - `bp.base` (baseline blood pressure), 
    - `bp.end` (endpoint blood pressure) 
    - `bp.diff=bp.end-bp.base` (change in blood pressure from baseline to endpoint)

In [None]:
set.seed(123)                     # fix the seed for random number generation 

# use "rnorm" to generate random normal
age         = rnorm(n, age.mu, age.sd)
bp.base     = rnorm(n,mu,sd)
bp.end      = rnorm(n,mu,sd)

bp.diff     = bp.end-bp.base    # take the difference between endpoint and baseline

dat4placebo = round(cbind(age,bp.base,bp.end,bp.diff))  # put the data together using "cbind" to column-bind

In [None]:
head(dat4placebo)

- $n$=100 **new drug** participants
    - `mean` for the `bp.end`=`mu`-`mu.d` 

In [None]:
age      = rnorm(n, age.mu, age.sd)
bp.base  = rnorm(n,mu,sd)
bp.end   = rnorm(n,mu-mu.d,sd)
bp.diff  = bp.end-bp.base
dat4drug = round(cbind(age,bp.base,bp.end,bp.diff))

- make a dataframe to hold all data
- make `trt` as a factor for treatment.

In [None]:
dat     = data.frame(rbind(dat4placebo,dat4drug))
dat[c(1:4, 101:104),]

In [None]:
dat$trt = as.factor(rep(c("Placebo", "Drug"), each=n))

In [None]:
# check the data dimension
dim(dat)
# print the first 6 obervations to see the variable names
head(dat)

**Basic ${\textsf R}$ Graphic**
- `boxplot`

In [None]:
boxplot(dat4placebo, las=1, main="Placebo")

In [None]:
boxplot(dat4drug, las=1, main="Drug")

- `xyplot` in `lattice` library

In [None]:
library(lattice)

In [None]:
# call xyplot function and print it
print(xyplot(bp.diff~age|trt, data=dat,
            xlab="Age", strip=strip.custom(bg="white"), 
            ylab="Blood Pressure Difference",lwd=3,cex=1.3,pch=20,type=c("p", "r")))

**Data Analysis**
- model 
$$
    y=\beta_0+\beta_1\times trt+\beta_2\times age+\beta_3\times age\times trt +\epsilon,\quad \epsilon\stackrel{iid}{\sim} N(0,\sigma^2)
$$

In [None]:
lm1 = lm(bp.diff~trt*age, data=dat)
summary(lm1)

- `xtable` in `xtable` library

In [None]:
library(xtable)

In [None]:
print(xtable(lm1, caption="ANOVA Table for Simulated  Clinical Trial Data", label = "tab4RI.coef"),
        table.placement = "htbp",caption.placement = "top")

In [None]:
layout(matrix(1:4, nrow=2))
plot(lm1)
