Advanced topics in R<!--NAVIGATION-->
< [Biological Computing in R &ndash; I](06-R_I.ipynb) | [Main Contents](Index.ipynb) | [Biological Computing in Python &ndash; II](08-Python_II.ipynb)>

# Biological Computing in R &ndash; II <span class="tocSkip"><a name="chap:R_II"></a>

<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Vectorization" data-toc-modified-id="Vectorization-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Vectorization</a></span><ul class="toc-item"><li><span><a href="#The-*apply-family-of-functions" data-toc-modified-id="The-*apply-family-of-functions-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>The <code>*apply</code> family of functions</a></span><ul class="toc-item"><li><span><a href="#The-tapply-function" data-toc-modified-id="The-tapply-function-1.1.1"><span class="toc-item-num">1.1.1&nbsp;&nbsp;</span>The <code>tapply</code> function</a></span></li></ul></li><li><span><a href="#Using-by" data-toc-modified-id="Using-by-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Using <code>by</code></a></span></li><li><span><a href="#Using-plyr-and-ddply" data-toc-modified-id="Using-plyr-and-ddply-1.3"><span class="toc-item-num">1.3&nbsp;&nbsp;</span>Using <code>plyr</code> and <code>ddply</code></a></span></li></ul></li><li><span><a href="#Some-more-control-flow-tools" data-toc-modified-id="Some-more-control-flow-tools-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Some more control flow tools</a></span><ul class="toc-item"><li><span><a href="#breaking-out-of-loops" data-toc-modified-id="breaking-out-of-loops-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span><code>breaking out of loops</code></a></span></li><li><span><a href="#Using-next" data-toc-modified-id="Using-next-2.2"><span class="toc-item-num">2.2&nbsp;&nbsp;</span>Using <code>next</code></a></span></li></ul></li><li><span><a href="#Practicals" data-toc-modified-id="Practicals-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Practicals</a></span></li><li><span><a href="#Generating-Random-Numbers" data-toc-modified-id="Generating-Random-Numbers-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Generating Random Numbers</a></span><ul class="toc-item"><li><span><a href="#&quot;Seeding&quot;-random-number-generators" data-toc-modified-id="&quot;Seeding&quot;-random-number-generators-4.1"><span class="toc-item-num">4.1&nbsp;&nbsp;</span>"Seeding" random number generators</a></span></li></ul></li><li><span><a href="#Errors-and-Debugging" data-toc-modified-id="Errors-and-Debugging-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Errors and Debugging</a></span><ul class="toc-item"><li><span><a href="#&quot;Catching&quot;-errors" data-toc-modified-id="&quot;Catching&quot;-errors-5.1"><span class="toc-item-num">5.1&nbsp;&nbsp;</span>"Catching" errors</a></span></li><li><span><a href="#Debugging" data-toc-modified-id="Debugging-5.2"><span class="toc-item-num">5.2&nbsp;&nbsp;</span>Debugging</a></span></li></ul></li><li><span><a href="#Building-your-own-R-packages" data-toc-modified-id="Building-your-own-R-packages-6"><span class="toc-item-num">6&nbsp;&nbsp;</span>Building your own R packages</a></span></li><li><span><a href="#Sweave-and-knitr" data-toc-modified-id="Sweave-and-knitr-7"><span class="toc-item-num">7&nbsp;&nbsp;</span>Sweave and knitr</a></span></li><li><span><a href="#Practicals" data-toc-modified-id="Practicals-8"><span class="toc-item-num">8&nbsp;&nbsp;</span>Practicals</a></span></li><li><span><a href="#R-Module-Wrap-up" data-toc-modified-id="R-Module-Wrap-up-9"><span class="toc-item-num">9&nbsp;&nbsp;</span>R Module Wrap up</a></span><ul class="toc-item"><li><span><a href="#Some-comments-and-suggestions" data-toc-modified-id="Some-comments-and-suggestions-9.1"><span class="toc-item-num">9.1&nbsp;&nbsp;</span>Some comments and suggestions</a></span></li></ul></li><li><span><a href="#Practicals-wrap-up" data-toc-modified-id="Practicals-wrap-up-10"><span class="toc-item-num">10&nbsp;&nbsp;</span>Practicals wrap-up</a></span></li><li><span><a href="#Readings" data-toc-modified-id="Readings-11"><span class="toc-item-num">11&nbsp;&nbsp;</span>Readings</a></span></li></ul></div>

In this chapter, you will learn some additional topics in R to


*  Make data wrangling, analyses, and simulations more efficient using vectorization and tools such as `plyr`

*  Use some advanced tools for control flows and looping
	
*  Generate random numbers for statistical simulations and looping
	
*  Find and fix errors in R code using debugging 

*  Become aware of some additional tools and topics in R (accessing databases, building your own packages, etc.). 
	


## Vectorization

R is very slow at cycling through a data structure such as a dataframe or matrix (using `for` and `while` loops). This is because R is a "nimble" language: at execution time R does not know what you are going to perform until it "reads" the code to perform. Compiled languages such as C, know exactly what the flow of the program is, as the code is "compiled" before execution. 

As a metaphor, C is a musician playing a score she has seen before -- optimizing each passage, while R is playing it "a prima vista" (i.e., at first sight) -- this can slow code execution and operations down. Let's see an example that illustrates this point.

*  Type (save in ` Code`) as ` Vectorize1.R` the following script, and run it (it sums all elements of a matrix):

```R
M <- matrix(runif(1000000),1000,1000)

SumAllElements <- function(M){
  Dimensions <- dim(M)
  Tot <- 0
  for (i in 1:Dimensions[1]){
    for (j in 1:Dimensions[2]){
      Tot <- Tot + M[i,j]
    }
  }
  return (Tot)
}
## This on my computer takes about 1 sec
print(system.time(SumAllElements(M)))
## While this takes about 0.01 sec
print(system.time(sum(M)))
```
Note the `system.time` R function --- it calculates how much time your code takes.

Both `SumAllElements()` and `sum()` approaches are correct, and will give you the right answer. However, the inbuilt function `sum()`  is 100 times faster than the other, because it is uses vectorization that avoids the amount of looping that `SumAllElements()` uses!

In R, even if you should try to avoid loops, in practice, it is often much easier to throw in a ` for` loop, and {\it then} "optimize" the code to avoid the loop if the running time is not satisfactory. Therefore, it won't hurt you to become really familiar with loops and looping as you learned in the [first R Chapter](06-R_I.ipynb).  

Fortunately, R has several functions that can operate on entire vectors and matrices without requiring looping (Vectorization). That is, vectorizing a computer program means you write it such that as many operations as possible are applied to whole data structure (vectors, matrices, dataframes, lists, etc) at one go, instead of its individual elements. 

You will learn about some important R functions that allow vectorization in the following sections.

### The ` *apply` family of functions

There are a family of functions called ` *apply` in R that vectorize your code for you. These functions are described in the help files (e.g. `?apply`). 

For example, ` apply` can be used when you want to apply a function to the rows or columns of a matrix (and higher-dimensional analogues -- remember arrays!). This is not generally advisable for data frames as it will first need to coerce the data frame to a matrix first.

*  Type the following in a script file called `apply1.R`, save it to your ` Code` directory, and run it:

```R
## apply: applying the same function to rows/colums of a matrix

## Build a random matrix
M <- matrix(rnorm(100), 10, 10)

## Take the mean of each row
RowMeans <- apply(M, 1, mean)
print (RowMeans)

## Now the variance
RowVars <- apply(M, 1, var)
print (RowVars)

## By column
ColMeans <- apply(M, 2, mean)
print (ColMeans)
```

That was using apply on some of R's inbuilt functions. You can use apply to define your own functions. Let's try it.

*  Type the following in a script file called `apply2.R`, save it to your ` Code` directory, and run it:

```R
SomeOperation <- function(v){ # What does this function do?
  if (sum(v) > 0){
    return (v * 100)
  }
  return (v)
}

M <- matrix(rnorm(100), 10, 10)
print (apply(M, 1, SomeOperation))
```

There are many other methods: ` lapply`, ` sapply`, ` eapply`, etc. Each is best for a given data type. For example, ` lapply` is best for R lists. Have a look at [this Stackoveflow thread](https://stackoverflow.com/questions/3505701/grouping-functions-tapply-by-aggregate-and-the-apply-family) 
for some guidelines. 

#### The `tapply` function

We will look at `tapply`, which is particularly useful because it allows you to apply a function to subsets of a vector in a dataframe, with the subsets defined by some other vector in the same dataframe, usually a factor (this could be useful for the [First R Chapter](06-R_I.ipynb)'s pound hill data analysis, for example). 

This makes it a bit of a different member of the ` *apply` family. Try this:

In [4]:
x <- 1:20 # a vector
x

Now create a `factor` type variable (of the same length) defining groups:

In [6]:
y <- factor(rep(letters[1:5], each = 4)) 
y

Now add up the values in x within each subgroup defined by y:

In [7]:
tapply(x, y, sum)

ERROR while rich displaying an object: Error in dn[[2L]]: subscript out of bounds

Traceback:
1. FUN(X[[i]], ...)
2. tryCatch(withCallingHandlers({
 .     rpr <- mime2repr[[mime]](obj)
 .     if (is.null(rpr)) 
 .         return(NULL)
 .     prepare_content(is.raw(rpr), rpr)
 . }, error = error_handler), error = outer_handler)
3. tryCatchList(expr, classes, parentenv, handlers)
4. tryCatchOne(expr, names, parentenv, handlers[[1L]])
5. doTryCatch(return(expr), name, parentenv, handler)
6. withCallingHandlers({
 .     rpr <- mime2repr[[mime]](obj)
 .     if (is.null(rpr)) 
 .         return(NULL)
 .     prepare_content(is.raw(rpr), rpr)
 . }, error = error_handler)
7. mime2repr[[mime]](obj)
8. repr_markdown.integer(obj)
9. repr_vector_generic(html_escape_names(obj), "%s. %s\n", "%s\n:   %s", 
 .     "**%s:** %s", "%s\n\n", item_uses_numbers = TRUE, escape_fun = html_escape)
10. html_escape_names(obj)
11. .escape_names(obj, "html")
12. colnames(obj)
ERROR while rich displaying an object: 

### Using ` by`

You can also do something similar to `tapply` with the `by` function, i.e., apply a function to a dataframe using some factor to define the subsets. Try this:

First import some data:

In [12]:
attach(iris)
iris

The following objects are masked from iris (pos = 3):

    Petal.Length, Petal.Width, Sepal.Length, Sepal.Width, Species

The following objects are masked from iris (pos = 4):

    Petal.Length, Petal.Width, Sepal.Length, Sepal.Width, Species

The following objects are masked from iris (pos = 5):

    Petal.Length, Petal.Width, Sepal.Length, Sepal.Width, Species

The following objects are masked from iris (pos = 6):

    Petal.Length, Petal.Width, Sepal.Length, Sepal.Width, Species



Sepal.Length,Sepal.Width,Petal.Length,Petal.Width,Species
5.1,3.5,1.4,0.2,setosa
4.9,3.0,1.4,0.2,setosa
4.7,3.2,1.3,0.2,setosa
4.6,3.1,1.5,0.2,setosa
5.0,3.6,1.4,0.2,setosa
5.4,3.9,1.7,0.4,setosa
4.6,3.4,1.4,0.3,setosa
5.0,3.4,1.5,0.2,setosa
4.4,2.9,1.4,0.2,setosa
4.9,3.1,1.5,0.1,setosa


Now run the `colMeans` function (it is better for dataframes than just mean) on multiple columns:

In [13]:
by(iris[,1:2], iris$Species, colMeans)

iris$Species: setosa
Sepal.Length  Sepal.Width 
       5.006        3.428 
------------------------------------------------------------ 
iris$Species: versicolor
Sepal.Length  Sepal.Width 
       5.936        2.770 
------------------------------------------------------------ 
iris$Species: virginica
Sepal.Length  Sepal.Width 
       6.588        2.974 

In [14]:
by(iris[,1:2], iris$Petal.Width, colMeans)

iris$Petal.Width: 0.1
Sepal.Length  Sepal.Width 
        4.82         3.36 
------------------------------------------------------------ 
iris$Petal.Width: 0.2
Sepal.Length  Sepal.Width 
    4.972414     3.379310 
------------------------------------------------------------ 
iris$Petal.Width: 0.3
Sepal.Length  Sepal.Width 
    4.971429     3.328571 
------------------------------------------------------------ 
iris$Petal.Width: 0.4
Sepal.Length  Sepal.Width 
    5.300000     3.785714 
------------------------------------------------------------ 
iris$Petal.Width: 0.5
Sepal.Length  Sepal.Width 
         5.1          3.3 
------------------------------------------------------------ 
iris$Petal.Width: 0.6
Sepal.Length  Sepal.Width 
         5.0          3.5 
------------------------------------------------------------ 
iris$Petal.Width: 1
Sepal.Length  Sepal.Width 
    5.414286     2.371429 
------------------------------------------------------------ 
iris$Petal.Width: 1.1
Sepal.Length  

### Using ` replicate`

The `replicate` function is useful to avoid a loop for function that typically involves random number generation (more on this below). For example:

In [16]:
replicate(10, runif(5))

0,1,2,3,4,5,6,7,8,9
0.25888155,0.1476238,0.5954552,0.8074188,0.8916004,0.5222074,0.795254,0.0906624,0.88246467,0.2094422
0.09401763,0.5074343,0.9851042,0.2605838,0.218953,0.4962177,0.203454,0.37673681,0.29768375,0.3024634
0.73955006,0.7379014,0.912972,0.5680505,0.4455375,0.8162595,0.8115905,0.7342564,0.02221009,0.1137064
0.45298293,0.5754574,0.9230039,0.5775192,0.4349828,0.9983535,0.9171582,0.98418235,0.0373946,0.8005109
0.25969546,0.6782946,0.9256775,0.7322016,0.6026458,0.8317724,0.9313614,0.03503012,0.12921967,0.4077644


That is, you just generated 10 sets (columns) of 5 uniformly-distributed random numbers (a 10 $\times$ 5 matrix). 

### Using `plyr` and `ddply`

The `plyr` package combines the functionality of the `*apply` family, into a few handy functions. Look up the [web page](http://plyr.had.co.nz/).

In particular, ` ddply` is very useful, because for each subset of a data frame, it applies a function and then combines results into another data frame. In other words, "ddply" means: take a data frame, split it up, do something to it, and return a data frame. Look up [this](http://seananderson.ca/2013/12/01/plyr.html) and 
[this](https://www.r-bloggers.com/transforming-subsets-of-data-in-r-with-by-ddply-and-data-table/) 
for examples.	There you will also see a comparison of speed of `ddply` vs `by` at the latter web page; `ddply` is actually slower than other vectorized methods, as it trades-off compactness of use for some of the speed of vectorization! Indeed, overall functions in ` plyr` can be slow if you are working with very large datasets that involve a lot of subsetting (analyses by many groups or grouping variables). 

The base ` *apply` functions remain useful and worth knowing even if you do get into ` plyr` or better still, `
dplyr`, which we will see in the [Data chapter](09-Data_R.ipyn).


## Practicals


	
*  {\bf A vectorization challenge}

The Ricker model is a classic discrete population model which was 
introduced in 1954 by Ricker to model recruitment of stock in fisheries.
It gives the expected number (or density) $N_{t+1}$ of individuals in 
generation $t + 1$ as a function of the number of individuals in the 
previous generation $t$:

N_{t+1} = N_t e^{r\left(1-\frac{N_t}{k}\right)}
 
Here $r$ is intrinsic growth rate and $k$ as the 
carrying capacity of the environment. Try this script that runs it:

\lstinputlisting[language=R]{Practicals/Code/Ricker.R}

Now open and run the script ` Vectorize2.R` (available on the 
bitbucket repository). This is the stochastic Ricker model (compare 
with the above script to see where the stochasticity (random error) 
enters). Now modify the script to complete the exercise given. 
 
{\it You will be marked on this one based upon how much faster your solution 
is compared to mine!}

*  {\bf Extra Credit:} Implement the python versions of {\tt 
Vectorize1.R} and ` Vectorize2.R` (call them {\tt 
Vectorize1.py} and ` Vectorize2.py` respectively). Then write a bash 
script that compares the computational speed of the four scripts. the 
bash script should display meaningful summary of the results in the 
terminal.       
 



## Generating Random Numbers

You will probably need to generate random numbers at some point in your 
journey towards becoming a proper data analyst or quantitative 
biologist. 

R has many routines for generating random samples from various 
probability distributions --- we have already used ` runif()`, {\tt 
rnorm()}. There are a number of random number distributions that you can sample 
or generate random numbers from: 


	` rnorm(10, m=0, sd=1)` & Draw 10 normal random numbers with mean 0 and s.d. 1\\
	` dnorm(x, m=0, sd=1)` & Density function\\
	` qnorm(x, m=0, sd=1)` & Cumulative density function\\
	` runif(20, min=0, max=2)` & Twenty random numbers from uniform
			[0,2]\\
	` rpois(20, lambda=10)` & Twenty random numbers from
			Poisson($\lambda$)\\
\\

### "Seeding" random number generators   
Before proceeding further, note that computers don't really generate mathematically random numbers, but 
instead a sequence of numbers that are close to random: "pseudo-random 
numbers". They are generated based on some iterative formula:

\[ x_{new} = f( x_{old}) \mod N \]

where modulo operation provides the "remainder" division.

So to generate the first random number, you need a {\bf seed}. Setting 
the seed allows you to reliably generate the same sequence of numbers, 
which can be useful when debugging programs (next section). 

Now, try this:


set.seed(1234567)
rnorm(1)
 0.1567038


What happened?! If this were truly a random number, how would everybody 
get the same answer? Now try ` rnorm(10)` and compare the results 
with your neighbour. Thus "random" numbers generated in R and in any 
other software are in fact "deterministic", but from a very complex 
formula that yields numbers with properties like random numbers. 

Effectively, ` rnorm` has an enormous list that it cycles through. 
The random seed starts the process, i.e., indicates where in the list 
to start. This is usually taken from the clock when you start R.

But why bother with this? Well, for debugging (next section). Bugs in 
code can be hard to find --- harder still if you are generating random 
numbers, so repeat runs of your code may or may not all trigger the 
same behaviour. You can set the seed once at the beginning of the code 
--- ensuring repeatability, retaining (pseudo) randomness. Once 
debugged, if you want, you can remove the set seed line.

To try out how sampling works, type the following into ` sample.R` 
and save in ` Code`:

\lstinputlisting[language=R]{Practicals/Code/sample.R}

## Errors and Debugging

### "Catching" errors

Often, you don't know if a simulation or a R function will work on a 
particular data or variable, or a value of a variable (can happen in 
many stats functions). 

Indeed, as most of you must have already experienced by now, there can 
be frustrating, puzzling bugs in programs that lead to mysterious 
errors. Often, the error and warning messages you get are 
un-understandable, especially in R! 

Rather than having R throw you out of the code, you would rather catch 
the error and keep going. This can be done using ` try`. Modify {\tt 
sample.R} as follows the 
following into `try.R` and save in `Code` (what does this 
script do?):

\lstinputlisting[language=R]{Practicals/Code/try.R}

Note the functions `sample` and `stop` in the above script.  
Also check out ` tryCatch`.

### Debugging

Once you have found an error, you would like to fix it. This is called 
debugging. Here are some useful debugging functions in R : 

* sep2pt

*  Warnings vs Errors; converting warnings to errors: {\tt 
stopifnot()} --- a bit like ` try`

*  What to do when you get an error: ` traceback()`

*  Simple ` print` commands in the right places can be useful 
for testing (but not strongly recommended)

*  Use of ` browser()` at key points in code --- my favourite option 
(also look up  ` recover()`)

*  ` debug(fn)`, ` undebug(fn)` : More technical approach to 
debugging --- explore them


Let's look at an example using ` browser()`. ` browser()` is 
handy because it will allow you to "single-step" through your code. 
Place it within your function at the point you want to examine (e.g.) 
local variables. 

Here's an  example usage of `browser()` (type in `browse.R` and 
save in ` Code`):

\lstinputlisting[language=R]{Practicals/Code/browse.R}

Now, within the browser, you can enter expressions as 
normal, or you can use a few particularly useful debug commands:

*  ` n`: single-step 
*  ` c`: exit browser and continue
*  ` Q`: exit browser and abort, return to top-level.


## Building your own R packages 

You can packaging up your code, data sets and documentation to make a 
{\it bona fide} R package. You may wish to do this for particularly 
large projects that you think will be useful for others. Read {\it 
Writing R Extensions} 
([cran.r-project.org/doc/manuals/r-release/R-exts.html](cran.r-project.org/doc/manuals/r-release/R-exts.html) manual and 
see {\it package.skeleton} to get started. The R tool set EcoDataTools 
([https://github.com/DomBennett/EcoDataTools](https://github.com/DomBennett/EcoDataTools)) and the package 
` cheddar` were written by Silwood Grad Students! 
        
## Sweave and knitr

Sweave and knitr are tools that allows you to write your Dissertation 
Report or some other document such that it can be updated automatically 
if data or R analysis change. Instead of inserting a prefabricated 
graph or table into the report, the master document contains the R code 
necessary to obtain it. When run through R, all data analysis output 
(tables, graphs, etc.) is created on the fly and inserted into a final 
document, which can be written using \LaTeX, LyX, HTML, or Markdown. 
The report can be automatically updated if data or analysis change, 
which allows for truly reproducible research. Check out 
[https://support.rstudio.com/hc/en-us/articles/200552056-Using-Sweave-and-knitr](https://support.rstudio.com/hc/en-us/articles/200552056-Using-Sweave-and-knitr) and 
[http://yihui.name/knitr/](http://yihui.name/knitr/).

## Practicals 


	
*  Autocorrelation in weather (this Practical may make more sense 
once you have done the R and Stats week where you will learn about 
correlation coefficients and p-values).


		
*  Make a new script named ` TAutoCorr.R`, and save in ` Code`
directory
  
*  At the start of the script, load and examine and plot \\ {\tt 
KeyWestAnnualMeanTemperature.Rdata}, using ` load()` --- This is the 
temperature in Key West, Florida for the 20th century.

*  {\bf The question this script will help answer is}: Is the 
temperature of one year significantly correlated with the next year 
(successive years), {\it across the years}. That us you will be 
calculating the correlation between $n-1$ pairs of years, where $n$ is 
the total number of years. However, you can't use the standard p-value 
calculated for a correlation coefficient (using ` R`'s ` cor()` 
function -- see below) because measurements of climatic variables in 
successive time-points in a time series (successive seconds, minutes, 
hours, months, years, etc.) are {\it not independent}.

*  Therefore,

*  Compute the appropriate correlation coefficient between successive 
years and store it (look at the help file for ` cor()`
*  Then repeat this calculation 10000 times by\\ 
-- randomly permuting the time series (Hint: you can use the {\tt 
sample} function that we learned about in this Chapter --- read the 
help file for this function and experiment with it),\\
-- then computing the correlation coefficient for each randomly permuted 
year sequence and storing it\\


*  Then calculate what fraction of the correlation coefficients from 
step 2 were greater than that from step 1 (this is your approximate 
p-value).

*  {\it How do you interpret these results? Why? Present your 
results and their interpretation in a pdf document written in \LaTeX 
(please include the the document's source code as well).}



* {Mapping} (Extra Credit)

Your project may not really need GIS, but you may still like/need to do 
some mapping. You you can do it in R using the ` maps` package. In 
this practical, you will map the Global Population Dynamics Database 
([https://www.imperial.ac.uk/cpb/gpdd/gpdd.aspx](https://www.imperial.ac.uk/cpb/gpdd/gpdd.aspx)) (GPDD). This is a 
freely available database that was developed at Silwood). 

If any of you are interested in doing a project around this database, 
please contact David Orme or Samraat Pawar! It is a gold mine of as yet 
under-utilized information. Note that the Living Planet Index 
([http://livingplanetindex.org/home/index](http://livingplanetindex.org/home/index)) is based upon these 
data.


*  Use `load()` from `GPDDFiltered.RData` that is available 
on the bitbucket git repository --- have a look at the database field headers and 
contents. 
*  What you need is latitude and longitude information for a bunch of 
species for which population time series are available in the GPDD
*  Now use ` install.packages()` to install the package {\tt 
maps}, as you did with ` ggplot2` --- hopefully without any 
problems!
*  Now create a script (saved under a sensible name in a sensible 
location --- hint hint!) that:
	
*  Loads the maps package
*  Loads the GPDD data
*  Creates a world map (use the map function, read its help 
*  maps})
*   Superimposes on the map all the locations from which we have 
		data in the GPDD dataframe
*  Compare your map with a fellow student to check
	

*  {\it Based on this map, what biases might you expect in any analysis 
based on the data represented? --- {\it include your answer as a comment at 
the end of your R script}. } 




## R Module Wrap up
 
 ### Some comments and suggestions

Thanks for enduring through the week! Learning to program in R or any 
other language, especially if it's your first-ever effort to learn 
programming, demands perseverance. Y'all have shown admirable 
quantities of this necessary quality. Keep going! I believe most if not 
all of you have climbed a significant part of a steep learning curve. 
Here are some things to keep in mind:

* sep2pt

*  There are many R nerds at Silwood that you can talk to --- {\it 
They walk among us!}

*  There is a Silwood R list that you can subscribe to: 
[https://mailman.ic.ac.uk/mailman/listinfo/silwood-r](https://mailman.ic.ac.uk/mailman/listinfo/silwood-r)

*  However, post questions only as a last resort! Google it first, 
and even before that, make sure you revise this week's (and stats 
week's) work.

*  Solutions to this weeks Practicals will become available by 15th 
Nov.



## Practicals wrap-up

  

*  Review and make sure you can run all the commands, code 
	fragments, and named scripts we have built till now and get the 
	expected outputs.

*  Annotate/comment your code lines as much and as often as 
	necessary using \#.
	
*  Keep all code, data and results files organized in the {\tt 
	Week3/} directory
	 
   


	\it `git add}, {\tt commit` and `push` all your code and data 
	from this chapter to your git repository by Wednesday, Nov 2, 5PM.


## Readings


    
*  See {\bf An introduction to the Interactive Debugging Tools in R,
	Roger D Peng} for detailed usage. 
	[http://www.biostat.jhsph.edu/~rpeng/docs/R-debug-tools.pdf](http://www.biostat.jhsph.edu/~rpeng/docs/R-debug-tools.pdf)
	
*  Friedrich Leisch. (2002) Sweave: Dynamic generation of statistical 
	reports using literate data analysis. Proceedings in Computational 
	Statistics, pages 575-580. Physica Verlag, Heidelberg, 2002. 
	[http://www.statistik.lmu.de/~leisch/Sweave](http://www.statistik.lmu.de/~leisch/Sweave)
	
*  Remember, R packages come with pdf guides/documentation!
	


