---
title: "Introduction to R"
output:
  html_document:
---

# Introduction

In this lab, I will introduce you to R and R Studio.  Both programs are free to download, and you can find links to download the software on the blog (http://tiny.cc/stat_250).

In the Behrend computer labs, R Studio can be found in the program menu under "R Studio."  As with many applications.

When you first run R Studio, the main screen you will see is the "Console" window, which is below this one.  This is window in which you can type commands.  After each command, press enter.  For example:

In [None]:
2+2

In [None]:
3*4

In [None]:
sqrt(16)

In [None]:
log(123)

This shows that you can easily use R as a calculator.  Some tips:

* If you want to scroll up to the previous command(s) you have entered, use the up arrow key
* If you forget the exact command you want to use, enter the first few letters and hit tab.  A list of possible commands will appear.  Try "`sq`".
* Help is available, either in the "Help" tab in the lower right hand window or you can enter `?log`.  Sometimes the R help can be useful, sometimes it may not be helpful.

Since this is a statistics class, we are interested in data sets, not single numbers.  If you want to enter a small data set very quickly, use the `c` command, which stands for "combine."

In [None]:
x <- c(10,14,15,12,13)

In [None]:
y = c(16,19,14,18,20)

Note, the arrow (`<-`) and the equals sign are equivalent.  They assign those values to `x` and `y` respectively.  Once you have the variables `x` and `y` you can perform some operations.

In [None]:
sqrt(x)

In [None]:
y^2

In [None]:
sum(x)

In [None]:
mean(y)

In biology, data sets are rarely that small, and they contain more than one variable.  I have put one such data set on the web, bullhead data collected in Presque Ilse Bay.  Here is how you can read it into R.

In [None]:
bullhead <- read.csv("http://www.personal.psu.edu/mar36/stat_250/data_pib.csv")

In [None]:
bullhead <- read.csv("http://tiny.cc/pubh2w")

The command `read.csv` will read any comma delimited file.  It can be located on the internet or on your computer (I will show you that later in the semester).  Once you have read the data into R, you can do a number of things.

In [None]:
head(bullhead)        # First six lines

In [None]:
bullhead              # The entire data set

In [None]:
tail(bullhead)        # Last six lines

In [None]:
names(bullhead)       # What are the variable names

In [None]:
bullhead$age          # Just the 'age' variable

# Tools for Categorical Data

Looking at the data, you see that some variables are quantitative and some are categorical.  The categorical variables can be examined with the `table` command.

In [None]:
table(bullhead$sex)

In [None]:
table(bullhead$pibloc)

In [None]:
table(bullhead$pibloc,bullhead$sex)

In [None]:
table(bullhead$pibloc,bullhead$skin)

In [None]:
table(bullhead$length)

You can use `table` on quantitative variables, but the output is not really useful.  Bar charts are the most useful way to plot categorical variables, and we have a number of options in R to use.

In [None]:
barplot(table(bullhead$sex))

In [None]:
barplot(table(bullhead$sex),names.arg=c("Missing","Female","Male"))

In [None]:
barplot(table(bullhead$pibloc))

In [None]:
barplot(table(bullhead$pibloc),cex.names=.75)

In [None]:
barplot(table(bullhead$pibloc),cex.names=.45,horiz=TRUE)

Another way to plot categorical variables is to use a pie chart.  Statisticians are not big fans of pie charts, which I will demonstrate in class.

In [None]:
pie(table(bullhead$sex),labels=c("Missing","Female","Male"))

In [None]:
pie(table(bullhead$sex),labels=c("Missing","Female","Male"),
          col=c("wheat","deeppink","blue"))

In [None]:
pie(table(bullhead$pibloc))

# Tools for Quantitative Data

To graphically summarize quantitative data, one solution is a histogram.

In [None]:
hist(bullhead$length)

In [None]:
hist(bullhead$weight)

In [None]:
hist(bullhead$weight,breaks=20,main="Histogram of Bullhead Weights (g)")

In [None]:
hist(bullhead$age)

When we look at histograms, we are interested in describing two things: the number of modes and the shape.  We will discuss this in lab/class.