# Basic Operations and Numerical Descriptions

## Basic Operations

Once you have a vector (or a list of numbers) in memory most basic operations are available. Most of the basic operations will act on a whole vector and can be used to quickly perform a large number of calculations with a single command. There is one thing to note, if you perform an operation on more than one vector it is often necessary that the vectors all contain the same number of entries.

In [1]:
a <- c(1,2,3,4)
a
a + 5
a - 10
a*4
a/5

We can save the results in another vector called b:

In [2]:
b <- a - 10
b

If you want to take the square root, find e raised to each number, the logarithm, etc., then the usual commands can be used:

In [3]:
sqrt(a)
exp(a)
log(a)
exp(log(a))

In [4]:
c <- (a + sqrt(a))/(exp(2)+1)
c

The operation is performed on an element by element basis. Note this is true for almost all of the basic functions. So you can bring together all kinds of complicated expressions:

In [5]:
a + b
a*b
a/b
(a+3)/(sqrt(1-b)*2-1)

You need to be careful of one thing. When you do operations on vectors they are performed on an element by element basis. One ramification of this is that all of the vectors in an expression must be the same length. If the lengths of the vectors differ then you may get an error message, or worse, a warning message and unpredictable results:

In [6]:
a <- c(1,2,3)
b <- c(10,11,12,13)
a+b

“longer object length is not a multiple of shorter object length”

In [7]:
ls()

If you look at the minimum of two vectors using the min command you will get the minimum of all of the numbers. There is a special command, called pmin, that may be the command you want in some circumstances:

In [8]:
a <- c(1,-2,3,-4)
b <- c(-1,2,-3,4)
min(a,b)
pmin(a,b)

## Basic Numerical Descriptions

In [9]:
tree <- read.csv(file="trees91.csv",header=TRUE,sep=",");
names(tree)

Each column in the data frame can be accessed as a vector. For example the numbers associated with the leaf biomass (LFBM) can be found using tree$LFBM:

In [10]:
tree$LFBM

The following commands can be used to get the mean, median, quantiles, minimum, maximum, variance, and standard deviation of a set of numbers:

In [11]:
mean(tree$LFBM)

In [12]:
median(tree$LFBM)

In [13]:
quantile(tree$LFBM)

In [14]:
min(tree$LFBM)

In [15]:
max(tree$LFBM)

In [16]:
var(tree$LFBM)

In [17]:
sd(tree$LFBM)

Finally, the summary command will print out the min, max, mean, median, and quantiles:

In [18]:
summary(tree$LFBM)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.1300  0.4800  0.7200  0.7649  1.0075  1.7600 

The summary command is especially nice because if you give it a data frame it will print out the summary for every vector in the data frame:

In [19]:
summary(tree)

       C               N              CHBR         REP             LFBM       
 Min.   :1.000   Min.   :1.000   A1     : 3   Min.   : 1.00   Min.   :0.1300  
 1st Qu.:2.000   1st Qu.:1.000   A4     : 3   1st Qu.: 9.00   1st Qu.:0.4800  
 Median :2.000   Median :2.000   A6     : 3   Median :14.00   Median :0.7200  
 Mean   :2.519   Mean   :1.926   B2     : 3   Mean   :13.05   Mean   :0.7649  
 3rd Qu.:3.000   3rd Qu.:3.000   B3     : 3   3rd Qu.:20.00   3rd Qu.:1.0075  
 Max.   :4.000   Max.   :3.000   B4     : 3   Max.   :20.00   Max.   :1.7600  
                                 (Other):36   NA's   :11                      
      STBM             RTBM            LFNCC           STNCC       
 Min.   :0.0300   Min.   :0.1200   Min.   :0.880   Min.   :0.3700  
 1st Qu.:0.1900   1st Qu.:0.2825   1st Qu.:1.312   1st Qu.:0.6400  
 Median :0.2450   Median :0.4450   Median :1.550   Median :0.7850  
 Mean   :0.2883   Mean   :0.4662   Mean   :1.560   Mean   :0.7872  
 3rd Qu.:0.3800   3rd Qu.:0.

## Quantile
In statistics and probability quantiles are cut points dividing the range of a probability distribution into continuous intervals with equal probabilities, or dividing the observations in a sample in the same way. There is one less quantile than the number of groups created. Thus quartiles are the three cut points that will divide a dataset into four equal-sized groups.

A quartile is a type of quantile. The first quartile (Q1) is defined as the middle number between the smallest number and the median of the data set. The second quartile (Q2) is the median of the data. The third quartile (Q3) is the middle value between the median and the highest value of the data set. 

How to Find Quantiles?

Sample question: Find the number in the following set of data where 20 percent of values fall below it, and 80 percent fall above:
1 3 5 6 9 11 12 13 19 21 22 32 35 36 45 44 55 68 79 80 81 88 90 91 92 100 112 113 114 120 121 132 145 146 149 150 155 180 189 190

Step 1: Order the data from smallest to largest. The data in the question is already in ascending order.

Step 2: Count how many observations you have in your data set. this particular data set has 40 items.

Step 3: Convert any percentage to a decimal for “q”. We are looking for the number where 20 percent of the values fall below it, so convert that to .2.

Step 4: Insert your values into the formula:
ith observation = q (n + 1)
ith observation = .2 (40 + 1) = 8.2

Answer: The ith observation is at 8.2, so we round down to 8 (remembering that this formula is an estimate). The 8th number in the set is 13, which is the number where 20 percent of the values fall below it.


The output of the summary() function shows you for every variable a set of descriptive statistics, depending on the type of the variable:

Numerical variables: summary() gives you the range, quartiles, median, and mean.

Factor variables: summary() gives you a table with frequencies.

Numerical and factor variables: summary() gives you the number of missing values, if there are any.

Character variables: summary() doesn’t give you any information at all apart from the length and the class (which is ‘character’).


## Operations on Vectors
Here we look at some commonly used commands that perform operations on lists. The commands include the sort, min, max, and sum commands.

In [20]:
a = c(2,4,6,3,1,5)
b = sort(a)
c = sort(a,decreasing = TRUE)
a
b
c

In [21]:
 min(a)
 max(a)
sum(a)