# Descriptive Statistics in R

#### Central Tendency:
Central tendency tells about how the group of data is clustered around the centre value of the distribution. Central tendency performs the following measures:
* Arithmetic Mean
* Geometric Mean
* Harmonic Mean
* Median
* Mode

### Arithmetic Mean
The arithmetic mean of give set of numbers is calculated using `mean()` function in R

In [1]:
#Example
x=c(3, 7, 5, 13, 20, 23, 39, 23, 40, 23, 14, 12, 56, 23)
mean(x)

### Geometric Mean
We use `prod()` and `length()` function to find the geometric mean for given set of numbers.

In [2]:
#Example
x=c(3, 7, 5, 13, 20, 23, 39, 23, 40, 23, 14, 12, 56, 23)
prod(x)^(1/length(x))
#prod() takes product of all the terms
#length() finds length of the vector

### Harmonic Mean
Since harmonic mean is reciprocal of the arithmetic mean of reciprocals of the given set of values we use mean() function for computing it as follows:

In [3]:
#Example
x=c(1, 2, 3, 4, 5)
y=1/x #inverse of the vector
1/mean(y) #hormonic mean

In [4]:
#another way
1/mean(1/x)

### Median
The median of give set of numbers is calculated using `median()` function in R

In [5]:
#odd amount of numbers
x=c(1, 2, 3, 4, 5)
median(x)

In [6]:
#even amount of numbers
y=c(1, 2, 3, 4, 5, 6)
median(y)

### Frequency Table
We use `table()` function to create a frequency table for a given set of observations.

In [7]:
#Example
x=c(1, 1, 2, 2, 2, 3, 3, 3, 3, 3)
table(x)
#gives numbers and their frequencies

x
1 2 3 
2 3 5 

In [8]:
y=c("a", "a", "b", "b", "b", "c")
table(y)
#gives number of times each alphabet is repeated

y
a b c 
2 3 1 

#### Use of `names()`
`names()` function is used to get or set the name of an Object. This function takes object i.e. vector, matrix or data frame as argument along with the value that is to be assigned as name to the object. The length of the value vector passed must be exactly equal to the length of the object to be named.

In [9]:
#for example
x=c(1,2,3)
names(x)=c('x','y','z') #assigns names to the numbers from the vector
x

### Mode
The mode of a given set of values is the value that is repeated most in the set. There can exist multiple mode values. We use frequency table i.e. `table(), names()` and `max()` function to compute mode of given set of observation.

In [10]:
#Example
#Single Mode
x=c(1,1,2,2,2,3,3,3,3)
y=table(x) #Creating frequncy table
m=names(y)[which(y==max(y))]
#from frequncy table give the name corresponding to value y which is maximum in the frequncy table
m

In [11]:
#Double Mode
x=c(1,1,2,2,2,3,3,3)
y=table(x)
m=names(y)[which(y == max(y))]
m

#### Use of Covariance and Correlation
Covariance and Correlation are used to measure relationships between two random variables. Both of these terms measure linear dependency between a pair of random variables or bivariate data.

### Covariance in R
We use `cov()` function to measure the covariance between given set of vectors. Syntax:
`cov(x, y, method)` where,
x and y represents the data vectors
method defines the type of method to be used to compute covariance. Default is "pearson".

In [12]:
#For Example:
x=c(1,2,3,4,5)
y=c(10,20,30,40,50)
cov(x,y)
#By defaulr 'pearson'

In [13]:
cov(x, y, method = "pearson")

In [14]:
cov(x, y, method = "kendall")

In [15]:
cov(x, y, method = "spearman")

### Correlation in R
We use `cor()` function to measure the correlation between given set of vectors. Syntax:
`cor(x, y, method)` where,
x and y represents the data vectors
method defines the type of method to be used to compute covariance. Default is "pearson".

In [16]:
#For Example:
x=c(1,2,3)
y=c(1,4,6)
cor(x, y)

In [17]:
cor(x, y, method = "pearson")

In [18]:
cor(x, y, method = "kendall")

In [19]:
cor(x, y, method = "spearman")

The End