# STATISTICAL INFERENCE: Second Computer Practise

## Estimations and Confidence Intervals

In this practice we are going to see with which R commands confidence intervals are calculated for one and two populations. We will use the file heights.txt that we used in the first session to learn
these commands

In [9]:
Data <-read.table(file="heights.txt", header=TRUE, dec=",", sep="\t")
str(Data)

'data.frame':	171 obs. of  6 variables:
 $ FATHER    : int  174 177 173 174 160 167 171 174 175 174 ...
 $ MOTHER    : int  156 159 161 156 165 157 162 158 162 161 ...
 $ SEX       : chr  "female" "male" "male" "male" ...
 $ BIRTHPLACE: int  2 2 1 2 1 3 1 3 3 1 ...
 $ HEIGHT    : int  165 170 168 167 162 163 165 168 168 167 ...
 $ WEIGHT    : int  65 67 51 69 54 61 65 76 67 69 ...


### Population mean weight: point estimation and Confidence Interval

In [10]:
t.test(Data$WEIGHT,alternative='two.sided',conf.level=.95)


	One Sample t-test

data:  Data$WEIGHT
t = 52.957, df = 170, p-value < 2.2e-16
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
 69.83973 75.24799
sample estimates:
mean of x 
 72.54386 


#### Explanation

The value of the point estimation is 72.54386 and the confidence interval with a confidence level of 95% is (69.83973,75.24799)

#### Note: Difference confidence level

To consider another confidence level, we must change conf.level by the value under consideration. For instance, for the 99% confidence interval, we must change conf.level=.95 by conf.level=.99

#### Note: Another variable

To consider another variable, we must change the first parameter of the command. For instance, of the 95% confidence interval of the mean height, we would need to execute the following command

t.test(Data$HEIGHT,alternative='two.sided',conf.level=.95)


### Variance of the population height: point estimation and 95% Confidence Interval

In [13]:
# Perhaps, you will need to install and load the following package:  
# install.packages("EnvStats")
 library(EnvStats)
varTest(Data$HEIGHT,alternative  = "two.sided",conf.level=0.95)


Results of Hypothesis Test
--------------------------

Null Hypothesis:                 variance = 1

Alternative Hypothesis:          True variance is not equal to 1

Test Name:                       Chi-Squared Test on Variance

Estimated Parameter(s):          variance = 11.08242

Data:                            Data$HEIGHT

Test Statistic:                  Chi-Squared = 1884.012

Test Statistic Parameter:        df = 170

P-value:                         0

95% Confidence Interval:         LCL =  9.057947
                                 UCL = 13.874455


#### Explanation:

The value of the point estimation of the population variance is 11.08242 and the confidence interval with a confidence level of 95% is (9.057947, 13.874455)

#### Note: Confidence Interval for the standard deviation of the population

    If instead of the variance, we are interested in the standard deviation, we use the command sqrt to get the square root. In our example: 

In [14]:
# Point estimation of the standard deviation
sqrt(11.08242)
# Confidence interval Lower-bound 
sqrt(9.353491)
# Confidence Interval Upper-bound
sqrt(13.376088)

## Ratio of variances of two independent populations

### Estimation between the ratio of variances of heights of men and women

In [15]:
women <-Data$HEIGHT[Data$SEX=="female"]       
men <-Data$HEIGHT[Data$SEX=="male"]
var.test(women,men,conf.level=0.95)


	F test to compare two variances

data:  women and men
F = 0.88431, num df = 81, denom df = 88, p-value = 0.576
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
 0.5765382 1.3616341
sample estimates:
ratio of variances 
         0.8843092 


#### Explanation:

The point estimation for the ration of variance of women's and men's height is 0.8843092 and the confidence interval of the %95 confidence level is (0.6177924, 1.2697464). Therefore, we cannot conclude that the variances are unequal (since 1 is inside the interval).

#### Note: Standard Deviation

If instead of the variance, we would like to consider the standard deviation, we will use the sqrt function as before. 

## Difference of the mean of two independent populations

### 90% Confidence Interval of the mean height between men and women

In [16]:
# In the previous example, we have seen that we could not reject the equality of variances of the heights of women and men
# That's why, we put var.equal=TRUE in the below command (if the variances were unequal, we will need to put
#   var.equal=FALSE)
t.test(women,men,conf.level=0.9,var.equal=TRUE)


	Two Sample t-test

data:  women and men
t = -4.0665, df = 169, p-value = 7.304e-05
alternative hypothesis: true difference in means is not equal to 0
90 percent confidence interval:
 -2.790305 -1.176809
sample estimates:
mean of x mean of y 
 165.7805  167.7640 


#### Explanation:

Point estimation of women's height: 165.7805

Point estimation of men's height: 167.7640

Confidence Interval with a confidence level of 90% of the difference of the mean heights of women and men: (-2.790305,-1.176809)

#### Conclussion:

The obtained values are negative, therefore we conclude with a confidence level of 90% that mu1-mu2 (that is, the mean height of women minus the mean height of men) is negative and, as a result, we conclude that the mean height of women is smaller than the mean height of men.

#### Note:

If instead of t.test(mujeres,hombres,....) we would put t.test(hombres,mujeres,...) the point estimations would be the same, but of the confidence intervals would be positive, getting the same conclusions. Do you understand why?

## Paired Data

### 98% Confidence interval of the difference of the height of father and mather

In [17]:
t.test(Data$FATHER,Data$MOTHER, conf.level=0.98)


	Welch Two Sample t-test

data:  Data$FATHER and Data$MOTHER
t = 23.914, df = 316.34, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
98 percent confidence interval:
 10.19882 12.40937
sample estimates:
mean of x mean of y 
 172.1871  160.8830 


#### Explanation:

Point estimation of father's height: 172.1871

Point estimation of mother's height: 160.8830

The 98% confidence interval of the difference of the mean of the height of the fathers and of the height of the mothers is (10.19882,12.40937)

#### Conclussion:

The obtained values in the confidence interval are positive, therefore we conclude with a confidence level of 98% that muD (that is, the mean height of the fathers minus the mean height of the mothers) is positive and, as a result, we conclude that the mean height of the father is larger than the mean height of the mothers. 



## The parameter p of a population Bin(1,p)

### The 95% confidence interval of those that are from Alava

In [18]:
# we count how many are from Alava 
Data.alava <- subset(Data,BIRTHPLACE==1)
amount.alava <- length(Data.alava$HEIGHT)
# we now count how many are in total
amount.total <- length(Data$HEIGHT)

prop.test(amount.alava,amount.total,conf.level=0.95)


	1-sample proportions test with continuity correction

data:  amount.alava out of amount.total, null probability 0.5
X-squared = 49.497, df = 1, p-value = 1.987e-12
alternative hypothesis: true p is not equal to 0.5
95 percent confidence interval:
 0.1689802 0.2996563
sample estimates:
        p 
0.2280702 


#### Explanation:

The point estimation of the proportion of population of Alava: 0.2280702

The 95% confidence interval: (0.1689802,0.2996563)


## Two populations Bin(1,p1) and Bin(1,p2)

### Difference of the proportion of being from Alava between men and women

In [19]:
# we count how many women and men are from Alava
Data.alava <- subset(Data,BIRTHPLACE==1)
women.alava <- subset(Data.alava,SEX=="female")
men.alava <- subset(Data.alava,SEX=="male")
n_women_alava <- length(women.alava$HEIGHT)
n_men_alava <- length(men.alava$HEIGHT)
# we count the number of women and men
Data.women <- subset(Data,SEX=="female")
n_women <- length(Data.women$HEIGHT)
Data.men <- subset(Data,SEX=="male")
n_men <- length(Data.men$HEIGHT)
# we perform the test
prop.test(c(n_women_alava,n_men_alava),c(n_women,n_men),conf.level=0.95)



	2-sample test for equality of proportions with continuity correction

data:  c(n_women_alava, n_men_alava) out of c(n_women, n_men)
X-squared = 0.43037, df = 1, p-value = 0.5118
alternative hypothesis: two.sided
95 percent confidence interval:
 -0.08391627  0.19161701
sample estimates:
   prop 1    prop 2 
0.2560976 0.2022472 


#### Explanation:

The point estimation of the proportion of women that are from Alava: 0.2560976

The point estimation of the proportion of men that are from Alava:  0.2022472

The 95% confidence interval of the difference of the proportion of women from Alava and of men from Alava: (-0.08391627,0.19161701)

#### Conclusion:

As zero is inside the interval, we cannot conclude that there is a significance difference on the difference of the proportion of women from Alava and men from Alava with a confidence level of 95%.
