::: {.callout-note collapse="true"}
### Update history

2022-11-17 First draft.
:::

# Introduction

This article is an extension of Rohit Farmer. 2022. "Parametric Hypothesis Tests with Examples in R." November 10, 2022. <https://www.dataalltheway.com/posts/010-parametric-hypothesis-tests-r> with example codes in Julia. Please check out the parent article for the theoretical background.

-   Z-test (@sec-z-test)
-   T-test (@sec-t-test)
-   F-test (@sec-f-test)

## Import packages


In [4]:
import Pkg
Pkg.activate(".")
using CSV
using DataFrames
using Statistics
using HypothesisTests

[32m[1m  Activating[22m[39m project at `~/sandbox/dataalltheway/posts/010-01-parametric-hypothesis-tests-julia`


# Getting the data

Some cleaning is necessary since the data is not of the correct types.


In [5]:
begin
	data = CSV.read(download("https://raw.githubusercontent.com/opencasestudies/ocs-bp-rural-and-urban-obesity/master/data/wrangled/BMI_long.csv"), DataFrame) # download and load
	allowmissing!(data, :BMI) # Allow BMI col to have missing values
	replace!(data.BMI, "NA" => missing) # Convert "NA" to missing
	data[!, :BMI] .= passmissing(parse).(Float64, (data[!, :BMI])) # Typecast into Float64?
end;

In [6]:
first(data, 20)

Row,Country,Sex,Region,Year,BMI
Unnamed: 0_level_1,String,String7,String15,Int64,Float64?
1,Afghanistan,Men,National,1985,20.2
2,Afghanistan,Men,Rural,1985,19.7
3,Afghanistan,Men,Urban,1985,22.4
4,Afghanistan,Men,National,2017,22.8
5,Afghanistan,Men,Rural,2017,22.5
6,Afghanistan,Men,Urban,2017,23.6
7,Afghanistan,Women,National,1985,20.6
8,Afghanistan,Women,Rural,1985,20.1
9,Afghanistan,Women,Urban,1985,23.2
10,Afghanistan,Women,National,2017,24.4


# Z-test {#sec-z-test}

## Two sample unpaired z-test


In [7]:
uneqvarztest = let
	# Fetch a random sample of BMI data for women in the year 1985 and 2017
	x1 = filter([:Sex, :Year] => (s, y) -> s=="Women" && y==1985 , data) |>
		x -> x[!, :BMI] |> skipmissing |> collect |> x->rand(x, 300)
	x2 = filter([:Sex, :Year] => (s, y) -> s=="Women" && y==2017 , data) |>
		x -> x[!, :BMI] |> skipmissing |> collect |> x->rand(x, 300)
	UnequalVarianceZTest(x1, x2)
end

Two sample z-test (unequal variance)
------------------------------------
Population details:
    parameter of interest:   Mean difference
    value under h_0:         0
    point estimate:          -2.45267
    95% confidence interval: (-2.89, -2.015)

Test summary:
    outcome with 95% confidence: reject h_0
    two-sided p-value:           <1e-27

Details:
    number of observations:   [300,300]
    z-statistic:              -10.98012310638998
    population standard error: 0.2233733304173345


## Two sample paired z-test


In [8]:
eqvarztest = let
	# Fetch a random sample of BMI data for women in the year 1985 and 2017
	x1 = filter([:Sex, :Year] => (s, y) -> s=="Women" && y==1985 , data) |>
		x -> x[!, :BMI] |> skipmissing |> collect |> x->rand(x, 300)
	x2 = filter([:Sex, :Year] => (s, y) -> s=="Women" && y==2017 , data) |>
		x -> x[!, :BMI] |> skipmissing |> collect |> x->rand(x, 300)
	EqualVarianceZTest(x1, x2)
end

Two sample z-test (equal variance)
----------------------------------
Population details:
    parameter of interest:   Mean difference
    value under h_0:         0
    point estimate:          -2.66433
    95% confidence interval: (-3.077, -2.251)

Test summary:
    outcome with 95% confidence: reject h_0
    two-sided p-value:           <1e-35

Details:
    number of observations:   [300,300]
    z-statistic:              -12.646344239797454
    population standard error: 0.21068012089602933


# T-test{#sec-t-test}

## One sample t-test


In [9]:
onesamplettest = let 
	x1 = filter(
		[:Sex, :Region, :Year] => 
			(s, r, y) -> s=="Men" && r=="Rural" && y == 2017,
		data
	) |>
	x -> x[!, :BMI] |> skipmissing |> collect |> x->rand(x, 300)
	OneSampleTTest(x1, 24.5)
end

One sample t-test
-----------------
Population details:
    parameter of interest:   Mean
    value under h_0:         24.5
    point estimate:          25.142
    95% confidence interval: (24.84, 25.44)

Test summary:
    outcome with 95% confidence: reject h_0
    two-sided p-value:           <1e-04

Details:
    number of observations:   300
    t-statistic:              4.19927592137962
    degrees of freedom:       299
    empirical standard error: 0.15288349992231


## Two sample unpaired (independent) t-test


In [10]:
unpairedtwosamplettest = let 
    x1 = filter([:Sex, :Region, :Year] => 
            (s, r, y) -> s=="Women" && r=="Rural" && y == 1985,
        data) |>
        x -> x[!, :BMI] |> skipmissing |> collect |> x->rand(x, 300)
    x2 = filter([:Sex, :Region, :Year] => 
            (s, r, y) -> s=="Women" && r=="Urban" && y == 1985,
        data) |>
        x -> x[!, :BMI] |> skipmissing |> collect |> x->rand(x, 300)
    x -> x[!, :BMI] |> skipmissing |> collect |> x->rand(x, 300)
    UnequalVarianceTTest(x1, x2)
end

Two sample t-test (unequal variance)
------------------------------------
Population details:
    parameter of interest:   Mean difference
    value under h_0:         0
    point estimate:          -1.04333
    95% confidence interval: (-1.491, -0.5958)

Test summary:
    outcome with 95% confidence: reject h_0
    two-sided p-value:           <1e-05

Details:
    number of observations:   [300,300]
    t-statistic:              -4.579077347889367
    degrees of freedom:       584.0098943512115
    empirical standard error: 0.22784793836562495


::: callout-warning
## Welch's Test

This test uses the Welch correction, and there is no way to turn it off in `HypothesisTests.jl`.
:::

### Only considering right tailed (one-tailed)


In [11]:
unpairedtwosamplettest = let 
	x1 = filter([:Sex, :Region, :Year] => 
			(s, r, y) -> s=="Women" && r=="Rural" && y == 1985,
		data) |>
		x -> x[!, :BMI] |> skipmissing |> collect |> x->rand(x, 300)
	x2 = filter([:Sex, :Region, :Year] => 
			(s, r, y) -> s=="Women" && r=="Urban" && y == 1985,
		data) |>
		x -> x[!, :BMI] |> skipmissing |> collect |> x->rand(x, 300)
	x -> x[!, :BMI] |> skipmissing |> collect |> x->rand(x, 300)
	UnequalVarianceTTest(x1, x2)
end
pvalue(unpairedtwosamplettest, tail=:right)

0.9999995678106779

## Two sample paired (dependent) t-test


In [12]:
pairedtwosamplettest = let 
	x1 = filter([:Sex, :Region, :Year] => 
			(s, r, y) -> s=="Women" && r=="Rural" && y == 1985,
		data) |>
		x -> x[!, :BMI] |> skipmissing |> collect |> x->rand(x, 300)
	x2 = filter([:Sex, :Region, :Year] => 
			(s, r, y) -> s=="Women" && r=="Urban" && y == 1985,
		data) |>
		x -> x[!, :BMI] |> skipmissing |> collect |> x->rand(x, 300)
	x -> x[!, :BMI] |> skipmissing |> collect |> x->rand(x, 300)
	EqualVarianceTTest(x1, x2)
end

Two sample t-test (equal variance)
----------------------------------
Population details:
    parameter of interest:   Mean difference
    value under h_0:         0
    point estimate:          -1.10067
    95% confidence interval: (-1.515, -0.6868)

Test summary:
    outcome with 95% confidence: reject h_0
    two-sided p-value:           <1e-06

Details:
    number of observations:   [300,300]
    t-statistic:              -5.223181810509686
    degrees of freedom:       598
    empirical standard error: 0.21072723611726182


# F-test{#sec-f-test}


In [13]:
Ftest = let 
	x1 = filter([:Sex, :Region, :Year] => 
			(s, r, y) -> s=="Women" && r=="Rural" && y == 1985,
		data) |>
		x -> x[!, :BMI] |> skipmissing |> collect |> x->rand(x, 300)
	x2 = filter([:Sex, :Region, :Year] => 
			(s, r, y) -> s=="Women" && r=="Urban" && y == 1985,
		data) |>
		x -> x[!, :BMI] |> skipmissing |> collect |> x->rand(x, 300)
	x -> x[!, :BMI] |> skipmissing |> collect |> x->rand(x, 300)
	VarianceFTest(x1, x2)
end

Variance F-test
---------------
Population details:
    parameter of interest:   variance ratio
    value under h_0:         1.0
    point estimate:          1.54765

Test summary:
    outcome with 95% confidence: reject h_0
    two-sided p-value:           0.0002

Details:
    number of observations: [300, 300]
    F statistic:            1.5476495641893069
    degrees of freedom:     [299, 299]
