<a href="https://clarusway.com/contact-us/"><img align="center" src="https://i.ibb.co/B43qn24/officially-licensed-logo.png" alt="Open in Clarusway LMS" width="110" height="200" title="This notebook is licensed by Clarusway IT training school. Please contact the authorized persons about the conditions under which you can use or share."></a>

In [1]:
# from google.colab import drive
# drive.mount('/content/drive')

In [1]:
# Importing related libraries; Pandas, Numpy, scipy.stats

import pandas as pd
import numpy as np
from scipy import stats

In [2]:
mu = 80
alpha = 0.05
n = 9
sample = [95,70,120,65,130,38,110,90,60]
sample_mean = np.mean(sample)
sample_mean

86.44444444444444

In [3]:
stats.ttest_1samp(sample, mu)

Ttest_1sampResult(statistic=0.6339555069870142, pvalue=0.5438063764856331)

In [4]:
mu = 60
alpha = 0.05
n = 9
sample = [35, 60, 70, 95, 30, 110, 80, 95, 130]
sample_mean = np.mean(sample)
sample_mean

78.33333333333333

In [5]:
stats.ttest_1samp(sample, mu)
# alpha > p-value, so we fail to reject the null hypothesis

Ttest_1sampResult(statistic=1.653621261606223, pvalue=0.13680337348338098)

In [22]:
#h0: mu = 60
#h1: mu != 60

mu = 68
alpha = 0.01
sample_mean = 72
sample_std = 10
n = 24

t_critical = stats.t.ppf(1-alpha/2, n-1) #percent point function
print(t_critical)

t_test = (sample_mean - mu)/(sample_std/np.sqrt(n))
print(t_test)


2.8073356837675227
1.9595917942265424


In [21]:
ref = np.random.normal(72,10,24)

stats.ttest_1samp(ref, 68)

Ttest_1sampResult(statistic=0.02763336327109021, pvalue=0.9781929445105383)

# One Sample T-Test

- According to Reynolds Intellectual Ability Scales, the ``average VIQ`` (Verbal IQ scores based on the four Wechsler (1981) subtests) is about ``109``.

- In our sample data, we have a ``sample of 40 cases``. 
- Let's test if the ``average VIQ of people`` is significantly ``bigger than 109``.

In [3]:
# Brain size and weight and IQ data (Willerman et al. 1991)

df = pd.read_csv("brain_size.csv", sep=";", na_values = ".", index_col=0)

df.head()

Unnamed: 0,Gender,FSIQ,VIQ,PIQ,Weight,Height,MRI_Count
1,Female,133,132,124,118.0,64.5,816932
2,Male,140,150,124,,72.5,1001121
3,Male,139,123,150,143.0,73.3,1038437
4,Male,133,129,128,172.0,68.8,965353
5,Female,137,132,134,147.0,65.0,951545


In [4]:
df.shape

(40, 7)

**Stating the null (H0) & alternative hypothesis (Ha or H1):**

In [5]:
# H0: mu = 109
# H1: mu > 109

**Calculating Test Statistic & p-value:**

In [6]:
# Calculating the mean of VIQ

xbar = df.VIQ.mean()
xbar

112.35

In [7]:
# Calculating the std of VIQ

s = df.VIQ.std()
s

23.616107063199735

In [8]:
s / np.sqrt(df.shape[0])

3.7340343893050596

In [9]:
df.shape

(40, 7)

In [10]:
# Calculating the test statistic

t_test = (xbar - 109)/(s/np.sqrt(df.shape[0]))

In [11]:
# test statistic

t_test

0.8971529586323553

In [12]:
# Calculating p-value

1 - stats.t.cdf(t_test, 39)

0.18757115929257173

In [13]:
# help(stats.ttest_1samp)

In [14]:
# Using stats.ttest_1samp() to calculate the test statistic and p-value

oneSamp = stats.ttest_1samp(df.VIQ, 109, alternative='greater')
oneSamp

Ttest_1sampResult(statistic=0.8971529586323551, pvalue=0.18757115929257173)

In [15]:
# Displaying p-value

oneSamp.pvalue

0.18757115929257173

**Making a decision:**

In [16]:
# Comparing p-value and alpha

alpha = 0.05

if oneSamp.pvalue < alpha:
    print('At {} level of significance, we can reject the null hypothesis in favor of alternative hypothesis.'.format(alpha))
else:
    print('At {} level of significance, we fail to reject the null hypothesis.'.format(alpha))

At 0.05 level of significance, we fail to reject the null hypothesis.


# Independent Samples T-Test

## Arsenic Example

- Arsenic concentration in public drinking water supplies is a potential health risk. 
- An article in the Arizona Republic (May 27, 2001) reported drinking water arsenic concentrations in parts per billion (ppb) for 10 metropolitan Phoenix communities and 10 communities in rural Arizona.
- You can find the data in CSV file.

Determine if there is any difference in mean arsenic concentrations between metropolitan Phoenix communities and communities in rural Arizona.

In [17]:
# Importing arsenic dataset

arsenic = pd.read_csv("arsenic.csv")
arsenic

Unnamed: 0,Metro Phoenix,x1,Rural Arizona,x2
0,Phoenix,3,Rimrock,48
1,Chandler,7,Goodyear,44
2,Gilbert,25,New River,40
3,Glendale,10,Apache Junction,38
4,Mesa,15,Buckeye,33
5,Paradise Valley,6,Nogales,21
6,Peoria,12,Black Canyon City,20
7,Scottsdale,25,Sedona,12
8,Tempe,15,Payson,1
9,Sun City,7,Casa Grande,18


In [18]:
arsenic.columns

Index(['Metro Phoenix', 'x1', 'Rural Arizona', 'x2'], dtype='object')

**Checking the Homogeneity of Variances:**

**[What is levene's test:](https://en.wikipedia.org/wiki/Levene%27s_test)**

In statistics, Levene's test is an inferential statistic used to assess the equality of variances for a variable calculated for two or more groups.

In [19]:
# Performing Levene test for equal variances

# H0: The population variances are equal
# H1: There is a difference between the variances in the population

# The small p-value suggests that the populations do not have equal variances.

leveneTest = stats.levene(arsenic.x1, arsenic.x2)
leveneTest

LeveneResult(statistic=7.7015516672169, pvalue=0.012482954069299166)

**Stating the null (H0) & alternative hypothesis (Ha or H1):**

In [20]:
# H0: mu1 = mu2
# H1: mu1 != mu2

In [21]:
# average Metro Phoenix

arsenic.x1.mean()

12.5

In [22]:
# average Rural Arizona

arsenic.x2.mean()

27.5

**Calculating Test Statistic & p-value (the T-test for the means of two independent samples of scores):**

In [23]:
# help(stats.ttest_ind)

In [24]:
# Calculating test statistics using stats.ttest_ind()

indTest = stats.ttest_ind(arsenic.x1, arsenic.x2, equal_var=False)
indTest

Ttest_indResult(statistic=-2.7669395785560553, pvalue=0.0158272848161009)

In [25]:
indTest.statistic

-2.7669395785560553

In [26]:
indTest.pvalue

0.0158272848161009

**Making a decision:**

In [27]:
alpha = 0.05

if indTest.pvalue < alpha:
    print('At {} level of significance, we can reject the null hypothesis in favor of alternative hypothesis.'.format(alpha))
else:
    print('At {} level of significance, we fail to reject the null hypothesis.'.format(alpha))

At 0.05 level of significance, we can reject the null hypothesis in favor of alternative hypothesis.


# Paired (Dependent) Samples T-Test

## Prozac Data

- Let us consider a simple example of what is often termed "pre/post" data or "pretest/posttest" data. 
- Suppose you wish to test the effect of Prozac on the well-being of depressed individuals, using a standardised "well-being scale" that sums Likert-type items to obtain a score that could range from 0 to 20. 
- Higher scores indicate greater well-being (that is, Prozac is having a positive effect). 
- While there are flaws in this design (e.g., lack of a control group) it will serve as an example of how to analyse such data.

Determine if Prozac enhances well-being in depressed individuals. Use   0.05


In [28]:
# Reading Prozac dataset

prozac = pd.read_csv("prozac.csv")
prozac

Unnamed: 0,moodpre,moodpost,difference
0,3,5,2
1,0,1,1
2,6,5,-1
3,7,7,0
4,4,10,6
5,3,9,6
6,2,7,5
7,1,11,10
8,4,8,4


**Stating the null (H0) & alternative hypothesis (Ha or H1):**

In [29]:
# moodpost - moodpre

# H0: d_bar = 0
# H1: d_bar > 0

**Calculating Test Statistic & p-value (the T-test for the means of two dependent (Paired) samples of scores):**

In [30]:
# Calculating test statistics using stats.ttest_rel() 

# moodpost - moodpre

pairedtest = stats.ttest_rel(prozac.moodpost, prozac.moodpre, alternative="greater")
pairedtest

Ttest_relResult(statistic=3.1428571428571423, pvalue=0.006872912197394244)

In [31]:
# moodpre - moodpost

# H0: d_bar = 0
# H1: d_bar < 0

# stats.ttest_rel(prozac.moodpre, prozac.moodpost, alternative="less")

**Making a decision:**

In [32]:
alpha = 0.05

if pairedtest.pvalue < alpha:
    print('At {} level of significance, we can reject the null hypothesis in favor of alternative hypothesis.'.format(alpha))
else:
    print('At {} level of significance, we fail to reject the null hypothesis.'.format(alpha))

At 0.05 level of significance, we can reject the null hypothesis in favor of alternative hypothesis.


In [33]:
# That means prozac treatment gives good results.

<a href="https://clarusway.com/contact-us/"><img align="center" src="https://i.ibb.co/B43qn24/officially-licensed-logo.png" alt="Open in Clarusway LMS" width="110" height="200" title="This notebook is licensed by Clarusway IT training school. Please contact the authorized persons about the conditions under which you can use or share."></a>