# Overview

### Has the network latency gone up since we switched internet service providers?

$H_0:$ Mean latency in the three months after switiching internet service providers is less than or equal to mean latency in the three months before internet service providers. 

$H_a:$ Mean latency in the three months after switching internet service providers is greater than mean latency in the three months before switching internet service providers.

**Type I Error**: To test this hypothesis, we sampled a group of people. Within our sample group, average latency for three months after switching internet service providers is greater than latency for three months before switching internet service providers, but the overall population experienced a decrease or no change in latency after switching internet service providers.

**Type II Error**: To test our hypothesis, we sampled a group of people. Within our sample group, average latency for three months after switching internet service providers is less than the average latency for three months before switching internet service providers. The overall population, however, experienced an average increase in latency after switching internet service providers. 

### Is the website redesign any good?

$H_0$: The mean click-through rate in the six months before the website redesign is equal to or greater than the mean click-through rate in the six months after the website redesign.

$H_a:$ The mean click-through rate in the six months before the website redesign is less than the mean click-through rate in the six months after the website redesign. 

**Type I Error**: To sample our hypothesis, we sampled the website's mean click rates for users in a set number of areas. Of the areas we sampled, the mean click through rate for the six months before the website redesign was less than the mean click through rate for the six months after the website redesign. However, the mean click rate of users in all areas shows an increase in the click through rate since the website redesign. 

**Type II Error**: To sample our hypothesis, we sampled the website's mean click rate for a set number of areas. Of the areas we sampled, the mean click through rate for the six months before the website redesign was greater than the click through rate for the six months after the website redesign. However, the mean click rate of all days shows no change in the click through rate since website redesign. 

### Is our television ad driving more sales?


$H_0:$ The mean sales rate in the six months before the television advertisement campaign began is greater than or equal to the mean sales rate in the six months after the television campaign began. 

$H_{a}:$ The mean sales rate in the six months before the television advertisement campaign began is less than the mean sales rate in the six months after the television campaign began. 

**Type I Error**: To test this hypothesis we sampled sales figures for a six month period in a set number of locations. In the sampled locations, the mean number of sales in the six months before the television advertising campaign began was less than the mean number of sales in the six months after the television advertising campaign began. 

However, in the whole market, sales rates remained the same after the television advertising campaign began. 

**Type II Error**: To test this hypothesis, we sampled the sales figures in a set number of locations. In the sampled locations, the mean number of sales in the six months before the television advertising campaign began was greater than the mean number of sales in the six months after the television advertising campaign began. 

However, in the whole market, sales decreased after the television advertising campaign began.

# T-test

We will import the datasets that we may need.

In [22]:
import matplotlib.pyplot as plt
import numpy as np
from scipy import stats
from pydataset import data
from numpy import random
import seaborn as sns
import pandas as pd
from math import sqrt

Ace Realty wants to determine whether the average time it takes to sell homes is different for its two offices.

A sample of 40 sales from office #1 revealed a mean of 90 days and a standard deviation of 15 days.

In [2]:
sample_size1 = 40
mean1 = 90
std1 = 15

A sample of 50 sales from office #2 revealed a mean of 100 days and a standard deviation of 20 days.

In [3]:
sample_size2 = 50
mean2 = 100
std2 = 20

Use a .05 level of significance.

In [4]:
level_of_significance = 0.05

First we will formulate our hypotheses.

$H_0$: There is no difference between the average home sale time for office #1 and office #2.

$H_a$: There is a difference between the average home sale time for office #1 and office #2.

In [5]:
office_1 = random.normal(mean1, std1, size = 10_000)
office_1_pd = pd.DataFrame({'time': office_1})
office_1_pd

Unnamed: 0,time
0,80.817265
1,100.948677
2,113.713092
3,92.537064
4,118.481805
...,...
9995,85.832188
9996,81.864330
9997,71.503399
9998,64.256225


In [13]:
office_2 = random.normal(mean2, std2, size = 10_000)
office_2_pd = pd.DataFrame({'time': office_2})
office_2_pd

Unnamed: 0,time
0,97.484297
1,88.186364
2,114.691223
3,117.984245
4,88.488167
...,...
9995,76.633079
9996,120.206621
9997,70.985564
9998,101.485143


Let's calculate the mean for the average home sale time for each office.

In [8]:
office_1_mean = office_1_pd['time'].mean()
office_2_mean = office_2_pd['time'].mean()

Let's calculate the standard deviation for the average home sale time for each office.

In [9]:
office_1_std = office_1_pd['time'].std()
office_2_std = office_2_pd['time'].std()

We apply the degrees of freedom formula. 

In [17]:
n1 = office_1_pd['time'].shape[0]
n2 = office_2_pd['time'].shape[0]

In [18]:
degrees_of_freedom = n1 + n2 - 2
degrees_of_freedom

19998

Now we will calculate the pooled standard deviation.

In [27]:
office_1_calculation = (n1 - 1) * (office_1_std**2)
office_2_calculation = (n2 - 1) * (office_2_std**2)

In [30]:
std_p = sqrt((office_1_calculation + office_2_calculation) / (degrees_of_freedom))
std_p

17.764366792778816

We will calculate standard error.

In [32]:
standard_error_office_1_calculation = (office_1_std**2)/n1
standard_error_office_2_calculation = (office_2_std**2)/n2

In [34]:
standard_error = sqrt(standard_error_office_1_calculation + standard_error_office_2_calculation)
standard_error

0.25122608445318045

We will now apply to formula to calculate the t value

In [35]:
t_numerator = office_1_mean - office_2_mean
t_denominator = (std_p * sqrt(1/n1 + 1/n2))

In [36]:
t_value = t_numerator/t_denominator 
t_value

-39.72394318544515

We can use our calculated t_value to calculate our p value.

In [38]:
p = stats.t(degrees_of_freedom).sf(t_value) * 2
p

2.0

Because our p value is greater than our 0.05 level of significance, we do not reject the null hypothesis.

### Based on these calculations, we conclude that there is not a statistically significant difference in the average amount of time it takes Office #1 and Office #2 to sell homes. 