You bought a spool of 1.75 mm filament for your 3D printer. You want to measure how close the filament diameter really is to 1.75 mm. You use a caliper tool and sample the diameter five times on the spool:\
1.78, 1.75, 1.72, 1.74, 1.77\
Calculate the mean and standard deviation for this set of values.

In [1]:
sample_data = [1.78, 1.75, 1.72, 1.74, 1.77]

mean = sum(sample_data) / len(sample_data)
print(mean)

1.752


In [4]:
from math import sqrt

def variance(values, is_sample: bool = False):
    mean = sum(values) / len(values)
    # print(mean)
    _variance = sum((v - mean) ** 2 for v in values) /\
        (len(values) - (1 if is_sample else 0))
    return _variance

def std_dev(values, is_sample: bool = False): 
    return sqrt(variance(values, is_sample))

print("VARIANCE = {}".format(variance(sample_data, is_sample=True)))
print("STD DEV = {}".format(std_dev(sample_data, is_sample=True)))

VARIANCE = 0.0005700000000000011
STD DEV = 0.023874672772626667


A manufacturer says the Z-Phone smart phone has a mean consumer life of 42 months with a standard deviation of 8 months. Assuming a normal distribution, what is the probability a given random Z-Phone will last between 20 and 30 months?

In [5]:
# using CDF
from scipy.stats import norm

mean = 42
std_dev = 8

prob_20_30 = norm.cdf(30, mean, std_dev) - norm.cdf(20, mean, std_dev)
print(prob_20_30)

0.0638274380338035


I am skeptical that my 3D printer filament is not 1.75 mm in average diameter as advertised. I sampled 34 measurements with my tool. The sample mean is 1.715588 and the sample standard deviation is 0.029252.\
What is the 99% confidence interval for the mean of my entire spool of filament?

In [6]:
from math import sqrt
from scipy.stats import norm

def critical_z_value(p):
    norm_dist = norm(loc=0.0, scale=1.0)
    left_tail_area = (1.0 - p) / 2.0
    upper_area = 1.0 - ((1.0 - p) / 2.0)
    return norm_dist.ppf(left_tail_area), norm_dist.ppf(upper_area)

def confidence_interval(p, sample_mean, sample_std, n): 
    # Sample size must be greater than 30
    lower, upper = critical_z_value(p)
    lower_ci = lower * (sample_std / sqrt(n))
    upper_ci = upper * (sample_std / sqrt(n))

    return sample_mean + lower_ci, sample_mean + upper_ci

print(confidence_interval(p=.99, sample_mean=1.715588, sample_std=0.029252, n=34))

(np.float64(1.7026658973748656), np.float64(1.7285101026251342))


There’s a 99% probability the average filament diameter for a roll is between 1.7026 and 1.7285.

4. Your marketing department has started a new advertising campaign and wants to know if it affected sales, which in the past averaged $10,345 a day with a standard deviation of $552. The new advertising campaign ran for 45 days and averaged $11,641 in sales.\
Did the campaign affect sales? Why or why not? (Use a two-tailed test for more reliable significance.)

H0:population mean = 10_345\
H1:population mean ≠ 10_345

In [11]:
# Calculating a range for a statistical significance of 5%
from scipy.stats import norm

# Past sales $10,345 a day, $552 std dev
mean = 10_345
std_dev = 552

# What x-value has 2.5% of area behind it?
x1 = norm.ppf(.025, mean, std_dev)
# What x-value has 97.5% of area behind it
x2 = norm.ppf(.975, mean, std_dev) 

print(x1) 
print(x2) 

9263.09988053389
11426.90011946611


In [None]:
# Calculating the two-tailed p-value
from scipy.stats import norm

new_mean_sales = 11_641

# Probability of $9,049 (left side) or less in sales
p1 = norm.cdf(mean-(new_mean_sales-mean), mean, std_dev) 

# Probability of $11,641 or more in sales
p2 = 1.0 - norm.cdf(new_mean_sales, mean, std_dev) 

# P-value of both tails
p_value = p1 + p2
print(p_value) 

0.018883335964961386


Our new sales $11,641 is above upper range of statistical significance of 5% ($11,426 upper area).\
Our p-value 1.888% is less than 5% threshold of significance.\
We passed two-tailed test.\
As a result, we can reject null hypothesis H0, meaning our advertising campaing had an impact on sales.\
Reaching $11,641 was not a pure coincidence, it's a statistically significant result of a new advertising campaing.

In [None]:
# Or even simpler calculation as we have symmetry in our distribution

from scipy.stats import norm

mean = 10345
std_dev = 552

p1 = 1.0 - norm.cdf(11641, mean, std_dev)
# Take advantage of symmetry
p2=p1
# P-value of both tails
# I could have also just multiplied by 2
p_value = p1 + p2
print("Two-tailed P-value", p_value) 
if p_value <= .05:
    print("Passes two-tailed test")
else:
    print("Fails two-tailed test")


Two-tailed P-value 0.01888333596496139
Passes two-tailed test
