# Independent t-test

#### Gaussian distribution

$$ f(x) = \frac{1}{\sigma \sqrt{2\pi}} e^{ -\frac{1}{2} \left(\frac{x-\mu}{\sigma}\right)^2 } $$
where:
- $ \mu $ is the mean.
- $ \sigma $ is the standard deviation.


Important property of Gaussians: the sum of two Gaussian distributions will also result in a Gaussian distribution. 

This property is a characteristic of the Gaussian distribution and is a consequence of the linearity of the mathematical operations involved.

If you have two Gaussian distributions, each defined by their mean $( \mu_1 $ and $ \mu_2 $ )  and standard deviation $( \sigma_1 $ and $ \sigma_2 $ ), the sum of these two Gaussian distributions will be another Gaussian distribution with a mean equal to the sum of the individual means and a standard deviation equal to the square root of the sum of the squares of the individual standard deviations:

For two Gaussian distributions $ f_1(x) $ and $ f_2(x) $:

$$ f_1(x) = \frac{1}{\sigma_1 \sqrt{2\pi}} \exp\left(-\frac{(x - \mu_1)^2}{2\sigma_1^2}\right) $$
$$ f_2(x) = \frac{1}{\sigma_2 \sqrt{2\pi}} \exp\left(-\frac{(x - \mu_2)^2}{2\sigma_2^2}\right) $$

The sum of these distributions $ f(x) = f_1(x) + f_2(x) $ will yield another Gaussian distribution with a mean $ \mu = \mu_1 + \mu_2 $ and a standard deviation $ \sigma = \sqrt{\sigma_1^2 + \sigma_2^2} $.

This property holds true regardless of the specific values of $ \mu_1 $, $ \mu_2 $, $ \sigma_1 $, and $ \sigma_2 $. It's important to note that this principle applies to the linear combination of Gaussian distributions. If you have more complex combinations or non-linear operations involved, the resulting distribution might not be Gaussian.


#### Assumptions of the independent-measures t-test

**1. Independence of Observations**: The two groups being compared should be independent of each other. This means that the observations in one group should not influence the observations in the other group. This assumption often stems from the study design; for example, if you randomly assign participants to two different groups, the groups can be considered independent.

**2. Normality**: The data within each group should be approximately normally distributed. The t-test is robust to mild violations of this assumption, especially when sample sizes are large. For smaller sample sizes, if this assumption is violated, non-parametric tests like the Mann-Whitney U test might be considered.

**3. Homogeneity of Variance (Homoscedasticity)**: The variances of the two groups should be roughly equal. This is the assumption behind the "pooled" variance in the standard independent t-test formula. If this assumption is not met, a modification to the t-test, called the Welch's t-test, can be used. Welch's t-test does not assume equal variances.

**4. Interval or Ratio Level of Measurement**: The dependent variable should be measured at the interval or ratio level (i.e., continuous data).

**5. Random Sampling**: The data should come from a random sample, meaning every member of the population has an equal chance of being included in the sample.

**6. Absence of Outliers**: Outliers can unduly influence the t-test leading to possible false conclusions. It's good practice to check for outliers and decide how to manage them before conducting the test.

Violation of these assumptions can lead to incorrect conclusions. However, the t-test is known to be robust, especially when the sample sizes are large.

#### Setup

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import ttest_ind, norm
import plotly.graph_objects as go
from plotly.subplots import make_subplots

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import t
import ipywidgets as widgets
from IPython.display import display
import pandas as pd

def plot_distributions(mu_a=5, mu_b=10, std_a=2, std_b=3):
    # Sample size for population A and B
    sample_size_a = 50
    sample_size_b = 50

    # Generate the values for the x-axis
    x = np.linspace(-5, 20, 1000)

    # Compute the Gaussian distributions
    y_a = (1 / (std_a * np.sqrt(2 * np.pi))) * np.exp(-0.5 * ((x - mu_a) / std_a)**2)
    y_b = (1 / (std_b * np.sqrt(2 * np.pi))) * np.exp(-0.5 * ((x - mu_b) / std_b)**2)

    # Compute the standard error and degrees of freedom
    std_err_a = std_a / np.sqrt(sample_size_a)
    std_err_b = std_b / np.sqrt(sample_size_b)
    estimated_std_err = np.sqrt((std_a**2 / sample_size_a) + (std_b**2 / sample_size_b))
    mean_diff = mu_a - mu_b
    df_a = sample_size_a - 1
    df_b = sample_size_b - 1

    # Compute the t-distributions for population A and B
    y_t_a = t.pdf(x, df_a, loc=mu_a, scale=std_err_a)
    y_t_b = t.pdf(x, df_b, loc=mu_b, scale=std_err_b)

    # Create subplots with a wider distance
    fig, axes = plt.subplots(1, 2, figsize=(8, 3))
    plt.subplots_adjust(wspace=2)  # Increase the distance between plots

    # Plot Gaussian and t-distributions
    axes[0].plot(x, y_a, label='Population A')
    axes[0].plot(x, y_b, label='Population B')
    axes[0].plot(x, y_t_a, linestyle='--')
    axes[0].plot(x, y_t_b, linestyle='--')
    axes[0].legend(loc='center left', bbox_to_anchor=(1, 0.5))
    axes[0].set_title('Theoretical Populations and Sampling Distributions')
    axes[0].set_xlabel('Value')
    axes[0].set_ylabel('Density')

    # Compute and plot the t-distribution for the independent-measures t-test
    degrees_of_freedom = sample_size_a + sample_size_b - 2
    t_values = np.linspace(-10, 10, 1000)
    y_t_distribution = t.pdf(t_values, degrees_of_freedom, loc=mean_diff, scale=estimated_std_err)
    axes[1].plot(t_values, y_t_distribution, label='Independent-Measures t-Distribution')
    axes[1].legend(loc='center left', bbox_to_anchor=(1, 0.5))
    axes[1].axvline(x=0, linestyle='--', color='gray')  # Add a vertical line at t=0
    axes[1].set_title('T-Distribution for Independent-Measures T-Test')
    axes[1].set_xlabel('T-Value')
    axes[1].set_ylabel('Density')
    plt.show()

    # Create a DataFrame to display as a table
    data = {
        'Population': ['A', 'B', 'A - B'],
        'Standard Error': [std_err_a, std_err_b, estimated_std_err],
        'Mean': [mu_a, mu_b, mean_diff]
    }
    df = pd.DataFrame(data)
    display(df)

# Widgets to change the means and standard deviations
mu_a_widget = widgets.FloatSlider(value=5, min=0, max=20, step=0.5, description='mu_a:')
mu_b_widget = widgets.FloatSlider(value=10, min=0, max=20, step=0.5, description='mu_b:')
std_a_widget = widgets.FloatSlider(value=2, min=0.1, max=10, step=0.1, description='std_a:')
std_b_widget = widgets.FloatSlider(value=3, min=0.1, max=10, step=0.1, description='std_b:')

# Display the interactive plot
widgets.interactive(plot_distributions, mu_a=mu_a_widget, mu_b=mu_b_widget, std_a=std_a_widget, std_b=std_b_widget)


In [None]:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import t

# Means
mu_a = 5
mu_b = 10

# Standard deviations
std_a = 2
std_b = 3

# Sample size for population A and B
sample_size_a = 50
sample_size_b = 50

# Generate the values for the x-axis
x = np.linspace(-5, 20, 1000)

# Compute the Gaussian distribution for population A
y_a = (1 / (std_a * np.sqrt(2 * np.pi))) * np.exp(-0.5 * ((x - mu_a) / std_a)**2)

# Compute the Gaussian distribution for population B
y_b = (1 / (std_b * np.sqrt(2 * np.pi))) * np.exp(-0.5 * ((x - mu_b) / std_b)**2)

# Compute the standard error for population A and B
std_err_a = std_a / np.sqrt(sample_size_a)
std_err_b = std_b / np.sqrt(sample_size_b)

# Compute the degrees of freedom for population A and B
df_a = sample_size_a - 1
df_b = sample_size_b - 1

# Compute the t-distribution for population A
y_t_a = t.pdf(x, df_a, loc=mu_a, scale=std_err_a)

# Compute the t-distribution for population B
y_t_b = t.pdf(x, df_b, loc=mu_b, scale=std_err_b)

# Plot both Gaussian distributions
plt.plot(x, y_a, label='Population A (mu={}, std={})'.format(mu_a, std_a))
plt.plot(x, y_b, label='Population B (mu={}, std={})'.format(mu_b, std_b))

# Plot both t-distributions
plt.plot(x, y_t_a, label='Sample Means A (sample_size={}, mu={}, std_err={})'.format(sample_size_a, mu_a, std_err_a), linestyle='--')
plt.plot(x, y_t_b, label='Sample Means B (sample_size={}, mu={}, std_err={})'.format(sample_size_b, mu_b, std_err_b), linestyle='--')

plt.title('Theoretical Populations and Sampling Distributions')
plt.xlabel('Value')
plt.ylabel('Density')
plt.legend()
plt.show()



In [None]:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import t

# Calculate the estimated standard error for the difference between means
estimated_std_err = np.sqrt((std_a**2 / sample_size_a) + (std_b**2 / sample_size_b))

# Calculate the difference between sample means
mean_diff = mu_a - mu_b

# Calculate the degrees of freedom for the independent-measures t-test
degrees_of_freedom = sample_size_a + sample_size_b - 2

# Define a range of values for the t-axis
t_values = np.linspace(-10, 10, 1000)

# Compute the t-distribution for the given parameters
y_t_distribution = t.pdf(t_values, degrees_of_freedom, loc=mean_diff, scale=estimated_std_err)

# Plot the t-distribution
plt.plot(t_values, y_t_distribution, label='Independent-Measures t-Distribution')
plt.title('T-Distribution for Independent-Measures T-Test')
plt.xlabel('T-Value')
plt.ylabel('Density')
plt.legend()
plt.show()

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import t
import ipywidgets as widgets
from tabulate import tabulate

var_a = 1
var_b = 1
n_a = 100
n_b = 100

def plot_distributions(mu_a, mu_b):
    # Parameters for the sampling distributions
    sigma_a = np.sqrt(var_a)
    sigma_b = np.sqrt(var_b)
    est_se_diff = np.sqrt(var_a/n_a + var_b/n_b)

    x = np.linspace(mu_a - 4*sigma_a, mu_b + 4*sigma_b, 1000)

    # Degrees of freedom for the difference
    df_diff = (var_a/n_a + var_b/n_b)**2 / ((var_a/n_a)**2/(n_a-1) + (var_b/n_b)**2/(n_b-1))

    # Using t-distribution PDF for m_A and m_B
    pdf_a = t.pdf(x, df=n_a-1, loc=mu_a, scale=sigma_a/np.sqrt(n_a))
    pdf_b = t.pdf(x, df=n_b-1, loc=mu_b, scale=sigma_b/np.sqrt(n_b))

    # Using t-distribution for m_A - m_B
    pdf_diff = t.pdf(x, df=df_diff, loc=mu_a - mu_b, scale=est_se_diff)

    fig, ax = plt.subplots(1, 3, figsize=(22, 5))
    
    # First plot
    ax[0].plot(x, pdf_a, color='blue', label='Sample Mean A')
    ax[0].plot(x, pdf_b, color='green', label='Sample Mean B')
    ax[0].axvline(mu_a, color='red', linestyle='dashed', linewidth=1)
    ax[0].axvline(mu_b, color='red', linestyle='dashed', linewidth=1)
    ax[0].set_title('Sampling Distributions of $m_A$ and $m_B$')
    ax[0].legend()
    ax[0].set_ylim(bottom=0)
    
    # Second plot
    ax[1].plot(x, pdf_diff, color='purple', label='$m_A - m_B$')
    ax[1].axvline(mu_a - mu_b, color='red', linestyle='dashed', linewidth=1)
    ax[1].set_title('Sampling Distribution of $m_A - m_B$')
    ax[1].legend()
    ax[1].set_ylim(ax[0].get_ylim())
    ax[1].set_xlim(ax[0].get_xlim())

    # Third plot with the t-distribution for the difference
    ax[2].plot(x, pdf_diff, color='purple', label='$m_A - m_B$ t-Distribution')
    ax[2].axvline(mu_a - mu_b, color='red', linestyle='dashed', linewidth=1)
    ax[2].set_title('t-Distribution of Difference in Sample Means')
    ax[2].legend()

    plt.tight_layout()
    plt.show()

    # Creating a table with values
    headers = ['Distribution', 'Peak', 'Mean', 'Standard Deviation']
    table_data = [
        ["m_A", max(pdf_a), mu_a, sigma_a],
        ["m_B", max(pdf_b), mu_b, sigma_b],
        ["m_A - m_B", max(pdf_diff), mu_a - mu_b, est_se_diff]
    ]

    table = tabulate(table_data, headers=headers, tablefmt='grid', floatfmt=(".4f", ".4f", ".4f", ".4f"))
    print(table)

# Create sliders for ma and mb
mu_a_slider = widgets.FloatSlider(value=50, min=0, max=100, step=1, description='mu_A:')
mu_b_slider = widgets.FloatSlider(value=51, min=0, max=100, step=1, description='mu_B:')

widgets.interactive(plot_distributions, mu_a=mu_a_slider, mu_b=mu_b_slider)


In [None]:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm
import ipywidgets as widgets
from IPython.display import display

var_a = 25
var_b = 30
n_a = 100
n_b = 120

def plot_distributions(mu_a, mu_b):
    # Parameters for the sampling distributions
    sigma_a = np.sqrt(var_a / n_a)
    sigma_b = np.sqrt(var_b / n_b)
    sigma_diff = np.sqrt(var_a/n_a + var_b/n_b)

    x = np.linspace(min(mu_a, mu_b) - 4*max(sigma_a, sigma_b), max(mu_a, mu_b) + 4*max(sigma_a, sigma_b), 1000)
    
    pdf_a = norm.pdf(x, mu_a, sigma_a)
    pdf_b = norm.pdf(x, mu_b, sigma_b)
    pdf_diff = norm.pdf(x, mu_a - mu_b, sigma_diff)

    fig, ax = plt.subplots(1, 2, figsize=(15, 5))
    
    ax[0].plot(x, pdf_a, color='blue', label='Sample Mean A')
    ax[0].plot(x, pdf_b, color='green', label='Sample Mean B')
    ax[0].axvline(mu_a, color='red', linestyle='dashed', linewidth=1)
    ax[0].axvline(mu_b, color='red', linestyle='dashed', linewidth=1)
    ax[0].set_title('Sampling Distributions of $m_A$ and $m_B$')
    ax[0].legend()
    ax[0].set_ylim(bottom=0)
    
    ax[1].plot(x, pdf_diff, color='purple', label='$m_A - m_B$')
    ax[1].axvline(mu_a - mu_b, color='red', linestyle='dashed', linewidth=1)
    ax[1].set_title('Sampling Distribution of $m_A - m_B$')
    ax[1].legend()
    ax[1].set_ylim(ax[0].get_ylim())  # setting the y-limits same as the first plot
    ax[1].set_xlim(ax[0].get_xlim())  # setting the x-limits same as the first plot
    
    plt.tight_layout()
    plt.show()

# Create sliders for ma and mb
mu_a_slider = widgets.FloatSlider(value=50, min=0, max=100, step=1, description='mu_A:')
mu_b_slider = widgets.FloatSlider(value=51, min=0, max=100, step=1, description='mu_B:')

widgets.interactive(plot_distributions, mu_a=mu_a_slider, mu_b=mu_b_slider)

In [None]:
# Generate data for plotting
x = np.linspace(-8, 8, 1000) # adjusted range to accommodate potential mean shifts
y = norm.pdf(x, 0, 1)

# Create ma and mb variables
ma = 1     # Mean for M_A, adjust as necessary
mb = 4.5   # Mean for M_B, adjust as necessary

fig, axes = plt.subplots(1, 1, figsize=(16, 6))

# Adjust plots to use ma and mb
axes.plot(x + ma, y, label=r'$M_A$')
axes.plot(x + mb, y, label=r'$M_B$', color='red')

axes.set_title('SDMS')
axes.set_ylabel('Probability Density')
axes.set_xlabel('Value')

# Vertical lines for the means of each curve
y_peak = max(y)
axes.plot([ma, ma], [0, y_peak], color='blue', linestyle='--')
axes.plot([mb, mb], [0, y_peak], color='red', linestyle='--')

plt.legend(loc='center left', bbox_to_anchor=(1, 0.5))
axes.grid(True)
axes.set_ylim(0)
plt.show()


In [None]:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm

# Generate data for plotting
x = np.linspace(-8, 8, 1000)  # adjusted range to accommodate potential mean shifts
y = norm.pdf(x, 0, 1)

# Create ma and mb variables
ma = 1     # Mean for M_A, adjust as necessary
mb = -1   # Mean for M_B, adjust as necessary

fig, axes = plt.subplots(1, 1, figsize=(16, 6))

# Plot the distribution difference, which is ma - mb
difference = ma - mb
axes.plot(x + difference, y, label=r'$M_A - M_B$', color='green')

axes.set_title('Difference in SDMS (M_A - M_B)')
axes.set_ylabel('Probability Density')
axes.set_xlabel('Value')

# Vertical line for the mean of the difference curve
y_peak = max(y)
axes.plot([difference, difference], [0, y_peak], color='green', linestyle='--')

plt.legend(loc='center left', bbox_to_anchor=(1, 0.5))
axes.grid(True)
axes.set_ylim(0)
plt.show()


In [None]:
# Generate data for plotting
ma_values = np.linspace(-8, 8, 100)
mb_values = np.linspace(-8, 8, 100)
MA, MB = np.meshgrid(ma_values, mb_values)
MA_minus_MB = MA - MB
PD = norm.pdf(MA_minus_MB, 0, 2)

# 2D Projections (Intersections)
fig, axarr = plt.subplots(2, 4, figsize=(24, 10))

# Intersection with MA=0 plane
axarr[0][0].plot(mb_values, PD[int(len(ma_values)/2),:], color='blue')
axarr[0][0].set_title('Intersection with MA=0 plane')
axarr[0][0].set_xlabel('MB')
axarr[0][0].set_ylabel('Probability Density')

# Intersection with MB=0 plane
axarr[0][1].plot(ma_values, PD[:,int(len(mb_values)/2)], color='green')
axarr[0][1].set_title('Intersection with MB=0 plane')
axarr[0][1].set_xlabel('MA')
axarr[0][1].set_ylabel('Probability Density')

# Intersection with MA=-MB plane
diagonal_intersection_minus = [PD[i, len(mb_values) - 1 - i] for i in range(len(ma_values))]
axarr[0][2].plot(ma_values, diagonal_intersection_minus, color='red')
axarr[0][2].set_title('Intersection with MA=-MB plane')
axarr[0][2].set_xlabel('MA or -MB')
axarr[0][2].set_ylabel('Probability Density')

# Intersection with MA=MB plane
diagonal_intersection_equal = [PD[i, i] for i in range(len(ma_values))]
axarr[0][3].plot(ma_values, diagonal_intersection_equal, color='purple')
axarr[0][3].set_title('Intersection with MA=MB plane')
axarr[0][3].set_xlabel('MA or MB')
axarr[0][3].set_ylabel('Probability Density')

# $ma-mb$ vs Probability Density on each plane
# MA=0 plane
axarr[1][0].scatter(MA_minus_MB[int(len(ma_values)/2),:], PD[int(len(ma_values)/2),:], color='blue')
axarr[1][0].set_title('ma-mb vs PD (MA=0 plane)')
axarr[1][0].set_xlabel('ma-mb')
axarr[1][0].set_ylabel('Probability Density')

# MB=0 plane
axarr[1][1].scatter(MA_minus_MB[:,int(len(mb_values)/2)], PD[:,int(len(mb_values)/2)], color='green')
axarr[1][1].set_title('ma-mb vs PD (MB=0 plane)')
axarr[1][1].set_xlabel('ma-mb')
axarr[1][1].set_ylabel('Probability Density')

# MA=-MB plane
axarr[1][2].scatter(ma_values - (mb_values[::-1]), diagonal_intersection_minus, color='red')
axarr[1][2].set_title('ma-mb vs PD (MA=-MB plane)')
axarr[1][2].set_xlabel('ma-mb')
axarr[1][2].set_ylabel('Probability Density')

# MA=MB plane
axarr[1][3].scatter(ma_values, diagonal_intersection_equal, color='purple')
axarr[1][3].set_title('ma-mb vs PD (MA=MB plane)')
axarr[1][3].set_xlabel('ma-mb')
axarr[1][3].set_ylabel('Probability Density')

plt.tight_layout()
plt.show()

# Create subplots with two columns for 3D plots
fig = make_subplots(rows=1, cols=2,
                    subplot_titles=('3D Surface Plot with Multiple Planes', '3D Surface Plot with ma-mb on the Z-axis'),
                    specs=[[{'type': 'surface'}, {'type': 'surface'}]],
                    horizontal_spacing=0.3)  # Increased spacing

# Add first surface plot to the left column
fig.add_trace(
    go.Surface(z=PD, x=MA, y=MB, surfacecolor=MA_minus_MB, colorscale='Viridis', colorbar=dict(title="ma-mb", x=0.42, xanchor="center")),
    row=1, col=1
)

# Planes for the first plot
# Plane for ma=-mb
plane_ma = np.linspace(-8, 8, 100)
plane_mb = -plane_ma
plane_MA, plane_Z = np.meshgrid(plane_ma, np.linspace(0, PD.max(), 100))
plane_MB = -plane_MA
plane1 = go.Surface(z=plane_Z, x=plane_MA, y=plane_MB, colorscale=[[0, 'rgba(255,0,0,0.7)'], [1, 'rgba(255,0,0,0.7)']], showscale=False)
fig.add_trace(plane1, row=1, col=1)

# Plane for ma=mb
plane_ma2 = np.linspace(-8, 8, 100)
plane_mb2 = plane_ma2.copy()
plane_MA2, plane_Z2 = np.meshgrid(plane_ma2, np.linspace(0, PD.max(), 100))
plane_MB2 = plane_MA2
plane2 = go.Surface(z=plane_Z2, x=plane_MA2, y=plane_MB2, colorscale=[[0, 'rgba(128,0,128,0.7)'], [1, 'rgba(128,0,128,0.7)']], showscale=False)
fig.add_trace(plane2, row=1, col=1)

# Plane for MA=0
plane_Z_MA, plane_MB_MA = np.meshgrid(np.linspace(0, PD.max(), 100), mb_values)
plane_MA_0 = np.zeros_like(plane_Z_MA)
plane3 = go.Surface(z=plane_Z_MA, x=plane_MA_0, y=plane_MB_MA, colorscale=[[0, 'rgba(0,0,255,0.7)'], [1, 'rgba(0,0,255,0.7)']], showscale=False)
fig.add_trace(plane3, row=1, col=1)

# Plane for MB=0
plane_MA_MB, plane_Z_MB = np.meshgrid(ma_values, np.linspace(0, PD.max(), 100))
plane_MB_0 = np.zeros_like(plane_Z_MB)
plane4 = go.Surface(z=plane_Z_MB, x=plane_MA_MB, y=plane_MB_0, colorscale=[[0, 'rgba(0,255,0,0.7)'], [1, 'rgba(0,255,0,0.7)']], showscale=False)
fig.add_trace(plane4, row=1, col=1)

# Add second surface plot to the right column
fig.add_trace(
    go.Surface(z=MA_minus_MB, x=MA, y=MB, surfacecolor=PD, colorscale='Plasma', colorbar=dict(title="Probability Density")),
    row=1, col=2
)

# Planes for the second plot
# Plane for MA=-MB
plane_ma = np.linspace(-8, 8, 100)
plane_mb = -plane_ma
plane_MA, plane_Z = np.meshgrid(plane_ma, np.linspace(MA_minus_MB.min(), MA_minus_MB.max(), 100))
plane_MB = -plane_MA
plane5 = go.Surface(z=plane_Z, x=plane_MA, y=plane_MB, colorscale=[[0, 'rgba(255,0,0,0.7)'], [1, 'rgba(255,0,0,0.7)']], showscale=False)
fig.add_trace(plane5, row=1, col=2)

# Plane for MA=MB
plane_ma2 = np.linspace(-8, 8, 100)
plane_mb2 = plane_ma2.copy()
plane_MA2, plane_Z2 = np.meshgrid(plane_ma2, np.linspace(MA_minus_MB.min(), MA_minus_MB.max(), 100))
plane_MB2 = plane_MA2
plane6 = go.Surface(z=plane_Z2, x=plane_MA2, y=plane_MB2, colorscale=[[0, 'rgba(128,0,128,0.7)'], [1, 'rgba(128,0,128,0.7)']], showscale=False)
fig.add_trace(plane6, row=1, col=2)

# Plane for MA=0
plane_Z_MA, plane_MB_MA = np.meshgrid(np.linspace(MA_minus_MB.min(), MA_minus_MB.max(), 100), mb_values)
plane_MA_0 = np.zeros_like(plane_Z_MA)
plane7 = go.Surface(z=plane_Z_MA, x=plane_MA_0, y=plane_MB_MA, colorscale=[[0, 'rgba(0,0,255,0.7)'], [1, 'rgba(0,0,255,0.7)']], showscale=False)
fig.add_trace(plane7, row=1, col=2)

# Plane for MB=0
plane_MA_MB, plane_Z_MB = np.meshgrid(ma_values, np.linspace(MA_minus_MB.min(), MA_minus_MB.max(), 100))
plane_MB_0 = np.zeros_like(plane_Z_MB)
plane8 = go.Surface(z=plane_Z_MB, x=plane_MA_MB, y=plane_MB_0, colorscale=[[0, 'rgba(0,255,0,0.7)'], [1, 'rgba(0,255,0,0.7)']], showscale=False)
fig.add_trace(plane8, row=1, col=2)

# Update layout for the plots
fig.update_layout(scene=dict(
                    xaxis_title='MA',
                    yaxis_title='MB',
                    zaxis_title='Probability Density'
                    ),
                  scene2=dict(
                    xaxis_title='MA',
                    yaxis_title='MB',
                    zaxis_title='ma-mb'
                    ),
                  width=1200,  # Adjust the width as needed
                  height=600   # Adjust the height as needed
                 )

# Show the combined plot
fig.show()

In [None]:
import numpy as np
import plotly.graph_objects as go
import matplotlib.pyplot as plt

# Assuming initialization code for PD, ma_values, mb_values, and MA_minus_MB
# Please insert or initialize these variables as appropriate.

# Create subplots with two columns for 3D plots
fig = make_subplots(rows=1, cols=2,
                    subplot_titles=('3D Surface Plot with Plane', '3D Surface Plot with ma-mb and Plane'),
                    specs=[[{'type': 'surface'}, {'type': 'surface'}]],
                    horizontal_spacing=0.3)

# Angle in radians to determine the plane's orientation
theta = 0  # When theta is 0, the plane should be along MA = 0

# Generate plane points using polar coordinates
distance = np.linspace(-10, 10, 200)  # Ensuring a larger range than the surface
plane_MA = distance * np.sin(theta)
plane_MB = distance * np.cos(theta)

# Expand dimensions to make the plane that spans the entire plot
plane_MA, plane_Z = np.meshgrid(plane_MA, np.linspace(0, PD.max(), 100))
plane_MB, _ = np.meshgrid(plane_MB, np.linspace(0, PD.max(), 100))

plane1 = go.Surface(z=plane_Z, x=plane_MA, y=plane_MB, colorscale=[[0, 'rgba(255,0,0,0.7)'], [1, 'rgba(255,0,0,0.7)']], showscale=False)
fig.add_trace(plane1, row=1, col=1)

# Adjust the plane's z-range to fit the range of MA_minus_MB for the second plot
plane_MA, plane_Z = np.meshgrid(plane_MA, np.linspace(MA_minus_MB.min(), MA_minus_MB.max(), 100))
plane_MB, _ = np.meshgrid(plane_MB, np.linspace(MA_minus_MB.min(), MA_minus_MB.max(), 100))

plane2 = go.Surface(z=plane_Z, x=plane_MA, y=plane_MB, colorscale=[[0, 'rgba(255,0,0,0.7)'], [1, 'rgba(255,0,0,0.7)']], showscale=False)
fig.add_trace(plane2, row=1, col=2)

# Add the 3D surface plots
fig.add_trace(
    go.Surface(z=PD, x=ma_values, y=mb_values, surfacecolor=MA_minus_MB, colorscale='Viridis', colorbar=dict(title="ma-mb", x=0.42, xanchor="center")),
    row=1, col=1
)
fig.add_trace(
    go.Surface(z=MA_minus_MB, x=ma_values, y=mb_values, surfacecolor=PD, colorscale='Plasma', colorbar=dict(title="Probability Density")),
    row=1, col=2
)

# Update layout for the 3D plots
fig.update_layout(scene=dict(
                    xaxis_title='MA',
                    yaxis_title='MB',
                    zaxis_title='Probability Density'
                    ),
                  scene2=dict(
                    xaxis_title='MA',
                    yaxis_title='MB',
                    zaxis_title='ma-mb'
                    ),
                  width=1200,
                  height=600)

# Show the combined 3D plot
fig.show()

# Plotting the intersection on a 2D plot using matplotlib
intersection_curve_z = np.interp(plane_MA, ma_values, PD.max(axis=1))

plt.figure(figsize=(10,6))
plt.plot(plane_MA[0], intersection_curve_z[0], 'k-', label='Intersection Curve')
plt.xlabel('MA')
plt.ylabel('Probability Density')
plt.title('Intersection of Plane with Probability Density')
plt.legend()
plt.grid(True)
plt.show()


## Example of a two-sample t-test

Suppose a teacher taught two classes. She expected both her classes to perform similarly on a test. She collects the results from both classes and wants to determine if the actual mean scores of the two classes are statistically different.

In [None]:
class1_mean = 55
class2_mean = 54  # Let's assume class2 has a slightly different mean for demonstration
sample_size = 25
sample_std = 10

alpha = 0.05
test_type = 'two-tails' # 'two-tails', 'one-tail'

np.random.seed(42)  # For reproducibility

# Generate random integer scores for both classes
scores_class1 = np.random.randint(class1_mean - sample_std, class1_mean + sample_std + 1, sample_size)
scores_class2 = np.random.randint(class2_mean - sample_std, class2_mean + sample_std + 1, sample_size)

# Display scores of students for both classes
plt.figure(figsize=(12, 6))
x = np.arange(len(scores_class1))
plt.scatter(x, scores_class1, color='blue', label='Class 1 Scores', marker='o')
plt.scatter(x, scores_class2, color='green', label='Class 2 Scores', marker='x')
plt.title('Test Scores of Students in Class 1 and Class 2')
plt.xlabel('Student Index')
plt.ylabel('Score')
plt.legend()
plt.grid(True)
plt.show()

# Two sample t-test
t_stat, p_value = ttest_ind(scores_class1, scores_class2)

print(f"T-statistic: {t_stat:.3f}")
print(f"P-value: {p_value:.3f}")

if test_type == 'two-tails':
    significance_threshold = alpha
else:
    significance_threshold = alpha/2

if p_value < significance_threshold:
    print(f"Based on the p-value, we reject the null hypothesis. The means of the two classes are statistically different at alpha={alpha}.")
else:
    print(f"Based on the p-value, we fail to reject the null hypothesis. The means of the two classes are not statistically different at alpha={alpha}.")
