# All Tests using DATASET

### 1. T-Test:
The t-test is used to compare the means of two groups. In this example, we'll compare the petal lengths of two species in the Iris dataset.

In [4]:
import scipy.stats as stats
from sklearn.datasets import load_iris

# Load the Iris dataset
iris = load_iris()
group_A = iris.data[iris.target == 0, 2]  # Petal length of setosa
group_B = iris.data[iris.target == 1, 2]  # Petal length of versicolor

# Perform a two-sample t-test
t_stat, p_value = stats.ttest_ind(group_A, group_B)

if p_value < 0.05:
    print("The means of the two species' petal lengths are significantly different.")
else:
    print("No significant difference in petal lengths.")


The means of the two species' petal lengths are significantly different.


#### In this code:

* We load the Iris dataset and extract the petal lengths of the Setosa (group_A) and Versicolor (group_B) species.
* We perform a two-sample t-test using stats.ttest_ind.
* We check the p-value, and if it's less than 0.05 (a common significance level), we conclude that the means of the two species' petal lengths are significantly different.

### 2. F-Test:
The F-test is used to compare variances of two or more groups. In this example, we'll compare the variances of petal lengths between three species in the Iris dataset. 

In [5]:
import scipy.stats as stats
from sklearn.datasets import load_iris

# Load the Iris dataset
iris = load_iris()
group_A = iris.data[iris.target == 0, 2]  # Petal length of Setosa
group_B = iris.data[iris.target == 1, 2]  # Petal length of Versicolor
group_C = iris.data[iris.target == 2, 2]  # Petal length of Virginica

# Perform an F-test (variance ratio test)
f_stat, p_value = stats.f_oneway(group_A, group_B, group_C)

if p_value < 0.05:
    print("At least one species has a significantly different variance in petal lengths.")
else:
    print("No significant difference in variances of petal lengths.")


At least one species has a significantly different variance in petal lengths.


#### In this code:

* We load the Iris dataset and extract the petal lengths of the Setosa (group_A), Versicolor (group_B), and Virginica (group_C) species.
* We perform an F-test using stats.f_oneway.
* We check the p-value, and if it's less than 0.05, we conclude that at least one species has a significantly different variance in petal lengths.

### 3. Z-Test:
The z-test is used to compare a sample mean to a known population mean. In this example, we'll compare the mean petal length of the entire Iris dataset to a hypothetical population mean.

In [20]:
import scipy.stats as stats
from sklearn.datasets import load_iris

# Load the Iris dataset
iris = load_iris()
sample_mean = iris.data[:, 2].mean()  # Mean petal length of the entire dataset
population_mean = 3.75  # Hypothetical population mean

# Calculate the sample standard deviation
sample_stddev = iris.data[:, 2].std()

# Sample size
sample_size = len(iris.data)

# Perform a one-sample z-test
z_score = (sample_mean - population_mean) / (sample_stddev / (sample_size**0.5))
p_value = 2 * (1 - stats.norm.cdf(abs(z_score)))

if p_value < 0.05:
    print("Sample mean is significantly different from the population mean.")
else:
    print("No significant difference from the population mean.")


No significant difference from the population mean.


#### In this code:

* We load the Iris dataset and calculate the mean petal length of the entire dataset as sample_mean.
* We define a hypothetical population mean (population_mean) for comparison.
* We calculate the sample standard deviation (sample_stddev) and sample size (sample_size).
* We perform a one-sample z-test, calculate the z-score, and determine the p-value.
* If the p-value is less than 0.05, we conclude that the sample mean is significantly different from the population mean.

### 3. ANOVA:
ANOVA is used to compare means of three or more groups. In this example, we'll compare petal lengths among all three species in the Iris dataset.

In [21]:
import scipy.stats as stats
from sklearn.datasets import load_iris

# Load the Iris dataset
iris = load_iris()
group_A = iris.data[iris.target == 0, 2]  # Petal length of Setosa
group_B = iris.data[iris.target == 1, 2]  # Petal length of Versicolor
group_C = iris.data[iris.target == 2, 2]  # Petal length of Virginica

# Perform one-way ANOVA
f_stat, p_value = stats.f_oneway(group_A, group_B, group_C)

if p_value < 0.05:
    print("At least one species has a significantly different mean petal length.")
else:
    print("No significant difference in petal lengths among the species.")


At least one species has a significantly different mean petal length.


#### In this code:

* We load the Iris dataset and extract the petal lengths of the Setosa (group_A), Versicolor (group_B), and Virginica (group_C) species.
* We perform a one-way ANOVA using stats.f_oneway.
* We check the p-value, and if it's less than 0.05, we conclude that at least one species has a significantly different mean petal length.

### 4. Chi-Square Test:
The chi-square test is used to test for associations between categorical variables. In this example, we'll test for an association between the species and sepal width of the Iris dataset.

In [22]:
import scipy.stats as stats
from sklearn.datasets import load_iris

# Load the Iris dataset
iris = load_iris()
observed = [[len(iris.target[iris.target == 0]), len(iris.target[iris.target == 1]), len(iris.target[iris.target == 2])],
            [len(iris.data[(iris.target == 0) & (iris.data[:, 1] > 3.0)]),
             len(iris.data[(iris.target == 1) & (iris.data[:, 1] > 3.0)]),
             len(iris.data[(iris.target == 2) & (iris.data[:, 1] > 3.0)])]]

# Perform a chi-square test for independence
chi2, p_value, dof, expected = stats.chi2_contingency(observed)

if p_value < 0.05:
    print("There is a significant association between species and sepal width.")
else:
    print("No significant association between species and sepal width.")


There is a significant association between species and sepal width.


#### In this code:

* We load the Iris dataset and create a contingency table (observed) with counts of species (rows) and sepal widths greater than 3.0 (columns).
* We perform a chi-square test for independence using stats.chi2_contingency.
* We check the p-value, and if it's less than 0.05, we conclude that there is a significant association between species and sepal width.

## ANOVA 


In [1]:
import pandas as pd
from scipy.stats import f_oneway
from sklearn.datasets import load_iris

# Load the Iris dataset
iris = load_iris()

# Convert the dataset into a Pandas DataFrame
df = pd.DataFrame(data=iris.data, columns=iris.feature_names)
df['species'] = iris.target

# Group the data by species and extract the sepal length column
group1 = df[df['species'] == 0]['sepal length (cm)']
group2 = df[df['species'] == 1]['sepal length (cm)']
group3 = df[df['species'] == 2]['sepal length (cm)']

# Perform ANOVA to test for significant differences between group means
f, p = f_oneway(group1, group2, group3)

if p < 0.05:
    print('Reject null hypothesis: at least one group mean is different')
else:
    print('Fail to reject null hypothesis: all group means are the same')

Reject null hypothesis: at least one group mean is different


In this example, we first load the Iris dataset using the load_iris() function from scikit-learn. We then convert the dataset into a Pandas DataFrame and extract the sepal length (cm) column for each of the three species of iris.

Next, we perform ANOVA on the three groups of data using the f_oneway() function from scipy.stats. This function returns the F-statistic and p-value for the test. If the p-value is less than 0.05, we reject the null hypothesis and conclude that there is evidence of a difference between at least one pair of group means. If the p-value is greater than or equal to 0.05, we fail to reject the null hypothesis and conclude that there is not enough evidence to support a difference between the group means.

In [19]:
# | F-test        | Compares variances of two or more groups | Compare variances of petal lengths between species | ```python
import scipy.stats as stats
from sklearn.datasets import load_iris

# Load the Iris dataset
iris = load_iris()
group_A = iris.data[iris.target == 0, 2]  # Petal length of setosa
group_B = iris.data[iris.target == 1, 2]  # Petal length of versicolor
group_C = iris.data[iris.target == 2, 2]  # Petal length of virginica

# Perform an F-test (variance ratio test)
f_stat, p_value = stats.f_oneway(group_A, group_B, group_C)
if p_value < 0.05:
    print("At least one species has a significantly different variance in petal lengths.")
else:
    print("No significant difference in variances of petal lengths.")

# | z-test        | Compares a sample mean to a population mean | Compare sample mean of petal length to the population mean | ```python
import scipy.stats as stats
from sklearn.datasets import load_iris

# Load the Iris dataset
iris = load_iris()
sample_mean = iris.data[:, 2].mean()  # Mean petal length of the entire dataset
population_mean = 3.75  # Hypothetical population mean

# Perform a one-sample z-test
sample_stddev = iris.data[:, 2].std()  # Standard deviation
sample_size = len(iris.data)
z_score = (sample_mean - population_mean) / (sample_stddev / (sample_size**0.5))
p_value = 2 * (1 - stats.norm.cdf(abs(z_score)))

if p_value < 0.05:
    print("Sample mean is significantly different from the population mean.")
else:
    print("No significant difference from the population mean.")

# | ANOVA         | Compares means of three or more groups | Compare petal lengths between all three species | ```python
import scipy.stats as stats
from sklearn.datasets import load_iris

# Load the Iris dataset
iris = load_iris()
group_A = iris.data[iris.target == 0, 2]  # Petal length of setosa
group_B = iris.data[iris.target == 1, 2]  # Petal length of versicolor
group_C = iris.data[iris.target == 2, 2]  # Petal length of virginica

# Perform one-way ANOVA
f_stat, p_value = stats.f_oneway(group_A, group_B, group_C)

if p_value < 0.05:
    print("At least one species has a significantly different mean petal length.")
else:
    print("No significant difference in petal lengths among the species.")

# | Chi-square test | Tests for associations between categorical variables | Test for association between species and sepal width | ```python
import scipy.stats as stats
from sklearn.datasets import load_iris

# Load the Iris dataset
iris = load_iris()
observed = [[len(iris.target[iris.target == 0]), len(iris.target[iris.target == 1]), len(iris.target[iris.target == 2])],
            [len(iris.data[(iris.target == 0) & (iris.data[:, 1] > 3.0)]),
             len(iris.data[(iris.target == 1) & (iris.data[:, 1] > 3.0)]),
             len(iris.data[(iris.target == 2) & (iris.data[:, 1] > 3.0)])]]

# Perform a chi-square test for independence
chi2, p_value, dof, expected = stats.chi2_contingency(observed)

if p_value < 0.05:
    print("There is a significant association between species and sepal width.")
else:
    print("No significant association between species and sepal width.")


# | F-test        | Compares variances of two or more groups | Compare variances of petal lengths between species | ```python
import scipy.stats as stats
from sklearn.datasets import load_iris

# Load the Iris dataset
iris = load_iris()
group_A = iris.data[iris.target == 0, 2]  # Petal length of setosa
group_B = iris.data[iris.target == 1, 2]  # Petal length of versicolor
group_C = iris.data[iris.target == 2, 2]  # Petal length of virginica

# Perform an F-test (variance ratio test)
f_stat, p_value = stats.f_oneway(group_A, group_B, group_C)
if p_value < 0.05:
    print("At least one species has a significantly different variance in petal lengths.")
else:
    print("No significant difference in variances of petal lengths.")

# | z-test        | Compares a sample mean to a population mean | Compare sample mean of petal length to the population mean | ```python
import scipy.stats as stats
from sklearn.datasets import load_iris

# Load the Iris dataset
iris = load_iris()
sample_mean = iris.data[:, 2].mean()  # Mean petal length of the entire dataset
population_mean = 3.75  # Hypothetical population mean

# Perform a one-sample z-test
sample_stddev = iris.data[:, 2].std()  # Standard deviation
sample_size = len(iris.data)
z_score = (sample_mean - population_mean) / (sample_stddev / (sample_size**0.5))
p_value = 2 * (1 - stats.norm.cdf(abs(z_score)))

if p_value < 0.05:
    print("Sample mean is significantly different from the population mean.")
else:
    print("No significant difference from the population mean.")

# | ANOVA         | Compares means of three or more groups | Compare petal lengths between all three species | ```python
import scipy.stats as stats
from sklearn.datasets import load_iris

# Load the Iris dataset
iris = load_iris()
group_A = iris.data[iris.target == 0, 2]  # Petal length of setosa
group_B = iris.data[iris.target == 1, 2]  # Petal length of versicolor
group_C = iris.data[iris.target == 2, 2]  # Petal length of virginica

# Perform one-way ANOVA
f_stat, p_value = stats.f_oneway(group_A, group_B, group_C)

if p_value < 0.05:
    print("At least one species has a significantly different mean petal length.")
else:
    print("No significant difference in petal lengths among the species.")

# | Chi-square test | Tests for associations between categorical variables | Test for association between species and sepal width | ```python
import scipy.stats as stats
from sklearn.datasets import load_iris

# Load the Iris dataset
iris = load_iris()
observed = [[len(iris.target[iris.target == 0]), len(iris.target[iris.target == 1]), len(iris.target[iris.target == 2])],
            [len(iris.data[(iris.target == 0) & (iris.data[:, 1] > 3.0)]),
             len(iris.data[(iris.target == 1) & (iris.data[:, 1] > 3.0)]),
             len(iris.data[(iris.target == 2) & (iris.data[:, 1] > 3.0)])]]

# Perform a chi-square test for independence
chi2, p_value, dof, expected = stats.chi2_contingency(observed)

if p_value < 0.05:
    print("There is a significant association between species and sepal width.")
else:
    print("No significant association between species and sepal width.")


#The provided code examples use the Iris dataset to demonstrate how to perform each test. These examples are simplified for illustration purposes. In real-world scenarios, you would work with more extensive datasets and may need to apply additional considerations and assumptions based on your specific analysis and research design. The comments in the code explain key steps in each analysis.


At least one species has a significantly different variance in petal lengths.
No significant difference from the population mean.
At least one species has a significantly different mean petal length.
There is a significant association between species and sepal width.
At least one species has a significantly different variance in petal lengths.
No significant difference from the population mean.
At least one species has a significantly different mean petal length.
There is a significant association between species and sepal width.


## A/B testing:



A/B testing, also known as split testing, is typically used in situations where you want to compare two or more versions of a webpage, application, or content to determine which one performs better in achieving a specific goal. A/B testing is commonly used in the following scenarios:

1. Website Optimization: A/B testing is widely used in web optimization to compare different versions of a webpage. It can be used to test variations of elements like headlines, images, call-to-action buttons, layout, and more to determine which combination results in higher conversion rates, such as sign-ups, purchases, or click-throughs.

2. Email Marketing: A/B testing is employed to test different subject lines, email copy, and images in email marketing campaigns. The goal is to improve open rates, click-through rates, and overall engagement.

3. Advertising: Marketers use A/B testing to compare the effectiveness of different ad creatives, ad copy, targeting options, and bidding strategies in online advertising campaigns.

4. Mobile App Optimization: Mobile app developers and marketers use A/B testing to improve user engagement, retention, and in-app conversions by testing variations of app features, designs, and user interfaces.

5. Product Testing: A/B testing can be used to test changes or features in a product, such as software applications or hardware devices, to determine which version is more preferred or effective.

6. Content Testing: Content publishers often perform A/B tests on articles, blog posts, or other content to see which headlines, content formats, or layouts lead to higher engagement, longer time spent on the page, or more social shares.

7. E-commerce: E-commerce platforms use A/B testing to optimize product listings, product descriptions, pricing, and the checkout process to maximize conversions and revenue.

In A/B testing, you need to have clear and measurable goals (e.g., increased click-through rate, higher conversion rate, more revenue) to assess the performance of different variations. It is crucial to have a well-defined user base, randomly assign users to different groups, and use statistical analysis to determine if the differences in performance are statistically significant.

A/B testing is not typically used with datasets like the Iris dataset, which is intended for machine learning and statistical analysis, not for comparing variations. It is used in scenarios where you are making changes to something and want to measure the impact of those changes on user behavior or outcomes.

In [2]:
import random

# Define two variations or groups (A and B)
variation_A = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]  # Control group
variation_B = [1, 1, 1, 1, 1, 1, 1, 1, 1, 0]  # Treatment group with a change

# Simulate user interactions (e.g., clicks) for each group
def simulate_user_interactions(variation):
    clicks = 0
    for _ in range(len(variation)):
        if random.random() < variation[_]:
            clicks += 1
    return clicks

# Run the A/B test and collect data
clicks_A = simulate_user_interactions(variation_A)
clicks_B = simulate_user_interactions(variation_B)

# Calculate the click-through rates (CTR)
CTR_A = clicks_A / len(variation_A)
CTR_B = clicks_B / len(variation_B)

# Compare CTRs and determine if there is a significant difference
if CTR_B > CTR_A:
    print("Variation B is performing better.")
    # You may perform further statistical tests to confirm significance.
else:
    print("Variation A is performing better.")


Variation A is performing better.


### In this code:

1. We define two variations: variation_A and variation_B, representing the control group and the treatment group, respectively.

2. We simulate user interactions (e.g., clicks) for each group using a simple random process.

3. We calculate the click-through rates (CTR) for each group, which is the proportion of users who clicked something in each group.

4. We compare the CTRs and determine which variation is performing better.

In a real-world scenario, you would typically have more complex data, larger sample sizes, and use statistical tests like t-tests or chi-squared tests to determine if the difference in performance is statistically significant. The code provided is a simplified example for illustration purposes.