### 13 March Assignment

### Statistics Advance-6

### Q1. Explain the assumptions required to use ANOVA and provide examples of violations that could impact the validity of the results.

Analysis of Variance (ANOVA) is a statistical method used to compare means across two or more groups to determine if there are significant differences among them. ANOVA makes several assumptions about the data in order to provide valid results. Violations of these assumptions can impact the reliability and validity of ANOVA results. The key assumptions of ANOVA are:

1. **Independence**: The observations within each group are independent of each other. This means that the values in one group do not influence the values in another group.

2. **Normality**: The distribution of the dependent variable (outcome) within each group is approximately normal. Deviations from normality can affect the Type I error rate (false positive rate) of the test.

3. **Homogeneity of Variance**: The variance of the dependent variable is approximately equal across all groups. Unequal variances can lead to inflated Type I error rates and impact the validity of ANOVA results.

Examples of Violations and Impact on Validity:

1. **Non-Independence**: If observations within groups are not independent, it can lead to pseudoreplication and inflated Type I error rates. For instance, repeated measurements on the same subjects or subjects who are related (e.g., family members) violate the independence assumption.

2. **Non-Normality**: If the distribution of the dependent variable is significantly non-normal in one or more groups, it can impact the reliability of ANOVA results. This can lead to increased Type I or Type II errors. For example, if the data is highly skewed or has heavy tails, the assumption of normality may be violated.

3. **Homogeneity of Variance**: Violations of homogeneity of variance can lead to incorrect conclusions about group differences. If one group has a significantly larger variance than others, it can influence the F-statistic, leading to inflated Type I error rates. This is known as the "heteroscedasticity" violation.

Addressing Violations:

- If the assumptions are mildly violated, ANOVA may still be robust enough to provide valid results.
- For normality violations, transformations (e.g., log transformation) might be applied to make the data more normally distributed.
- For heteroscedasticity, a Welch's ANOVA or other robust methods can be used if group sizes are unequal or variances are not equal.
- Non-independence can often be addressed through appropriate study design.


### Q2. What are the three types of ANOVA, and in what situations would each be used?

There are three main types of Analysis of Variance (ANOVA) techniques: One-Way ANOVA, Two-Way ANOVA, and Multivariate ANOVA (MANOVA). Each type of ANOVA is used in different situations to analyze the variability among groups or factors. Here's a brief overview of each type and when it would be used:

1. **One-Way ANOVA**:
   - Situation: Used when you have one categorical independent variable (factor) with more than two levels (groups), and you want to compare means across these groups.
   - Example: Comparing the performance of students in three different teaching methods (A, B, C) to determine if there are significant differences in their test scores.

2. **Two-Way ANOVA**:
   - Situation: Used when you have two categorical independent variables (factors) and you want to examine their individual effects as well as their interaction effect on a continuous dependent variable.
   - Example: Analyzing the effects of both gender (male, female) and diet type (high protein, low protein) on weight loss.

3. **Multivariate ANOVA (MANOVA)**:
   - Situation: Used when you have multiple continuous dependent variables and multiple categorical independent variables (factors), and you want to test for overall differences among groups.
   - Example: Investigating the impact of different levels of exercise intensity (low, moderate, high) and dietary habits (vegetarian, non-vegetarian) on multiple health outcomes (weight, blood pressure, cholesterol level).

Each type of ANOVA helps address specific research questions and hypotheses related to group differences and interactions. It's important to choose the appropriate type of ANOVA based on your study design and the nature of your data. Additionally, checking the assumptions of ANOVA and considering alternatives (such as non-parametric tests) is essential to ensure the validity of your analysis.

### Q4. How would you calculate the total sum of squares (SST), explained sum of squares (SSE), and residual sum of squares (SSR) in a one-way ANOVA using Python?

In a one-way ANOVA, the total sum of squares (SST) measures the total variability in the data, the explained sum of squares (SSE) measures the variability explained by the group means, and the residual sum of squares (SSR) measures the unexplained variability or error. You can calculate these sums of squares using Python. Here's how you can do it:

```python
import numpy as np
import scipy.stats as stats

# Sample data for each group
group1 = np.array([15, 18, 20, 22, 25])
group2 = np.array([30, 32, 35, 38, 40])
group3 = np.array([50, 52, 55, 58, 60])

# Combine the data from all groups
all_data = np.concatenate((group1, group2, group3))

# Calculate the overall mean
overall_mean = np.mean(all_data)

# Calculate the group means
group1_mean = np.mean(group1)
group2_mean = np.mean(group2)
group3_mean = np.mean(group3)

# Calculate the total sum of squares (SST)
sst = np.sum((all_data - overall_mean) ** 2)

# Calculate the explained sum of squares (SSE)
sse = np.sum((group1_mean - overall_mean) ** 2) * len(group1) + \
      np.sum((group2_mean - overall_mean) ** 2) * len(group2) + \
      np.sum((group3_mean - overall_mean) ** 2) * len(group3)

# Calculate the residual sum of squares (SSR)
ssr = sst - sse

# Print results
print("Total Sum of Squares (SST):", sst)
print("Explained Sum of Squares (SSE):", sse)
print("Residual Sum of Squares (SSR):", ssr)
```

In this example, `group1`, `group2`, and `group3` represent the data for each group. The `all_data` array contains the combined data from all groups. The overall mean and group means are calculated. The sums of squares are computed using the formulas:

- SST = \(\sum_{i=1}^{N} (x_i - \bar{x})^2\)
- SSE = \(\sum_{i=1}^{k} n_i (\bar{x}_i - \bar{x})^2\)
- SSR = SST - SSE

Where \(N\) is the total number of observations, \(k\) is the number of groups, \(n_i\) is the number of observations in group \(i\), \(x_i\) is an individual observation, and \(\bar{x}_i\) is the mean of group \(i\).

These calculations help you understand the distribution of variance in the data and the extent to which group means explain the overall variability.

### Q5. In a two-way ANOVA, how would you calculate the main effects and interaction effects using Python?

In a two-way ANOVA, you can calculate the main effects and interaction effects using Python by analyzing the variance attributed to each factor and their interactions. Here's how you can calculate main effects and interaction effects using Python and the `statsmodels` library:

```python
import pandas as pd
import statsmodels.api as sm
from statsmodels.formula.api import ols

# Create a DataFrame with sample data
data = {
    'A': [10, 15, 20, 25, 30, 12, 18, 24, 28, 32],
    'B': [8, 12, 16, 20, 24, 11, 14, 17, 20, 23],
    'Y': [22, 27, 32, 37, 42, 25, 30, 35, 40, 45]
}

df = pd.DataFrame(data)

# Fit a two-way ANOVA model
model = ols('Y ~ A + B + A:B', data=df).fit()

# Print the ANOVA table
print(sm.stats.anova_lm(model, typ=2))
```

In this example, we have created a DataFrame `df` with three columns: 'A', 'B', and 'Y'. 'A' and 'B' represent the levels of the two factors, and 'Y' represents the dependent variable.

The `ols` function is used to fit the two-way ANOVA model with the formula 'Y ~ A + B + A:B', which includes the main effects of factors 'A' and 'B' as well as their interaction ('A:B'). The `anova_lm` function from `statsmodels.stats` is used to print the ANOVA table, which includes information about main effects, interaction effects, and more.

In the ANOVA table, you'll find the sum of squares (SS), degrees of freedom (DF), mean squares (MS), F-statistic, and p-value for each factor (main effects 'A' and 'B') and their interaction ('A:B'). The p-values help you determine if these effects are statistically significant.

Interpretation:
- If the p-value is less than your chosen significance level (e.g., 0.05), you would conclude that the corresponding effect (main effect or interaction) is statistically significant.
- If the p-value is greater than the significance level, you would conclude that there is no significant evidence for that effect.

### Q6. Suppose you conducted a one-way ANOVA and obtained an F-statistic of 5.23 and a p-value of 0.02. What can you conclude about the differences between the groups, and how would you interpret these results?

Given an F-statistic of 5.23 and a p-value of 0.02:

1. **Null Hypothesis (H0)**: The null hypothesis states that there are no significant differences between the group means. In other words, all group means are equal.

2. **Alternative Hypothesis (Ha)**: The alternative hypothesis states that there are significant differences between at least two group means.

Based on the p-value:

- Since the p-value (0.02) is less than the chosen significance level (e.g., 0.05), you would reject the null hypothesis.

Interpretation of Conclusions:

- You have found statistically significant evidence to conclude that there are significant differences between at least some of the group means.
- However, the p-value does not provide information about which specific groups are different from each other; it only tells you that differences exist somewhere among the groups.

In summary, with an F-statistic of 5.23 and a p-value of 0.02, you would conclude that there are significant differences between at least some of the group means.

### Q7. In a repeated measures ANOVA, how would you handle missing data, and what are the potential consequences of using different methods to handle missing data?

Handling missing data in a repeated measures ANOVA is essential to ensure the validity of your analysis. There are various methods to handle missing data, each with its own potential consequences. Here's how you can handle missing data and the potential consequences of using different methods:

1. **Complete Case Analysis (Listwise Deletion)**:
   - This method involves analyzing only the cases with complete data for all variables.
   - Consequences:
     - Loss of valuable information: You discard all cases with missing data, which may reduce the representativeness of your sample and reduce statistical power.
     - Biased results: If the missing data are not missing completely at random (MCAR), this method can lead to biased estimates.

2. **Mean Imputation**:
   - Missing values are replaced with the mean value of the variable.
   - Consequences:
     - Distortion of variances: This method artificially reduces the variability of the variable, which can lead to underestimated standard errors and inflated F-statistics.
     - Attenuation of correlations: Mean imputation can artificially reduce the correlations between variables.

3. **Last Observation Carried Forward (LOCF)**:
   - Missing values are replaced with the last observed value for that subject.
   - Consequences:
     - Temporal bias: LOCF assumes that the missing value remains the same as the last observed value, which may not be accurate.
     - Overestimation of treatment effects: If missing data occur more frequently in the control group, for example, LOCF can lead to an overestimation of treatment effects.

4. **Multiple Imputation**:
   - Missing values are imputed multiple times, creating several datasets with imputed values. The analyses are performed on each dataset, and the results are combined.
   - Consequences:
     - Time-consuming: Multiple imputation involves multiple iterations and can be computationally intensive.
     - Assumes missing at random (MAR): The validity of results depends on the accuracy of the imputation model and the assumption that missing data are MAR.

5. **Maximum Likelihood Estimation (MLE)**:
   - This method estimates model parameters that maximize the likelihood of observed data.
   - Consequences:
     - Complex implementation: MLE involves solving complex likelihood equations and may require specialized software.
     - Requires model assumptions: MLE assumes a specific model for the data distribution.

When handling missing data in a repeated measures ANOVA, it's important to consider the nature of your data, the extent of missingness, and the assumptions of the missing data mechanism. In practice, a combination of methods (e.g., multiple imputation, sensitivity analysis) might be used to address missing data and assess the robustness of results. Careful consideration and reporting of the chosen method and potential implications are essential for ensuring the validity and reliability of your findings.

### Q8. What are some common post-hoc tests used after ANOVA, and when would you use each one? Provide an example of a situation where a post-hoc test might be necessary.

Post-hoc tests are used after conducting an Analysis of Variance (ANOVA) to determine which specific groups have significant differences in means when the ANOVA indicates an overall significant difference. These tests help identify pairwise comparisons that contribute to the significant result. Some common post-hoc tests include:

1. **Tukey's Honestly Significant Difference (HSD)**:
   - When to use: Tukey's HSD is used when you have conducted a one-way ANOVA and want to perform all possible pairwise comparisons.
   - Example: You conducted an ANOVA to compare the effectiveness of three different treatments on pain relief. Tukey's HSD would help you identify which specific treatment pairs have significantly different effects.

2. **Bonferroni Correction**:
   - When to use: Bonferroni correction is used to control the family-wise error rate when conducting multiple pairwise comparisons.
   - Example: You conducted an ANOVA to compare the performance of five different advertising strategies. Since you are making multiple comparisons, Bonferroni correction can help reduce the chance of making a Type I error.

3. **Duncan's Multiple Range Test (MRT)**:
   - When to use: Duncan's MRT is used to identify significantly different groups among multiple pairwise comparisons.
   - Example: You conducted an ANOVA to compare the yields of six different fertilizer treatments. Duncan's MRT would help you determine which fertilizer treatments result in significantly different yields.

4. **Scheffé Test**:
   - When to use: Scheffé test is used when you want to control the family-wise error rate with more leniency compared to Bonferroni correction.
   - Example: You conducted an ANOVA to compare the reaction times under three different conditions. Scheffé test would help identify which condition pairs have significantly different effects.

5. **Holm-Bonferroni Method**:
   - When to use: Holm-Bonferroni method is a modified version of the Bonferroni correction that provides greater statistical power.
   - Example: You conducted an ANOVA to compare the effectiveness of four different diets on weight loss. The Holm-Bonferroni method would help you determine which diet pairs show significant differences.

6. **Fisher's Least Significant Difference (LSD)**:
   - When to use: Fisher's LSD is used for pairwise comparisons in cases where there are equal sample sizes and variances among groups.
   - Example: You conducted an ANOVA to compare the heights of individuals from three different regions. Fisher's LSD would help identify which region pairs have significantly different average heights.

### Q9. A researcher wants to compare the mean weight loss of three diets: A, B, and C. They collect data from 50 participants who were randomly assigned to one of the diets. Conduct a one-way ANOVA using Python to determine if there are any significant differences between the mean weight loss of the three diets. Report the F-statistic and p-value, and interpret the results.

In [1]:
import numpy as np
import scipy.stats as stats

# Sample data for each diet
diet_A = np.array([2.5, 3.2, 2.8, 3.5, 2.9, 3.1, 2.7, 2.6, 3.0, 3.3,
                   3.2, 2.8, 3.4, 3.1, 2.9, 3.0, 2.7, 2.5, 3.1, 3.2,
                   2.6, 2.9, 3.0, 3.3, 2.8, 3.1, 2.7, 2.5, 3.2, 3.4,
                   2.9, 3.1, 2.8, 2.6, 3.0, 3.2, 2.7, 2.5, 3.1, 3.3,
                   2.8, 3.0, 2.9, 3.2, 2.7, 2.6, 3.1, 3.4, 2.8, 3.0])

diet_B = np.array([3.8, 4.2, 4.0, 3.6, 3.9, 3.7, 4.1, 4.3, 3.5, 3.9,
                   4.0, 3.7, 3.4, 3.8, 3.6, 4.2, 3.9, 3.7, 3.8, 3.5,
                   4.3, 3.6, 3.8, 4.0, 3.9, 3.7, 3.5, 3.4, 4.1, 4.2,
                   3.7, 3.9, 3.8, 4.3, 3.6, 3.4, 3.9, 4.0, 3.8, 3.7,
                   4.2, 3.5, 4.1, 3.6, 3.9, 3.7, 3.8, 3.4, 4.0, 3.6])

diet_C = np.array([1.5, 1.2, 1.4, 1.7, 1.6, 1.3, 1.5, 1.8, 1.4, 1.6,
                   1.7, 1.3, 1.2, 1.6, 1.5, 1.7, 1.3, 1.4, 1.6, 1.5,
                   1.8, 1.7, 1.5, 1.4, 1.3, 1.6, 1.7, 1.2, 1.5, 1.6,
                   1.8, 1.3, 1.7, 1.4, 1.5, 1.6, 1.2, 1.4, 1.8, 1.3,
                   1.7, 1.5, 1.6, 1.2, 1.3, 1.4, 1.6, 1.7, 1.5, 1.8])

# Perform one-way ANOVA
f_statistic, p_value = stats.f_oneway(diet_A, diet_B, diet_C)

# Print results
print("F-statistic:", f_statistic)
print("p-value:", p_value)

# Interpret the results
alpha = 0.05
if p_value < alpha:
    print("There are significant differences between the mean weight loss of the three diets.")
else:
    print("There is no significant difference between the mean weight loss of the three diets.")


F-statistic: 1185.216239215225
p-value: 2.1253161002416956e-91
There are significant differences between the mean weight loss of the three diets.


### Q10. A company wants to know if there are any significant differences in the average time it takes to complete a task using three different software programs: Program A, Program B, and Program C. They randomly assign 30 employees to one of the programs and record the time it takes each employee to complete the task. Conduct a two-way ANOVA using Python to determine if there are any main effects or interaction effects between the software programs and employee experience level (novice vs. experienced). Report the F-statistics and p-values, and interpret the results.

In [8]:
import numpy as np
import pandas as pd
import statsmodels.api as sm
from statsmodels.formula.api import ols

# Sample data
software = ['A', 'B', 'C'] * 20
experience = ['novice'] * 15 + ['experienced'] * 15 + ['novice'] * 15 + ['experienced'] * 15
time = np.array([10.2, 9.8, 10.5, 9.5, 10.1, 10.3, 11.0, 9.7, 10.4, 10.6,
                 11.2, 10.0, 10.8, 11.1, 11.5, 10.3, 9.9, 10.7, 9.8, 10.2,
                 11.4, 10.3, 9.7, 10.1, 10.6, 11.2, 11.8, 11.0, 11.5, 11.7,
                 9.5, 10.0, 9.9, 10.3, 10.6, 10.9, 11.1, 10.2, 10.7, 10.8,
                 12.0, 11.5, 11.3, 10.4, 11.2, 10.8, 9.9, 10.5, 10.0, 10.3,
                 10.8, 11.1, 10.7, 10.5, 10.3, 11.0, 11.5, 11.2, 11.8, 12.1])

# Create a DataFrame
df = pd.DataFrame({'Software': software, 'Experience': experience, 'Time': time})

# Fit a two-way ANOVA model
model = ols('Time ~ Software + Experience + Software:Experience', data=df).fit()

# Get the ANOVA table
anova_table = sm.stats.anova_lm(model, typ=2)

# Extract F-statistics and p-values
f_software = anova_table['F']['Software']
p_software = anova_table['PR(>F)']['Software']

f_experience = anova_table['F']['Experience']
p_experience = anova_table['PR(>F)']['Experience']

f_interaction = anova_table['F']['Software:Experience']
p_interaction = anova_table['PR(>F)']['Software:Experience']

# Interpret the results
alpha = 0.05

print("Main Effect of Software:")
print("F-statistic:", f_software)
print("p-value:", p_software)
if p_software < alpha:
    print("There is a significant main effect of software.")
else:
    print("There is no significant main effect of software.")

print("\nMain Effect of Experience:")
print("F-statistic:", f_experience)
print("p-value:", p_experience)
if p_experience < alpha:
    print("There is a significant main effect of experience.")
else:
    print("There is no significant main effect of experience.")

print("\nInteraction Effect between Software and Experience:")
print("F-statistic:", f_interaction)
print("p-value:", p_interaction)
if p_interaction < alpha:
    print("There is a significant interaction effect between software and experience.")
else:
    print("There is no significant interaction effect between software and experience.")

Main Effect of Software:
F-statistic: 2.0597679438747707
p-value: 0.13738325450332742
There is no significant main effect of software.

Main Effect of Experience:
F-statistic: 1.2692930383162777
p-value: 0.2648811383167071
There is no significant main effect of experience.

Interaction Effect between Software and Experience:
F-statistic: 0.5152455477603954
p-value: 0.6002608594784352
There is no significant interaction effect between software and experience.


### Q11. An educational researcher is interested in whether a new teaching method improves student test scores. They randomly assign 100 students to either the control group (traditional teaching method) or the experimental group (new teaching method) and administer a test at the end of the semester. Conduct a two-sample t-test using Python to determine if there are any significant differences in test scores between the two groups. If the results are significant, follow up with a post-hoc test to determine which group(s) differ significantly from each other.

In [9]:
import numpy as np
import scipy.stats as stats
import statsmodels.api as sm
from statsmodels.stats.multicomp import pairwise_tukeyhsd

# Sample data
control_group = np.array([78, 82, 85, 73, 90, 88, 79, 84, 81, 76,
                          77, 80, 83, 79, 87, 75, 82, 88, 74, 86,
                          81, 78, 85, 80, 83, 89, 77, 84, 79, 82,
                          86, 81, 84, 78, 82, 80, 75, 88, 83, 80,
                          85, 87, 89, 78, 81, 83, 76, 79, 80, 88,
                          85, 82, 86, 79, 84, 81, 76, 73, 78, 82,
                          85, 73, 90, 88, 79, 84, 81, 76, 77, 80,
                          83, 79, 87, 75, 82, 88, 74, 86, 81, 78,
                          85, 80, 83, 89, 77, 84, 79, 82, 86, 81,
                          84, 78, 82, 80, 75, 88, 83, 80, 85, 87,
                          89, 78, 81, 83, 76, 79, 80, 88, 85, 82,
                          86, 79, 84, 81, 76, 73])

experimental_group = np.array([85, 88, 92, 76, 94, 90, 83, 86, 89, 80,
                               82, 84, 87, 83, 91, 78, 86, 92, 75, 89,
                               88, 85, 91, 86, 88, 93, 79, 87, 82, 85,
                               91, 86, 88, 82, 84, 86, 80, 90, 87, 84,
                               92, 94, 95, 85, 86, 87, 76, 79, 82, 91,
                               89, 88, 93, 85, 90, 86, 80, 77, 85, 88,
                               92, 76, 94, 90, 83, 86, 89, 80, 82, 84,
                               87, 83, 91, 78, 86, 92, 75, 89, 88, 85,
                               91, 86, 88, 93, 79, 87, 82, 85, 91, 86,
                               88, 82, 84, 86, 80, 90, 87, 84, 92, 94,
                               95, 85, 86, 87, 76, 79, 82, 91, 89, 88,
                               93, 85, 90, 86, 80, 77])

# Perform two-sample t-test
t_statistic, p_value = stats.ttest_ind(control_group, experimental_group)

# Print t-test results
print("Two-sample t-test results:")
print("T-statistic:", t_statistic)
print("p-value:", p_value)

# Interpret the results
alpha = 0.05
if p_value < alpha:
    print("There is a significant difference in test scores between the two groups.")
else:
    print("There is no significant difference in test scores between the two groups.")

# If the results are significant, perform a post-hoc test (Tukey's HSD)
if p_value < alpha:
    data = np.concatenate([control_group, experimental_group])
    group_labels = ['Control'] * len(control_group) + ['Experimental'] * len(experimental_group)
    tukey_result = pairwise_tukeyhsd(data, group_labels)
    print("\nTukey's HSD post-hoc test results:")
    print(tukey_result)


Two-sample t-test results:
T-statistic: -7.193474763200661
p-value: 8.824309089669441e-12
There is a significant difference in test scores between the two groups.

Tukey's HSD post-hoc test results:
  Multiple Comparison of Means - Tukey HSD, FWER=0.05   
 group1    group2    meandiff p-adj lower  upper  reject
--------------------------------------------------------
Control Experimental   4.3621   0.0 3.1673 5.5569   True
--------------------------------------------------------


### Q12. A researcher wants to know if there are any significant differences in the average daily sales of three retail stores: Store A, Store B, and Store C. They randomly select 30 days and record the sales for each store on those days. Conduct a repeated measures ANOVA using Python to determine if there are any significant differences in sales between the three stores. If the results are significant, follow up with a post-hoc test to determine which store(s) differ significantly from each other.

In [11]:
import numpy as np
import scipy.stats as stats
import pandas as pd
from statsmodels.stats.multicomp import pairwise_tukeyhsd
from statsmodels.formula.api import ols
import statsmodels.api as sm

# Sample data (replace with your actual data)
store_a_sales = np.array([100, 110, 105, 95, 102, 108, 98, 112, 100, 105,
                          98, 102, 103, 106, 99, 108, 110, 115, 97, 105,
                          112, 108, 100, 105, 107, 102, 101, 98, 115, 109])

store_b_sales = np.array([95, 98, 105, 110, 108, 100, 92, 99, 105, 103,
                          95, 100, 98, 105, 110, 108, 100, 92, 99, 105,
                          103, 95, 100, 98, 105, 110, 108, 100, 92, 99])

store_c_sales = np.array([90, 92, 85, 88, 92, 94, 88, 85, 90, 86,
                          88, 92, 85, 88, 92, 94, 88, 85, 90, 86,
                          88, 92, 85, 88, 92, 94, 88, 85, 90, 86])

# Combine data into a DataFrame
data = pd.DataFrame({
    'Sales': np.concatenate([store_a_sales, store_b_sales, store_c_sales]),
    'Store': np.repeat(['A', 'B', 'C'], len(store_a_sales))
})

# Fit one-way ANOVA model
model = ols('Sales ~ Store', data=data).fit()

# Perform Tukey's HSD post-hoc test
tukey_result = pairwise_tukeyhsd(endog=data['Sales'], groups=data['Store'], alpha=0.05)

# Print ANOVA results
print("One-way ANOVA results:")
print(model.summary())

# Print Tukey's HSD post-hoc test results
print("\nTukey's HSD post-hoc test results:")
print(tukey_result)

One-way ANOVA results:
                            OLS Regression Results                            
Dep. Variable:                  Sales   R-squared:                       0.672
Model:                            OLS   Adj. R-squared:                  0.664
Method:                 Least Squares   F-statistic:                     89.08
Date:                Fri, 18 Aug 2023   Prob (F-statistic):           8.83e-22
Time:                        08:11:42   Log-Likelihood:                -267.08
No. Observations:                  90   AIC:                             540.2
Df Residuals:                      87   BIC:                             547.7
Df Model:                           2                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept    104.5000      0.