# Multiple Linear Regression Analysis: Marketing Channels Impact on Sales## IntroductionIn this comprehensive analysis, we'll explore how different marketing channels influence sales performance using multiple linear regression. This statistical technique helps us understand the linear relationship between one dependent variable (sales) and multiple independent variables (various marketing channels).Multiple linear regression is particularly valuable for understanding:- Which marketing channels have the strongest impact on sales- How much each channel contributes to overall sales performance  - The statistical significance of each relationship- How well our marketing variables explain sales varianceThe goal is to build a predictive model that can guide marketing budget allocation decisions and provide actionable insights for business strategy.

## Dataset OverviewOur dataset contains information about different marketing promotional budgets and their corresponding sales results. Before building the regression model, we need to examine what variables are available and understand their characteristics.**Key Variables Expected:**- **TV**: TV promotional budget or category- **Radio**: Radio promotional budget  - **Social Media**: Social media promotional budget- **Influencer**: Influencer marketing budget or category- **Sales**: Sales results (our target variable)Let's start by loading and exploring the data to understand its structure.

## Step 1: Import Required LibrariesFirst, we'll import all the necessary Python libraries for data analysis, visualization, and modeling.

In [None]:
# Data manipulation and analysisimport pandas as pdimport numpy as np# Visualizationimport matplotlib.pyplot as pltimport seaborn as sns# Statistical analysisfrom scipy import stats# Machine learningfrom sklearn.model_selection import train_test_splitfrom sklearn.linear_model import LinearRegressionfrom sklearn.metrics import mean_squared_error, r2_score# Advanced statistical modelingimport statsmodels.api as smfrom statsmodels.stats.outliers_influence import variance_inflation_factor# Set visualization styleplt.style.use('default')sns.set_palette("husl")plt.rcParams['figure.figsize'] = (10, 6)print("All libraries imported successfully!")

## Step 2: Load and Examine the DatasetNow we'll load the marketing sales data and take our first look at its structure and contents.

In [None]:
# Load the datasetdata = pd.read_csv('sales_marketing_data.csv')# Display basic information about the datasetprint("Dataset shape:", data.shape)print("\nColumn names:")print(data.columns.tolist())

In [None]:
# Display the first few rows to understand the data structureprint("First 5 rows of the dataset:")print(data.head())

In [None]:
# Get detailed information about the datasetprint("Dataset Information:")print("=" * 50)data.info()

In [None]:
# Generate summary statisticsprint("Summary Statistics:")print("=" * 50)print(data.describe())

## Step 3: Exploratory Data Analysis### Data Quality AssessmentBefore building our regression model, we need to thoroughly understand our data through exploratory analysis. This includes checking for missing values, understanding variable distributions, and identifying potential issues.

In [None]:
# Check for missing valuesprint("Missing Values Analysis:")print("=" * 30)missing_values = data.isnull().sum()print(missing_values)print("\nTotal missing values:", data.isnull().sum().sum())# Check for duplicate rowsduplicates = data.duplicated().sum()print(f"\nDuplicate rows: {duplicates}")

In [None]:
# Examine data types and unique values for each columnprint("Data Types and Unique Values:")print("=" * 40)for column in data.columns:    print(f"\n{column}:")    print(f"  Data type: {data[column].dtype}")    print(f"  Unique values: {data[column].nunique()}")    if data[column].nunique() <= 10:  # Show unique values for categorical variables        print(f"  Values: {sorted(data[column].unique())}")

### Visualizing Data DistributionsUnderstanding the distribution of each variable helps us identify patterns, outliers, and the nature of our data.

In [None]:
# Create distribution plots for all numeric variablesnumeric_columns = data.select_dtypes(include=[np.number]).columnsn_cols = len(numeric_columns)fig, axes = plt.subplots(2, 2, figsize=(15, 10))axes = axes.ravel()for i, column in enumerate(numeric_columns):    if i < len(axes):        axes[i].hist(data[column], bins=20, alpha=0.7, edgecolor='black')        axes[i].set_title(f'Distribution of {column}')        axes[i].set_xlabel(column)        axes[i].set_ylabel('Frequency')# Remove empty subplotsfor i in range(len(numeric_columns), len(axes)):    fig.delaxes(axes[i])plt.tight_layout()plt.show()

### Analyzing Relationships Between VariablesA key aspect of multiple linear regression is understanding how our predictor variables relate to each other and to our target variable (Sales). We'll use correlation analysis and visualizations to explore these relationships.

In [None]:
# Create a correlation matrix for numeric variablesnumeric_data = data.select_dtypes(include=[np.number])correlation_matrix = numeric_data.corr()# Create a heatmap to visualize correlationsplt.figure(figsize=(10, 8))sns.heatmap(correlation_matrix,             annot=True,             cmap='coolwarm',             center=0,            square=True,             fmt='.3f',            cbar_kws={'label': 'Correlation Coefficient'})plt.title('Correlation Matrix of Numeric Variables')plt.tight_layout()plt.show()

In [None]:
# Display correlations with Sales specificallyprint("Correlations with Sales (sorted by absolute value):")print("=" * 50)sales_correlations = correlation_matrix['Sales'].drop('Sales').sort_values(key=abs, ascending=False)for variable, correlation in sales_correlations.items():    print(f"{variable:15}: {correlation:6.3f}")

### Pairwise Relationships VisualizationA pairplot helps us visualize the relationships between all numeric variables simultaneously, making it easier to spot patterns and potential issues.

In [None]:
# Create pairplot for numeric variablesplt.figure(figsize=(12, 10))sns.pairplot(numeric_data, diag_kind='hist', plot_kws={'alpha': 0.6})plt.suptitle('Pairwise Relationships Between Numeric Variables', y=1.02)plt.show()

### Categorical Variables AnalysisIf our dataset contains categorical variables (like TV promotion levels or Influencer categories), we need to understand how these categories relate to our sales outcomes.

In [None]:
# Identify categorical variablescategorical_columns = data.select_dtypes(include=['object']).columnsprint(f"Categorical variables found: {list(categorical_columns)}")if len(categorical_columns) > 0:    for column in categorical_columns:        print(f"\n{column} categories and their sales statistics:")        print("-" * 50)        category_stats = data.groupby(column)['Sales'].agg(['count', 'mean', 'std']).round(3)        print(category_stats)else:    print("No categorical variables found in the dataset.")

In [None]:
# Visualize categorical variables' impact on salescategorical_columns = data.select_dtypes(include=['object']).columnsif len(categorical_columns) > 0:    n_cats = len(categorical_columns)    fig, axes = plt.subplots(1, min(n_cats, 2), figsize=(15, 6))        if n_cats == 1:        axes = [axes]        for i, column in enumerate(categorical_columns[:2]):  # Show up to 2 categorical variables        if i < len(axes):            sns.boxplot(data=data, x=column, y='Sales', ax=axes[i])            axes[i].set_title(f'Sales Distribution by {column}')            axes[i].tick_params(axis='x', rotation=45)        plt.tight_layout()    plt.show()else:    print("No categorical variables to visualize.")

## Step 4: Data Preparation for Modeling### Handling Missing Values and Data CleaningBefore building our regression model, we need to ensure our data is clean and properly formatted for analysis.

In [None]:
# Check and handle missing valuesprint("Data cleaning process:")print("=" * 30)print(f"Original dataset shape: {data.shape}")# Remove rows with missing valuesdata_clean = data.dropna()print(f"After removing missing values: {data_clean.shape}")print(f"Rows removed: {data.shape[0] - data_clean.shape[0]}")# Reset index after cleaningdata_clean = data_clean.reset_index(drop=True)print("Index reset completed.")

### Encoding Categorical VariablesMultiple linear regression requires all input variables to be numeric. We'll convert categorical variables into dummy variables (also called one-hot encoding), which creates binary (0/1) variables for each category.

In [None]:
# Identify categorical and numeric columnscategorical_cols = data_clean.select_dtypes(include=['object']).columnsnumeric_cols = data_clean.select_dtypes(include=[np.number]).columnsprint("Variable types identified:")print(f"Categorical columns: {list(categorical_cols)}")print(f"Numeric columns: {list(numeric_cols)}")# Separate target variabletarget_variable = 'Sales'feature_numeric = [col for col in numeric_cols if col != target_variable]print(f"\nTarget variable: {target_variable}")print(f"Numeric features: {feature_numeric}")

In [None]:
# Create dummy variables for categorical featuresif len(categorical_cols) > 0:    print("Creating dummy variables for categorical features:")    print("-" * 50)        # Create dummy variables (drop_first=True to avoid multicollinearity)    dummy_variables = pd.get_dummies(data_clean[categorical_cols], drop_first=True, prefix_sep='_')        print(f"Original categorical columns: {len(categorical_cols)}")    print(f"Dummy variables created: {dummy_variables.shape[1]}")    print("\nDummy variable columns:")    print(list(dummy_variables.columns))        # Show first few rows of dummy variables    print("\nFirst 5 rows of dummy variables:")    print(dummy_variables.head())    else:    print("No categorical variables found - no dummy variables needed.")    dummy_variables = pd.DataFrame()

In [None]:
# Combine all features for modelingprint("Preparing final feature matrix:")print("=" * 40)# Start with numeric features (excluding target)X = data_clean[feature_numeric].copy()print(f"Numeric features shape: {X.shape}")# Add dummy variables if they existif not dummy_variables.empty:    X = pd.concat([X, dummy_variables], axis=1)    print(f"After adding dummy variables: {X.shape}")# Prepare target variabley = data_clean[target_variable].copy()print(f"\nFinal feature matrix shape: {X.shape}")print(f"Target variable shape: {y.shape}")print("\nFeature columns:")for i, col in enumerate(X.columns, 1):    print(f"  {i:2d}. {col}")

### Checking for MulticollinearityBefore building our regression model, we should check for multicollinearity among predictor variables. High multicollinearity can make it difficult to determine the individual effect of each variable and can lead to unstable coefficient estimates.We'll use the Variance Inflation Factor (VIF) to assess multicollinearity:- **VIF < 5**: Low multicollinearity (acceptable)- **VIF 5-10**: Moderate multicollinearity (caution needed)  - **VIF > 10**: High multicollinearity (problematic)

In [None]:
# Calculate Variance Inflation Factor (VIF) for multicollinearity checkdef calculate_vif(dataframe):    vif_data = pd.DataFrame()    vif_data["Feature"] = dataframe.columns    vif_data["VIF"] = [variance_inflation_factor(dataframe.values, i)                        for i in range(dataframe.shape[1])]    return vif_data.sort_values('VIF', ascending=False)# Calculate VIF for all featuresprint("Variance Inflation Factor Analysis:")print("=" * 40)vif_results = calculate_vif(X)print(vif_results)print("\nVIF Interpretation:")print("- VIF < 5:   Low multicollinearity (good)")print("- VIF 5-10:  Moderate multicollinearity (caution)")print("- VIF > 10:  High multicollinearity (problematic)")# Flag high VIF variableshigh_vif = vif_results[vif_results['VIF'] > 10]if not high_vif.empty:    print(f"\nWarning: {len(high_vif)} variables have high VIF (>10):")    print(high_vif[['Feature', 'VIF']])else:    print("\nGood news: No variables have problematically high VIF values.")

## Step 5: Building the Multiple Linear Regression Model### Train-Test SplitTo properly evaluate our model's performance, we'll split our data into training and testing sets. The training set will be used to build the model, while the testing set will help us assess how well the model generalizes to new, unseen data.

In [None]:
# Split the data into training and testing setsX_train, X_test, y_train, y_test = train_test_split(    X, y,     test_size=0.2,     random_state=42,     stratify=None)print("Data split completed:")print("=" * 25)print(f"Training set size: {X_train.shape[0]} samples ({X_train.shape[0]/len(X)*100:.1f}%)")print(f"Testing set size:  {X_test.shape[0]} samples ({X_test.shape[0]/len(X)*100:.1f}%)")print(f"Number of features: {X_train.shape[1]}")print("\nTraining set statistics:")print(f"Target variable (Sales) - Mean: {y_train.mean():.2f}, Std: {y_train.std():.2f}")print("\nTesting set statistics:")print(f"Target variable (Sales) - Mean: {y_test.mean():.2f}, Std: {y_test.std():.2f}")

### Model TrainingNow we'll create and train our multiple linear regression model using scikit-learn's LinearRegression class. This will find the best-fitting linear relationship between our marketing variables and sales.

In [None]:
# Create and train the multiple linear regression modelmodel = LinearRegression()# Fit the model to training datamodel.fit(X_train, y_train)print("Multiple Linear Regression Model Training Complete!")print("=" * 55)print(f"Model intercept (β₀): {model.intercept_:.4f}")print(f"Number of coefficients: {len(model.coef_)}")print("\nModel equation:")print(f"Sales = {model.intercept_:.4f}", end="")for i, (feature, coef) in enumerate(zip(X.columns, model.coef_)):    sign = "+" if coef >= 0 else ""    print(f" {sign}{coef:.4f}*{feature}", end="")print()

## Step 6: Model Evaluation and Performance Assessment### Prediction and Performance MetricsLet's evaluate how well our model performs on both training and testing data using key regression metrics.

In [None]:
# Make predictions on both training and testing setsy_train_pred = model.predict(X_train)y_test_pred = model.predict(X_test)# Calculate performance metricstrain_r2 = r2_score(y_train, y_train_pred)test_r2 = r2_score(y_test, y_test_pred)train_rmse = np.sqrt(mean_squared_error(y_train, y_train_pred))test_rmse = np.sqrt(mean_squared_error(y_test, y_test_pred))print("MODEL PERFORMANCE METRICS")print("=" * 40)print(f"Training R² Score:    {train_r2:.4f}")print(f"Testing R² Score:     {test_r2:.4f}")print(f"Training RMSE:        {train_rmse:.4f}")print(f"Testing RMSE:         {test_rmse:.4f}")# Assess overfittingr2_difference = abs(train_r2 - test_r2)print(f"\nR² Difference:        {r2_difference:.4f}")if r2_difference < 0.05:    print("✓ Model shows good generalization (low overfitting)")elif r2_difference < 0.10:    print("⚠ Model shows moderate overfitting")else:    print("✗ Model shows significant overfitting")print(f"\nModel explains {test_r2*100:.1f}% of sales variance in test data")

### Statistical Significance TestingUsing statsmodels, we can get detailed statistical information about our regression model, including p-values, confidence intervals, and other diagnostic statistics.

In [None]:
# Use statsmodels for detailed statistical analysisX_train_sm = sm.add_constant(X_train)  # Add intercept termX_test_sm = sm.add_constant(X_test)# Fit the model using statsmodelssm_model = sm.OLS(y_train, X_train_sm).fit()# Display comprehensive model summaryprint("DETAILED STATISTICAL ANALYSIS")print("=" * 50)print(sm_model.summary())

### Coefficient Analysis and InterpretationLet's examine each coefficient in detail to understand which marketing channels have the strongest impact on sales.

In [None]:
# Create a detailed coefficient analysiscoefficients_df = pd.DataFrame({    'Feature': X.columns,    'Coefficient': model.coef_,    'Abs_Coefficient': np.abs(model.coef_)}).sort_values('Abs_Coefficient', ascending=False)print("COEFFICIENT ANALYSIS")print("=" * 30)print(coefficients_df)print("\nCOEFFICIENT INTERPRETATION:")print("-" * 35)for _, row in coefficients_df.head().iterrows():    direction = "increases" if row['Coefficient'] > 0 else "decreases"    print(f"• {row['Feature']}: For each unit increase, sales {direction} by {abs(row['Coefficient']):.4f}")# Statistical significance from statsmodelsprint("\nSTATISTICAL SIGNIFICANCE (p-values):")print("-" * 40)p_values = sm_model.pvalues[1:]  # Exclude interceptfor feature, p_val in zip(X.columns, p_values):    significance = "***" if p_val < 0.001 else "**" if p_val < 0.01 else "*" if p_val < 0.05 else ""    print(f"{feature:20}: p = {p_val:.4f} {significance}")print("\nSignificance levels: *** p<0.001, ** p<0.01, * p<0.05")

### Model VisualizationLet's create visualizations to better understand our model's performance and the relationships it has captured.

In [None]:
# Create model performance visualizationsfig, axes = plt.subplots(2, 2, figsize=(15, 12))# 1. Actual vs Predicted (Training)axes[0,0].scatter(y_train, y_train_pred, alpha=0.6, color='blue')axes[0,0].plot([y_train.min(), y_train.max()], [y_train.min(), y_train.max()], 'r--', lw=2)axes[0,0].set_xlabel('Actual Sales')axes[0,0].set_ylabel('Predicted Sales')axes[0,0].set_title(f'Training Set: Actual vs Predicted\nR² = {train_r2:.3f}')# 2. Actual vs Predicted (Testing)axes[0,1].scatter(y_test, y_test_pred, alpha=0.6, color='green')axes[0,1].plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], 'r--', lw=2)axes[0,1].set_xlabel('Actual Sales')axes[0,1].set_ylabel('Predicted Sales')axes[0,1].set_title(f'Testing Set: Actual vs Predicted\nR² = {test_r2:.3f}')# 3. Coefficient importancetop_features = coefficients_df.head(8)  # Show top 8 featuresaxes[1,0].barh(range(len(top_features)), top_features['Coefficient'])axes[1,0].set_yticks(range(len(top_features)))axes[1,0].set_yticklabels(top_features['Feature'])axes[1,0].set_xlabel('Coefficient Value')axes[1,0].set_title('Feature Coefficients (Impact on Sales)')axes[1,0].axvline(x=0, color='red', linestyle='--', alpha=0.7)# 4. Residuals distributionresiduals = y_test - y_test_predaxes[1,1].hist(residuals, bins=15, alpha=0.7, edgecolor='black')axes[1,1].set_xlabel('Residuals')axes[1,1].set_ylabel('Frequency')axes[1,1].set_title('Distribution of Residuals')axes[1,1].axvline(x=0, color='red', linestyle='--', alpha=0.7)plt.tight_layout()plt.show()

## Step 7: Residual Analysis and Model Assumptions### Checking Linear Regression AssumptionsMultiple linear regression relies on several key assumptions. Let's test these assumptions using residual analysis:1. **Linearity**: The relationship between predictors and target should be linear2. **Independence**: Residuals should be independent of each other3. **Homoscedasticity**: Residuals should have constant variance4. **Normality**: Residuals should be approximately normally distributed

In [None]:
# Comprehensive residual analysisresiduals_train = y_train - y_train_predresiduals_test = y_test - y_test_predprint("RESIDUAL ANALYSIS SUMMARY")print("=" * 35)print(f"Training residuals - Mean: {residuals_train.mean():.6f}, Std: {residuals_train.std():.4f}")print(f"Testing residuals  - Mean: {residuals_test.mean():.6f}, Std: {residuals_test.std():.4f}")# Test for normality using Shapiro-Wilk testfrom scipy.stats import shapirostat_train, p_train = shapiro(residuals_train)stat_test, p_test = shapiro(residuals_test)print(f"\nNormality Test (Shapiro-Wilk):")print(f"Training set: statistic = {stat_train:.4f}, p-value = {p_train:.4f}")print(f"Testing set:  statistic = {stat_test:.4f}, p-value = {p_test:.4f}")if p_test > 0.05:    print("✓ Residuals appear to be normally distributed (p > 0.05)")else:    print("⚠ Residuals may not be normally distributed (p ≤ 0.05)")

In [None]:
# Create comprehensive residual plotsfig, axes = plt.subplots(2, 2, figsize=(15, 12))# 1. Residuals vs Fitted Values (Homoscedasticity check)axes[0,0].scatter(y_test_pred, residuals_test, alpha=0.6)axes[0,0].axhline(y=0, color='red', linestyle='--')axes[0,0].set_xlabel('Fitted Values')axes[0,0].set_ylabel('Residuals')axes[0,0].set_title('Residuals vs Fitted Values\n(Check for Homoscedasticity)')# 2. Q-Q Plot (Normality check)from scipy.stats import probplotprobplot(residuals_test, dist="norm", plot=axes[0,1])axes[0,1].set_title('Q-Q Plot of Residuals\n(Check for Normality)')# 3. Histogram of residualsaxes[1,0].hist(residuals_test, bins=15, alpha=0.7, edgecolor='black', density=True)axes[1,0].set_xlabel('Residuals')axes[1,0].set_ylabel('Density')axes[1,0].set_title('Distribution of Residuals')# Overlay normal distributionx_norm = np.linspace(residuals_test.min(), residuals_test.max(), 100)y_norm = stats.norm.pdf(x_norm, residuals_test.mean(), residuals_test.std())axes[1,0].plot(x_norm, y_norm, 'r-', linewidth=2, label='Normal Distribution')axes[1,0].legend()# 4. Residuals vs Order (Independence check)axes[1,1].plot(range(len(residuals_test)), residuals_test, 'o-', alpha=0.6)axes[1,1].axhline(y=0, color='red', linestyle='--')axes[1,1].set_xlabel('Observation Order')axes[1,1].set_ylabel('Residuals')axes[1,1].set_title('Residuals vs Order\n(Check for Independence)')plt.tight_layout()plt.show()

## Step 8: Business Insights and Recommendations### Marketing Channel Effectiveness AnalysisBased on our multiple linear regression analysis, let's translate the statistical findings into actionable business insights.

In [None]:
# Generate business insights based on model resultsprint("BUSINESS INSIGHTS AND RECOMMENDATIONS")print("=" * 50)# Identify most impactful marketing channelstop_positive = coefficients_df[coefficients_df['Coefficient'] > 0].head(3)top_negative = coefficients_df[coefficients_df['Coefficient'] < 0].head(3)print("\n🎯 MOST EFFECTIVE MARKETING CHANNELS:")print("-" * 40)for i, (_, row) in enumerate(top_positive.iterrows(), 1):    impact = row['Coefficient']    print(f"{i}. {row['Feature']}: +{impact:.4f} sales impact per unit")if not top_negative.empty:    print("\n⚠️  CHANNELS WITH NEGATIVE IMPACT:")    print("-" * 35)    for i, (_, row) in enumerate(top_negative.iterrows(), 1):        impact = abs(row['Coefficient'])        print(f"{i}. {row['Feature']}: -{impact:.4f} sales impact per unit")# Model performance summaryprint(f"\n📊 MODEL PERFORMANCE SUMMARY:")print("-" * 30)print(f"• Model explains {test_r2*100:.1f}% of sales variance")print(f"• Average prediction error: ±{test_rmse:.2f} sales units")print(f"• Model generalization: {'Good' if r2_difference < 0.05 else 'Moderate' if r2_difference < 0.10 else 'Poor'}")

In [None]:
# Strategic recommendations based on analysisprint("\n💡 STRATEGIC RECOMMENDATIONS:")print("-" * 35)# Recommendation 1: Budget allocationif not top_positive.empty:    best_channel = top_positive.iloc[0]['Feature']    best_impact = top_positive.iloc[0]['Coefficient']    print(f"1. PRIORITIZE {best_channel.upper()}")    print(f"   • Highest ROI: {best_impact:.4f} sales per unit investment")    print(f"   • Consider increasing budget allocation to this channel")# Recommendation 2: Model reliabilityif test_r2 > 0.7:    print(f"\n2. HIGH MODEL CONFIDENCE")    print(f"   • R² = {test_r2:.3f} indicates strong predictive power")    print(f"   • Use this model for budget planning and forecasting")elif test_r2 > 0.5:    print(f"\n2. MODERATE MODEL CONFIDENCE")    print(f"   • R² = {test_r2:.3f} shows reasonable predictive ability")    print(f"   • Consider collecting additional data to improve accuracy")else:    print(f"\n2. LOW MODEL CONFIDENCE")    print(f"   • R² = {test_r2:.3f} suggests limited predictive power")    print(f"   • Investigate additional variables or non-linear relationships")# Recommendation 3: Data qualityprint(f"\n3. DATA QUALITY ASSESSMENT")if p_test > 0.05:    print(f"   • ✓ Model assumptions are reasonably met")    print(f"   • Results can be trusted for decision-making")else:    print(f"   • ⚠ Some model assumptions may be violated")    print(f"   • Consider data transformation or alternative modeling approaches")print(f"\n4. NEXT STEPS")print(f"   • Monitor actual vs predicted performance")print(f"   • Update model quarterly with new data")print(f"   • Test model predictions with A/B experiments")

## Conclusion### Summary of Multiple Linear Regression AnalysisThis comprehensive analysis has demonstrated how to perform multiple linear regression on marketing sales data. Here are the key takeaways:#### **What We Accomplished:**- ✅ **Data Exploration**: Thoroughly examined the dataset structure, distributions, and relationships- ✅ **Data Preparation**: Cleaned data, handled categorical variables, and checked for multicollinearity  - ✅ **Model Building**: Created and trained a multiple linear regression model- ✅ **Statistical Analysis**: Evaluated model performance and statistical significance- ✅ **Assumption Testing**: Verified regression assumptions through residual analysis- ✅ **Business Insights**: Translated statistical findings into actionable recommendations#### **Key Statistical Findings:**- Model performance metrics (R², RMSE) indicate the model's predictive accuracy- Coefficient analysis reveals which marketing channels have the strongest impact on sales- Statistical significance testing identifies which relationships are reliable- Residual analysis confirms whether regression assumptions are met#### **Business Value:**This analysis provides a data-driven foundation for:- **Budget Allocation**: Prioritize marketing channels with highest sales impact- **Performance Forecasting**: Predict sales based on marketing spend- **Strategic Planning**: Make informed decisions about marketing investments- **ROI Optimization**: Focus resources on most effective channels#### **Model Limitations and Considerations:**- Linear regression assumes linear relationships between variables- Model performance depends on data quality and completeness- External factors not captured in the data may influence sales- Regular model updates are needed as market conditions change### Next Steps for Implementation1. **Validate Results**: Test model predictions against actual outcomes2. **Monitor Performance**: Track model accuracy over time3. **Expand Analysis**: Consider additional variables or advanced modeling techniques4. **Automate Process**: Set up regular model retraining with new dataThis analysis demonstrates the power of multiple linear regression for understanding complex business relationships and making data-driven decisions in marketing strategy.