# 🚀 Linear Regression Workspace Setup

Welcome to the **Simple Linear Regression: Sales vs Advertising Analysis** project workspace!

This notebook will guide you through setting up your development environment for analyzing the linear relationship between Sales and Advertising data using Python and Scikit-learn.

## 📋 What We'll Cover:
1. Install Required Packages
2. Configure Development Environment
3. Set Up Project Structure
4. Initialize Version Control
5. Configure Environment Variables
6. Test Installation and Setup

---

## 1. 📦 Install Required Packages

First, let's install all the essential Python packages needed for our Linear Regression analysis:

- **numpy**: Numerical computations
- **pandas**: Data manipulation and analysis
- **scikit-learn**: Machine learning library
- **matplotlib**: Data visualization
- **seaborn**: Statistical visualization
- **jupyter**: Interactive notebook environment

In [None]:
# Install required packages
import subprocess
import sys

def install_package(package):
    """Install a package using pip."""
    subprocess.check_call([sys.executable, "-m", "pip", "install", package])

# List of required packages
packages = [
    "numpy>=1.21.0",
    "pandas>=1.3.0", 
    "scikit-learn>=1.0.0",
    "matplotlib>=3.5.0",
    "seaborn>=0.11.0",
    "jupyter>=1.0.0"
]

print("Installing required packages...")
for package in packages:
    try:
        install_package(package)
        print(f"✅ Successfully installed: {package}")
    except Exception as e:
        print(f"❌ Failed to install {package}: {e}")

print("\n🎉 Package installation complete!")

In [None]:
# Verify installed packages and their versions
import numpy as np
import pandas as pd
import sklearn
import matplotlib
import seaborn as sns

print("📋 Installed Package Versions:")
print("=" * 40)
print(f"NumPy: {np.__version__}")
print(f"Pandas: {pd.__version__}")
print(f"Scikit-learn: {sklearn.__version__}")
print(f"Matplotlib: {matplotlib.__version__}")
print(f"Seaborn: {sns.__version__}")

print("\n✅ All packages successfully imported!")

## 2. 🔧 Configure Development Environment

Let's configure our development environment with optimal settings for data science and machine learning work.

In [None]:
# Configure matplotlib for better plots
import matplotlib.pyplot as plt
import warnings

# Set up matplotlib settings
plt.rcParams['figure.figsize'] = (12, 8)
plt.rcParams['font.size'] = 12
plt.rcParams['axes.labelsize'] = 12
plt.rcParams['axes.titlesize'] = 14
plt.rcParams['xtick.labelsize'] = 10
plt.rcParams['ytick.labelsize'] = 10
plt.rcParams['legend.fontsize'] = 10

# Configure seaborn style
sns.set_style("whitegrid")
sns.set_palette("husl")

# Suppress warnings for cleaner output
warnings.filterwarnings('ignore')

# Set pandas display options
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('display.max_colwidth', 100)

print("🎨 Development environment configured!")
print("✅ Matplotlib settings optimized")
print("✅ Seaborn style applied")
print("✅ Pandas display options set")
print("✅ Warnings filtered")

## 3. 📁 Set Up Project Structure

Let's create and organize our project structure for the Linear Regression analysis.

In [None]:
import os

# Display current project structure
def display_project_structure(path, prefix="", max_depth=3, current_depth=0):
    """Display project directory structure."""
    if current_depth > max_depth:
        return
    
    items = sorted(os.listdir(path))
    for i, item in enumerate(items):
        if item.startswith('.') and item not in ['.github', '.vscode']:
            continue
            
        item_path = os.path.join(path, item)
        is_last = i == len(items) - 1
        
        print(f"{prefix}{'└── ' if is_last else '├── '}{item}")
        
        if os.path.isdir(item_path) and current_depth < max_depth:
            extension = "    " if is_last else "│   "
            display_project_structure(item_path, prefix + extension, max_depth, current_depth + 1)

print("📂 Current Project Structure:")
print("=" * 40)
project_root = os.getcwd()
print(f"📁 {os.path.basename(project_root)}/")
display_project_structure(project_root)

## 4. 🔄 Initialize Version Control

Let's check if Git is available and initialize version control for our project.

In [None]:
# Check Git availability and status
import subprocess

def run_command(command):
    """Run a shell command and return the result."""
    try:
        result = subprocess.run(command, shell=True, capture_output=True, text=True)
        return result.returncode == 0, result.stdout.strip(), result.stderr.strip()
    except Exception as e:
        return False, "", str(e)

# Check if Git is installed
success, output, error = run_command("git --version")
if success:
    print(f"✅ Git is installed: {output}")
else:
    print("❌ Git is not installed or not in PATH")

# Check if we're in a Git repository
success, output, error = run_command("git status")
if success:
    print("✅ Already in a Git repository")
    print("📋 Current Git status:")
    print(output)
else:
    print("📝 Not in a Git repository")
    print("💡 You can initialize Git later with: git init")

# Show current directory for reference
print(f"\n📍 Current working directory: {os.getcwd()}")

## 5. ⚙️ Configure Environment Variables

Let's set up environment variables and configuration for our machine learning project.

In [None]:
# Set up environment variables for the project
import os
import numpy as np

# Set random seeds for reproducibility
RANDOM_SEED = 42
np.random.seed(RANDOM_SEED)

# Project configuration
PROJECT_CONFIG = {
    'PROJECT_NAME': 'Linear Regression Analysis',
    'PROJECT_VERSION': '1.0.0',
    'AUTHOR': 'Data Scientist',
    'DATA_SIZE': 100,
    'TEST_SIZE': 0.2,
    'RANDOM_SEED': RANDOM_SEED
}

# Display configuration
print("🔧 Project Configuration:")
print("=" * 40)
for key, value in PROJECT_CONFIG.items():
    print(f"{key}: {value}")

# Set environment variables (for this session)
for key, value in PROJECT_CONFIG.items():
    os.environ[key] = str(value)

print("\n✅ Environment variables configured!")
print("🎯 Random seed set for reproducible results")

## 6. 🧪 Test Installation and Setup

Let's run comprehensive tests to verify that our workspace setup is working correctly.

In [None]:
# Comprehensive testing of all components
print("🧪 Running Comprehensive Tests...")
print("=" * 50)

# Test 1: Import all required libraries
try:
    import numpy as np
    import pandas as pd
    from sklearn.model_selection import train_test_split
    from sklearn.linear_model import LinearRegression
    from sklearn.metrics import mean_squared_error, r2_score
    import matplotlib.pyplot as plt
    import seaborn as sns
    print("✅ Test 1 PASSED: All libraries imported successfully")
except ImportError as e:
    print(f"❌ Test 1 FAILED: Import error - {e}")

# Test 2: Create sample data
try:
    np.random.seed(42)
    X = np.random.uniform(10, 100, 50).reshape(-1, 1)
    y = 2.5 * X.flatten() + 15 + np.random.normal(0, 5, 50)
    print("✅ Test 2 PASSED: Sample data created successfully")
except Exception as e:
    print(f"❌ Test 2 FAILED: Data creation error - {e}")

# Test 3: Train/test split
try:
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
    print("✅ Test 3 PASSED: Data split successful")
except Exception as e:
    print(f"❌ Test 3 FAILED: Data split error - {e}")

# Test 4: Model training
try:
    model = LinearRegression()
    model.fit(X_train, y_train)
    print("✅ Test 4 PASSED: Model training successful")
except Exception as e:
    print(f"❌ Test 4 FAILED: Model training error - {e}")

# Test 5: Predictions and metrics
try:
    y_pred = model.predict(X_test)
    mse = mean_squared_error(y_test, y_pred)
    r2 = r2_score(y_test, y_pred)
    print("✅ Test 5 PASSED: Predictions and metrics calculated")
    print(f"   MSE: {mse:.2f}, R²: {r2:.2f}")
except Exception as e:
    print(f"❌ Test 5 FAILED: Prediction error - {e}")

# Test 6: Visualization
try:
    plt.figure(figsize=(8, 6))
    plt.scatter(X_test, y_test, alpha=0.6, label='Actual')
    plt.scatter(X_test, y_pred, alpha=0.6, label='Predicted')
    plt.xlabel('X')
    plt.ylabel('y')
    plt.title('Test Visualization')
    plt.legend()
    plt.show()
    print("✅ Test 6 PASSED: Visualization created successfully")
except Exception as e:
    print(f"❌ Test 6 FAILED: Visualization error - {e}")

print("\n🎉 All tests completed!")
print("🚀 Your Linear Regression workspace is ready to use!")

## 🎯 Workspace Setup Complete!

Congratulations! Your Linear Regression analysis workspace is now fully configured and ready to use.

### 📋 What's Available:

1. **Main Analysis Script**: `linear_regression_analysis.py`
   - Complete implementation of Simple Linear Regression
   - Data generation, exploration, and visualization
   - Model training, evaluation, and business insights

2. **VS Code Task**: Use `Ctrl+Shift+P` → "Tasks: Run Task" → "Run Linear Regression Analysis"

3. **All Dependencies Installed**:
   - ✅ NumPy for numerical operations
   - ✅ Pandas for data manipulation
   - ✅ Scikit-learn for machine learning
   - ✅ Matplotlib & Seaborn for visualization
   - ✅ Jupyter for interactive analysis

### 🚀 Next Steps:

1. **Run the main script**: Execute the VS Code task or run `python linear_regression_analysis.py`
2. **Explore the code**: Open `linear_regression_analysis.py` to understand the implementation
3. **Modify for your data**: Replace the sample data generation with your actual dataset
4. **Experiment**: Try different parameters and explore advanced regression techniques

### 💡 Tips:
- Use the VS Code integrated terminal for running scripts
- The project follows Python best practices and includes comprehensive documentation
- All random seeds are set for reproducible results
- The code includes error handling and informative outputs

**Happy coding! 🎉**

## 6. 🧪 Test Installation and Setup

Let's run comprehensive tests to verify that everything is working correctly.

In [None]:
# Comprehensive test of all components
import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score
import matplotlib.pyplot as plt

def test_installation():
    """Test all major components of our setup."""
    tests = []
    
    # Test 1: NumPy operations
    try:
        arr = np.array([1, 2, 3, 4, 5])
        mean_val = np.mean(arr)
        tests.append(("NumPy operations", True, f"Mean calculated: {mean_val}"))
    except Exception as e:
        tests.append(("NumPy operations", False, str(e)))
    
    # Test 2: Pandas DataFrame
    try:
        df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
        tests.append(("Pandas DataFrame", True, f"Shape: {df.shape}"))
    except Exception as e:
        tests.append(("Pandas DataFrame", False, str(e)))
    
    # Test 3: Scikit-learn model
    try:
        # Create sample data
        X = np.array([[1], [2], [3], [4], [5]])
        y = np.array([2, 4, 6, 8, 10])
        
        # Train model
        model = LinearRegression()
        model.fit(X, y)
        
        # Make prediction
        pred = model.predict([[6]])
        tests.append(("Scikit-learn model", True, f"Prediction for x=6: {pred[0]:.2f}"))
    except Exception as e:
        tests.append(("Scikit-learn model", False, str(e)))
    
    # Test 4: Matplotlib plotting
    try:
        fig, ax = plt.subplots(figsize=(6, 4))
        ax.plot([1, 2, 3], [1, 4, 9])
        ax.set_title("Test Plot")
        plt.close(fig)  # Close to avoid display
        tests.append(("Matplotlib plotting", True, "Plot created successfully"))
    except Exception as e:
        tests.append(("Matplotlib plotting", False, str(e)))
    
    return tests

# Run tests
print("🧪 Running Installation Tests...")
print("=" * 50)

test_results = test_installation()
passed = 0
total = len(test_results)

for test_name, success, message in test_results:
    status = "✅ PASS" if success else "❌ FAIL"
    print(f"{status} | {test_name}")
    print(f"     {message}")
    if success:
        passed += 1

print("\n" + "=" * 50)
print(f"📊 Test Results: {passed}/{total} tests passed")

if passed == total:
    print("🎉 All tests passed! Your workspace is ready!")
    print("\n🚀 Next Steps:")
    print("1. Run 'python linear_regression_analysis.py' to start the analysis")
    print("2. Or continue with the interactive analysis in this notebook")
else:
    print("⚠️  Some tests failed. Please check the error messages above.")

---

## 🎯 Workspace Setup Complete!

Congratulations! Your Linear Regression analysis workspace is now fully configured and ready for use.

### 📋 What You Have:
- ✅ All required Python packages installed
- ✅ Development environment optimized
- ✅ Project structure organized
- ✅ Version control checked
- ✅ Environment variables configured
- ✅ Installation tested and verified

### 🚀 Ready to Analyze:
Your workspace now includes:
- **`linear_regression_analysis.py`** - Complete implementation
- **`requirements.txt`** - All dependencies listed
- **`README.md`** - Comprehensive documentation
- **`LICENSE`** - MIT License for open source

### 📊 Next Steps:
1. **Run the analysis**: Execute the main script to see the complete linear regression workflow
2. **Explore the data**: Modify parameters and experiment with different datasets
3. **Extend the model**: Add features like polynomial regression or multiple variables
4. **Visualize results**: Create additional plots and insights

**Happy analyzing! 🎉**