# Week 1: Introduction to Statistical Learning & Python Basics

### Part 1: Introduction to Statistical Learning


1. **What is Statistical Learning?**
   - **Goal**: Build models that capture the relationship between variables (features) and predict new outcomes or provide insight into how different variables are related.
   - **Supervised Learning**: Input features (X) and corresponding outputs (Y) are used to train models. Examples include regression and classification.
   - **Unsupervised Learning**: No labeled output (Y). Examples include clustering and dimensionality reduction.

2. **Key Concepts**
   - **Training vs. Test Sets**: A model is trained on a subset of data and tested on new data to assess its performance.
   - **Bias-Variance Tradeoff**: A balance between a model being too simple (high bias) or too complex (high variance).
   - **Prediction vs. Inference**: Predicting future observations vs. understanding relationships between variables.
    

### Part 2: Python Setup and Pandas Basics


Let's start by ensuring you're comfortable with some Python basics, including data manipulation with **Pandas**. If you haven’t installed these tools yet, here’s a quick guide:

1. **Install Python and Libraries**
   - Download and install Anaconda from [here](https://www.anaconda.com/products/individual), which comes with Jupyter notebooks and all the libraries like Pandas, NumPy, and Scikit-learn.

2. **Basic Pandas Operations**
    

In [1]:

# Import necessary libraries
import pandas as pd

# Create a sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David'],
        'Age': [25, 30, 35, 40],
        'Salary': [70000, 80000, 120000, 90000]}

df = pd.DataFrame(data)

# Display the DataFrame
print("DataFrame:")
print(df)

# Access columns
print("\nAges of employees:")
print(df['Age'])

# Basic statistics
print("\nSummary statistics:")
print(df.describe())

# Filtering data
print("\nEmployees with Salary > 80,000:")
high_salary = df[df['Salary'] > 80000]
print(high_salary)


DataFrame:
      Name  Age  Salary
0    Alice   25   70000
1      Bob   30   80000
2  Charlie   35  120000
3    David   40   90000

Ages of employees:
0    25
1    30
2    35
3    40
Name: Age, dtype: int64

Summary statistics:
             Age         Salary
count   4.000000       4.000000
mean   32.500000   90000.000000
std     6.454972   21602.468995
min    25.000000   70000.000000
25%    28.750000   77500.000000
50%    32.500000   85000.000000
75%    36.250000   97500.000000
max    40.000000  120000.000000

Employees with Salary > 80,000:
      Name  Age  Salary
2  Charlie   35  120000
3    David   40   90000



#### Task: Run the code
1. Create a Pandas DataFrame and perform basic operations:
   - Load a dataset
   - Access columns and rows
   - Perform filtering based on conditions (like salary > 80,000)
    

### Part 3: Python Libraries Overview


In this week, we’ll work with the following Python libraries:
- **NumPy**: For numerical operations, array manipulation
- **Pandas**: For handling and analyzing data
- **Matplotlib/Seaborn**: For basic plotting
- **Scikit-learn**: For machine learning models (in later weeks)

---

#### Next Steps:
- Practice with a simple dataset: Try loading a CSV file using Pandas and perform basic data analysis.
- Let me know if you need any sample datasets or help with any part of this process! Once you're comfortable with this, we can move to the next topic: **Linear Regression**.
    