diff --git a/.gitignore b/.gitignore
new file mode 100644
index 0000000..f0e837c
--- /dev/null
+++ b/.gitignore
@@ -0,0 +1,77 @@
+# Python
+__pycache__/
+*.py[cod]
+*$py.class
+*.so
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+MANIFEST
+
+# Virtual environments
+venv/
+env/
+ENV/
+env.bak/
+venv.bak/
+.venv/
+
+# IDEs
+.vscode/
+.idea/
+*.swp
+*.swo
+*~
+.DS_Store
+
+# Jupyter Notebook
+.ipynb_checkpoints
+*.ipynb_checkpoints/
+
+# Database files
+*.db
+*.sqlite
+*.sqlite3
+etl_output.db
+
+# Data files (optional - uncomment if you don't want to track data files)
+# *.csv
+# *.json
+# *.xlsx
+# *.parquet
+
+# Logs
+*.log
+logs/
+
+# Environment variables
+.env
+.env.local
+
+# Testing
+.pytest_cache/
+.coverage
+htmlcov/
+.tox/
+
+# OS
+Thumbs.db
+.DS_Store
+
+# Temporary files
+tmp/
+temp/
+*.tmp
diff --git a/01-python-fundamentals/README.md b/01-python-fundamentals/README.md
new file mode 100644
index 0000000..8611448
--- /dev/null
+++ b/01-python-fundamentals/README.md
@@ -0,0 +1,56 @@
+# Python Fundamentals
+
+Welcome to the Python Fundamentals section! This is where your journey begins.
+
+## 📚 What You'll Learn
+
+- Python syntax and basic data types
+- Control structures (if, for, while)
+- Functions and modules
+- Object-oriented programming basics
+- Error handling
+- File operations
+
+## 📖 Lessons
+
+1. [Getting Started with Python](lessons/01-getting-started.md)
+2. [Variables and Data Types](lessons/02-variables-datatypes.md)
+3. [Control Flow](lessons/03-control-flow.md)
+4. [Functions](lessons/04-functions.md)
+5. [Object-Oriented Programming](lessons/05-oop-basics.md)
+6. [Error Handling](lessons/06-error-handling.md)
+7. [File I/O](lessons/07-file-io.md)
+
+## 💻 Examples
+
+Check the `examples/` folder for working code examples that demonstrate each concept.
+
+## ✏️ Exercises
+
+Complete the exercises in the `exercises/` folder to practice what you've learned. Solutions are provided, but try to solve them on your own first!
+
+## ⏱️ Estimated Time
+
+2-4 weeks, depending on your prior programming experience and time commitment.
+
+## ✅ Completion Checklist
+
+- [ ] Complete all lessons
+- [ ] Run all examples
+- [ ] Solve all exercises
+- [ ] Build a small project using concepts learned
+
+## 🎯 Project Idea
+
+Build a simple command-line todo list application that:
+- Adds tasks
+- Removes tasks
+- Marks tasks as complete
+- Saves tasks to a file
+- Loads tasks from a file
+
+## 📚 Additional Resources
+
+- [Python Official Tutorial](https://docs.python.org/3/tutorial/)
+- [Real Python - Python Basics](https://realpython.com/tutorials/basics/)
+- [Automate the Boring Stuff with Python](https://automatetheboringstuff.com/)
diff --git a/01-python-fundamentals/examples/hello_world.py b/01-python-fundamentals/examples/hello_world.py
new file mode 100644
index 0000000..de87ded
--- /dev/null
+++ b/01-python-fundamentals/examples/hello_world.py
@@ -0,0 +1,34 @@
+"""
+Basic Python Examples - Hello World and Simple Operations
+"""
+
+# Simple print statement
+print("Hello, Data Engineer!")
+
+# Print with multiple values
+print("Welcome to", "Python", "Programming")
+
+# Basic arithmetic
+print("\n=== Basic Arithmetic ===")
+print("2 + 3 =", 2 + 3)
+print("10 - 4 =", 10 - 4)
+print("5 * 6 =", 5 * 6)
+print("20 / 4 =", 20 / 4)
+print("17 // 5 =", 17 // 5)  # Integer division
+print("17 % 5 =", 17 % 5)    # Modulus
+print("2 ** 8 =", 2 ** 8)    # Exponentiation
+
+# String operations
+print("\n=== String Operations ===")
+name = "Data Engineer"
+print("Hello,", name)
+print("Length of name:", len(name))
+print("Uppercase:", name.upper())
+print("Lowercase:", name.lower())
+
+# Comments
+# This is a single-line comment
+"""
+This is a multi-line comment
+or a docstring
+"""
diff --git a/01-python-fundamentals/exercises/README.md b/01-python-fundamentals/exercises/README.md
new file mode 100644
index 0000000..0348bf6
--- /dev/null
+++ b/01-python-fundamentals/exercises/README.md
@@ -0,0 +1,263 @@
+# Python Fundamentals Exercises
+
+## 📝 Overview
+
+These exercises are designed to reinforce the concepts you've learned in the Python Fundamentals section. Start with Exercise 1 and work your way through them sequentially.
+
+## 🎯 How to Use These Exercises
+
+1. Read the exercise description
+2. Try to solve it on your own first
+3. Test your solution
+4. Compare with the provided solution (if available)
+5. Understand any differences
+
+## 📚 Exercise List
+
+### Exercise 1: Variables and Data Types
+**Difficulty**: ⭐ Beginner
+
+Create a program that:
+- Stores your name, age, and city in variables
+- Prints a formatted message using these variables
+- Calculates and prints your age in months and days
+
+**Example Output**:
+```
+Name: Alice
+Age: 25 years (300 months, 9125 days)
+City: New York
+```
+
+**Skills Practiced**: Variables, basic math, string formatting
+
+---
+
+### Exercise 2: Number Calculator
+**Difficulty**: ⭐ Beginner
+
+Write a simple calculator that:
+- Takes two numbers as input
+- Performs addition, subtraction, multiplication, and division
+- Prints all results
+
+**Example**:
+```python
+# Input: 10, 5
+# Output:
+# Addition: 15
+# Subtraction: 5
+# Multiplication: 50
+# Division: 2.0
+```
+
+**Skills Practiced**: Variables, arithmetic operators, input/output
+
+---
+
+### Exercise 3: Grade Calculator
+**Difficulty**: ⭐⭐ Intermediate
+
+Create a program that:
+- Takes a score (0-100) as input
+- Determines the letter grade:
+  - A: 90-100
+  - B: 80-89
+  - C: 70-79
+  - D: 60-69
+  - F: Below 60
+- Prints the grade and a message
+
+**Skills Practiced**: If/elif/else, comparison operators
+
+---
+
+### Exercise 4: List Operations
+**Difficulty**: ⭐⭐ Intermediate
+
+Write a program that:
+- Creates a list of numbers
+- Finds the sum, average, minimum, and maximum
+- Prints all results
+
+**Example**:
+```python
+numbers = [10, 25, 30, 15, 40]
+# Output:
+# Sum: 120
+# Average: 24.0
+# Minimum: 10
+# Maximum: 40
+```
+
+**Skills Practiced**: Lists, loops, built-in functions
+
+---
+
+### Exercise 5: String Manipulator
+**Difficulty**: ⭐⭐ Intermediate
+
+Create a program that:
+- Takes a sentence as input
+- Counts the number of words
+- Counts the number of vowels
+- Converts to uppercase and lowercase
+- Reverses the string
+
+**Skills Practiced**: Strings, string methods, loops
+
+---
+
+### Exercise 6: Number Guessing Game
+**Difficulty**: ⭐⭐ Intermediate
+
+Build a game that:
+- Generates a random number between 1 and 100
+- Asks the user to guess
+- Provides hints (too high/too low)
+- Counts number of guesses
+- Congratulates on correct guess
+
+**Skills Practiced**: Loops, conditionals, random module
+
+---
+
+### Exercise 7: Shopping List Manager
+**Difficulty**: ⭐⭐⭐ Advanced
+
+Create a program that:
+- Allows adding items to a shopping list
+- Allows removing items
+- Allows viewing all items
+- Allows clearing the list
+- Uses a menu system
+
+**Skills Practiced**: Lists, loops, functions, user input
+
+---
+
+### Exercise 8: Contact Book
+**Difficulty**: ⭐⭐⭐ Advanced
+
+Build a simple contact book that:
+- Stores contacts (name, phone, email)
+- Allows adding new contacts
+- Allows searching by name
+- Allows displaying all contacts
+- Uses dictionaries to store data
+
+**Skills Practiced**: Dictionaries, functions, user input
+
+---
+
+### Exercise 9: File Word Counter
+**Difficulty**: ⭐⭐⭐ Advanced
+
+Write a program that:
+- Reads a text file
+- Counts total words
+- Counts unique words
+- Finds most common words
+- Writes results to a new file
+
+**Skills Practiced**: File I/O, string processing, dictionaries
+
+---
+
+### Exercise 10: Mini Project - Todo List Application
+**Difficulty**: ⭐⭐⭐⭐ Challenge
+
+Build a command-line todo list app that:
+- Adds tasks
+- Marks tasks as complete
+- Deletes tasks
+- Views all tasks
+- Saves to file
+- Loads from file on start
+
+**Features**:
+- Menu-driven interface
+- Data persistence
+- Input validation
+- Error handling
+
+**Skills Practiced**: All Python fundamentals
+
+---
+
+## 🧪 Testing Your Solutions
+
+### Basic Testing
+```python
+# Test with different inputs
+# Check edge cases
+# Verify output matches expected
+
+# Example:
+def add(a, b):
+    return a + b
+
+# Test
+assert add(2, 3) == 5
+assert add(-1, 1) == 0
+assert add(0, 0) == 0
+print("All tests passed!")
+```
+
+## 💡 Tips for Success
+
+1. **Start Simple**: Get basic functionality working first
+2. **Test Often**: Test after each small change
+3. **Read Errors**: Error messages tell you what's wrong
+4. **Use Print**: Print statements help debug
+5. **Take Breaks**: Step away if stuck
+6. **Ask for Help**: Use communities when truly stuck
+
+## 📝 Submission Guidelines
+
+When practicing:
+1. Create a file for each exercise (e.g., `exercise_01.py`)
+2. Add comments explaining your approach
+3. Test with multiple inputs
+4. Compare with solution (if provided)
+5. Refactor to improve code quality
+
+## 🎯 Bonus Challenges
+
+For each exercise, try to:
+- Add input validation
+- Handle errors gracefully
+- Add more features
+- Optimize your code
+- Write cleaner code
+
+## 📚 Additional Practice
+
+After completing these exercises:
+1. **LeetCode Easy**: Try Python easy problems
+2. **HackerRank**: Python basics track
+3. **Codewars**: 8 kyu and 7 kyu challenges
+4. **Exercism**: Python track with mentoring
+
+## 🏆 Next Steps
+
+Once you've completed all exercises:
+- Move to `02-python-data-engineering`
+- Start building small projects
+- Contribute your own exercises
+- Help others learn
+
+## 📖 Solutions
+
+Solutions are available in the `solutions/` folder, but try to solve exercises on your own first! Learning happens when you struggle through problems.
+
+## 🤝 Getting Help
+
+If you're stuck:
+1. Review the relevant lesson
+2. Check Python documentation
+3. Search for similar problems online
+4. Ask specific questions in communities
+5. Look at the solution as a last resort
+
+Good luck with your exercises!
diff --git a/01-python-fundamentals/lessons/01-getting-started.md b/01-python-fundamentals/lessons/01-getting-started.md
new file mode 100644
index 0000000..d3264b3
--- /dev/null
+++ b/01-python-fundamentals/lessons/01-getting-started.md
@@ -0,0 +1,162 @@
+# Getting Started with Python
+
+## What is Python?
+
+Python is a high-level, interpreted programming language known for its simplicity and readability. It's one of the most popular languages for data engineering, data science, and general-purpose programming.
+
+## Why Python for Data Engineering?
+
+- **Easy to Learn**: Clear and readable syntax
+- **Extensive Libraries**: Rich ecosystem for data manipulation (Pandas, NumPy)
+- **Cross-platform**: Works on Windows, Mac, and Linux
+- **Large Community**: Extensive resources and support
+- **Integration**: Works well with databases and data tools
+
+## Installing Python
+
+### Windows
+1. Download Python from [python.org](https://www.python.org/downloads/)
+2. Run the installer (check "Add Python to PATH")
+3. Verify installation: `python --version`
+
+### Mac
+```bash
+# Using Homebrew
+brew install python3
+python3 --version
+```
+
+### Linux
+```bash
+# Ubuntu/Debian
+sudo apt update
+sudo apt install python3 python3-pip
+
+# Verify
+python3 --version
+```
+
+## Your First Python Program
+
+Create a file called `hello.py`:
+
+```python
+print("Hello, Data Engineer!")
+```
+
+Run it:
+```bash
+python hello.py
+```
+
+## Python Interactive Shell
+
+You can also use Python interactively:
+
+```bash
+python
+>>> print("Hello!")
+Hello!
+>>> 2 + 2
+4
+>>> exit()
+```
+
+## Setting Up Your Development Environment
+
+### Option 1: VS Code (Recommended)
+1. Install [VS Code](https://code.visualstudio.com/)
+2. Install Python extension
+3. Create a workspace for your projects
+
+### Option 2: PyCharm
+1. Install [PyCharm Community Edition](https://www.jetbrains.com/pycharm/)
+2. Create a new Python project
+
+### Option 3: Jupyter Notebook
+Great for data exploration:
+```bash
+pip install jupyter
+jupyter notebook
+```
+
+## Virtual Environments
+
+Always use virtual environments for your projects:
+
+```bash
+# Create a virtual environment
+python -m venv myenv
+
+# Activate it
+# Windows:
+myenv\Scripts\activate
+# Mac/Linux:
+source myenv/bin/activate
+
+# Install packages
+pip install pandas
+
+# Deactivate
+deactivate
+```
+
+## Basic Python Syntax
+
+### Comments
+```python
+# This is a single-line comment
+
+"""
+This is a
+multi-line comment
+or docstring
+"""
+```
+
+### Print Statement
+```python
+print("Hello World")
+print("Value:", 42)
+```
+
+### Basic Arithmetic
+```python
+# Addition
+print(5 + 3)  # 8
+
+# Subtraction
+print(10 - 4)  # 6
+
+# Multiplication
+print(3 * 4)  # 12
+
+# Division
+print(15 / 3)  # 5.0
+
+# Integer Division
+print(15 // 4)  # 3
+
+# Modulus
+print(15 % 4)  # 3
+
+# Exponentiation
+print(2 ** 3)  # 8
+```
+
+## Next Steps
+
+Now that you have Python installed and running, proceed to the next lesson on variables and data types.
+
+## Practice Exercise
+
+1. Install Python on your computer
+2. Set up VS Code or your preferred editor
+3. Create a Python file that prints your name
+4. Use the Python interactive shell to calculate: (10 + 5) * 3
+5. Create a virtual environment for this course
+
+## Additional Resources
+
+- [Python.org Beginner's Guide](https://wiki.python.org/moin/BeginnersGuide)
+- [Real Python - Installation & Setup](https://realpython.com/installing-python/)
diff --git a/02-python-data-engineering/README.md b/02-python-data-engineering/README.md
new file mode 100644
index 0000000..60aa102
--- /dev/null
+++ b/02-python-data-engineering/README.md
@@ -0,0 +1,204 @@
+# Python for Data Engineering
+
+This section focuses on using Python for data engineering tasks - the practical skills you'll use daily as a data engineer.
+
+## 📚 What You'll Learn
+
+- Working with Pandas for data manipulation
+- Reading and writing various file formats (CSV, JSON, Parquet, Excel)
+- API interactions and web scraping
+- Data cleaning and transformation
+- Working with dates and times
+- Connecting to databases with Python
+- Error handling in data pipelines
+
+## 📖 Lessons
+
+1. [Introduction to Pandas](lessons/01-pandas-intro.md)
+2. [Data Cleaning](lessons/02-data-cleaning.md)
+3. [File Formats](lessons/03-file-formats.md)
+4. [Working with APIs](lessons/04-apis.md)
+5. [Database Connections](lessons/05-database-connections.md)
+6. [Date and Time Handling](lessons/06-datetime.md)
+7. [Data Validation](lessons/07-data-validation.md)
+
+## 💻 Examples
+
+The `examples/` folder contains practical code examples:
+- `pandas_basics.py` - Pandas fundamentals
+- `csv_processing.py` - CSV file operations
+- `json_handling.py` - Working with JSON
+- `api_requests.py` - API interactions
+- `database_operations.py` - Database connectivity
+
+## ✏️ Exercises
+
+Practice exercises in `exercises/` folder:
+- Data cleaning challenges
+- File format conversions
+- API data extraction
+- Database operations
+- Real-world scenarios
+
+## 🛠️ Required Libraries
+
+```bash
+# Install required packages
+pip install pandas numpy
+pip install requests
+pip install openpyxl  # for Excel files
+pip install pyarrow   # for Parquet files
+pip install sqlalchemy psycopg2-binary
+```
+
+## ⏱️ Estimated Time
+
+4-6 weeks with hands-on practice
+
+## ✅ Completion Checklist
+
+- [ ] Master Pandas basics
+- [ ] Work with CSV, JSON, and Excel files
+- [ ] Make API requests
+- [ ] Connect to databases
+- [ ] Clean and transform real datasets
+- [ ] Handle errors properly
+- [ ] Complete all exercises
+
+## 🎯 Project Ideas
+
+### Project 1: Data Pipeline
+Build a pipeline that:
+- Fetches data from an API
+- Cleans and transforms the data
+- Saves to database and CSV
+
+### Project 2: Data Integration
+Combine data from:
+- Multiple CSV files
+- JSON API
+- Database tables
+- Output: Clean, unified dataset
+
+### Project 3: Automated Report
+Create a script that:
+- Reads data from database
+- Performs analysis
+- Generates Excel report
+- Sends email notification
+
+## 📊 Real-World Scenarios
+
+### E-commerce Data Processing
+- Process order data from CSV
+- Validate customer information
+- Calculate metrics
+- Load into database
+
+### API Data Extraction
+- Fetch weather data from API
+- Parse JSON responses
+- Store in structured format
+- Handle rate limits and errors
+
+### Log File Analysis
+- Read server log files
+- Parse and extract information
+- Identify patterns
+- Generate reports
+
+## 🔑 Key Skills
+
+### Data Manipulation with Pandas
+```python
+import pandas as pd
+
+# Read data
+df = pd.read_csv('data.csv')
+
+# Basic operations
+df.head()
+df.info()
+df.describe()
+
+# Filtering
+df[df['age'] > 30]
+
+# Grouping
+df.groupby('category')['sales'].sum()
+
+# Transformation
+df['new_column'] = df['old_column'] * 2
+```
+
+### File Operations
+```python
+# CSV
+df = pd.read_csv('file.csv')
+df.to_csv('output.csv', index=False)
+
+# JSON
+df = pd.read_json('file.json')
+df.to_json('output.json')
+
+# Excel
+df = pd.read_excel('file.xlsx')
+df.to_excel('output.xlsx', index=False)
+
+# Parquet
+df = pd.read_parquet('file.parquet')
+df.to_parquet('output.parquet')
+```
+
+### API Requests
+```python
+import requests
+
+response = requests.get('https://api.example.com/data')
+data = response.json()
+df = pd.DataFrame(data)
+```
+
+### Database Operations
+```python
+from sqlalchemy import create_engine
+
+engine = create_engine('postgresql://user:pass@localhost/db')
+df = pd.read_sql('SELECT * FROM table', engine)
+df.to_sql('new_table', engine, if_exists='replace')
+```
+
+## 💡 Best Practices
+
+1. **Read Documentation**: Pandas docs are excellent
+2. **Use Vectorization**: Avoid loops when possible
+3. **Memory Management**: Be aware of large datasets
+4. **Error Handling**: Always handle exceptions
+5. **Data Validation**: Validate before processing
+6. **Type Hints**: Use type hints in functions
+7. **Testing**: Write tests for data transformations
+
+## 📚 Additional Resources
+
+- [Pandas Documentation](https://pandas.pydata.org/docs/)
+- [Python Data Science Handbook](https://jakevdp.github.io/PythonDataScienceHandbook/)
+- [Real Python - Pandas Tutorials](https://realpython.com/learning-paths/pandas-data-science/)
+- [Kaggle Learn - Pandas](https://www.kaggle.com/learn/pandas)
+
+## Common Pitfalls to Avoid
+
+1. **Chained Indexing**: Use `.loc` instead
+2. **Modifying During Iteration**: Use `.apply()` or vectorization
+3. **Not Checking Data Types**: Always verify dtypes
+4. **Ignoring Missing Data**: Handle NaN values properly
+5. **Memory Issues**: Use chunking for large files
+6. **Silent Failures**: Add logging and error handling
+
+## Next Steps
+
+After completing this section, you'll be able to:
+- Build data ingestion pipelines
+- Process various data formats
+- Interact with APIs and databases
+- Handle real-world data issues
+- Write production-quality Python code for data engineering
diff --git a/02-python-data-engineering/examples/pandas_basics.py b/02-python-data-engineering/examples/pandas_basics.py
new file mode 100644
index 0000000..5fe1690
--- /dev/null
+++ b/02-python-data-engineering/examples/pandas_basics.py
@@ -0,0 +1,185 @@
+"""
+Pandas Basics - Essential Operations for Data Engineers
+"""
+
+import pandas as pd
+import numpy as np
+
+# Creating DataFrames
+print("=" * 50)
+print("Creating DataFrames")
+print("=" * 50)
+
+# From dictionary
+data = {
+    'name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
+    'age': [25, 30, 35, 28, 32],
+    'city': ['New York', 'San Francisco', 'Los Angeles', 'Chicago', 'Boston'],
+    'salary': [70000, 85000, 90000, 75000, 80000]
+}
+df = pd.DataFrame(data)
+print("\nDataFrame from dictionary:")
+print(df)
+
+# Basic information
+print("\n" + "=" * 50)
+print("Basic DataFrame Information")
+print("=" * 50)
+print("\nShape:", df.shape)
+print("\nColumns:", df.columns.tolist())
+print("\nData types:")
+print(df.dtypes)
+print("\nBasic statistics:")
+print(df.describe())
+
+# Selecting data
+print("\n" + "=" * 50)
+print("Selecting Data")
+print("=" * 50)
+
+# Select a column
+print("\nNames column:")
+print(df['name'])
+
+# Select multiple columns
+print("\nNames and ages:")
+print(df[['name', 'age']])
+
+# Select rows by condition
+print("\nPeople older than 30:")
+print(df[df['age'] > 30])
+
+# Multiple conditions
+print("\nPeople older than 30 with salary > 80000:")
+print(df[(df['age'] > 30) & (df['salary'] > 80000)])
+
+# Sorting
+print("\n" + "=" * 50)
+print("Sorting Data")
+print("=" * 50)
+
+print("\nSorted by age (ascending):")
+print(df.sort_values('age'))
+
+print("\nSorted by salary (descending):")
+print(df.sort_values('salary', ascending=False))
+
+# Adding new columns
+print("\n" + "=" * 50)
+print("Adding New Columns")
+print("=" * 50)
+
+df['monthly_salary'] = df['salary'] / 12
+df['senior'] = df['age'] > 30
+print(df)
+
+# Grouping and aggregation
+print("\n" + "=" * 50)
+print("Grouping and Aggregation")
+print("=" * 50)
+
+# Group by senior status
+print("\nAverage salary by senior status:")
+print(df.groupby('senior')['salary'].mean())
+
+# Multiple aggregations
+print("\nMultiple statistics by senior status:")
+print(df.groupby('senior')['salary'].agg(['mean', 'min', 'max', 'count']))
+
+# Handling missing data
+print("\n" + "=" * 50)
+print("Handling Missing Data")
+print("=" * 50)
+
+# Create DataFrame with missing values
+df_missing = pd.DataFrame({
+    'A': [1, 2, np.nan, 4],
+    'B': [5, np.nan, np.nan, 8],
+    'C': [9, 10, 11, 12]
+})
+
+print("\nDataFrame with missing values:")
+print(df_missing)
+
+print("\nCheck for missing values:")
+print(df_missing.isnull())
+
+print("\nCount of missing values per column:")
+print(df_missing.isnull().sum())
+
+print("\nDrop rows with any missing values:")
+print(df_missing.dropna())
+
+print("\nFill missing values with 0:")
+print(df_missing.fillna(0))
+
+print("\nFill missing values with column mean:")
+print(df_missing.fillna(df_missing.mean()))
+
+# Merging DataFrames
+print("\n" + "=" * 50)
+print("Merging DataFrames")
+print("=" * 50)
+
+# Create two DataFrames
+df1 = pd.DataFrame({
+    'employee_id': [1, 2, 3],
+    'name': ['Alice', 'Bob', 'Charlie'],
+    'department': ['Sales', 'IT', 'HR']
+})
+
+df2 = pd.DataFrame({
+    'employee_id': [1, 2, 4],
+    'salary': [70000, 85000, 75000]
+})
+
+print("\nDataFrame 1:")
+print(df1)
+print("\nDataFrame 2:")
+print(df2)
+
+print("\nInner join:")
+print(pd.merge(df1, df2, on='employee_id', how='inner'))
+
+print("\nLeft join:")
+print(pd.merge(df1, df2, on='employee_id', how='left'))
+
+print("\nOuter join:")
+print(pd.merge(df1, df2, on='employee_id', how='outer'))
+
+# Apply functions
+print("\n" + "=" * 50)
+print("Applying Functions")
+print("=" * 50)
+
+# Apply function to a column
+df['name_length'] = df['name'].apply(len)
+print("\nAdded name length column:")
+print(df[['name', 'name_length']])
+
+# Apply custom function
+def categorize_salary(salary):
+    if salary < 75000:
+        return 'Low'
+    elif salary < 85000:
+        return 'Medium'
+    else:
+        return 'High'
+
+df['salary_category'] = df['salary'].apply(categorize_salary)
+print("\nSalary categories:")
+print(df[['name', 'salary', 'salary_category']])
+
+print("\n" + "=" * 50)
+print("String Operations")
+print("=" * 50)
+
+# String methods
+df['name_upper'] = df['name'].str.upper()
+df['city_lower'] = df['city'].str.lower()
+print("\nString transformations:")
+print(df[['name', 'name_upper', 'city', 'city_lower']])
+
+# Filtering with string methods
+print("\nNames containing 'a':")
+print(df[df['name'].str.contains('a', case=False)])
diff --git a/03-sql-fundamentals/README.md b/03-sql-fundamentals/README.md
new file mode 100644
index 0000000..baf6664
--- /dev/null
+++ b/03-sql-fundamentals/README.md
@@ -0,0 +1,131 @@
+# SQL Fundamentals
+
+Welcome to SQL Fundamentals! Here you'll learn the essential skills for working with relational databases.
+
+## 📚 What You'll Learn
+
+- Basic SQL query syntax
+- Filtering and sorting data
+- Joining tables
+- Aggregate functions
+- Grouping data
+- Subqueries and CTEs
+
+## 📖 Lessons
+
+1. [Introduction to SQL and Databases](lessons/01-intro-to-sql.md)
+2. [SELECT Statements](lessons/02-select-statements.md)
+3. [Filtering with WHERE](lessons/03-where-clause.md)
+4. [Sorting and Limiting Results](lessons/04-order-limit.md)
+5. [Joins](lessons/05-joins.md)
+6. [Aggregate Functions](lessons/06-aggregates.md)
+7. [GROUP BY and HAVING](lessons/07-groupby-having.md)
+8. [Subqueries](lessons/08-subqueries.md)
+9. [Common Table Expressions (CTEs)](lessons/09-ctes.md)
+
+## 💻 Practice Database
+
+We'll use a sample database with the following tables:
+
+- `employees` - Employee information
+- `departments` - Department details
+- `projects` - Project information
+- `sales` - Sales transactions
+- `customers` - Customer data
+
+## 🗄️ Setting Up Your Database
+
+### Using SQLite (Easiest for Beginners)
+```bash
+# SQLite comes pre-installed on most systems
+sqlite3 practice.db
+```
+
+### Using PostgreSQL (Recommended for Production)
+```bash
+# Install PostgreSQL
+# Ubuntu/Debian
+sudo apt install postgresql
+
+# Mac
+brew install postgresql
+
+# Start the service and create database
+createdb learning_db
+psql learning_db
+```
+
+## 📝 Sample Queries
+
+Check the `queries/` folder for example SQL queries organized by topic.
+
+## ✏️ Exercises
+
+Complete the exercises in the `exercises/` folder. Each exercise includes:
+- Problem description
+- Sample data
+- Expected output
+- Solution (try solving on your own first!)
+
+## ⏱️ Estimated Time
+
+3-4 weeks of consistent practice
+
+## ✅ Completion Checklist
+
+- [ ] Complete all lessons
+- [ ] Run all sample queries
+- [ ] Solve all exercises
+- [ ] Create your own practice database
+- [ ] Write 50+ SQL queries
+
+## 🎯 Project Idea
+
+Build a sample e-commerce database with:
+- Products table
+- Orders table
+- Customers table
+- Order items table
+
+Write queries to:
+- Find top-selling products
+- Calculate revenue by month
+- Identify best customers
+- Analyze product categories
+
+## 📚 Additional Resources
+
+- [SQLZoo](https://sqlzoo.net/) - Interactive SQL tutorial
+- [PostgreSQL Tutorial](https://www.postgresqltutorial.com/)
+- [LeetCode SQL Problems](https://leetcode.com/problemset/database/)
+- [Mode SQL Tutorial](https://mode.com/sql-tutorial/)
+
+## 🔑 Key SQL Concepts
+
+### Basic Query Structure
+```sql
+SELECT column1, column2
+FROM table_name
+WHERE condition
+ORDER BY column1
+LIMIT 10;
+```
+
+### Common Data Types
+- `INTEGER` / `INT` - Whole numbers
+- `DECIMAL` / `NUMERIC` - Decimal numbers
+- `VARCHAR(n)` - Variable-length text
+- `TEXT` - Long text
+- `DATE` - Date values
+- `TIMESTAMP` - Date and time
+- `BOOLEAN` - True/false
+
+### SQL Keywords to Know
+- `SELECT` - Retrieve data
+- `FROM` - Specify table
+- `WHERE` - Filter rows
+- `JOIN` - Combine tables
+- `GROUP BY` - Group rows
+- `HAVING` - Filter groups
+- `ORDER BY` - Sort results
+- `LIMIT` - Restrict number of rows
diff --git a/03-sql-fundamentals/queries/01-basic-selects.sql b/03-sql-fundamentals/queries/01-basic-selects.sql
new file mode 100644
index 0000000..5c065b0
--- /dev/null
+++ b/03-sql-fundamentals/queries/01-basic-selects.sql
@@ -0,0 +1,56 @@
+-- Basic SELECT Queries
+-- This file contains examples of basic SELECT statements
+
+-- Select all columns from a table
+SELECT * FROM employees;
+
+-- Select specific columns
+SELECT first_name, last_name, email
+FROM employees;
+
+-- Select with column aliases
+SELECT 
+    first_name AS "First Name",
+    last_name AS "Last Name",
+    salary AS "Annual Salary"
+FROM employees;
+
+-- Select distinct values (remove duplicates)
+SELECT DISTINCT department_id
+FROM employees;
+
+SELECT DISTINCT city, country
+FROM customers;
+
+-- Select with calculations
+SELECT 
+    first_name,
+    last_name,
+    salary,
+    salary * 1.1 AS salary_with_raise,
+    salary / 12 AS monthly_salary
+FROM employees;
+
+-- Concatenate strings
+SELECT 
+    first_name || ' ' || last_name AS full_name,
+    email
+FROM employees;
+
+-- Using CONCAT function (in some SQL dialects)
+SELECT 
+    CONCAT(first_name, ' ', last_name) AS full_name,
+    email
+FROM employees;
+
+-- Select with LIMIT (restrict number of rows)
+SELECT first_name, last_name
+FROM employees
+LIMIT 5;
+
+-- Select current date/time
+SELECT CURRENT_DATE;
+SELECT CURRENT_TIMESTAMP;
+
+-- Select literal values
+SELECT 'Hello' AS greeting, 42 AS answer;
diff --git a/03-sql-fundamentals/queries/02-where-clause.sql b/03-sql-fundamentals/queries/02-where-clause.sql
new file mode 100644
index 0000000..b4440bb
--- /dev/null
+++ b/03-sql-fundamentals/queries/02-where-clause.sql
@@ -0,0 +1,99 @@
+-- WHERE Clause Examples
+-- Filtering data with various conditions
+
+-- Basic equality
+SELECT * FROM employees
+WHERE department_id = 5;
+
+-- Not equal
+SELECT * FROM employees
+WHERE department_id != 5;
+-- or
+SELECT * FROM employees
+WHERE department_id <> 5;
+
+-- Comparison operators
+SELECT first_name, last_name, salary
+FROM employees
+WHERE salary > 50000;
+
+SELECT * FROM employees
+WHERE hire_date >= '2020-01-01';
+
+-- BETWEEN operator
+SELECT first_name, last_name, salary
+FROM employees
+WHERE salary BETWEEN 40000 AND 60000;
+
+-- IN operator (match any value in a list)
+SELECT * FROM employees
+WHERE department_id IN (1, 3, 5);
+
+SELECT * FROM products
+WHERE category IN ('Electronics', 'Clothing', 'Books');
+
+-- LIKE operator (pattern matching)
+-- % matches any sequence of characters
+-- _ matches any single character
+
+-- Names starting with 'J'
+SELECT * FROM employees
+WHERE first_name LIKE 'J%';
+
+-- Names ending with 'son'
+SELECT * FROM employees
+WHERE last_name LIKE '%son';
+
+-- Names containing 'ar'
+SELECT * FROM employees
+WHERE first_name LIKE '%ar%';
+
+-- Email addresses from gmail
+SELECT * FROM employees
+WHERE email LIKE '%@gmail.com';
+
+-- Names with exactly 4 characters
+SELECT * FROM employees
+WHERE first_name LIKE '____';
+
+-- NULL checks
+SELECT * FROM employees
+WHERE manager_id IS NULL;
+
+SELECT * FROM employees
+WHERE phone_number IS NOT NULL;
+
+-- Combining conditions with AND
+SELECT * FROM employees
+WHERE salary > 50000 
+  AND department_id = 3;
+
+-- Combining conditions with OR
+SELECT * FROM employees
+WHERE department_id = 1 
+   OR department_id = 5;
+
+-- Using AND with OR (use parentheses for clarity)
+SELECT * FROM employees
+WHERE (department_id = 1 OR department_id = 5)
+  AND salary > 50000;
+
+-- NOT operator
+SELECT * FROM employees
+WHERE NOT department_id = 5;
+
+SELECT * FROM employees
+WHERE department_id NOT IN (1, 2, 3);
+
+-- Complex conditions
+SELECT 
+    first_name,
+    last_name,
+    salary,
+    department_id
+FROM employees
+WHERE 
+    (salary BETWEEN 40000 AND 70000)
+    AND department_id IN (2, 4, 6)
+    AND hire_date >= '2019-01-01'
+    AND email LIKE '%@company.com';
diff --git a/03-sql-fundamentals/queries/03-joins.sql b/03-sql-fundamentals/queries/03-joins.sql
new file mode 100644
index 0000000..b905133
--- /dev/null
+++ b/03-sql-fundamentals/queries/03-joins.sql
@@ -0,0 +1,193 @@
+-- SQL Joins - Combining Data from Multiple Tables
+-- Demonstrates different types of joins with examples
+
+-- Sample data structure (for reference):
+-- employees: employee_id, first_name, last_name, department_id, manager_id
+-- departments: department_id, department_name, location
+-- projects: project_id, project_name, budget
+-- project_assignments: employee_id, project_id, hours_worked
+
+-- ============================================
+-- INNER JOIN
+-- Returns only rows that have matches in both tables
+-- ============================================
+
+-- Basic inner join
+SELECT 
+    e.first_name,
+    e.last_name,
+    d.department_name
+FROM employees e
+INNER JOIN departments d ON e.department_id = d.department_id;
+
+-- Join with additional conditions
+SELECT 
+    e.first_name,
+    e.last_name,
+    d.department_name,
+    d.location
+FROM employees e
+INNER JOIN departments d ON e.department_id = d.department_id
+WHERE d.location = 'New York';
+
+-- ============================================
+-- LEFT JOIN (LEFT OUTER JOIN)
+-- Returns all rows from left table and matching rows from right table
+-- ============================================
+
+-- Find all employees and their departments (including employees without departments)
+SELECT 
+    e.first_name,
+    e.last_name,
+    d.department_name
+FROM employees e
+LEFT JOIN departments d ON e.department_id = d.department_id;
+
+-- Find employees who are not assigned to any department
+SELECT 
+    e.first_name,
+    e.last_name
+FROM employees e
+LEFT JOIN departments d ON e.department_id = d.department_id
+WHERE d.department_id IS NULL;
+
+-- ============================================
+-- RIGHT JOIN (RIGHT OUTER JOIN)
+-- Returns all rows from right table and matching rows from left table
+-- ============================================
+
+-- Find all departments and their employees (including departments with no employees)
+SELECT 
+    d.department_name,
+    e.first_name,
+    e.last_name
+FROM employees e
+RIGHT JOIN departments d ON e.department_id = d.department_id;
+
+-- Find departments with no employees
+SELECT 
+    d.department_name
+FROM employees e
+RIGHT JOIN departments d ON e.department_id = d.department_id
+WHERE e.employee_id IS NULL;
+
+-- ============================================
+-- FULL OUTER JOIN
+-- Returns all rows when there's a match in either table
+-- ============================================
+
+-- Find all employees and departments (including unmatched records from both)
+SELECT 
+    e.first_name,
+    e.last_name,
+    d.department_name
+FROM employees e
+FULL OUTER JOIN departments d ON e.department_id = d.department_id;
+
+-- ============================================
+-- SELF JOIN
+-- Joining a table to itself
+-- ============================================
+
+-- Find employees and their managers
+SELECT 
+    e.first_name || ' ' || e.last_name AS employee,
+    m.first_name || ' ' || m.last_name AS manager
+FROM employees e
+LEFT JOIN employees m ON e.manager_id = m.employee_id;
+
+-- ============================================
+-- MULTIPLE JOINS
+-- Joining more than two tables
+-- ============================================
+
+-- Find employees, their departments, and projects
+SELECT 
+    e.first_name,
+    e.last_name,
+    d.department_name,
+    p.project_name,
+    pa.hours_worked
+FROM employees e
+INNER JOIN departments d ON e.department_id = d.department_id
+INNER JOIN project_assignments pa ON e.employee_id = pa.employee_id
+INNER JOIN projects p ON pa.project_id = p.project_id;
+
+-- ============================================
+-- JOIN with Aggregate Functions
+-- ============================================
+
+-- Count employees per department
+SELECT 
+    d.department_name,
+    COUNT(e.employee_id) AS employee_count
+FROM departments d
+LEFT JOIN employees e ON d.department_id = e.department_id
+GROUP BY d.department_name
+ORDER BY employee_count DESC;
+
+-- Total hours worked by employee on all projects
+SELECT 
+    e.first_name,
+    e.last_name,
+    SUM(pa.hours_worked) AS total_hours
+FROM employees e
+INNER JOIN project_assignments pa ON e.employee_id = pa.employee_id
+GROUP BY e.employee_id, e.first_name, e.last_name
+HAVING SUM(pa.hours_worked) > 100;
+
+-- ============================================
+-- CROSS JOIN
+-- Cartesian product of two tables (all possible combinations)
+-- ============================================
+
+-- Create all possible employee-project combinations (use carefully!)
+SELECT 
+    e.first_name,
+    e.last_name,
+    p.project_name
+FROM employees e
+CROSS JOIN projects p;
+
+-- Practical use: Generate date range for each employee
+-- (assuming you have a dates table)
+SELECT 
+    e.first_name,
+    d.date
+FROM employees e
+CROSS JOIN date_range d
+WHERE d.date BETWEEN '2024-01-01' AND '2024-01-31';
+
+-- ============================================
+-- JOIN with USING clause
+-- When column names are the same in both tables
+-- ============================================
+
+-- Instead of: ON e.department_id = d.department_id
+-- You can use: USING (department_id)
+SELECT 
+    e.first_name,
+    e.last_name,
+    d.department_name
+FROM employees e
+INNER JOIN departments d USING (department_id);
+
+-- ============================================
+-- Complex JOIN Example
+-- ============================================
+
+-- Find employees working on high-budget projects in specific locations
+SELECT 
+    e.first_name || ' ' || e.last_name AS employee_name,
+    d.department_name,
+    d.location,
+    p.project_name,
+    p.budget,
+    pa.hours_worked
+FROM employees e
+INNER JOIN departments d ON e.department_id = d.department_id
+INNER JOIN project_assignments pa ON e.employee_id = pa.employee_id
+INNER JOIN projects p ON pa.project_id = p.project_id
+WHERE p.budget > 100000
+  AND d.location IN ('New York', 'San Francisco')
+ORDER BY p.budget DESC, pa.hours_worked DESC;
diff --git a/04-advanced-sql/README.md b/04-advanced-sql/README.md
new file mode 100644
index 0000000..22be85e
--- /dev/null
+++ b/04-advanced-sql/README.md
@@ -0,0 +1,160 @@
+# Advanced SQL
+
+Welcome to Advanced SQL! Here you'll learn optimization, database design, and advanced query techniques.
+
+## 📚 What You'll Learn
+
+- Database design and normalization
+- Indexes and query optimization
+- Window functions
+- Stored procedures and functions
+- Transactions and concurrency
+- Performance tuning
+- Advanced query patterns
+
+## 📖 Lessons
+
+1. [Database Design Principles](lessons/01-database-design.md)
+2. [Normalization](lessons/02-normalization.md)
+3. [Indexes and Performance](lessons/03-indexes.md)
+4. [Window Functions](lessons/04-window-functions.md)
+5. [Stored Procedures](lessons/05-stored-procedures.md)
+6. [Transactions](lessons/06-transactions.md)
+7. [Query Optimization](lessons/07-query-optimization.md)
+8. [Advanced Patterns](lessons/08-advanced-patterns.md)
+
+## 💻 Sample Queries
+
+Check the `queries/` folder for advanced SQL examples:
+- Window functions
+- Complex aggregations
+- Recursive queries
+- Performance optimization examples
+
+## ⏱️ Estimated Time
+
+3-4 weeks with hands-on practice
+
+## ✅ Completion Checklist
+
+- [ ] Understand database normalization
+- [ ] Design efficient database schemas
+- [ ] Use indexes effectively
+- [ ] Write window functions
+- [ ] Create stored procedures
+- [ ] Understand transactions
+- [ ] Optimize slow queries
+- [ ] Complete all exercises
+
+## 🎯 Key Concepts
+
+### Window Functions
+```sql
+-- Running total
+SELECT 
+    date,
+    amount,
+    SUM(amount) OVER (ORDER BY date) AS running_total
+FROM sales;
+
+-- Ranking
+SELECT 
+    employee_name,
+    salary,
+    RANK() OVER (ORDER BY salary DESC) AS salary_rank
+FROM employees;
+
+-- Partitioned aggregation
+SELECT 
+    department,
+    employee_name,
+    salary,
+    AVG(salary) OVER (PARTITION BY department) AS dept_avg
+FROM employees;
+```
+
+### Indexes
+```sql
+-- Create index
+CREATE INDEX idx_employee_name ON employees(last_name, first_name);
+
+-- Create unique index
+CREATE UNIQUE INDEX idx_employee_email ON employees(email);
+
+-- Create partial index
+CREATE INDEX idx_active_employees 
+ON employees(department_id) 
+WHERE status = 'active';
+```
+
+### Common Table Expressions (CTEs)
+```sql
+-- Simple CTE
+WITH high_earners AS (
+    SELECT * FROM employees
+    WHERE salary > 80000
+)
+SELECT department_id, COUNT(*) 
+FROM high_earners 
+GROUP BY department_id;
+
+-- Recursive CTE (org hierarchy)
+WITH RECURSIVE employee_hierarchy AS (
+    SELECT employee_id, name, manager_id, 1 AS level
+    FROM employees
+    WHERE manager_id IS NULL
+    
+    UNION ALL
+    
+    SELECT e.employee_id, e.name, e.manager_id, eh.level + 1
+    FROM employees e
+    JOIN employee_hierarchy eh ON e.manager_id = eh.employee_id
+)
+SELECT * FROM employee_hierarchy;
+```
+
+## 💡 Best Practices
+
+1. **Indexing**: Index foreign keys and frequently queried columns
+2. **Query Design**: Avoid SELECT *, use specific columns
+3. **Joins**: Use appropriate join types
+4. **Transactions**: Keep them short and focused
+5. **Testing**: Test queries with production-like data volumes
+6. **Documentation**: Document complex queries
+7. **Monitoring**: Track slow queries
+
+## 🔍 Query Optimization Tips
+
+1. **Use EXPLAIN**: Analyze query execution plans
+2. **Limit Result Sets**: Use WHERE clauses effectively
+3. **Avoid Functions in WHERE**: Can prevent index usage
+4. **Use Joins Instead of Subqueries**: Often faster
+5. **Proper Data Types**: Use appropriate types for columns
+6. **Batch Operations**: Bulk inserts instead of row-by-row
+7. **Connection Pooling**: Reuse database connections
+
+## 📚 Additional Resources
+
+- [Use The Index, Luke](https://use-the-index-luke.com/)
+- [PostgreSQL Performance Tuning](https://wiki.postgresql.org/wiki/Performance_Optimization)
+- [SQL Server Execution Plans](https://www.red-gate.com/simple-talk/databases/sql-server/performance-sql-server/execution-plans/)
+
+## Real-World Scenarios
+
+### Scenario 1: Slow Dashboard Query
+- Analyze execution plan
+- Add appropriate indexes
+- Rewrite query to reduce joins
+- Consider materialized views
+
+### Scenario 2: Concurrent Updates
+- Implement proper transactions
+- Handle deadlocks
+- Use appropriate isolation levels
+- Design for concurrency
+
+### Scenario 3: Large Data Imports
+- Use bulk insert methods
+- Disable indexes during import
+- Rebuild indexes after import
+- Use transactions appropriately
diff --git a/05-data-engineering/README.md b/05-data-engineering/README.md
new file mode 100644
index 0000000..594aeb0
--- /dev/null
+++ b/05-data-engineering/README.md
@@ -0,0 +1,129 @@
+# Data Engineering Concepts
+
+Welcome to the Data Engineering section! Here you'll learn the core concepts and practices of data engineering.
+
+## 📚 What You'll Learn
+
+- ETL vs ELT processes
+- Data pipeline architecture
+- Data warehousing concepts
+- Data quality and validation
+- Data modeling
+- Workflow orchestration
+- Version control for data projects
+
+## 📖 Lessons
+
+1. [Introduction to Data Engineering](lessons/01-intro-data-engineering.md)
+2. [ETL vs ELT](lessons/02-etl-vs-elt.md)
+3. [Data Pipelines](lessons/03-data-pipelines.md)
+4. [Data Warehousing](lessons/04-data-warehousing.md)
+5. [Data Quality](lessons/05-data-quality.md)
+6. [Data Modeling](lessons/06-data-modeling.md)
+7. [Workflow Orchestration](lessons/07-orchestration.md)
+8. [Version Control with Git](lessons/08-version-control.md)
+
+## 🏗️ Projects
+
+### Project 1: Simple ETL Pipeline
+Build an ETL pipeline that:
+- Extracts data from CSV files
+- Transforms and cleans the data
+- Loads it into a database
+
+### Project 2: Data Quality Framework
+Create a data quality checking system that:
+- Validates data types
+- Checks for null values
+- Identifies duplicates
+- Generates quality reports
+
+### Project 3: Automated Data Pipeline
+Build an automated pipeline that:
+- Runs on a schedule
+- Processes incoming data
+- Handles errors gracefully
+- Sends notifications
+
+## ⏱️ Estimated Time
+
+4-6 weeks with hands-on projects
+
+## ✅ Completion Checklist
+
+- [ ] Understand ETL vs ELT
+- [ ] Build a basic ETL pipeline
+- [ ] Design a data warehouse schema
+- [ ] Implement data quality checks
+- [ ] Use Git for version control
+- [ ] Complete all projects
+
+## 🎯 Real-World Scenarios
+
+### Scenario 1: E-commerce Analytics
+Design a data pipeline for an e-commerce company that:
+- Ingests order data from multiple sources
+- Processes customer behavior data
+- Creates aggregated reports
+- Feeds a dashboard
+
+### Scenario 2: IoT Data Processing
+Build a system to:
+- Collect sensor data
+- Clean and validate readings
+- Store time-series data efficiently
+- Generate alerts for anomalies
+
+## 🔑 Key Concepts
+
+### ETL Process
+1. **Extract**: Pull data from source systems
+2. **Transform**: Clean, validate, and reshape data
+3. **Load**: Store data in target system
+
+### Data Pipeline Components
+- **Source**: Where data comes from
+- **Ingestion**: How data is collected
+- **Processing**: Data transformation logic
+- **Storage**: Where data is stored
+- **Orchestration**: How pipeline steps are coordinated
+
+### Data Quality Dimensions
+- **Accuracy**: Is the data correct?
+- **Completeness**: Is all required data present?
+- **Consistency**: Is data consistent across sources?
+- **Timeliness**: Is data up-to-date?
+- **Validity**: Does data follow business rules?
+
+## 🛠️ Tools You'll Use
+
+- **Python**: For data processing
+- **Pandas**: For data manipulation
+- **SQL**: For data querying
+- **Git**: For version control
+- **SQLite/PostgreSQL**: For data storage
+
+## 📚 Additional Resources
+
+- [The Data Engineering Cookbook](https://github.com/andkret/Cookbook)
+- [Fundamentals of Data Engineering (Book)](https://www.oreilly.com/library/view/fundamentals-of-data/9781098108298/)
+- [Data Engineering Weekly Newsletter](https://www.dataengineeringweekly.com/)
+
+## 💡 Best Practices
+
+1. **Documentation**: Document your pipelines thoroughly
+2. **Testing**: Test your data pipelines
+3. **Monitoring**: Monitor pipeline health and data quality
+4. **Idempotency**: Design pipelines to be rerunnable
+5. **Error Handling**: Handle failures gracefully
+6. **Scalability**: Design for growth
+7. **Security**: Protect sensitive data
+
+## 🎓 Career Path
+
+Understanding these concepts prepares you for roles like:
+- Data Engineer
+- ETL Developer
+- Data Pipeline Engineer
+- Analytics Engineer
+- Data Platform Engineer
diff --git a/05-data-engineering/projects/simple-etl/etl_example.py b/05-data-engineering/projects/simple-etl/etl_example.py
new file mode 100644
index 0000000..0c0e15d
--- /dev/null
+++ b/05-data-engineering/projects/simple-etl/etl_example.py
@@ -0,0 +1,159 @@
+"""
+Simple ETL Pipeline Example
+Demonstrates Extract, Transform, Load process
+"""
+
+import csv
+import sqlite3
+from datetime import datetime
+
+
+def extract_data(csv_file):
+    """
+    Extract data from CSV file
+    
+    Args:
+        csv_file: Path to CSV file
+        
+    Returns:
+        List of dictionaries containing the data
+    """
+    print(f"[{datetime.now()}] Extracting data from {csv_file}...")
+    data = []
+    
+    try:
+        with open(csv_file, 'r') as file:
+            csv_reader = csv.DictReader(file)
+            for row in csv_reader:
+                data.append(row)
+        print(f"[{datetime.now()}] Extracted {len(data)} records")
+        return data
+    except FileNotFoundError:
+        print(f"Error: File {csv_file} not found")
+        return []
+
+
+def transform_data(data):
+    """
+    Transform and clean the data
+    
+    Args:
+        data: List of dictionaries to transform
+        
+    Returns:
+        Transformed data
+    """
+    print(f"[{datetime.now()}] Transforming data...")
+    transformed = []
+    
+    for record in data:
+        # Example transformations
+        transformed_record = {
+            'id': int(record.get('id', 0)),
+            'name': record.get('name', '').strip().title(),
+            'email': record.get('email', '').strip().lower(),
+            'age': int(record.get('age', 0)) if record.get('age') else None,
+            'city': record.get('city', '').strip().title(),
+            'processed_date': datetime.now().strftime('%Y-%m-%d')
+        }
+        
+        # Data quality checks
+        if transformed_record['email'] and '@' in transformed_record['email']:
+            transformed.append(transformed_record)
+        else:
+            print(f"Skipping invalid record: {record}")
+    
+    print(f"[{datetime.now()}] Transformed {len(transformed)} valid records")
+    return transformed
+
+
+def load_data(data, db_name='etl_output.db'):
+    """
+    Load data into SQLite database
+    
+    Args:
+        data: List of dictionaries to load
+        db_name: Name of the database file
+    """
+    print(f"[{datetime.now()}] Loading data into database...")
+    
+    # Connect to database (creates if doesn't exist)
+    conn = sqlite3.connect(db_name)
+    cursor = conn.cursor()
+    
+    # Create table if it doesn't exist
+    cursor.execute('''
+        CREATE TABLE IF NOT EXISTS users (
+            id INTEGER PRIMARY KEY,
+            name TEXT NOT NULL,
+            email TEXT UNIQUE NOT NULL,
+            age INTEGER,
+            city TEXT,
+            processed_date TEXT
+        )
+    ''')
+    
+    # Insert data
+    inserted = 0
+    for record in data:
+        try:
+            cursor.execute('''
+                INSERT OR REPLACE INTO users (id, name, email, age, city, processed_date)
+                VALUES (?, ?, ?, ?, ?, ?)
+            ''', (
+                record['id'],
+                record['name'],
+                record['email'],
+                record['age'],
+                record['city'],
+                record['processed_date']
+            ))
+            inserted += 1
+        except sqlite3.Error as e:
+            print(f"Error inserting record {record['id']}: {e}")
+    
+    conn.commit()
+    conn.close()
+    
+    print(f"[{datetime.now()}] Loaded {inserted} records into database")
+
+
+def run_etl_pipeline(source_file, target_db='etl_output.db'):
+    """
+    Run the complete ETL pipeline
+    
+    Args:
+        source_file: Path to source CSV file
+        target_db: Name of target database
+    """
+    print(f"\n{'='*50}")
+    print("Starting ETL Pipeline")
+    print(f"{'='*50}\n")
+    
+    start_time = datetime.now()
+    
+    # Extract
+    raw_data = extract_data(source_file)
+    
+    if not raw_data:
+        print("No data to process. Exiting.")
+        return
+    
+    # Transform
+    clean_data = transform_data(raw_data)
+    
+    # Load
+    load_data(clean_data, target_db)
+    
+    end_time = datetime.now()
+    duration = (end_time - start_time).total_seconds()
+    
+    print(f"\n{'='*50}")
+    print(f"ETL Pipeline Completed in {duration:.2f} seconds")
+    print(f"{'='*50}\n")
+
+
+if __name__ == "__main__":
+    # Example usage
+    # Create a sample CSV file first or replace with your file
+    run_etl_pipeline('sample_data.csv')
diff --git a/06-advanced-topics/README.md b/06-advanced-topics/README.md
new file mode 100644
index 0000000..a347409
--- /dev/null
+++ b/06-advanced-topics/README.md
@@ -0,0 +1,283 @@
+# Advanced Topics in Data Engineering
+
+This section covers advanced concepts and technologies that modern data engineers use in production environments.
+
+## 📚 What You'll Learn
+
+- Introduction to Apache Spark
+- Cloud data platforms (AWS, GCP, Azure)
+- Data streaming concepts
+- Containerization with Docker
+- Testing data pipelines
+- CI/CD for data engineering
+- Data governance and security
+
+## 📖 Lessons
+
+1. [Introduction to Big Data](lessons/01-big-data-intro.md)
+2. [Apache Spark Basics](lessons/02-spark-basics.md)
+3. [Cloud Platforms Overview](lessons/03-cloud-platforms.md)
+4. [Data Streaming](lessons/04-data-streaming.md)
+5. [Docker for Data Engineers](lessons/05-docker.md)
+6. [Testing Data Pipelines](lessons/06-testing.md)
+7. [CI/CD](lessons/07-cicd.md)
+8. [Data Governance](lessons/08-data-governance.md)
+
+## 🎯 Projects
+
+### Project 1: Dockerized ETL Pipeline
+- Package ETL pipeline in Docker
+- Use Docker Compose for multi-container setup
+- Include database and application
+
+### Project 2: Cloud Data Pipeline
+- Build pipeline on cloud platform
+- Use managed services
+- Implement monitoring
+
+### Project 3: Streaming Data Pipeline
+- Process real-time data
+- Use message queues
+- Handle high throughput
+
+## ⏱️ Estimated Time
+
+6-8 weeks for comprehensive understanding
+
+## ✅ Completion Checklist
+
+- [ ] Understand big data concepts
+- [ ] Learn Spark basics
+- [ ] Explore cloud platforms
+- [ ] Build a Docker container
+- [ ] Understand streaming concepts
+- [ ] Implement testing
+- [ ] Set up CI/CD pipeline
+- [ ] Complete capstone project
+
+## 🔑 Key Technologies
+
+### Apache Spark
+```python
+from pyspark.sql import SparkSession
+
+# Create Spark session
+spark = SparkSession.builder \
+    .appName("DataEngineering") \
+    .getOrCreate()
+
+# Read data
+df = spark.read.csv("data.csv", header=True)
+
+# Transform
+df_transformed = df.filter(df.age > 25) \
+    .groupBy("city") \
+    .count()
+
+# Write
+df_transformed.write.parquet("output/")
+```
+
+### Docker
+```dockerfile
+# Dockerfile for Python app
+FROM python:3.9-slim
+
+WORKDIR /app
+
+COPY requirements.txt .
+RUN pip install -r requirements.txt
+
+COPY . .
+
+CMD ["python", "etl_pipeline.py"]
+```
+
+### Docker Compose
+```yaml
+# docker-compose.yml
+version: '3.8'
+
+services:
+  postgres:
+    image: postgres:13
+    environment:
+      POSTGRES_PASSWORD: password
+    ports:
+      - "5432:5432"
+  
+  etl_app:
+    build: .
+    depends_on:
+      - postgres
+    environment:
+      DB_HOST: postgres
+```
+
+## ☁️ Cloud Platforms
+
+### AWS Services for Data Engineering
+- **S3**: Object storage
+- **RDS**: Managed databases
+- **Redshift**: Data warehouse
+- **Glue**: ETL service
+- **Lambda**: Serverless compute
+- **Kinesis**: Streaming data
+
+### GCP Services
+- **Cloud Storage**: Object storage
+- **Cloud SQL**: Managed databases
+- **BigQuery**: Data warehouse
+- **Dataflow**: Stream/batch processing
+- **Cloud Functions**: Serverless
+- **Pub/Sub**: Messaging
+
+### Azure Services
+- **Blob Storage**: Object storage
+- **Azure SQL**: Managed databases
+- **Synapse Analytics**: Data warehouse
+- **Data Factory**: ETL/ELT
+- **Functions**: Serverless
+- **Event Hubs**: Streaming
+
+## 🧪 Testing Data Pipelines
+
+### Unit Testing
+```python
+import pytest
+import pandas as pd
+
+def test_data_transformation():
+    # Arrange
+    input_data = pd.DataFrame({
+        'name': ['Alice', 'Bob'],
+        'age': [25, 30]
+    })
+    
+    # Act
+    result = transform_data(input_data)
+    
+    # Assert
+    assert len(result) == 2
+    assert 'age_group' in result.columns
+```
+
+### Integration Testing
+```python
+def test_database_connection():
+    engine = create_engine(TEST_DB_URL)
+    conn = engine.connect()
+    assert conn is not None
+    conn.close()
+
+def test_etl_pipeline():
+    # Run entire pipeline on test data
+    run_pipeline(test_source, test_target)
+    # Verify results
+    result = read_from_target()
+    assert result.shape[0] > 0
+```
+
+## 🔄 CI/CD Example
+
+### GitHub Actions Workflow
+```yaml
+name: Data Pipeline CI
+
+on: [push, pull_request]
+
+jobs:
+  test:
+    runs-on: ubuntu-latest
+    
+    steps:
+    - uses: actions/checkout@v2
+    
+    - name: Set up Python
+      uses: actions/setup-python@v2
+      with:
+        python-version: 3.9
+    
+    - name: Install dependencies
+      run: |
+        pip install -r requirements.txt
+        pip install pytest
+    
+    - name: Run tests
+      run: pytest
+    
+    - name: Run linter
+      run: pylint *.py
+```
+
+## 📊 Data Streaming Concepts
+
+### Key Concepts
+- **Event Stream**: Continuous flow of events
+- **Message Queue**: Buffer between producers and consumers
+- **Consumer Group**: Multiple consumers processing stream
+- **Offset**: Position in stream
+- **Windowing**: Time-based aggregations
+
+### Example Technologies
+- **Apache Kafka**: Distributed streaming platform
+- **RabbitMQ**: Message broker
+- **AWS Kinesis**: Managed streaming
+- **Google Pub/Sub**: Messaging service
+
+## 💡 Best Practices
+
+1. **Containerization**: Use Docker for consistency
+2. **Testing**: Test at multiple levels
+3. **Monitoring**: Implement comprehensive monitoring
+4. **Documentation**: Document architecture and decisions
+5. **Security**: Follow security best practices
+6. **Cost Optimization**: Monitor and optimize cloud costs
+7. **Scalability**: Design for growth
+
+## 📚 Additional Resources
+
+### Books
+- "Learning Spark" by Matei Zaharia
+- "Streaming Systems" by Tyler Akidau
+- "Docker Deep Dive" by Nigel Poulton
+
+### Online
+- [Apache Spark Documentation](https://spark.apache.org/docs/latest/)
+- [Docker Documentation](https://docs.docker.com/)
+- [AWS Data Analytics](https://aws.amazon.com/big-data/datalakes-and-analytics/)
+
+### Certifications
+- AWS Certified Data Analytics
+- Google Professional Data Engineer
+- Microsoft Certified: Azure Data Engineer
+- Databricks Certified Data Engineer
+
+## Career Advancement
+
+Mastering these topics prepares you for:
+- Senior Data Engineer
+- Data Platform Engineer
+- Big Data Engineer
+- Cloud Data Engineer
+- MLOps Engineer
+
+## Capstone Project Ideas
+
+1. **Real-time Analytics Dashboard**
+   - Stream data from API
+   - Process with Spark Streaming
+   - Store in time-series database
+   - Visualize in real-time
+
+2. **Cloud Data Warehouse**
+   - Design star schema
+   - Implement on cloud platform
+   - Build ETL pipeline
+   - Add data quality checks
+
+3. **Containerized Pipeline**
+   - Full ETL pipeline in Docker
+   - Orchestrated with Airflow
+   - Automated testing
+   - CI/CD deployment
diff --git a/07-projects/README.md b/07-projects/README.md
new file mode 100644
index 0000000..ca5849a
--- /dev/null
+++ b/07-projects/README.md
@@ -0,0 +1,293 @@
+# Capstone Projects
+
+This section contains comprehensive projects that bring together everything you've learned. Each project simulates real-world data engineering scenarios.
+
+## 🎯 Projects Overview
+
+### Project 1: ETL Pipeline
+Build a complete Extract, Transform, Load pipeline for processing sales data.
+
+### Project 2: Data Warehouse
+Design and implement a dimensional data warehouse for analytics.
+
+### Project 3: Real-Time Dashboard
+Create a system that processes streaming data and displays real-time metrics.
+
+## 📋 Prerequisites
+
+Before starting these projects, you should have completed:
+- Python Fundamentals
+- Python for Data Engineering
+- SQL Fundamentals
+- Data Engineering Concepts
+
+## 🚀 How to Approach These Projects
+
+1. **Understand Requirements**: Read project specs carefully
+2. **Plan Architecture**: Design before coding
+3. **Start Simple**: Build MVP first
+4. **Iterate**: Add features incrementally
+5. **Test**: Validate at each step
+6. **Document**: Explain your design decisions
+7. **Refactor**: Improve code quality
+8. **Deploy**: Make it production-ready
+
+## Project 1: ETL Pipeline
+
+### Overview
+Build an automated ETL pipeline that processes e-commerce data from multiple sources.
+
+### Objectives
+- Extract data from CSV, JSON, and API
+- Clean and validate data
+- Transform for analytics
+- Load into database
+- Schedule regular updates
+- Handle errors gracefully
+
+### Requirements
+- Python 3.8+
+- PostgreSQL or SQLite
+- Pandas
+- SQLAlchemy
+
+### Skills Practiced
+- Data extraction from multiple sources
+- Data cleaning and validation
+- Database operations
+- Error handling
+- Logging
+- Scheduling
+
+### Deliverables
+- Working ETL pipeline
+- Documentation
+- Unit tests
+- Configuration files
+- README with setup instructions
+
+### Success Criteria
+- Pipeline runs without errors
+- Data quality checks pass
+- Handles edge cases
+- Well-documented code
+- Tests cover main functionality
+
+---
+
+## Project 2: Data Warehouse
+
+### Overview
+Design and implement a data warehouse using dimensional modeling for a fictional retail company.
+
+### Objectives
+- Design star schema
+- Create dimension and fact tables
+- Build ETL to populate warehouse
+- Write analytical queries
+- Optimize for performance
+- Document design decisions
+
+### Requirements
+- PostgreSQL (or similar)
+- Python for ETL
+- Understanding of dimensional modeling
+- SQL knowledge
+
+### Skills Practiced
+- Database design
+- Dimensional modeling
+- Data warehouse concepts
+- ETL development
+- Query optimization
+- Performance tuning
+
+### Deliverables
+- Database schema (ERD)
+- ETL scripts
+- Sample analytical queries
+- Documentation
+- Performance analysis
+
+### Success Criteria
+- Properly normalized dimensions
+- Efficient fact table design
+- Working ETL process
+- Optimized queries
+- Clear documentation
+
+---
+
+## Project 3: Real-Time Dashboard
+
+### Overview
+Create a system that ingests streaming data, processes it, and displays real-time metrics on a dashboard.
+
+### Objectives
+- Simulate or connect to data stream
+- Process data in real-time
+- Store processed data
+- Create visualization dashboard
+- Handle high throughput
+- Implement monitoring
+
+### Requirements
+- Python
+- Database (PostgreSQL/TimescaleDB)
+- Message queue (optional)
+- Visualization tool (Plotly/Dash/Grafana)
+
+### Skills Practiced
+- Stream processing
+- Real-time data handling
+- Data visualization
+- System design
+- Performance optimization
+
+### Deliverables
+- Data ingestion service
+- Processing pipeline
+- Dashboard application
+- Documentation
+- Demo video
+
+### Success Criteria
+- Handles data in real-time
+- Low latency processing
+- Responsive dashboard
+- Scalable design
+- Proper error handling
+
+---
+
+## 📚 Additional Project Ideas
+
+### Beginner Projects
+1. **CSV Data Cleaner**: Tool to clean messy CSV files
+2. **Database Backup Script**: Automate database backups
+3. **Log File Parser**: Extract insights from log files
+4. **Data Quality Checker**: Validate data against rules
+
+### Intermediate Projects
+5. **API Data Aggregator**: Collect data from multiple APIs
+6. **Automated Report Generator**: Generate daily/weekly reports
+7. **Data Version Control**: Track changes in datasets
+8. **Multi-Source Data Integration**: Combine different data sources
+
+### Advanced Projects
+9. **Data Lakehouse**: Implement data lake and warehouse
+10. **ML Pipeline**: Data pipeline for machine learning
+11. **Data Observability Platform**: Monitor data quality and pipelines
+12. **Change Data Capture (CDC)**: Track database changes
+
+## 💡 Tips for Success
+
+### Planning
+- Sketch architecture diagrams
+- List requirements clearly
+- Break into small tasks
+- Estimate time needed
+
+### Development
+- Use version control (Git)
+- Commit frequently
+- Write tests as you go
+- Document as you code
+
+### Best Practices
+- Follow coding standards
+- Handle errors properly
+- Add logging
+- Use configuration files
+- Keep credentials secure
+
+### Testing
+- Test with sample data first
+- Validate edge cases
+- Performance test with realistic data
+- Test failure scenarios
+
+### Documentation
+- Explain design decisions
+- Document setup process
+- Provide usage examples
+- Include troubleshooting guide
+
+## 🎓 Learning Outcomes
+
+After completing these projects, you will:
+- Have portfolio projects for job applications
+- Understand full data engineering lifecycle
+- Know how to design data systems
+- Be comfortable with production concepts
+- Have experience with real-world challenges
+
+## 📝 Project Presentation
+
+For each project, prepare:
+1. **Problem Statement**: What you're solving
+2. **Architecture Diagram**: System design
+3. **Technology Stack**: Tools used
+4. **Demo**: Working demonstration
+5. **Challenges**: What you learned
+6. **Future Improvements**: What's next
+
+## 🤝 Getting Help
+
+If you get stuck:
+1. Review relevant lessons
+2. Check documentation
+3. Search Stack Overflow
+4. Ask in communities
+5. Review similar projects on GitHub
+
+## 🌟 Showcase Your Work
+
+- Push to GitHub with good README
+- Write blog post about your project
+- Create demo video
+- Add to your portfolio
+- Share on LinkedIn
+
+## 📊 Evaluation Rubric
+
+### Code Quality (25%)
+- Clean, readable code
+- Proper structure
+- Comments and docstrings
+- Follows best practices
+
+### Functionality (25%)
+- Meets requirements
+- Works as expected
+- Handles edge cases
+- Error handling
+
+### Design (20%)
+- Good architecture
+- Scalable solution
+- Efficient implementation
+- Proper data modeling
+
+### Testing (15%)
+- Unit tests included
+- Test coverage
+- Tests pass
+- Edge cases covered
+
+### Documentation (15%)
+- Clear README
+- Setup instructions
+- Architecture explained
+- Usage examples
+
+## Next Steps
+
+1. Choose a project that interests you
+2. Read the detailed requirements
+3. Plan your approach
+4. Start building
+5. Iterate and improve
+6. Share your work
+
+Good luck with your projects! These will form the foundation of your data engineering portfolio.
diff --git a/07-projects/etl-pipeline/README.md b/07-projects/etl-pipeline/README.md
new file mode 100644
index 0000000..ba7c65a
--- /dev/null
+++ b/07-projects/etl-pipeline/README.md
@@ -0,0 +1,360 @@
+# Project 1: E-commerce ETL Pipeline
+
+## 📋 Project Overview
+
+Build a production-ready ETL pipeline that processes e-commerce sales data from multiple sources, validates and transforms it, and loads it into a database for analytics.
+
+## 🎯 Objectives
+
+1. Extract data from multiple sources (CSV, JSON, API)
+2. Implement data validation and quality checks
+3. Transform data for analytics
+4. Load data into a PostgreSQL database
+5. Handle errors and edge cases
+6. Implement logging and monitoring
+7. Make the pipeline schedulable
+
+## 📊 Data Sources
+
+### Source 1: Orders CSV
+Daily exports of order data:
+```
+order_id,customer_id,order_date,total_amount,status
+1001,5001,2024-01-15,99.99,completed
+1002,5002,2024-01-15,149.50,pending
+```
+
+### Source 2: Product Catalog API
+REST API endpoint: `/api/products`
+```json
+{
+  "products": [
+    {
+      "product_id": "P001",
+      "name": "Laptop",
+      "category": "Electronics",
+      "price": 999.99
+    }
+  ]
+}
+```
+
+### Source 3: Customer Data JSON
+Customer information updates:
+```json
+{
+  "customer_id": 5001,
+  "name": "John Doe",
+  "email": "john@example.com",
+  "signup_date": "2023-01-10"
+}
+```
+
+## 🏗️ Architecture
+
+```
+┌─────────────┐     ┌──────────────┐     ┌──────────────┐
+│   Sources   │────▶│  ETL Process │────▶│   Database   │
+└─────────────┘     └──────────────┘     └──────────────┘
+     │                     │                      │
+     ├─ CSV Files          ├─ Extract             ├─ PostgreSQL
+     ├─ JSON Files         ├─ Transform           ├─ Staging Tables
+     └─ REST API           ├─ Load                └─ Final Tables
+                           └─ Validate
+```
+
+## 📁 Project Structure
+
+```
+etl-pipeline/
+├── README.md
+├── requirements.txt
+├── config/
+│   ├── config.yaml
+│   └── logging_config.yaml
+├── data/
+│   ├── input/
+│   │   ├── orders/
+│   │   ├── customers/
+│   │   └── products/
+│   └── output/
+├── src/
+│   ├── __init__.py
+│   ├── extract/
+│   │   ├── __init__.py
+│   │   ├── csv_extractor.py
+│   │   ├── json_extractor.py
+│   │   └── api_extractor.py
+│   ├── transform/
+│   │   ├── __init__.py
+│   │   ├── data_cleaner.py
+│   │   ├── data_validator.py
+│   │   └── data_transformer.py
+│   ├── load/
+│   │   ├── __init__.py
+│   │   └── database_loader.py
+│   ├── utils/
+│   │   ├── __init__.py
+│   │   ├── logger.py
+│   │   ├── db_connection.py
+│   │   └── config_loader.py
+│   └── pipeline.py
+├── tests/
+│   ├── __init__.py
+│   ├── test_extract.py
+│   ├── test_transform.py
+│   ├── test_load.py
+│   └── test_pipeline.py
+├── sql/
+│   ├── schema.sql
+│   └── queries.sql
+└── main.py
+```
+
+## 🔧 Technical Requirements
+
+### Required Software
+- Python 3.8+
+- PostgreSQL 12+
+- Git
+
+### Python Libraries
+```
+pandas>=1.3.0
+sqlalchemy>=1.4.0
+psycopg2-binary>=2.9.0
+requests>=2.26.0
+pyyaml>=5.4.0
+python-dotenv>=0.19.0
+pytest>=7.0.0
+```
+
+## 📝 Implementation Steps
+
+### Phase 1: Setup (Week 1)
+- [ ] Set up project structure
+- [ ] Install dependencies
+- [ ] Set up database
+- [ ] Create configuration files
+- [ ] Set up logging
+
+### Phase 2: Extract (Week 1-2)
+- [ ] Implement CSV extractor
+- [ ] Implement JSON extractor
+- [ ] Implement API extractor
+- [ ] Handle extraction errors
+- [ ] Write extraction tests
+
+### Phase 3: Transform (Week 2-3)
+- [ ] Implement data validation
+- [ ] Implement data cleaning
+- [ ] Implement transformations
+- [ ] Add data quality checks
+- [ ] Write transformation tests
+
+### Phase 4: Load (Week 3)
+- [ ] Create database schema
+- [ ] Implement database loader
+- [ ] Handle loading errors
+- [ ] Implement upsert logic
+- [ ] Write loading tests
+
+### Phase 5: Integration (Week 4)
+- [ ] Connect all components
+- [ ] Implement orchestration
+- [ ] Add comprehensive logging
+- [ ] Handle end-to-end errors
+- [ ] Write integration tests
+
+### Phase 6: Production Ready (Week 4)
+- [ ] Add configuration management
+- [ ] Implement monitoring
+- [ ] Add scheduling capability
+- [ ] Create documentation
+- [ ] Performance optimization
+
+## 🧪 Testing Strategy
+
+### Unit Tests
+Test individual components:
+- Extractors
+- Transformers
+- Loaders
+- Utilities
+
+### Integration Tests
+Test component interactions:
+- Extract → Transform
+- Transform → Load
+- End-to-end pipeline
+
+### Data Quality Tests
+Validate data:
+- Schema validation
+- Data type checks
+- Null value checks
+- Duplicate detection
+- Business rule validation
+
+## 📊 Database Schema
+
+### Staging Tables
+```sql
+CREATE TABLE staging_orders (
+    order_id VARCHAR(50),
+    customer_id VARCHAR(50),
+    order_date VARCHAR(50),
+    total_amount VARCHAR(50),
+    status VARCHAR(50),
+    loaded_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
+);
+```
+
+### Final Tables
+```sql
+CREATE TABLE orders (
+    order_id INTEGER PRIMARY KEY,
+    customer_id INTEGER REFERENCES customers(customer_id),
+    order_date DATE NOT NULL,
+    total_amount DECIMAL(10,2) NOT NULL,
+    status VARCHAR(20) NOT NULL,
+    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
+    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
+);
+
+CREATE TABLE customers (
+    customer_id INTEGER PRIMARY KEY,
+    name VARCHAR(100) NOT NULL,
+    email VARCHAR(100) UNIQUE NOT NULL,
+    signup_date DATE,
+    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
+);
+
+CREATE TABLE products (
+    product_id VARCHAR(50) PRIMARY KEY,
+    name VARCHAR(200) NOT NULL,
+    category VARCHAR(50),
+    price DECIMAL(10,2) NOT NULL,
+    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
+);
+```
+
+## 🔍 Data Quality Checks
+
+1. **Completeness**: All required fields present
+2. **Validity**: Data types and formats correct
+3. **Consistency**: Cross-field validation
+4. **Accuracy**: Values within expected ranges
+5. **Uniqueness**: No duplicate keys
+6. **Timeliness**: Data is current
+
+## 📈 Monitoring and Logging
+
+### Log Levels
+- **INFO**: Pipeline start/stop, phase transitions
+- **WARNING**: Data quality issues, missing data
+- **ERROR**: Processing failures
+- **DEBUG**: Detailed processing information
+
+### Metrics to Track
+- Records processed
+- Records failed
+- Processing time
+- Data quality scores
+- Error rates
+
+## 🚀 Running the Pipeline
+
+### Setup
+```bash
+# Clone repository
+git clone <repo-url>
+cd etl-pipeline
+
+# Create virtual environment
+python -m venv venv
+source venv/bin/activate  # or venv\Scripts\activate on Windows
+
+# Install dependencies
+pip install -r requirements.txt
+
+# Set up database
+psql -U postgres -f sql/schema.sql
+
+# Configure
+cp config/config.example.yaml config/config.yaml
+# Edit config.yaml with your settings
+```
+
+### Execution
+```bash
+# Run full pipeline
+python main.py
+
+# Run specific phase
+python main.py --phase extract
+python main.py --phase transform
+python main.py --phase load
+
+# Run with date range
+python main.py --start-date 2024-01-01 --end-date 2024-01-31
+
+# Dry run (no database writes)
+python main.py --dry-run
+```
+
+## 📖 Documentation Requirements
+
+1. **README**: Project overview and setup
+2. **Architecture Diagram**: System design
+3. **API Documentation**: If creating APIs
+4. **Configuration Guide**: How to configure
+5. **Troubleshooting**: Common issues and solutions
+6. **Code Comments**: Inline documentation
+
+## 🎯 Success Criteria
+
+- [ ] Pipeline processes all three data sources
+- [ ] Data quality checks are implemented
+- [ ] Errors are handled gracefully
+- [ ] Logging provides adequate information
+- [ ] Tests achieve >80% coverage
+- [ ] Documentation is complete
+- [ ] Code follows Python best practices
+- [ ] Pipeline runs without manual intervention
+
+## 🌟 Bonus Features
+
+- **Incremental Loading**: Only process new/changed data
+- **Parallel Processing**: Process multiple files simultaneously
+- **Email Notifications**: Alert on failures
+- **Dashboard**: Visualize pipeline metrics
+- **Containerization**: Package in Docker
+- **Cloud Deployment**: Deploy to AWS/GCP/Azure
+
+## 📚 Resources
+
+- [SQLAlchemy Documentation](https://docs.sqlalchemy.org/)
+- [Pandas Data Validation](https://pandas.pydata.org/docs/user_guide/indexing.html)
+- [Python Logging Best Practices](https://docs.python.org/3/howto/logging.html)
+- [PostgreSQL Documentation](https://www.postgresql.org/docs/)
+
+## 🤝 Getting Help
+
+- Review previous lessons on ETL
+- Check Stack Overflow for specific errors
+- Refer to library documentation
+- Ask in data engineering communities
+
+## 📝 Submission Guidelines
+
+When completed, your repository should include:
+1. All source code
+2. Requirements file
+3. Database schema
+4. Sample data (or data generator)
+5. Test suite
+6. Complete documentation
+7. Demo video (optional)
+
+Good luck building your ETL pipeline!
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
new file mode 100644
index 0000000..c812f5d
--- /dev/null
+++ b/CONTRIBUTING.md
@@ -0,0 +1,146 @@
+# Contributing to Data Engineer Learning Path
+
+Thank you for your interest in contributing! This document provides guidelines for contributing to this learning repository.
+
+## How to Contribute
+
+### Reporting Issues
+- Check if the issue already exists
+- Provide a clear description
+- Include relevant details (lesson number, code snippets, etc.)
+
+### Suggesting Improvements
+- Open an issue with the "enhancement" label
+- Describe the improvement clearly
+- Explain why it would be valuable
+
+### Adding Content
+
+#### Adding Lessons
+1. Fork the repository
+2. Create a new branch (`git checkout -b add-lesson-topic`)
+3. Add your lesson in the appropriate directory
+4. Follow the existing lesson format
+5. Include examples and exercises
+6. Update the section README.md
+7. Submit a pull request
+
+#### Adding Examples
+- Place in the appropriate `examples/` folder
+- Include comments explaining the code
+- Make sure code runs without errors
+- Add a docstring at the top of the file
+
+#### Adding Exercises
+- Place in the appropriate `exercises/` folder
+- Include clear problem description
+- Provide sample input/output
+- Include solution in a separate file (e.g., `exercise_name_solution.py`)
+
+## Code Style Guidelines
+
+### Python Code
+- Follow PEP 8 style guide
+- Use meaningful variable names
+- Add comments for complex logic
+- Include docstrings for functions
+- Keep functions focused and small
+
+### SQL Code
+- Use uppercase for SQL keywords
+- Indent subqueries
+- Add comments explaining complex queries
+- Format for readability
+
+### Markdown
+- Use headers appropriately (# for main title, ## for sections, etc.)
+- Include code blocks with language specification
+- Add blank lines between sections
+- Use bullet points for lists
+
+## Content Guidelines
+
+### Lessons
+- Start with clear learning objectives
+- Progress from simple to complex
+- Include practical examples
+- End with exercises or a project
+- Link to additional resources
+
+### Examples
+- Should be runnable without modification (when possible)
+- Include error handling
+- Demonstrate best practices
+- Keep focused on one concept
+
+### Exercises
+- Should reinforce lesson concepts
+- Provide varying difficulty levels
+- Include hints for difficult problems
+- Solution should include explanation
+
+## Pull Request Process
+
+1. **Create a descriptive PR title**
+   - Good: "Add lesson on pandas groupby operations"
+   - Bad: "Update files"
+
+2. **Describe your changes**
+   - What was added/changed
+   - Why the change is needed
+   - Any relevant context
+
+3. **Ensure quality**
+   - Code runs without errors
+   - Markdown renders correctly
+   - No typos or grammar issues
+   - Links work correctly
+
+4. **Wait for review**
+   - Respond to feedback
+   - Make requested changes
+   - Be patient and respectful
+
+## Testing Your Contributions
+
+### Python Code
+```bash
+# Run the code to ensure it works
+python your_script.py
+
+# Check for syntax errors
+python -m py_compile your_script.py
+```
+
+### SQL Code
+- Test queries in a database
+- Verify results are correct
+- Check for syntax errors
+
+### Markdown
+- Preview in VS Code or GitHub
+- Verify links work
+- Check formatting
+
+## Code of Conduct
+
+- Be respectful and inclusive
+- Welcome newcomers
+- Provide constructive feedback
+- Focus on the content, not the person
+- Help create a positive learning environment
+
+## Questions?
+
+If you have questions about contributing:
+1. Check existing issues and discussions
+2. Open a new issue with your question
+3. Tag it with "question"
+
+## Recognition
+
+Contributors will be acknowledged in the repository. Thank you for helping others learn!
+
+## License
+
+By contributing, you agree that your contributions will be licensed under the same license as this project (MIT License).
diff --git a/FAQ.md b/FAQ.md
new file mode 100644
index 0000000..19f2002
--- /dev/null
+++ b/FAQ.md
@@ -0,0 +1,232 @@
+# Frequently Asked Questions (FAQ)
+
+## General Questions
+
+### Q: Do I need prior programming experience?
+**A:** No! This learning path starts from the basics. However, basic computer literacy is expected.
+
+### Q: How long will it take to complete?
+**A:** It depends on your pace:
+- **Full-time (40 hrs/week)**: 3-4 months
+- **Part-time (15 hrs/week)**: 6-9 months
+- **Casual (5 hrs/week)**: 12+ months
+
+### Q: Is this learning path free?
+**A:** Yes! All materials in this repository are free. However, some recommended resources (books, courses) may have costs.
+
+### Q: What's the job market like for data engineers?
+**A:** Data engineering is in high demand with competitive salaries. Entry-level positions typically require portfolio projects and internship experience.
+
+### Q: Can I skip sections?
+**A:** Not recommended. Each section builds on previous ones. However, if you already know Python, you can move through it quickly.
+
+## Technical Questions
+
+### Q: Which Python version should I use?
+**A:** Python 3.8 or higher. We recommend using the latest stable version (3.11 or 3.12).
+
+### Q: Windows, Mac, or Linux?
+**A:** Any! All examples work on all platforms. Linux is common in production, but start with what you have.
+
+### Q: SQLite or PostgreSQL?
+**A:** Start with SQLite (easier setup), then move to PostgreSQL (more features, production-ready).
+
+### Q: Do I need a powerful computer?
+**A:** No. Basic specs are fine:
+- 4GB RAM minimum (8GB recommended)
+- 20GB free disk space
+- Any modern processor
+
+### Q: How do I install Python?
+**A:** See [Getting Started Guide](GETTING_STARTED.md) and the [Python Installation Lesson](01-python-fundamentals/lessons/01-getting-started.md).
+
+## Learning Questions
+
+### Q: I'm stuck on an exercise. What should I do?
+**A:** 
+1. Read the error message carefully
+2. Review the relevant lesson
+3. Search for the error online
+4. Ask in communities (provide details)
+5. Check the solution (as last resort)
+
+### Q: How much time should I spend daily?
+**A:** 
+- **Minimum**: 30 minutes (to maintain consistency)
+- **Ideal**: 1-2 hours
+- **Quality over quantity**: Focused 1 hour beats distracted 3 hours
+
+### Q: Should I take notes?
+**A:** Yes! Taking notes helps retention. Keep a learning journal to track progress and challenges.
+
+### Q: When should I start building projects?
+**A:** Start small projects early! Even simple programs help you learn. Complete the capstone projects after finishing relevant sections.
+
+### Q: How do I know if I'm ready to move to the next section?
+**A:** You should be able to:
+- Explain key concepts
+- Complete most exercises independently
+- Build a small project using the skills
+
+## Career Questions
+
+### Q: What jobs can I get after completing this?
+**A:** 
+- Junior Data Engineer
+- ETL Developer
+- Data Pipeline Engineer
+- Analytics Engineer
+- BI Developer (with additional skills)
+
+### Q: What's the typical salary?
+**A:** Varies by location and experience:
+- **Entry-level**: $60k-$90k
+- **Mid-level**: $90k-$130k
+- **Senior**: $130k-$180k+
+(US market, adjust for your location)
+
+### Q: Do I need a degree?
+**A:** Not always. Many companies hire based on skills and portfolio. However, some companies require a degree. A strong portfolio can compensate.
+
+### Q: Should I get certified?
+**A:** Certifications can help but aren't required. Focus on:
+1. Building strong portfolio
+2. Understanding concepts deeply
+3. Then consider certifications for specific tools/platforms
+
+### Q: How important is the capstone project?
+**A:** Very! Employers want to see you can build real systems. Quality projects in your portfolio are crucial.
+
+## Tool Questions
+
+### Q: VS Code or PyCharm?
+**A:** Either works great:
+- **VS Code**: Free, lightweight, extensible
+- **PyCharm**: More Python-specific features
+- Try both, use what feels better
+
+### Q: Do I need to learn Docker?
+**A:** Eventually, yes. It's covered in advanced topics. But master the basics first.
+
+### Q: Should I learn AWS, GCP, or Azure?
+**A:** Learn cloud concepts first, then pick one:
+- **AWS**: Most popular, lots of resources
+- **GCP**: Strong data/ML tools
+- **Azure**: Good for enterprise
+Start with one, concepts transfer to others.
+
+### Q: What about Apache Spark?
+**A:** Important for big data, but not required initially. It's covered in advanced topics after you're comfortable with Python and SQL.
+
+## Practice Questions
+
+### Q: Where can I practice SQL?
+**A:** 
+- **LeetCode**: Database section
+- **HackerRank**: SQL challenges
+- **SQLZoo**: Interactive tutorials
+- **Mode Analytics**: SQL tutorials with real data
+
+### Q: Where can I practice Python?
+**A:**
+- **LeetCode**: Python problems
+- **HackerRank**: Python track
+- **Codewars**: Community challenges
+- **Exercism**: Mentor-supported practice
+
+### Q: How do I get real datasets to practice?
+**A:**
+- **Kaggle**: Thousands of datasets
+- **data.gov**: Government data
+- **GitHub**: Awesome datasets repositories
+- **APIs**: Public APIs for real-time data
+
+## Troubleshooting
+
+### Q: My code isn't working but I don't see an error
+**A:** 
+- Check indentation (Python is sensitive)
+- Verify variable names (case-sensitive)
+- Add print statements to debug
+- Use Python debugger (pdb)
+
+### Q: I get "ModuleNotFoundError"
+**A:**
+```bash
+# Install the missing module
+pip install module_name
+
+# Make sure you're in the right virtual environment
+which python  # Should show your venv path
+```
+
+### Q: PostgreSQL connection fails
+**A:**
+- Is PostgreSQL running? `sudo service postgresql status`
+- Check connection details (host, port, password)
+- Verify database exists
+- Check firewall settings
+
+### Q: Git push fails
+**A:**
+- Check you're on correct branch: `git branch`
+- Pull first: `git pull origin branch_name`
+- Verify credentials
+- Check repository permissions
+
+## Community Questions
+
+### Q: Where can I get help?
+**A:** 
+- **Reddit**: r/learnprogramming, r/dataengineering
+- **Discord**: Python Discord, DataTalks.Club
+- **Stack Overflow**: Ask specific questions
+- **GitHub Issues**: For problems with this repo
+
+### Q: How can I contribute?
+**A:** See [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines. Contributions are welcome!
+
+### Q: Can I share my solutions?
+**A:** Yes! Sharing helps others learn. Fork the repo, add your solutions, share on social media.
+
+### Q: Is there a study group?
+**A:** Check the repository discussions or create your own! Many learners find study partners in Discord servers.
+
+## Next Steps Questions
+
+### Q: I finished everything. What's next?
+**A:** Congratulations! 🎉
+1. Build more projects
+2. Contribute to open source
+3. Learn advanced topics (Airflow, Spark, etc.)
+4. Prepare for interviews
+5. Apply for jobs
+6. Keep learning!
+
+### Q: What should I learn after this?
+**A:** Depends on your interests:
+- **Big Data**: Apache Spark, Hadoop
+- **Cloud**: Deep dive into AWS/GCP/Azure
+- **Orchestration**: Apache Airflow, Prefect
+- **Streaming**: Kafka, Flink
+- **ML Engineering**: MLOps, model deployment
+
+### Q: How do I prepare for interviews?
+**A:**
+1. Review system design
+2. Practice SQL and Python problems
+3. Prepare to explain your projects
+4. Study common data engineering patterns
+5. Research the company
+6. Practice behavioral questions
+
+## Still Have Questions?
+
+- **Open an Issue**: For technical problems with the repository
+- **Start a Discussion**: For learning questions and sharing
+- **Join Communities**: Connect with other learners
+- **Contact**: Check repository for contact information
+
+---
+
+Remember: Every expert was once a beginner asking these same questions. Don't be afraid to ask for help!
diff --git a/GETTING_STARTED.md b/GETTING_STARTED.md
new file mode 100644
index 0000000..5ae9a84
--- /dev/null
+++ b/GETTING_STARTED.md
@@ -0,0 +1,358 @@
+# Getting Started with Data Engineer Learning Path
+
+Welcome! This guide will help you get started on your journey to becoming a data engineer.
+
+## 🎯 Is This For You?
+
+This learning path is perfect if you:
+- Want to become a data engineer
+- Have basic computer skills
+- Are willing to learn and practice regularly
+- Can dedicate 10-15 hours per week
+- Enjoy working with data and solving problems
+
+No prior programming experience is required, but it's helpful!
+
+## 📅 Time Commitment
+
+### Full-Time Study (40 hours/week)
+- Complete path in 3-4 months
+- Intensive learning
+- Quick career transition
+
+### Part-Time Study (10-15 hours/week)
+- Complete path in 6-9 months
+- Balance with work/life
+- Sustainable pace
+
+### Casual Learning (5-10 hours/week)
+- Complete path in 12+ months
+- Flexible schedule
+- At your own pace
+
+## 🗺️ Your Learning Journey
+
+### Month 1-2: Python Foundations
+**Goal**: Learn Python basics
+
+**What you'll do**:
+- Install Python and VS Code
+- Learn variables, loops, functions
+- Write simple programs
+- Complete exercises
+
+**Time**: 2-4 hours/day
+
+**Milestone**: Build a command-line todo app
+
+### Month 3-4: Python for Data
+**Goal**: Use Python for data tasks
+
+**What you'll do**:
+- Learn Pandas for data manipulation
+- Work with CSV, JSON, Excel files
+- Make API requests
+- Process real datasets
+
+**Time**: 2-4 hours/day
+
+**Milestone**: Build data cleaning script
+
+### Month 3-5: SQL Fundamentals
+**Goal**: Master database queries
+
+**What you'll do** (overlaps with Python):
+- Learn SELECT, JOIN, GROUP BY
+- Practice with real databases
+- Write complex queries
+- Optimize query performance
+
+**Time**: 1-2 hours/day
+
+**Milestone**: Design and query your own database
+
+### Month 5-7: Data Engineering Concepts
+**Goal**: Understand data pipelines
+
+**What you'll do**:
+- Learn ETL processes
+- Build data pipelines
+- Implement data quality checks
+- Use version control
+
+**Time**: 2-3 hours/day
+
+**Milestone**: Build complete ETL pipeline
+
+### Month 7-9: Advanced Topics
+**Goal**: Learn production tools
+
+**What you'll do**:
+- Docker basics
+- Cloud platforms intro
+- Testing and CI/CD
+- Workflow orchestration
+
+**Time**: 2-3 hours/day
+
+**Milestone**: Deploy containerized pipeline
+
+### Month 9+: Projects & Job Search
+**Goal**: Build portfolio and find job
+
+**What you'll do**:
+- Complete capstone projects
+- Build portfolio
+- Prepare for interviews
+- Apply for jobs
+
+## 🚀 Week 1 Action Plan
+
+### Day 1: Setup
+- [ ] Install Python 3.8+
+- [ ] Install VS Code
+- [ ] Install Git
+- [ ] Create GitHub account
+- [ ] Clone this repository
+
+### Day 2: Python Basics
+- [ ] Read "Getting Started with Python" lesson
+- [ ] Write your first Python program
+- [ ] Complete basic syntax exercises
+- [ ] Watch a Python tutorial video
+
+### Day 3: More Python
+- [ ] Learn about variables and data types
+- [ ] Practice with numbers and strings
+- [ ] Do 5 coding exercises
+- [ ] Start a learning journal
+
+### Day 4: Control Flow
+- [ ] Learn if/else statements
+- [ ] Learn loops (for, while)
+- [ ] Write programs using control flow
+- [ ] Complete 5 more exercises
+
+### Day 5: Functions
+- [ ] Learn to write functions
+- [ ] Understand parameters and return values
+- [ ] Practice with function exercises
+- [ ] Refactor previous code into functions
+
+### Day 6: Practice & Review
+- [ ] Review all concepts from the week
+- [ ] Complete remaining exercises
+- [ ] Start a small project
+- [ ] Join a coding community
+
+### Day 7: Rest & Plan
+- [ ] Review your progress
+- [ ] Plan next week
+- [ ] Read about SQL
+- [ ] Set up PostgreSQL (optional)
+
+## 💻 Required Software Setup
+
+### 1. Python
+```bash
+# Verify installation
+python --version  # Should be 3.8+
+pip --version
+```
+
+### 2. Code Editor (VS Code)
+- Download from [code.visualstudio.com](https://code.visualstudio.com/)
+- Install Python extension
+- Install Git extension
+
+### 3. Git
+```bash
+# Verify installation
+git --version
+```
+
+### 4. Database (Start with SQLite, add PostgreSQL later)
+```bash
+# SQLite is built into Python
+python -c "import sqlite3; print('SQLite ready!')"
+```
+
+## 📖 Daily Study Routine
+
+### Option 1: Morning Learner (Before Work)
+- **6:00-7:00 AM**: Study new concepts
+- **7:00-7:30 AM**: Practice exercises
+- **Evening**: Review and practice (30 min)
+
+### Option 2: Evening Learner (After Work)
+- **7:00-8:00 PM**: Study new concepts
+- **8:00-9:00 PM**: Practice and exercises
+- **Weekend**: Projects and review
+
+### Option 3: Full-Time Student
+- **9:00-11:00 AM**: Study new material
+- **11:00-12:00 PM**: Practice exercises
+- **1:00-3:00 PM**: Project work
+- **3:00-4:00 PM**: Review and community
+
+## 📚 Learning Resources
+
+### Primary: This Repository
+Follow the structured path here
+
+### Supplementary
+- **Video**: YouTube Python tutorials
+- **Practice**: LeetCode, HackerRank
+- **Community**: Reddit r/learnprogramming
+- **Documentation**: Official Python docs
+
+## 🎯 Setting Goals
+
+### Short-Term (1-2 weeks)
+- Complete Python basics
+- Write 10 simple programs
+- Join online community
+
+### Medium-Term (1-3 months)
+- Complete Python and SQL sections
+- Build 3 small projects
+- Start GitHub portfolio
+
+### Long-Term (6-12 months)
+- Complete all sections
+- Build 3 capstone projects
+- Land data engineering job
+
+## 📝 Tracking Progress
+
+### Keep a Learning Journal
+```markdown
+# Date: 2024-01-15
+## What I Learned
+- Python functions
+- Return values
+- Default parameters
+
+## What I Built
+- Calculator program
+- Temperature converter
+
+## Challenges
+- Understanding scope
+- Debugging errors
+
+## Tomorrow's Goals
+- Learn about lists
+- Practice with data structures
+```
+
+### Use GitHub
+- Commit code daily
+- Track your streak
+- Build your portfolio
+- Show your progress
+
+## 🤝 Getting Help
+
+### When Stuck
+1. Read error messages carefully
+2. Check documentation
+3. Search Stack Overflow
+4. Ask in communities
+5. Take a break, come back fresh
+
+### Communities
+- **Reddit**: r/learnprogramming, r/dataengineering
+- **Discord**: Python Discord, DataTalks.Club
+- **Stack Overflow**: Ask and answer questions
+- **LinkedIn**: Connect with data engineers
+
+## 💡 Study Tips
+
+### Effective Learning
+1. **Code Every Day**: Even 30 minutes
+2. **Type, Don't Copy**: Type all examples
+3. **Build Projects**: Apply what you learn
+4. **Teach Others**: Explain concepts
+5. **Review Regularly**: Revisit old topics
+6. **Take Breaks**: Rest is important
+7. **Stay Consistent**: Daily practice beats cramming
+
+### Avoid Common Pitfalls
+- ❌ Tutorial hell (watching without doing)
+- ❌ Learning too many things at once
+- ❌ Skipping fundamentals
+- ❌ Not practicing enough
+- ❌ Giving up when stuck
+- ❌ Comparing to others
+
+### Do This Instead
+- ✅ Build while learning
+- ✅ Focus on one topic at a time
+- ✅ Master basics first
+- ✅ Practice daily
+- ✅ Embrace challenges
+- ✅ Track your own progress
+
+## 🎓 Study Groups
+
+### Find Study Partners
+- Local meetups
+- Online study groups
+- Discord servers
+- LinkedIn groups
+
+### Start Your Own
+- Invite friends to learn
+- Set weekly goals
+- Share progress
+- Support each other
+
+## 📊 Progress Milestones
+
+### Beginner
+- ✅ Installed all required software
+- ✅ Written first Python program
+- ✅ Completed 10 exercises
+- ✅ Built first small project
+
+### Intermediate
+- ✅ Comfortable with Python basics
+- ✅ Can write SQL queries
+- ✅ Built data processing script
+- ✅ Used Git and GitHub
+
+### Advanced
+- ✅ Built complete ETL pipeline
+- ✅ Understand databases well
+- ✅ Can use cloud services
+- ✅ Ready for job interviews
+
+## 🚀 Ready to Start?
+
+1. **Star this repository** on GitHub
+2. **Fork it** to your account
+3. **Clone** to your computer
+4. **Start with** 01-python-fundamentals
+5. **Code along** with examples
+6. **Complete** exercises
+7. **Build** projects
+8. **Share** your progress
+
+## 📞 Questions?
+
+- Open an issue in this repository
+- Join our community discussions
+- Check the FAQ (coming soon)
+
+## 🎉 Welcome!
+
+You're at the beginning of an exciting journey. Data engineering is a rewarding career with great opportunities. Take it one step at a time, practice consistently, and don't give up when things get challenging.
+
+**Remember**: Every expert was once a beginner. You can do this!
+
+---
+
+Ready to begin? Head over to **[01-python-fundamentals](01-python-fundamentals/README.md)** and start learning!
+
+**Happy Learning! 🚀**
diff --git a/LICENSE b/LICENSE
new file mode 100644
index 0000000..49ba495
--- /dev/null
+++ b/LICENSE
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2025 Data Engineer Learning Path Contributors
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
diff --git a/README.md b/README.md
index b068193..62c7cd5 100644
--- a/README.md
+++ b/README.md
@@ -1,2 +1,231 @@
-# data_engineer_learning_python_sql_path
-data_engineer_learning_python_sql_path
+# Data Engineer Learning Path - Python & SQL
+
+A comprehensive learning path for aspiring data engineers, covering essential Python programming and SQL database skills needed for modern data engineering roles.
+
+## 📚 Table of Contents
+
+- [Overview](#overview)
+- [Getting Started](#getting-started)
+- [Learning Path](#learning-path)
+- [Prerequisites](#prerequisites)
+- [Repository Structure](#repository-structure)
+- [How to Use This Repository](#how-to-use-this-repository)
+- [Resources](#resources)
+- [FAQ](#faq)
+- [Contributing](#contributing)
+
+## 🎯 Overview
+
+This repository provides a structured learning path for data engineers, focusing on:
+- **Python Programming**: From basics to advanced data manipulation
+- **SQL Databases**: Query writing, optimization, and database design
+- **Data Engineering Concepts**: ETL/ELT, data pipelines, and data warehousing
+- **Practical Projects**: Hands-on exercises and real-world scenarios
+
+## 🚀 Getting Started
+
+**New to programming or data engineering?** Start here:
+
+👉 **[GETTING STARTED GUIDE](GETTING_STARTED.md)** - Your roadmap to begin learning
+
+This guide includes:
+- Time commitments and study plans
+- Week 1 action plan
+- Software setup instructions
+- Daily study routines
+- Tips for success
+
+## 🛣️ Learning Path
+
+### Phase 1: Python Fundamentals (2-4 weeks)
+- [ ] Python basics: variables, data types, control structures
+- [ ] Functions and modules
+- [ ] Object-oriented programming
+- [ ] Error handling and debugging
+- [ ] File I/O operations
+
+### Phase 2: Python for Data Engineering (4-6 weeks)
+- [ ] Data structures: lists, dictionaries, sets, tuples
+- [ ] Working with libraries: NumPy, Pandas
+- [ ] Data manipulation and transformation
+- [ ] API interactions and web scraping
+- [ ] Working with CSV, JSON, and XML files
+
+### Phase 3: SQL Fundamentals (3-4 weeks)
+- [ ] Basic SQL queries: SELECT, WHERE, ORDER BY
+- [ ] Joins: INNER, LEFT, RIGHT, FULL OUTER
+- [ ] Aggregate functions and GROUP BY
+- [ ] Subqueries and CTEs (Common Table Expressions)
+- [ ] Window functions
+
+### Phase 4: Advanced SQL (3-4 weeks)
+- [ ] Database design and normalization
+- [ ] Indexes and query optimization
+- [ ] Stored procedures and functions
+- [ ] Transactions and ACID properties
+- [ ] Working with different SQL databases (PostgreSQL, MySQL, SQLite)
+
+### Phase 5: Data Engineering Concepts (4-6 weeks)
+- [ ] ETL vs ELT processes
+- [ ] Data pipelines and orchestration
+- [ ] Data warehousing concepts
+- [ ] Data quality and validation
+- [ ] Version control with Git
+
+### Phase 6: Advanced Topics (6-8 weeks)
+- [ ] Working with big data tools (introduction to Spark)
+- [ ] Cloud platforms (AWS, GCP, Azure basics)
+- [ ] Data streaming concepts
+- [ ] Docker and containerization
+- [ ] Testing and CI/CD for data pipelines
+
+### Phase 7: Practical Projects
+- [ ] Build an ETL pipeline
+- [ ] Create a data warehouse
+- [ ] Implement data quality checks
+- [ ] Build a dashboard with real-time data
+
+## 📋 Prerequisites
+
+- Basic computer literacy
+- Understanding of basic programming concepts (helpful but not required)
+- Willingness to learn and practice regularly
+- A computer with internet access
+
+### Required Software
+- Python 3.8 or higher
+- A SQL database (PostgreSQL recommended, SQLite for beginners)
+- Git for version control
+- A code editor (VS Code, PyCharm, or similar)
+
+## 📁 Repository Structure
+
+```
+.
+├── 01-python-fundamentals/       # Python basics and fundamentals
+│   ├── lessons/                  # Theory and explanations
+│   ├── examples/                 # Code examples
+│   └── exercises/                # Practice exercises
+│
+├── 02-python-data-engineering/   # Python for data tasks
+│   ├── lessons/
+│   ├── examples/
+│   └── exercises/
+│
+├── 03-sql-fundamentals/          # Basic SQL concepts
+│   ├── lessons/
+│   ├── queries/                  # SQL query examples
+│   └── exercises/
+│
+├── 04-advanced-sql/              # Advanced SQL topics
+│   ├── lessons/
+│   ├── queries/
+│   └── exercises/
+│
+├── 05-data-engineering/          # Data engineering concepts
+│   ├── lessons/
+│   ├── projects/
+│   └── exercises/
+│
+├── 06-advanced-topics/           # Advanced data engineering
+│   ├── lessons/
+│   ├── projects/
+│   └── exercises/
+│
+├── 07-projects/                  # Capstone projects
+│   ├── etl-pipeline/
+│   ├── data-warehouse/
+│   └── real-time-dashboard/
+│
+└── resources/                    # Additional resources
+    ├── books.md
+    ├── courses.md
+    └── tools.md
+```
+
+## 🚀 How to Use This Repository
+
+1. **Clone the repository**
+   ```bash
+   git clone https://github.com/fabianomalves/data_engineer_learning_python_sql_path.git
+   cd data_engineer_learning_python_sql_path
+   ```
+
+2. **Install dependencies** (when needed)
+   ```bash
+   # Create a virtual environment
+   python -m venv venv
+   source venv/bin/activate  # On Windows: venv\Scripts\activate
+   
+   # Install required packages
+   pip install -r requirements.txt
+   ```
+
+3. **Follow the learning path sequentially**
+   - Start with Phase 1 and progress through each phase
+   - Complete exercises before moving to the next section
+   - Work on projects to apply your knowledge
+
+4. **Practice regularly**
+   - Code daily, even if just for 30 minutes
+   - Review concepts regularly
+   - Build your own projects alongside the provided ones
+
+5. **Track your progress**
+   - Check off completed topics in the learning path
+   - Keep a learning journal
+   - Share your progress and projects
+
+## 📖 Resources
+
+### Books
+- "Python for Data Analysis" by Wes McKinney
+- "SQL Performance Explained" by Markus Winand
+- "Designing Data-Intensive Applications" by Martin Kleppmann
+
+### Online Platforms
+- DataCamp
+- Coursera
+- LeetCode (for SQL practice)
+- HackerRank
+
+### Documentation
+- [Python Official Documentation](https://docs.python.org/3/)
+- [PostgreSQL Documentation](https://www.postgresql.org/docs/)
+- [Pandas Documentation](https://pandas.pydata.org/docs/)
+
+## ❓ FAQ
+
+Have questions? Check out our **[Frequently Asked Questions](FAQ.md)** covering:
+- Getting started
+- Technical setup
+- Learning strategies
+- Career advice
+- Troubleshooting
+
+## 🤝 Contributing
+
+Contributions are welcome! If you'd like to contribute:
+
+1. Fork the repository
+2. Create a new branch (`git checkout -b feature/improvement`)
+3. Make your changes
+4. Commit your changes (`git commit -am 'Add new feature'`)
+5. Push to the branch (`git push origin feature/improvement`)
+6. Create a Pull Request
+
+See [CONTRIBUTING.md](CONTRIBUTING.md) for detailed guidelines.
+
+## 📝 License
+
+This project is open source and available under the MIT License.
+
+## ⭐ Acknowledgments
+
+This learning path is designed to help aspiring data engineers build a strong foundation in Python and SQL, the two most essential skills for modern data engineering roles.
+
+---
+
+**Happy Learning! 🚀**
+
+For questions or discussions, please open an issue in this repository.
diff --git a/requirements.txt b/requirements.txt
new file mode 100644
index 0000000..dab89ea
--- /dev/null
+++ b/requirements.txt
@@ -0,0 +1,34 @@
+# Python Data Engineering Learning Path Requirements
+
+# Core data manipulation libraries
+pandas>=1.5.0
+numpy>=1.24.0
+
+# Database connectivity
+sqlalchemy>=2.0.0
+psycopg2-binary>=2.9.0
+
+# File format support
+openpyxl>=3.1.0  # Excel files
+pyarrow>=12.0.0  # Parquet files
+
+# API and web
+requests>=2.31.0
+
+# Configuration
+python-dotenv>=1.0.0
+pyyaml>=6.0
+
+# Testing
+pytest>=7.4.0
+pytest-cov>=4.1.0
+
+# Data validation
+great-expectations>=0.17.0
+
+# Utilities
+python-dateutil>=2.8.0
+
+# Optional: For advanced examples
+# apache-airflow>=2.7.0  # Uncomment if learning Airflow
+# pyspark>=3.4.0  # Uncomment if learning Spark
diff --git a/resources/books.md b/resources/books.md
new file mode 100644
index 0000000..2ca34e1
--- /dev/null
+++ b/resources/books.md
@@ -0,0 +1,134 @@
+# Recommended Books for Data Engineers
+
+## Python Programming
+
+### Beginner
+1. **"Python Crash Course" by Eric Matthes**
+   - Perfect for complete beginners
+   - Hands-on projects
+   - Clear explanations
+
+2. **"Automate the Boring Stuff with Python" by Al Sweigart**
+   - Practical Python applications
+   - Free to read online
+   - Great for automation tasks
+
+### Intermediate
+3. **"Python for Data Analysis" by Wes McKinney**
+   - Written by the creator of Pandas
+   - Essential for data manipulation
+   - Real-world examples
+
+4. **"Fluent Python" by Luciano Ramalho**
+   - Deep dive into Python
+   - Best practices
+   - Advanced features
+
+## SQL and Databases
+
+5. **"SQL Performance Explained" by Markus Winand**
+   - Query optimization
+   - Index strategies
+   - Database-agnostic
+
+6. **"Learning SQL" by Alan Beaulieu**
+   - Comprehensive introduction
+   - Practical examples
+   - Covers MySQL
+
+7. **"PostgreSQL: Up and Running" by Regina Obe and Leo Hsu**
+   - PostgreSQL specific
+   - Quick start guide
+   - Best practices
+
+## Data Engineering
+
+8. **"Designing Data-Intensive Applications" by Martin Kleppmann**
+   - Must-read for data engineers
+   - Covers fundamental concepts
+   - Architecture patterns
+
+9. **"Fundamentals of Data Engineering" by Joe Reis and Matt Housley**
+   - Modern data engineering practices
+   - Lifecycle approach
+   - Tool-agnostic
+
+10. **"The Data Warehouse Toolkit" by Ralph Kimball and Margy Ross**
+    - Dimensional modeling
+    - Data warehouse design
+    - Industry standard
+
+11. **"Data Pipelines Pocket Reference" by James Densmore**
+    - Quick reference guide
+    - Pipeline patterns
+    - Best practices
+
+## System Design and Architecture
+
+12. **"Building Microservices" by Sam Newman**
+    - Microservices architecture
+    - Relevant for distributed systems
+    - Practical guidance
+
+13. **"Site Reliability Engineering" by Google**
+    - Production systems
+    - Monitoring and reliability
+    - Free online
+
+## Additional Topics
+
+14. **"The Pragmatic Programmer" by David Thomas and Andrew Hunt**
+    - Software craftsmanship
+    - Best practices
+    - Career development
+
+15. **"Clean Code" by Robert C. Martin**
+    - Code quality
+    - Refactoring
+    - Maintainability
+
+## Online Resources
+
+### Free Books
+- [Python Data Science Handbook](https://jakevdp.github.io/PythonDataScienceHandbook/)
+- [SQL Murder Mystery](https://mystery.knightlab.com/) - Interactive SQL learning
+- [The Data Engineering Cookbook](https://github.com/andkret/Cookbook)
+
+### Documentation
+- [Python Official Docs](https://docs.python.org/3/)
+- [PostgreSQL Docs](https://www.postgresql.org/docs/)
+- [Pandas Documentation](https://pandas.pydata.org/docs/)
+
+## Reading Plan
+
+### Month 1-2: Python Foundations
+- Python Crash Course
+- Automate the Boring Stuff
+
+### Month 3-4: Data Manipulation
+- Python for Data Analysis
+- Learning SQL
+
+### Month 5-6: Advanced Topics
+- Designing Data-Intensive Applications
+- Fundamentals of Data Engineering
+
+### Ongoing
+- SQL Performance Explained (reference)
+- The Pragmatic Programmer (reference)
+
+## Tips for Reading Technical Books
+
+1. **Code Along**: Type out examples as you read
+2. **Take Notes**: Summarize key concepts
+3. **Do Exercises**: Complete all practice problems
+4. **Apply**: Use concepts in your own projects
+5. **Review**: Revisit difficult chapters
+6. **Discuss**: Join study groups or forums
+
+## Where to Buy
+
+- [O'Reilly Learning Platform](https://learning.oreilly.com/) - Subscription access to many books
+- Amazon (Kindle or Physical)
+- [Manning Publications](https://www.manning.com/) - Often has sales
+- Local Library - Many libraries have O'Reilly access
diff --git a/resources/cheatsheet.md b/resources/cheatsheet.md
new file mode 100644
index 0000000..cec0da4
--- /dev/null
+++ b/resources/cheatsheet.md
@@ -0,0 +1,572 @@
+# Data Engineer Cheatsheet
+
+Quick reference for common Python and SQL operations used in data engineering.
+
+## Python Basics
+
+### Variables and Data Types
+```python
+# Numbers
+integer = 42
+floating = 3.14
+complex_num = 3 + 4j
+
+# Strings
+text = "Hello"
+multi_line = """Multiple
+lines"""
+
+# Boolean
+is_true = True
+is_false = False
+
+# None
+empty = None
+```
+
+### Lists
+```python
+# Create
+my_list = [1, 2, 3, 4, 5]
+mixed = [1, "two", 3.0, True]
+
+# Access
+first = my_list[0]
+last = my_list[-1]
+slice = my_list[1:4]  # [2, 3, 4]
+
+# Modify
+my_list.append(6)      # Add to end
+my_list.insert(0, 0)   # Insert at position
+my_list.remove(3)      # Remove first occurrence
+popped = my_list.pop() # Remove and return last
+
+# Common operations
+length = len(my_list)
+sorted_list = sorted(my_list)
+reversed_list = list(reversed(my_list))
+```
+
+### Dictionaries
+```python
+# Create
+person = {"name": "Alice", "age": 30, "city": "NYC"}
+
+# Access
+name = person["name"]
+age = person.get("age", 0)  # With default
+
+# Modify
+person["email"] = "alice@example.com"
+person.update({"phone": "123-456-7890"})
+
+# Iterate
+for key, value in person.items():
+    print(f"{key}: {value}")
+```
+
+### Control Flow
+```python
+# If-else
+if x > 0:
+    print("Positive")
+elif x < 0:
+    print("Negative")
+else:
+    print("Zero")
+
+# For loop
+for i in range(5):
+    print(i)
+
+for item in my_list:
+    print(item)
+
+# While loop
+while x > 0:
+    x -= 1
+
+# List comprehension
+squares = [x**2 for x in range(10)]
+evens = [x for x in range(10) if x % 2 == 0]
+```
+
+### Functions
+```python
+# Basic function
+def greet(name):
+    return f"Hello, {name}!"
+
+# Default arguments
+def greet(name="World"):
+    return f"Hello, {name}!"
+
+# Multiple return values
+def stats(numbers):
+    return min(numbers), max(numbers), sum(numbers)
+
+# Lambda function
+square = lambda x: x**2
+```
+
+## Pandas
+
+### DataFrame Creation
+```python
+import pandas as pd
+
+# From dictionary
+df = pd.DataFrame({
+    'A': [1, 2, 3],
+    'B': [4, 5, 6]
+})
+
+# From CSV
+df = pd.read_csv('file.csv')
+
+# From SQL
+df = pd.read_sql('SELECT * FROM table', connection)
+```
+
+### Data Selection
+```python
+# Columns
+df['column_name']
+df[['col1', 'col2']]
+
+# Rows
+df.iloc[0]        # By position
+df.loc[0]         # By label
+df.iloc[0:5]      # First 5 rows
+df.head(10)       # First 10 rows
+df.tail(10)       # Last 10 rows
+
+# Conditional
+df[df['age'] > 30]
+df[(df['age'] > 30) & (df['city'] == 'NYC')]
+```
+
+### Data Manipulation
+```python
+# Add column
+df['new_col'] = df['col1'] + df['col2']
+
+# Drop column
+df = df.drop('column_name', axis=1)
+
+# Rename
+df = df.rename(columns={'old': 'new'})
+
+# Sort
+df = df.sort_values('column_name')
+df = df.sort_values(['col1', 'col2'], ascending=[True, False])
+
+# Group by
+grouped = df.groupby('category')['value'].sum()
+grouped = df.groupby('category').agg({
+    'value': ['sum', 'mean', 'count']
+})
+```
+
+### Data Cleaning
+```python
+# Handle missing values
+df.isnull().sum()              # Count nulls
+df = df.dropna()               # Drop rows with nulls
+df = df.fillna(0)              # Fill with value
+df = df.fillna(df.mean())      # Fill with mean
+
+# Remove duplicates
+df = df.drop_duplicates()
+df = df.drop_duplicates(subset=['column'])
+
+# Data types
+df.dtypes                      # Check types
+df['col'] = df['col'].astype(int)  # Convert type
+```
+
+### Merging DataFrames
+```python
+# Merge (SQL-like joins)
+merged = pd.merge(df1, df2, on='key')
+merged = pd.merge(df1, df2, on='key', how='left')
+
+# Concat (append)
+combined = pd.concat([df1, df2], axis=0)  # Rows
+combined = pd.concat([df1, df2], axis=1)  # Columns
+```
+
+## SQL
+
+### Basic Queries
+```sql
+-- Select
+SELECT column1, column2 FROM table;
+SELECT * FROM table;
+SELECT DISTINCT city FROM customers;
+
+-- Where
+SELECT * FROM table WHERE age > 25;
+SELECT * FROM table WHERE age > 25 AND city = 'NYC';
+SELECT * FROM table WHERE age BETWEEN 20 AND 30;
+SELECT * FROM table WHERE city IN ('NYC', 'LA', 'SF');
+SELECT * FROM table WHERE name LIKE 'A%';
+
+-- Order By
+SELECT * FROM table ORDER BY age DESC;
+SELECT * FROM table ORDER BY age, name;
+
+-- Limit
+SELECT * FROM table LIMIT 10;
+SELECT * FROM table LIMIT 10 OFFSET 20;
+```
+
+### Joins
+```sql
+-- Inner Join
+SELECT a.*, b.name 
+FROM orders a
+INNER JOIN customers b ON a.customer_id = b.id;
+
+-- Left Join
+SELECT a.*, b.name 
+FROM orders a
+LEFT JOIN customers b ON a.customer_id = b.id;
+
+-- Multiple Joins
+SELECT o.order_id, c.name, p.product_name
+FROM orders o
+JOIN customers c ON o.customer_id = c.id
+JOIN products p ON o.product_id = p.id;
+```
+
+### Aggregations
+```sql
+-- Basic aggregates
+SELECT COUNT(*) FROM table;
+SELECT SUM(amount) FROM orders;
+SELECT AVG(salary) FROM employees;
+SELECT MIN(price), MAX(price) FROM products;
+
+-- Group By
+SELECT city, COUNT(*) as count
+FROM customers
+GROUP BY city;
+
+SELECT department, AVG(salary) as avg_salary
+FROM employees
+GROUP BY department
+HAVING AVG(salary) > 50000;
+```
+
+### Subqueries
+```sql
+-- In WHERE clause
+SELECT * FROM employees
+WHERE salary > (SELECT AVG(salary) FROM employees);
+
+-- In FROM clause
+SELECT dept, avg_sal
+FROM (
+    SELECT department as dept, AVG(salary) as avg_sal
+    FROM employees
+    GROUP BY department
+) subquery
+WHERE avg_sal > 60000;
+```
+
+### Window Functions
+```sql
+-- Running total
+SELECT date, amount,
+    SUM(amount) OVER (ORDER BY date) as running_total
+FROM sales;
+
+-- Ranking
+SELECT name, salary,
+    RANK() OVER (ORDER BY salary DESC) as rank
+FROM employees;
+
+-- Partition
+SELECT department, name, salary,
+    AVG(salary) OVER (PARTITION BY department) as dept_avg
+FROM employees;
+```
+
+### Data Modification
+```sql
+-- Insert
+INSERT INTO table (col1, col2) VALUES (val1, val2);
+INSERT INTO table VALUES (val1, val2, val3);
+
+-- Update
+UPDATE table SET col1 = val1 WHERE condition;
+
+-- Delete
+DELETE FROM table WHERE condition;
+
+-- Create Table
+CREATE TABLE users (
+    id SERIAL PRIMARY KEY,
+    name VARCHAR(100) NOT NULL,
+    email VARCHAR(100) UNIQUE,
+    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
+);
+```
+
+## Database Operations in Python
+
+### SQLAlchemy
+```python
+from sqlalchemy import create_engine
+
+# Connect
+engine = create_engine('postgresql://user:pass@localhost/db')
+
+# Read
+df = pd.read_sql('SELECT * FROM table', engine)
+df = pd.read_sql_table('table_name', engine)
+df = pd.read_sql_query('SELECT * FROM table WHERE x > 5', engine)
+
+# Write
+df.to_sql('table_name', engine, if_exists='replace', index=False)
+# if_exists: 'fail', 'replace', 'append'
+```
+
+### Psycopg2 (PostgreSQL)
+```python
+import psycopg2
+
+# Connect
+conn = psycopg2.connect(
+    host="localhost",
+    database="mydb",
+    user="user",
+    password="password"
+)
+
+# Execute
+cursor = conn.cursor()
+cursor.execute("SELECT * FROM table")
+rows = cursor.fetchall()
+
+# With parameters
+cursor.execute("SELECT * FROM table WHERE id = %s", (id,))
+
+# Commit and close
+conn.commit()
+cursor.close()
+conn.close()
+```
+
+## File Operations
+
+### CSV
+```python
+# Read
+df = pd.read_csv('file.csv')
+df = pd.read_csv('file.csv', sep=';', encoding='utf-8')
+
+# Write
+df.to_csv('output.csv', index=False)
+df.to_csv('output.csv', sep='\t', encoding='utf-8')
+```
+
+### JSON
+```python
+# Read
+df = pd.read_json('file.json')
+df = pd.read_json('file.json', orient='records')
+
+# Write
+df.to_json('output.json', orient='records', indent=2)
+```
+
+### Excel
+```python
+# Read
+df = pd.read_excel('file.xlsx', sheet_name='Sheet1')
+
+# Write
+df.to_excel('output.xlsx', sheet_name='Data', index=False)
+
+# Multiple sheets
+with pd.ExcelWriter('output.xlsx') as writer:
+    df1.to_excel(writer, sheet_name='Sheet1')
+    df2.to_excel(writer, sheet_name='Sheet2')
+```
+
+### Parquet
+```python
+# Read
+df = pd.read_parquet('file.parquet')
+
+# Write
+df.to_parquet('output.parquet', compression='snappy')
+```
+
+## Date/Time Operations
+
+### Python datetime
+```python
+from datetime import datetime, timedelta
+
+# Current
+now = datetime.now()
+today = datetime.today()
+
+# Create
+dt = datetime(2024, 1, 15, 10, 30)
+
+# Parse
+dt = datetime.strptime('2024-01-15', '%Y-%m-%d')
+
+# Format
+formatted = dt.strftime('%Y-%m-%d %H:%M:%S')
+
+# Arithmetic
+tomorrow = now + timedelta(days=1)
+last_week = now - timedelta(weeks=1)
+```
+
+### Pandas datetime
+```python
+# Convert to datetime
+df['date'] = pd.to_datetime(df['date_string'])
+
+# Extract components
+df['year'] = df['date'].dt.year
+df['month'] = df['date'].dt.month
+df['day'] = df['date'].dt.day
+df['dayofweek'] = df['date'].dt.dayofweek
+
+# Date arithmetic
+df['next_week'] = df['date'] + pd.Timedelta(weeks=1)
+
+# Resample (time series)
+df.set_index('date').resample('D').mean()  # Daily average
+df.set_index('date').resample('M').sum()   # Monthly sum
+```
+
+## Common Patterns
+
+### ETL Pattern
+```python
+def extract_data(source):
+    """Extract data from source"""
+    return pd.read_csv(source)
+
+def transform_data(df):
+    """Clean and transform data"""
+    df = df.dropna()
+    df['new_col'] = df['col1'] + df['col2']
+    return df
+
+def load_data(df, target):
+    """Load data to target"""
+    df.to_sql('table', engine, if_exists='replace')
+
+# Run ETL
+df = extract_data('input.csv')
+df = transform_data(df)
+load_data(df, 'output_table')
+```
+
+### Error Handling
+```python
+try:
+    df = pd.read_csv('file.csv')
+except FileNotFoundError:
+    print("File not found")
+except pd.errors.ParserError:
+    print("Error parsing CSV")
+except Exception as e:
+    print(f"Unexpected error: {e}")
+finally:
+    print("Cleanup")
+```
+
+### Logging
+```python
+import logging
+
+logging.basicConfig(
+    level=logging.INFO,
+    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
+)
+
+logger = logging.getLogger(__name__)
+
+logger.info("Processing started")
+logger.warning("Warning message")
+logger.error("Error occurred")
+```
+
+## Git Commands
+
+```bash
+# Clone
+git clone <url>
+
+# Status
+git status
+
+# Add files
+git add file.py
+git add .
+
+# Commit
+git commit -m "Message"
+
+# Push
+git push origin branch_name
+
+# Pull
+git pull origin branch_name
+
+# Branch
+git branch new_branch
+git checkout new_branch
+git checkout -b new_branch  # Create and switch
+
+# Merge
+git merge branch_name
+
+# View history
+git log
+git log --oneline
+```
+
+## Docker Commands
+
+```bash
+# Build
+docker build -t myapp .
+
+# Run
+docker run myapp
+docker run -p 5000:5000 myapp
+docker run -d myapp  # Detached
+
+# List
+docker ps        # Running
+docker ps -a     # All
+
+# Stop/Start
+docker stop container_id
+docker start container_id
+
+# Remove
+docker rm container_id
+docker rmi image_id
+
+# Docker Compose
+docker-compose up
+docker-compose up -d
+docker-compose down
+docker-compose logs
+```
+
+---
+
+This cheatsheet covers the most common operations. Bookmark it for quick reference!
diff --git a/resources/courses.md b/resources/courses.md
new file mode 100644
index 0000000..429e521
--- /dev/null
+++ b/resources/courses.md
@@ -0,0 +1,252 @@
+# Online Courses and Learning Platforms
+
+## Comprehensive Learning Platforms
+
+### DataCamp
+- **Data Engineer Career Track**
+  - Structured learning path
+  - Interactive exercises
+  - Hands-on projects
+  - Certificate upon completion
+
+### Coursera
+- **IBM Data Engineering Professional Certificate**
+  - 13 courses
+  - Real-world projects
+  - Industry-recognized certificate
+  
+- **Google Data Analytics Professional Certificate**
+  - Beginner-friendly
+  - SQL and analysis skills
+  - Portfolio projects
+
+### Udacity
+- **Data Engineering Nanodegree**
+  - Project-based learning
+  - Mentor support
+  - Industry partnerships
+  - Advanced topics
+
+## Python Courses
+
+### Free
+1. **Python for Everybody (Coursera)**
+   - Dr. Chuck Severance
+   - Beginner-friendly
+   - University of Michigan
+
+2. **CS50's Introduction to Programming with Python (edX)**
+   - Harvard University
+   - Free certification available
+   - High quality
+
+3. **freeCodeCamp Python Courses (YouTube)**
+   - Multiple full courses
+   - Completely free
+   - Various skill levels
+
+### Paid
+4. **Complete Python Bootcamp (Udemy)**
+   - Jose Portilla
+   - Comprehensive coverage
+   - Practical projects
+
+5. **Python for Data Science and Machine Learning (Udemy)**
+   - Jose Portilla
+   - Focus on data libraries
+   - Hands-on projects
+
+## SQL Courses
+
+### Free
+1. **Khan Academy - Intro to SQL**
+   - Interactive lessons
+   - Beginner-friendly
+   - Query practice
+
+2. **SQLBolt**
+   - Interactive SQL lessons
+   - Progressive difficulty
+   - Great for beginners
+
+3. **Mode SQL Tutorial**
+   - SQL for data analysis
+   - Real datasets
+   - Advanced topics
+
+### Paid
+4. **The Complete SQL Bootcamp (Udemy)**
+   - PostgreSQL focus
+   - Practical examples
+   - Assessment tests
+
+5. **DataCamp SQL Fundamentals Track**
+   - Multiple courses
+   - Interactive exercises
+   - Progressive learning
+
+## Data Engineering Specific
+
+### Free
+1. **AWS Data Engineering Fundamentals (Coursera)**
+   - Cloud data engineering
+   - AWS services
+   - Free to audit
+
+2. **Microsoft Azure Data Engineering (Coursera)**
+   - Azure-specific
+   - Cloud technologies
+   - Free to audit
+
+### Paid
+3. **Data Engineering on Google Cloud Platform (Coursera)**
+   - GCP focus
+   - Professional certificate
+   - Hands-on labs
+
+4. **The Complete Hands-On Introduction to Apache Airflow (Udemy)**
+   - Workflow orchestration
+   - Practical projects
+   - Industry tool
+
+## Practice Platforms
+
+### Coding Practice
+1. **LeetCode**
+   - SQL problems
+   - Algorithm practice
+   - Interview preparation
+
+2. **HackerRank**
+   - Python challenges
+   - SQL challenges
+   - Certification tests
+
+3. **Codewars**
+   - Community challenges
+   - Multiple languages
+   - Progressive difficulty
+
+4. **Exercism**
+   - Mentor-supported
+   - Language tracks
+   - Free practice
+
+### SQL Specific
+5. **SQLZoo**
+   - Interactive SQL tutorials
+   - Progressive difficulty
+   - Multiple SQL dialects
+
+6. **SQL Murder Mystery**
+   - Learn through games
+   - Engaging format
+   - Good for beginners
+
+## YouTube Channels
+
+### Data Engineering
+1. **Data Engineering TV**
+   - Industry insights
+   - Tool reviews
+   - Best practices
+
+2. **Seattle Data Guy**
+   - Career advice
+   - Technical tutorials
+   - Industry trends
+
+### Python
+3. **Corey Schafer**
+   - Clear explanations
+   - Python tutorials
+   - Practical examples
+
+4. **Real Python**
+   - Python tutorials
+   - Best practices
+   - Various skill levels
+
+### SQL
+5. **Socratica - SQL**
+   - Clear, concise lessons
+   - Professional production
+   - Beginner-friendly
+
+## Learning Path Recommendations
+
+### Complete Beginner (0-3 months)
+1. Python for Everybody (Coursera)
+2. Khan Academy SQL
+3. DataCamp Python Fundamentals
+
+### Intermediate (3-6 months)
+1. Python for Data Analysis course
+2. Complete SQL Bootcamp
+3. Practice on LeetCode/HackerRank
+
+### Advanced (6-12 months)
+1. Data Engineering Nanodegree (Udacity)
+2. Cloud platform specialization
+3. Apache Airflow course
+
+## Certification Paths
+
+### Entry Level
+- **DataCamp Data Engineer Track**
+- **HackerRank Python/SQL Certificates**
+
+### Professional
+- **AWS Certified Data Analytics**
+- **Google Professional Data Engineer**
+- **Microsoft Certified: Azure Data Engineer**
+
+### Advanced
+- **Databricks Certified Data Engineer**
+- **Snowflake SnowPro Core Certification**
+
+## Tips for Online Learning
+
+1. **Set a Schedule**: Dedicate specific hours each week
+2. **Take Notes**: Summarize key concepts
+3. **Code Along**: Practice while watching
+4. **Build Projects**: Apply what you learn
+5. **Join Communities**: Connect with other learners
+6. **Review Regularly**: Revisit difficult topics
+7. **Stay Consistent**: Daily practice beats cramming
+
+## Free vs Paid
+
+### When Free is Enough
+- Learning basics
+- Exploring new topics
+- Casual learning
+- Budget constraints
+
+### When to Consider Paid
+- Structured learning path needed
+- Want certification
+- Need mentor support
+- Serious career change
+
+## Community Learning
+
+### Forums and Communities
+- Reddit: r/dataengineering, r/learnpython
+- Stack Overflow
+- Discord servers for data engineering
+- LinkedIn groups
+
+### Study Groups
+- Local meetups
+- Online study groups
+- Code review sessions
+- Peer learning
+
+## Budget-Friendly Options
+
+1. **Library Access**: Many libraries offer free Coursera/LinkedIn Learning
+2. **Financial Aid**: Coursera offers financial aid
+3. **Free Trials**: Try platforms before committing
+4. **Company Benefits**: Check if employer offers learning budgets
+5. **YouTube**: Vast free content available
diff --git a/resources/tools.md b/resources/tools.md
new file mode 100644
index 0000000..287638c
--- /dev/null
+++ b/resources/tools.md
@@ -0,0 +1,337 @@
+# Essential Tools for Data Engineers
+
+## Development Environment
+
+### Code Editors and IDEs
+
+#### Visual Studio Code (Recommended for Beginners)
+- **Free and open source**
+- **Extensions**: Python, SQL, Git
+- **Features**: Debugging, integrated terminal, Git integration
+- **Download**: [code.visualstudio.com](https://code.visualstudio.com/)
+
+#### PyCharm
+- **Professional**: Paid, full-featured
+- **Community**: Free, perfect for Python
+- **Features**: Advanced debugging, database tools, refactoring
+- **Download**: [jetbrains.com/pycharm](https://www.jetbrains.com/pycharm/)
+
+#### Jupyter Notebook / JupyterLab
+- **Interactive Python environment**
+- **Great for**: Data exploration, documentation
+- **Install**: `pip install jupyter`
+
+### Terminal and Command Line
+
+#### Windows
+- **PowerShell**
+- **Windows Terminal** (modern, recommended)
+- **Git Bash** (Unix-like commands on Windows)
+- **WSL** (Windows Subsystem for Linux)
+
+#### Mac/Linux
+- **Terminal** (built-in)
+- **iTerm2** (Mac, advanced features)
+- **Zsh** with Oh My Zsh (enhanced shell)
+
+## Version Control
+
+### Git
+- **Essential skill** for all developers
+- **Commands**: clone, commit, push, pull, branch, merge
+- **Download**: [git-scm.com](https://git-scm.com/)
+
+### GitHub
+- **Code hosting** and collaboration
+- **Portfolio**: Showcase your projects
+- **Learning**: Explore open source projects
+
+### Alternatives
+- **GitLab**: Similar to GitHub, good CI/CD
+- **Bitbucket**: Integrates with Atlassian tools
+
+## Databases
+
+### SQLite
+- **Best for**: Learning, small projects
+- **Advantages**: No server, file-based, simple
+- **Built-in** to Python
+
+### PostgreSQL (Recommended)
+- **Best for**: Production, learning advanced SQL
+- **Features**: Full-featured, reliable, open source
+- **Download**: [postgresql.org](https://www.postgresql.org/)
+
+### MySQL/MariaDB
+- **Popular**: Widely used in web applications
+- **Similar to**: PostgreSQL in many ways
+
+### Database Tools
+
+#### DBeaver (Recommended)
+- **Free and open source**
+- **Supports**: All major databases
+- **Features**: Query builder, ER diagrams, data export
+- **Download**: [dbeaver.io](https://dbeaver.io/)
+
+#### pgAdmin
+- **PostgreSQL specific**
+- **Official tool**
+- **Full-featured**
+
+#### DataGrip (JetBrains)
+- **Paid, professional**
+- **Multi-database support**
+- **Advanced features**
+
+## Python Libraries
+
+### Essential for Data Engineering
+
+```bash
+# Data manipulation
+pip install pandas
+pip install numpy
+
+# Database connectivity
+pip install psycopg2-binary  # PostgreSQL
+pip install pymysql  # MySQL
+pip install sqlalchemy  # ORM and database abstraction
+
+# Data validation
+pip install great-expectations
+pip install pandera
+
+# API interactions
+pip install requests
+pip install httpx
+
+# File formats
+pip install openpyxl  # Excel
+pip install pyarrow  # Parquet
+pip install lxml  # XML
+
+# Configuration
+pip install python-dotenv  # Environment variables
+pip install pyyaml  # YAML files
+
+# Testing
+pip install pytest
+pip install pytest-cov
+
+# Date/Time
+pip install python-dateutil
+```
+
+### Advanced Libraries
+
+```bash
+# Workflow orchestration
+pip install apache-airflow
+
+# Data quality
+pip install dbt-core  # Data transformation
+
+# Cloud SDKs
+pip install boto3  # AWS
+pip install google-cloud-storage  # GCP
+pip install azure-storage-blob  # Azure
+
+# Big data
+pip install pyspark
+
+# Logging and monitoring
+pip install loguru
+```
+
+## Workflow Orchestration
+
+### Apache Airflow
+- **Industry standard** for data pipelines
+- **Features**: Scheduling, monitoring, retries
+- **Python-based** DAGs (Directed Acyclic Graphs)
+
+### Alternatives
+- **Prefect**: Modern, Pythonic
+- **Dagster**: Data-aware orchestration
+- **Luigi**: Spotify's workflow manager
+
+## Containerization
+
+### Docker
+- **Essential** for modern data engineering
+- **Benefits**: Consistent environments, easy deployment
+- **Download**: [docker.com](https://www.docker.com/)
+
+### Docker Compose
+- **Multi-container** applications
+- **Great for**: Local development
+- **Included** with Docker Desktop
+
+## Cloud Platforms
+
+### AWS (Amazon Web Services)
+- **Services**: S3, RDS, Redshift, Glue, Lambda
+- **Most popular** cloud platform
+- **Free tier** available
+
+### Google Cloud Platform (GCP)
+- **Services**: BigQuery, Cloud Storage, Dataflow
+- **Strong** data and ML offerings
+- **Free tier** available
+
+### Microsoft Azure
+- **Services**: Azure SQL, Data Factory, Synapse
+- **Enterprise focused**
+- **Free tier** available
+
+## Data Quality and Testing
+
+### Great Expectations
+- **Data validation** framework
+- **Documentation** generation
+- **Integration** with pipelines
+
+### Pytest
+- **Testing framework**
+- **Essential** for production code
+- **Simple** and powerful
+
+### dbt (data build tool)
+- **SQL-based** transformations
+- **Testing** built-in
+- **Documentation** generation
+
+## Monitoring and Logging
+
+### Logging
+- **Python logging** module (built-in)
+- **Loguru**: Modern logging library
+- **Structured logging**: JSON logs
+
+### Monitoring Tools
+- **Prometheus**: Metrics collection
+- **Grafana**: Visualization
+- **ELK Stack**: Elasticsearch, Logstash, Kibana
+
+## Documentation
+
+### Markdown
+- **Standard** for documentation
+- **Easy to learn**
+- **Supported everywhere**
+
+### Sphinx
+- **Python documentation** generator
+- **Used by** Python itself
+- **Professional** output
+
+### Draw.io / Diagrams.net
+- **Free diagramming** tool
+- **Architecture diagrams**
+- **Data flow diagrams**
+
+## Productivity Tools
+
+### Task Management
+- **Notion**: All-in-one workspace
+- **Trello**: Kanban boards
+- **Todoist**: Simple todo lists
+
+### Note Taking
+- **Obsidian**: Markdown-based
+- **Notion**: Rich features
+- **Jupyter notebooks**: Code + notes
+
+### Communication
+- **Slack**: Team communication
+- **Discord**: Communities
+- **Stack Overflow**: Q&A
+
+## Package Management
+
+### pip
+- **Default** Python package manager
+- **Essential** for installing libraries
+
+### conda
+- **Environment** and package manager
+- **Good for**: Data science
+- **Includes**: Non-Python dependencies
+
+### Poetry
+- **Modern** dependency management
+- **Better** dependency resolution
+- **Recommended** for projects
+
+## Recommended Setup for Beginners
+
+1. **Install**: Python 3.8+
+2. **Install**: VS Code
+3. **Install**: Git
+4. **Install**: PostgreSQL
+5. **Install**: Docker (when ready)
+6. **Create**: GitHub account
+7. **Install**: DBeaver
+8. **Learn**: Basic terminal commands
+
+## Recommended Setup for Advanced Users
+
+1. All beginner tools
+2. **Add**: PyCharm Professional
+3. **Add**: Docker and Docker Compose
+4. **Add**: Apache Airflow (in Docker)
+5. **Add**: Cloud platform CLI (AWS/GCP/Azure)
+6. **Add**: Monitoring tools
+7. **Add**: CI/CD tools (GitHub Actions)
+
+## Tool Selection Tips
+
+1. **Start Simple**: Don't overwhelm yourself
+2. **Master Basics**: Before moving to advanced tools
+3. **Open Source First**: Try free tools before paid
+4. **Community**: Choose tools with active communities
+5. **Job Market**: Consider what employers use
+6. **Personal Preference**: Use what works for you
+
+## Learning Resources
+
+### Practice Environments
+- **Google Colab**: Free Jupyter notebooks
+- **Kaggle**: Datasets and notebooks
+- **Repl.it**: Online IDE
+
+### Sandboxes
+- **DB Fiddle**: Online SQL practice
+- **SQLite Online**: Browser-based SQLite
+- **PythonAnywhere**: Host Python apps for free
+
+## Cost Considerations
+
+### Free Forever
+- Python
+- VS Code
+- Git
+- PostgreSQL
+- SQLite
+- DBeaver
+- Most Python libraries
+
+### Free Tier (Limited)
+- AWS, GCP, Azure
+- GitHub (unlimited public repos)
+- Docker Hub
+
+### Worth Paying For
+- PyCharm Professional (student discounts available)
+- Cloud resources (for production)
+- Courses and books
+- Monitoring services (for production)
+
+## Next Steps
+
+1. **Install core tools**: Python, editor, Git, database
+2. **Set up environment**: Virtual environments, Git config
+3. **Practice**: Build small projects
+4. **Explore**: Try new tools as you learn
+5. **Share**: Show your work on GitHub