# Week 5: Organizing Projects with GitHub + AI Helpers

This notebook demonstrates best practices for organizing data science projects on GitHub and leveraging AI helpers like GitHub Copilot and ChatGPT to improve your development workflow.

## 1. Project Organization & Git Basics

A well-organized project structure makes collaboration easier and your code more maintainable.

In [None]:
# Example project structure
project_structure = """
my-data-science-project/
├── README.md                 # Project overview and setup instructions
├── .gitignore                # Files to exclude from version control
├── requirements.txt          # Python dependencies
├── notebooks/                # Jupyter notebooks for exploration
│   ├── 01_data_exploration.ipynb
│   ├── 02_analysis.ipynb
│   └── 03_visualization.ipynb
├── data/                      # Data files
│   ├── raw/                  # Original, unmodified data
│   └── processed/            # Cleaned, transformed data
├── src/                       # Source code modules
│   ├── __init__.py
│   ├── data_loader.py
│   ├── preprocessing.py
│   └── visualization.py
└── tests/                     # Unit tests
    └── test_preprocessing.py
"""

print(project_structure)

## 2. Writing Effective Commit Messages

Clear commit messages help you and your team understand what changed and why.

In [None]:
# Good vs Bad commit messages

good_commits = [
    "Add temperature trend analysis for Germany",
    "Fix bug in data filtering logic for missing values",
    "Refactor data loading to improve performance",
    "Update documentation for API integration",
    "Add unit tests for preprocessing functions"
]

bad_commits = [
    "update",
    "fixes",
    "asdf",
    "work in progress",
    "changes"
]

print("GOOD commit messages:")
for msg in good_commits:
    print(f"  ✓ {msg}")

print("\nBAD commit messages:")
for msg in bad_commits:
    print(f"  ✗ {msg}")

## 3. Creating a Comprehensive README

Your README is the first thing people see. Make it count!

In [None]:
readme_template = """
# Project Title: Temperature Analysis & Climate Insights

## Overview
This project analyzes global temperature data to identify trends and seasonal patterns across different countries.

## Setup Instructions

1. **Clone the repository**
   ```bash
   git clone https://github.com/username/temperature-analysis.git
   cd temperature-analysis
   ```

2. **Install dependencies**
   ```bash
   pip install -r requirements.txt
   ```

3. **Run the analysis**
   ```bash
   jupyter notebook notebooks/01_data_exploration.ipynb
   ```

## Data

- **Source**: Average Monthly Surface Temperature Dataset
- **Format**: CSV with columns: Entity, Code, Day, Monthly average
- **Time Period**: 1940-2023
- **Countries**: 200+ countries and regions

## Key Findings

- Seasonal temperature variations are more pronounced in Germany than in Egypt
- Global temperatures show gradual warming trend over time
- Coastal regions have different seasonal patterns than inland areas

## Files

- `notebooks/`: Jupyter notebooks for exploration and analysis
- `data/`: Raw and processed data files
- `src/`: Reusable functions and modules

## Author
Your Name

## License
MIT
"""

print(readme_template)

## 4. Using GitHub Copilot & AI Helpers

AI helpers can accelerate your development by:
- Auto-completing code patterns
- Generating function stubs
- Writing tests and documentation
- Debugging and refactoring code

In [None]:
# Example: Function that AI could help complete

def load_temperature_data(filepath):
    """
    Load temperature data from CSV file and prepare for analysis.
    
    Parameters:
    -----------
    filepath : str
        Path to the CSV file
    
    Returns:
    --------
    pd.DataFrame
        Cleaned dataframe with columns: Entity, Code, Day, Monthly average, Year
    """
    import pandas as pd
    
    # AI-assisted: Load and parse the CSV
    df = pd.read_csv(filepath)
    df['Day'] = pd.to_datetime(df['Day'])
    df['Year'] = df['Day'].dt.year
    
    # AI-assisted: Remove rows with missing values in key columns
    df = df.dropna(subset=['Entity', 'Monthly average'])
    
    # AI-assisted: Convert temperature to numeric
    df['Monthly average'] = pd.to_numeric(df['Monthly average'], errors='coerce')
    
    return df

print("Example function that could be AI-assisted:")
print(load_temperature_data.__doc__)

## 5. Best Practices for AI-Assisted Coding

When using GitHub Copilot or ChatGPT:

In [None]:
best_practices = {
    "DO": [
        "✓ Review all AI-generated code before using it",
        "✓ Test the code thoroughly",
        "✓ Understand what the code does",
        "✓ Modify and adapt as needed for your use case",
        "✓ Use AI to learn and understand new concepts",
        "✓ Ask AI to explain code you don't understand"
    ],
    "DON'T": [
        "✗ Blindly accept AI suggestions without review",
        "✗ Use AI code without testing it",
        "✗ Copy-paste code without understanding it",
        "✗ Trust AI for security-critical code",
        "✗ Rely solely on AI for important decisions",
        "✗ Forget to document your development process"
    ]
}

for category, items in best_practices.items():
    print(f"\n{category}:")
    for item in items:
        print(f"  {item}")

## 6. Common Git Workflows

Essential Git commands for collaboration and version control.

In [None]:
git_commands = """
# Initialize a repository
git init

# Add files and commit
git add .
git commit -m "Initial commit: Project setup"

# Push to GitHub
git push origin main

# Create a new branch for a feature
git branch feature/temperature-analysis
git checkout feature/temperature-analysis

# Make changes, then commit
git add src/analysis.py
git commit -m "Add temperature trend analysis function"

# Push your branch
git push origin feature/temperature-analysis

# Switch back to main
git checkout main

# Pull latest changes
git pull origin main

# Merge your feature branch
git merge feature/temperature-analysis

# View commit history
git log --oneline
"""

print(git_commands)

## 7. Assignment: Create Your Project on GitHub

**Task**: Set up a GitHub repository for your data science project with proper organization and documentation.

**Requirements**:
1. Create a GitHub repository with a clear name
2. Set up proper folder structure (notebooks/, data/, src/)
3. Write a comprehensive README.md
4. Create a .gitignore file
5. Make at least 5 meaningful commits with good messages
6. Use an AI helper to improve your code quality or documentation
7. Share your repository link and document how AI helpers assisted you

**Tips**:
- Commit frequently with clear messages
- Push your code to GitHub regularly
- Ask AI helpers to review your code or write docstrings
- Document what AI helped you with in your README