# Assignment 2: N-grams and Language Identification
## CNG463 - Introduction to Natural Language Processing
### METU NCC Computer Engineering
### Fall 2025-26

---

## Submission Information

**Due Date:** 15/11/2025 23:59

**Submission:** Submit through ODTUClass:
1. PDF export of this notebook
2. This .ipynb file (backup)

**File Naming:** `FirstnameLastname_StudentID_as2.pdf` and `.ipynb`

**Critical:**
- Email submissions will be deleted immediately and receive 0 points
- Late submissions are not permitted

---

## Student Information

**Full Name:** [YOUR FULL NAME HERE]

**Student ID:** [YOUR STUDENT ID HERE]

**Submission Date:** [DATE]

**Languages Used:** English + [YOUR SECOND LANGUAGE]

---

## Overview

This assignment provides hands-on experience with:
1. **N-gram language models** (2-grams and 3-grams)
2. **Language identification** using statistical methods
3. **Text generation** with n-gram models
4. **Evaluation metrics** (accuracy, precision, recall, F1-score)
5. **Cross-validation** and **statistical significance testing**

### Your Corpus

You will use **your own written reports** from other courses as the corpus:
- If you have reports in two languages → use them directly
- If you only have English reports → translate them to a second language using AI tools
- **Minimum corpus size:** ~5,000 words per language (combine multiple reports if needed)

### Grading Summary

**Total: 100 points**
- Task 1 (Corpus Statistics): 15 pts (10 baseline + 5 creativity)
- Task 2 (N-gram Language ID): 30 pts (20 baseline + 10 creativity)
- Task 3 (Evaluation & Comparison): 20 pts (15 baseline + 5 creativity)
- Task 4 (Text Generation): 20 pts (15 baseline + 5 creativity)
- Task 5 (Reflection): 15 pts (10 baseline + 5 creativity)

**Note:** Completing only baseline requirements yields 70/100. The remaining 30 points reward creativity, deeper analysis, and original insights.

---

## Instructions for Using This Notebook

### Getting Started
1. **Make your own copy:** `File → Save a copy in Drive`
2. **Rename it:** Include your name and student ID
3. **Work in order:** Complete tasks sequentially
4. **Save frequently:** Colab auto-saves, but be safe!

### Where to Write Code
- Look for cells marked with `# TODO: Your code here`
- You can add more code cells as needed
- Keep code clean and well-commented

### Where to Write Analysis
- Look for markdown sections: **[YOUR ANALYSIS HERE]**
- Double-click to edit markdown cells
- Write clear, concise observations

### Before Submitting
1. **Fill in** your name and student ID at the top
2. **Run all cells:** `Runtime → Run all`
3. **Check outputs:** Ensure all cells executed without errors
4. **Export to PDF:** `File → Print → Save as PDF`
5. **Download .ipynb:** `File → Download → Download .ipynb`
6. **Submit both files** to ODTUClass

---

## Setup and Imports

Run this cell first to import all necessary libraries.

In [None]:
# Standard libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from collections import defaultdict, Counter
import re
import random
from typing import List, Tuple, Dict

# Statistical testing
from scipy import stats

# Scikit-learn for evaluation
from sklearn.model_selection import KFold
from sklearn.metrics import accuracy_score, precision_recall_fscore_support

# For HuggingFace model
!pip install -q transformers
from transformers import pipeline

# Set random seed for reproducibility
random.seed(42)
np.random.seed(42)

# Plotting style
sns.set_style('whitegrid')
plt.rcParams['figure.figsize'] = (10, 6)

print("✓ All libraries imported successfully!")

---

# Task 1: Corpus Preparation and Statistics (15 points)

**Baseline (10 pts):** Upload and analyze your corpus in two languages. Calculate basic statistics including word count, vocabulary size, sentence count, special characters, and other relevant metrics.

**Creativity Bonus (5 pts):** Provide additional interesting statistical analyses (e.g., word length distribution, most frequent words, character n-gram analysis, type-token ratio, etc.) with visualization and insights.

---

## 1.1: Upload Your Corpus Files

Upload your text files (reports from other courses) in two languages.

In [None]:
from google.colab import files

# Upload English corpus
print("📁 Upload your ENGLISH corpus file(s):")
print("(You can select multiple files if needed)\n")
uploaded_en = files.upload()

print("\n" + "="*50 + "\n")

# Upload second language corpus
print("📁 Upload your SECOND LANGUAGE corpus file(s):")
print("(You can select multiple files if needed)\n")
uploaded_lang2 = files.upload()

print("\n✓ Files uploaded successfully!")

## 1.2: Load and Preprocess Corpora

Load your corpus files and perform basic preprocessing:
- Read all files and combine them
- Decide on preprocessing steps (lowercasing, punctuation, etc.)
- Tokenize into sentences and words

**Note:** Document your preprocessing decisions - they affect your results!

In [None]:
# TODO: Your code here
# 
# Suggested steps:
# 1. Read uploaded files
# 2. Combine multiple files if needed
# 3. Basic cleaning (optional: remove extra whitespace, etc.)
# 4. Sentence tokenization
# 5. Word tokenization
#
# Example structure:
# english_text = ""  # Combined text from all English files
# lang2_text = ""     # Combined text from all second language files
# 
# english_sentences = [...]  # List of sentences
# lang2_sentences = [...]
#
# english_words = [...]  # List of all words
# lang2_words = [...]

# Hint: You might want to use simple splits or regex for tokenization
# Example: sentences = text.split('.') or re.split(r'[.!?]+', text)
# Example: words = sentence.lower().split() or re.findall(r'\b\w+\b', sentence)

pass

### Preprocessing Decisions

**[YOUR ANALYSIS HERE]**

Document the preprocessing choices you made:
- Did you lowercase the text? Why or why not?
- How did you handle punctuation?
- What method did you use for sentence tokenization?
- What method did you use for word tokenization?
- Any other preprocessing steps?

## 1.3: Basic Statistical Analysis

Calculate and display key statistics for both corpora.

In [None]:
# TODO: Your code here
#
# Required statistics (baseline):
# 1. Total word count
# 2. Vocabulary size (unique words)
# 3. Number of sentences
# 4. Average sentence length (in words)
# 5. Special characters count (e.g., Turkish: ç, ğ, ı, ö, ş, ü)
# 6. Any other relevant statistics
#
# Hint: Use Counter, set(), len() functions
# Hint: Create a nice comparison table or visualization

pass

## 1.4: Additional Statistical Analysis (Creativity Bonus)

Perform additional interesting analyses. Examples:
- Word length distribution
- Most frequent words (top 20)
- Type-token ratio (vocabulary richness)
- Character n-gram frequencies
- Sentence length distribution
- Zipf's law visualization

In [None]:
# TODO: Your creative analysis here
# 
# Suggestions:
# - Create visualizations (histograms, bar charts, word clouds)
# - Compare the two languages
# - Look for interesting patterns

pass

### Statistical Analysis: Observations and Insights

**[YOUR ANALYSIS HERE]**

Discuss your findings:
- What are the key differences between the two corpora?
- Are there significant differences in vocabulary size relative to corpus size?
- What interesting patterns did you observe?
- How might these statistics affect n-gram modeling?

---

# Task 2: N-gram Language Identification (30 points)

Build 2-gram and 3-gram language models using **10-fold cross-validation** and evaluate language identification performance.

**Baseline (20 pts):** 
- Implement 2-gram and 3-gram models with Laplace smoothing
- Use 10-fold cross-validation
- Calculate accuracy for both models across all folds
- Report mean and standard deviation of accuracy

**Creativity Bonus (10 pts):**
- Error analysis: which sentences are misclassified and why?
- Compare 2-gram vs 3-gram performance
- Test with different smoothing parameters
- Analyze confidence scores
- Any other creative exploration

---

## 2.1: N-gram Model Implementation

Implement n-gram language models with Laplace (add-1) smoothing.

**Key Concepts:**
- **N-gram:** Sequence of n words (2-gram: "the cat", 3-gram: "the black cat")
- **Language Model:** Probability distribution over word sequences
- **Laplace Smoothing:** Add 1 to all counts to handle unseen n-grams
- **Formula:** P(word|context) = (count(context, word) + 1) / (count(context) + V)
  - V = vocabulary size

In [None]:
class NgramLanguageModel:
    """
    N-gram language model with Laplace smoothing.
    
    This model learns the probability distribution of word sequences
    and can be used for language identification.
    """
    
    def __init__(self, n: int = 2):
        """
        Initialize the n-gram model.
        
        Args:
            n: The n in n-gram (2 for bigram, 3 for trigram)
        """
        self.n = n
        self.ngram_counts = defaultdict(int)  # Count of each n-gram
        self.context_counts = defaultdict(int)  # Count of each context (n-1 words)
        self.vocabulary = set()  # All unique words seen
        
    def train(self, sentences: List[List[str]]):
        """
        Train the model on a list of tokenized sentences.
        
        Args:
            sentences: List of sentences, where each sentence is a list of words
        """
        # TODO: Implement training
        # 
        # Steps:
        # 1. Add start/end tokens to sentences (e.g., <s>, </s>)
        # 2. Extract all n-grams from sentences
        # 3. Count n-grams and contexts
        # 4. Build vocabulary
        #
        # Hint: Use tuple for n-grams, e.g., ('the', 'cat') for bigram
        # Hint: Context is the first (n-1) words of the n-gram
        
        pass
    
    def get_probability(self, ngram: Tuple[str]) -> float:
        """
        Calculate the probability of an n-gram using Laplace smoothing.
        
        Args:
            ngram: Tuple of n words
            
        Returns:
            Smoothed probability of the n-gram
        """
        # TODO: Implement Laplace smoothing
        #
        # Formula: P(word|context) = (count(ngram) + 1) / (count(context) + V)
        # where V is the vocabulary size
        #
        # Steps:
        # 1. Extract context (first n-1 words)
        # 2. Get counts
        # 3. Apply smoothing formula
        
        pass
    
    def get_sentence_log_probability(self, sentence: List[str]) -> float:
        """
        Calculate log probability of a sentence.
        
        Args:
            sentence: List of words
            
        Returns:
            Log probability of the sentence
        """
        # TODO: Implement sentence scoring
        #
        # Steps:
        # 1. Add start/end tokens
        # 2. Extract all n-grams
        # 3. Sum log probabilities (use np.log to avoid underflow)
        #
        # Hint: log(P1 * P2 * P3) = log(P1) + log(P2) + log(P3)
        
        pass

print("✓ NgramLanguageModel class defined")

## 2.2: Language Identification Function

Create a function that uses two language models to identify the language of a sentence.

In [None]:
def identify_language(sentence: List[str], 
                     model_lang1: NgramLanguageModel, 
                     model_lang2: NgramLanguageModel,
                     lang1_name: str = "English",
                     lang2_name: str = "Language2") -> Tuple[str, float]:
    """
    Identify which language a sentence belongs to.
    
    Args:
        sentence: List of words to classify
        model_lang1: Trained n-gram model for language 1
        model_lang2: Trained n-gram model for language 2
        lang1_name: Name of language 1
        lang2_name: Name of language 2
        
    Returns:
        Tuple of (predicted_language, confidence_score)
    """
    # TODO: Implement language identification
    #
    # Steps:
    # 1. Calculate log probability under each model
    # 2. Choose language with higher log probability
    # 3. Calculate confidence (optional: difference in log probs)
    
    pass

print("✓ identify_language function defined")

## 2.3: Prepare Data for Cross-Validation

Organize your data for 10-fold cross-validation.

In [None]:
# TODO: Prepare your data
#
# Create a dataset where:
# - X = list of all sentences (from both languages)
# - y = list of labels (0 for English, 1 for second language)
#
# Example structure:
# X = english_sentences + lang2_sentences
# y = [0] * len(english_sentences) + [1] * len(lang2_sentences)
#
# Shuffle the data (optional but recommended)

# X = []  # All sentences
# y = []  # Corresponding labels

pass

## 2.4: 10-Fold Cross-Validation

Implement 10-fold cross-validation for both 2-gram and 3-gram models.

**What is Cross-Validation?**
- Split data into 10 equal parts (folds)
- Train on 9 folds, test on 1 fold
- Repeat 10 times (each fold serves as test set once)
- Average results across all folds

In [None]:
# TODO: Implement 10-fold cross-validation
#
# Steps:
# 1. Create KFold object with 10 splits
# 2. For each fold:
#    a. Split data into train and test
#    b. Train 2-gram models on both languages
#    c. Train 3-gram models on both languages
#    d. Test both models on test set
#    e. Calculate accuracy for both
# 3. Store results for each fold
#
# Hint: Use sklearn.model_selection.KFold

# Initialize cross-validation
kfold = KFold(n_splits=10, shuffle=True, random_state=42)

# Store results
results_2gram = []  # Accuracy for each fold (2-gram)
results_3gram = []  # Accuracy for each fold (3-gram)

# TODO: Implement the cross-validation loop

pass

## 2.5: Results Summary

Display cross-validation results.

In [None]:
# TODO: Calculate and display results
#
# For both 2-gram and 3-gram:
# - Mean accuracy across folds
# - Standard deviation
# - Create a visualization (e.g., bar plot comparing models)
# - Create a table showing fold-by-fold results

pass

## 2.6: Error Analysis (Creativity Bonus)

Analyze misclassified sentences to understand model limitations.

In [None]:
# TODO: Error analysis (for creativity bonus)
#
# Suggestions:
# - Collect all misclassified sentences from one fold
# - Analyze their characteristics (length, vocabulary, etc.)
# - Show examples of misclassified sentences
# - Discuss why they might be difficult

pass

### Language Identification: Analysis and Observations

**[YOUR ANALYSIS HERE]**

Discuss your findings:
- How did 2-gram and 3-gram models compare?
- What accuracy did you achieve?
- Were the results consistent across folds?
- What types of sentences were most difficult to classify?
- Did you notice any patterns in the errors?
- How did model performance relate to your corpus statistics from Task 1?

---

# Task 3: Evaluation and Comparison (20 points)

Calculate comprehensive evaluation metrics and compare with a HuggingFace model.

**Baseline (15 pts):**
- Calculate precision, recall, and F1-score for your models
- Compare your models with a HuggingFace language identification model
- Perform paired t-test for statistical significance
- Report and interpret results

**Creativity Bonus (5 pts):**
- Confusion matrices
- Per-class metrics analysis
- Discussion of when each model performs better
- Additional statistical analyses

---

## 3.1: Detailed Evaluation Metrics

Calculate precision, recall, and F1-score for your n-gram models.

**Metrics Explained:**
- **Accuracy:** Overall correctness (TP + TN) / Total
- **Precision:** Of predicted positives, how many are correct? TP / (TP + FP)
- **Recall:** Of actual positives, how many did we find? TP / (TP + FN)
- **F1-Score:** Harmonic mean of precision and recall: 2 * (P * R) / (P + R)

In [None]:
# TODO: Re-run cross-validation and collect predictions
#
# This time, store predictions for each fold to calculate all metrics
#
# For each fold, store:
# - y_true: actual labels
# - y_pred_2gram: predictions from 2-gram model
# - y_pred_3gram: predictions from 3-gram model

# Initialize storage for all folds
all_metrics_2gram = []
all_metrics_3gram = []

# TODO: Run cross-validation and collect metrics

pass

In [None]:
# TODO: Calculate and display comprehensive metrics
#
# For both models, calculate:
# - Mean and std of accuracy, precision, recall, F1
# - Create a comparison table
# - Visualize metrics

pass

## 3.2: HuggingFace Model Comparison

Compare your models with a pre-trained language identification model.

**Recommended Model:** `papluca/xlm-roberta-base-language-detection` (lightweight, 218 languages)

**Note:** This model is trained on many languages and may not be specifically optimized for your language pair.

In [None]:
# Load HuggingFace model
print("Loading HuggingFace language identification model...")
print("This may take a minute on first run.\n")

hf_model = pipeline("text-classification", 
                   model="papluca/xlm-roberta-base-language-detection",
                   device=-1)  # Use CPU (-1) or GPU (0)

print("✓ Model loaded successfully!")

In [None]:
# TODO: Evaluate HuggingFace model on the same folds
#
# Steps:
# 1. For each fold in your cross-validation:
#    a. Get predictions from HuggingFace model on test set
#    b. Calculate accuracy (and other metrics)
# 2. Store results for comparison
#
# Hint: hf_model(sentence_string) returns prediction
# Hint: Map HuggingFace language codes to your labels
#       e.g., 'en' -> 0, 'tr' -> 1

results_hf = []  # Accuracy for each fold (HuggingFace)

# TODO: Implement HuggingFace evaluation

pass

## 3.3: Statistical Significance Testing

Use paired t-test to determine if differences between models are statistically significant.

**Why Paired T-Test?**
- Same data (same folds) for all models
- Tests if mean difference is significantly different from zero
- Null hypothesis (H0): No difference between models
- Alternative hypothesis (H1): There is a difference

**Interpretation:**
- p-value < 0.05: Reject H0, difference is statistically significant
- p-value ≥ 0.05: Fail to reject H0, difference could be due to chance

In [None]:
# TODO: Perform paired t-tests
#
# Compare:
# 1. 2-gram vs 3-gram
# 2. 2-gram vs HuggingFace
# 3. 3-gram vs HuggingFace
#
# Use: stats.ttest_rel(results1, results2)

# Example:
# t_stat, p_value = stats.ttest_rel(results_2gram, results_3gram)
# print(f"2-gram vs 3-gram: t={t_stat:.4f}, p={p_value:.4f}")

pass

## 3.4: Comprehensive Comparison Visualization

In [None]:
# TODO: Create visualizations comparing all models
#
# Suggestions:
# - Box plots showing distribution of accuracies across folds
# - Bar chart comparing mean accuracies with error bars
# - Line plot showing fold-by-fold performance
# - Table with all metrics (mean ± std)

pass

## 3.5: Additional Analysis (Creativity Bonus)

Deeper analysis of model comparison.

In [None]:
# TODO: Additional analysis (for creativity bonus)
#
# Ideas:
# - Confusion matrices for each model
# - Per-language precision/recall comparison
# - Analysis of sentences where models disagree
# - Correlation analysis between models
# - Performance vs sentence length

pass

### Model Comparison: Analysis and Observations

**[YOUR ANALYSIS HERE]**

Discuss your findings:
- How did your n-gram models compare to HuggingFace?
- Were the differences statistically significant? What does this mean?
- Which model performed best? Why do you think so?
- What are the advantages and disadvantages of each approach?
- In what scenarios might your simple n-gram model be preferable to a complex neural model?
- How did the evaluation metrics (precision, recall, F1) provide additional insights beyond accuracy?

---

# Task 4: Text Generation with N-grams (20 points)

Use your English n-gram models to generate sample sentences.

**Baseline (15 pts):**
- Generate 10 sentences using your 2-gram English model
- Generate 10 sentences using your 3-gram English model
- Compare quality and coherence
- Discuss observations

**Creativity Bonus (5 pts):**
- Experiment with different generation strategies (e.g., temperature, top-k sampling)
- Compare sentences with different starting words
- Analyze common patterns or errors
- Generate sentences in your second language and compare

---

## 4.1: Text Generation Implementation

Implement sentence generation using n-gram models.

**How N-gram Generation Works:**
1. Start with initial context (e.g., `<s>` token or a seed word)
2. Look at all n-grams starting with current context
3. Sample next word based on n-gram probabilities
4. Update context (shift by one word)
5. Repeat until end token or max length

In [None]:
def generate_sentence(model: NgramLanguageModel, 
                     max_length: int = 20,
                     start_word: str = None) -> str:
    """
    Generate a sentence using an n-gram model.
    
    Args:
        model: Trained n-gram language model
        max_length: Maximum number of words to generate
        start_word: Optional starting word (if None, use <s>)
        
    Returns:
        Generated sentence as string
    """
    # TODO: Implement text generation
    #
    # Steps:
    # 1. Initialize context (start tokens or seed word)
    # 2. Loop until </s> or max_length:
    #    a. Find all possible next words given current context
    #    b. Get probability for each possible next word
    #    c. Sample next word (use np.random.choice with probabilities)
    #    d. Add word to sentence
    #    e. Update context (shift window)
    # 3. Return generated sentence
    #
    # Hint: You may need to add methods to NgramLanguageModel class
    #       to get possible next words and their probabilities
    
    pass

print("✓ generate_sentence function defined")

## 4.2: Train Models for Generation

Train 2-gram and 3-gram models on your complete English corpus.

In [None]:
# TODO: Train models on full English corpus (not cross-validation)
#
# Train:
# - english_model_2gram: 2-gram model
# - english_model_3gram: 3-gram model

pass

## 4.3: Generate Sample Sentences

Generate 10 sentences with each model.

In [None]:
# Generate with 2-gram model
print("=" * 60)
print("2-GRAM GENERATED SENTENCES")
print("=" * 60)

# TODO: Generate 10 sentences
# for i in range(10):
#     sentence = generate_sentence(english_model_2gram)
#     print(f"{i+1}. {sentence}")

pass

In [None]:
# Generate with 3-gram model
print("=" * 60)
print("3-GRAM GENERATED SENTENCES")
print("=" * 60)

# TODO: Generate 10 sentences
# for i in range(10):
#     sentence = generate_sentence(english_model_3gram)
#     print(f"{i+1}. {sentence}")

pass

## 4.4: Creative Generation Experiments (Creativity Bonus)

In [None]:
# TODO: Creative experiments (for creativity bonus)
#
# Ideas:
# - Generate with different seed words
# - Try different sampling strategies (greedy, random, top-k)
# - Generate longer/shorter sentences
# - Compare generation quality metrics
# - Generate in your second language and compare

pass

### Text Generation: Analysis and Observations

**[YOUR ANALYSIS HERE]**

Discuss your findings:
- How did 2-gram and 3-gram generated sentences compare?
- Which model produced more coherent sentences? Why?
- What patterns did you notice in the generated text?
- What were common errors or awkward constructions?
- How does the quality of generated text relate to your corpus size and characteristics?
- What are the fundamental limitations of n-gram generation?
- How might modern neural language models (like GPT) improve upon these results?

---

# Task 5: Reflection and Insights (15 points)

Provide a comprehensive reflection on your experiments.

**Baseline (10 pts):**
- Key findings and surprises
- Challenges encountered
- Lessons learned about n-grams and language modeling
- Limitations of the approaches used

**Creativity Bonus (5 pts):**
- Connections to linguistic theory or NLP concepts
- Practical applications and implications
- Suggestions for improvements
- Insights about the relationship between model complexity and performance

---

## Overall Reflection

**[YOUR COMPREHENSIVE REFLECTION HERE]**

Address the following points:

### Key Findings
- What were your main discoveries from this assignment?
- What surprised you most?
- How did your results compare to your initial expectations?

### Technical Challenges
- What were the most difficult aspects of implementation?
- What debugging or problem-solving strategies did you use?
- How did you handle edge cases or unexpected behaviors?

### Model Performance
- How did different n-gram sizes affect performance?
- Why do you think certain models performed better than others?
- What role did corpus size and quality play?

### Limitations
- What are the fundamental limitations of n-gram models?
- What kinds of linguistic phenomena can't n-grams capture?
- When would you choose (or not choose) n-grams for a real application?

### Practical Applications
- Where could these techniques be applied in real-world NLP systems?
- How do n-gram models compare to modern neural approaches?
- What are the trade-offs (accuracy, speed, resources, interpretability)?

### Learning Outcomes
- What did you learn about NLP and language modeling?
- How has this assignment changed your understanding of how language models work?
- What connections did you make to concepts from lectures or readings?

### Future Directions
- If you had more time, what would you explore further?
- What improvements could you make to your implementation?
- What other NLP techniques would you like to combine with n-grams?

---

## Pre-Submission Checklist ✓

Before submitting, verify that you have:

### Content Completeness
- [ ] Filled in name and student ID at the top
- [ ] Completed all baseline requirements for Tasks 1-5
- [ ] Written analysis and observations in all required sections
- [ ] Included at least some creativity bonus elements

### Technical Requirements  
- [ ] All code cells execute without errors
- [ ] Ran all cells: `Runtime → Run all`
- [ ] All outputs (tables, plots, text) are visible
- [ ] Code is well-commented and readable

### Submission Files
- [ ] Exported notebook to PDF: `File → Print → Save as PDF`
- [ ] Downloaded .ipynb file: `File → Download → Download .ipynb`
- [ ] Checked PDF rendering (all content visible)
- [ ] Verified file names: `FirstnameLastname_StudentID_as2.pdf` and `.ipynb`
- [ ] Ready to upload both files to ODTUClass

### Quality Check
- [ ] Results make sense and are properly interpreted
- [ ] Writing is clear and free of major errors
- [ ] Visualizations are labeled and readable
- [ ] Citations included if you used external resources

---

## Final Notes

**Academic Integrity:**
- This is individual work - do not copy from others
- You may discuss concepts with classmates but write your own code and analysis
- Cite any external resources you consulted (tutorials, Stack Overflow, etc.)
- Using AI assistants is allowed for debugging but not for generating complete solutions

**Getting Help:**
- Review lecture materials and readings
- Attend office hours
- Ask questions on the course forum
- Start early to allow time for troubleshooting

**Good luck! 🎓**