# Student Report Email Automation Pipeline

This notebook provides a comprehensive system for:
1. **Scraping Think Academy student reports** from HTML pages
2. **Generating personalized AI feedback emails** using OpenAI or Perplexity
3. **Translating emails to Chinese** using Google Translate
4. **Sending emails via Outlook** with proper CC to weekly reports

## ‚ö†Ô∏è Requirements
- **OpenAI or Perplexity API key** (REQUIRED - no generic templates)
- Student data CSV with specific columns
- Optional: Email password for sending (otherwise prints to console)

## üìã CSV Format Required
Your spreadsheet must include these columns:
- `Email`: Parent's email address
- `Gender`: M/F/Male/Female (or blank for they/them)
- `Parent Name`: How to address parent (or blank for "Parent")
- `First Name`: Student's first name
- `Last Name`: Student's last name
- `engagement`: 1-4 rating (4=excited, 3=engaged, 2=distracted, 1=very distracted)
- `camera`: 1-4 rating (4=always on, 3=mostly on, 2=mostly off, 1=always off)
- `homework_score`: Previous homework score (e.g., "85/100")
- `report_link`: Think Academy report URL
- `absent`: True/False
- `absent_notified`: True/False
- `extra_feedback`: Additional feedback to include
- `extra_section`: Special section for top of email

## Step 1: Install Required Packages

First, we need to install all the necessary Python packages for web scraping, AI generation, translation, and email sending.

In [1]:
pip install selenium


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.3.1[0m[39;49m -> [0m[32;49m25.3[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


In [2]:
# Install required packages
import subprocess
import sys

def install_package(package):
    try:
        subprocess.check_call([sys.executable, "-m", "pip", "install", package])
        print(f"‚úÖ Successfully installed {package}")
    except subprocess.CalledProcessError as e:
        print(f"‚ùå Failed to install {package}: {e}")

# List of required packages
packages = [
    "pandas",
    "requests",
    "beautifulsoup4",
    "openai",
    "selenium"
]

print("Installing required packages...")
for package in packages:
    install_package(package)

print("\nüéâ Package installation complete!")

Installing required packages...



[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.3.1[0m[39;49m -> [0m[32;49m25.3[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


‚úÖ Successfully installed pandas



[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.3.1[0m[39;49m -> [0m[32;49m25.3[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


‚úÖ Successfully installed requests



[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.3.1[0m[39;49m -> [0m[32;49m25.3[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


‚úÖ Successfully installed beautifulsoup4
Collecting openai
  Downloading openai-2.6.1-py3-none-any.whl.metadata (29 kB)
Collecting distro<2,>=1.7.0 (from openai)
  Using cached distro-1.9.0-py3-none-any.whl.metadata (6.8 kB)
Collecting httpx<1,>=0.23.0 (from openai)
  Downloading httpx-0.28.1-py3-none-any.whl.metadata (7.1 kB)
Collecting jiter<1,>=0.10.0 (from openai)
  Downloading jiter-0.11.1-cp311-cp311-macosx_10_12_x86_64.whl.metadata (5.2 kB)
Collecting httpcore==1.* (from httpx<1,>=0.23.0->openai)
  Downloading httpcore-1.0.9-py3-none-any.whl.metadata (21 kB)
Collecting h11>=0.16 (from httpcore==1.*->httpx<1,>=0.23.0->openai)
  Downloading h11-0.16.0-py3-none-any.whl.metadata (8.3 kB)
Downloading openai-2.6.1-py3-none-any.whl (1.0 MB)
[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m1.0/1.0 MB[0m [31m11.3 MB/s[0m eta [36m0:00:00[0m
[?25hUsing cached distro-1.9.0-py3-none-any.whl (2


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.3.1[0m[39;49m -> [0m[32;49m25.3[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


‚úÖ Successfully installed openai
‚úÖ Successfully installed selenium

üéâ Package installation complete!



[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.3.1[0m[39;49m -> [0m[32;49m25.3[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


## Step 2: Import Libraries and Check Dependencies

Import all necessary libraries and verify they're working correctly.

In [1]:
# Import all required libraries
try:
    import pandas as pd
    import numpy as np
    import requests
    from bs4 import BeautifulSoup
    import re
    import smtplib
    from email.mime.text import MIMEText
    from email.mime.multipart import MIMEMultipart
    import os
    import json
    from typing import Dict, List, Tuple, Optional
    import time
    from datetime import datetime
    import openai
    import getpass
    import requests
    from selenium import webdriver
    from selenium.webdriver.chrome.options import Options
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC

    #from googletrans import Translator
    
    print("‚úÖ All libraries imported successfully!")
    print(f"üìä Pandas version: {pd.__version__}")
    print(f"üåê Requests version: {requests.__version__}")
    print(f"ü§ñ OpenAI library imported")
    #print(f"üî§ Google Translate library imported")
    
except ImportError as e:
    print(f"‚ùå Import error: {e}")
    print("Please run the previous cell to install missing packages.")

‚úÖ All libraries imported successfully!
üìä Pandas version: 2.0.3
üåê Requests version: 2.31.0
ü§ñ OpenAI library imported


## Step 3: Create Student Report Analyzer Class

This class handles:
- Web scraping of Think Academy report pages
- Extracting question performance data
- Analyzing attendance patterns
- Setting up AI API connections

In [2]:
class StudentReportAnalyzer:
    def __init__(self, api_key: str, api_type: str = "openai"):
        """
        Initialize the analyzer with API key
        api_type: 'perplexity' or 'openai'
        """
        self.api_key = api_key
        self.api_type = api_type
        #self.translator = Translator()
        
        # Set up API client
        if api_type == "openai":
            import openai
            self.client = openai.OpenAI(api_key=api_key)
        elif api_type == "perplexity":
            # For Perplexity API - using requests directly
            self.base_url = "https://api.perplexity.ai/chat/completions"
        
        print(f"‚úÖ StudentReportAnalyzer initialized with {api_type.upper()} API")
    
    def scrape_report_data(self, report_url: str) -> Dict:
        """
        Scrape student report data from Think Academy report link
        """
        options = Options()
        options.headless = True
        driver = webdriver.Chrome(options=options)
        driver.get(report_url)

        try:
            # Wait up to 10 seconds for the answer list section to load
            time.sleep(5)
            page_source = driver.page_source
        except Exception as e:
            print(f"Timeout or error waiting for page load: {e}")
            page_source = driver.page_source
        finally:
            driver.quit()
        try:
            soup = BeautifulSoup(page_source, 'html.parser')
            #print(page_source)
            # Extract question performance
            #print(f"Page Source: {page_source}")
            questions_data = self._extract_question_performance(soup)
            
            # Extract attendance data
            attendance_data = self._extract_attendance_data(soup)
            if not (questions_data and attendance_data):
                raise Exception(f"Report scraped, but no data found for {report_url}.")
            return {
                'questions': questions_data,
                'attendance': attendance_data,
                'report_link': report_url,
                'scrape_success': True
            }
        
        except Exception as e:
            print(f"‚ö†Ô∏è Error scraping {report_url}: {str(e)}")
            return {
                'questions': {},
                'attendance': {},
                'report_link': report_url,
                'scrape_success': False,
                'error': str(e)
        }
    
    def _extract_question_performance(self, soup: BeautifulSoup) -> Dict:
        """Extract question performance data"""
        questions = {}
        
       # Find the answer list section
        answer_section = soup.find('section', {'class': 'answer-list'})

        if not answer_section:
            return {}

        questions = {}

        question_divs = answer_section.find_all('div', class_='answer')
        for div in question_divs:
            # Extract question number (text content)
            question_num = div.get_text().strip()
            
            # Extract answer status from class list by checking known statuses
            class_list = div.get('class', [])
            # Possible statuses
            valid_statuses = ['correct', 'partly-correct', 'incorrect', 'no-answer']
            status = None
            for cls in class_list:
                if cls in valid_statuses:
                    status = cls
                    break
            
            if question_num and status:
                questions[question_num] = status
        #print(f"Answer section: {answer_section}")
        #print(f"Question Divs: {question_divs}")
        #print(f"Questions: {questions}")
        return questions
    
    def _extract_attendance_data(self, soup: BeautifulSoup) -> Dict:
        """Extract attendance data"""
        attendance = {
            'total_segments': 0,
            'online_segments': 0,
            'offline_segments': 0,
            'attendance_percentage': 0,
            'late_arrival': False,
            'early_departure': False,
            'intermittent_disconnections': 0
        }
        
        # Find attendance progress
        progress_ul = soup.find('ul', {'id': 'progress-inner', 'class': 'progress-inner'})
        if not progress_ul:
            return attendance
        
        # Find all li elements
        segments = progress_ul.find_all('li')
        attendance['total_segments'] = len(segments)
        
        online_count = 0
        offline_count = 0
        disconnection_count = 0
        
        prev_online = None
        
        for i, segment in enumerate(segments):
            class_list = segment.get('class', [])
            is_online = 'online' in class_list
            
            if is_online:
                online_count += 1
                if prev_online is False:  # Reconnection
                    disconnection_count += 1
            else:
                offline_count += 1
            
            prev_online = is_online
        
        attendance['online_segments'] = online_count
        attendance['offline_segments'] = offline_count
        attendance['attendance_percentage'] = (online_count / len(segments)) * 100 if segments else 0
        
        # Determine late arrival (first few segments offline)
        if segments and not ('online' in segments[0].get('class', [])):
            attendance['late_arrival'] = True
        
        # Determine early departure (last few segments offline)
        if segments and not ('online' in segments[-1].get('class', [])):
            attendance['early_departure'] = True
        
        # Count disconnections
        attendance['intermittent_disconnections'] = disconnection_count
        
        return attendance

    def calculate_time_missed(self, attendance_data: Dict, class_duration_minutes: int = 90) -> int:
        """Calculate approximate minutes missed based on attendance percentage"""
        if attendance_data['total_segments'] == 0:
            return 0
        
        missed_percentage = (attendance_data['offline_segments'] / attendance_data['total_segments'])
        minutes_missed = int(missed_percentage * class_duration_minutes)
        return minutes_missed

# Test the class creation
print("‚úÖ StudentReportAnalyzer class created successfully!")
print("üîß Ready to initialize with API key")

‚úÖ StudentReportAnalyzer class created successfully!
üîß Ready to initialize with API key


## Step 4: Create Email Generation Class

This class handles:
- Creating detailed AI prompts from student data
- Generating personalized emails using OpenAI or Perplexity
- Translating content to Chinese
- Handling pronouns based on gender

In [5]:
class EmailGenerator:
    def __init__(self, analyzer: StudentReportAnalyzer):
        self.analyzer = analyzer
        print("‚úÖ EmailGenerator initialized")
    
    def generate_email_content(self, student_data: Dict, report_data: Dict, class_stats: Dict = None) -> str:
        """
        Generate personalized email content using AI
        """
        # Prepare context for AI
        context = self._prepare_context(student_data, report_data, class_stats)
        
        # Create prompt for AI
        prompt = self._create_email_prompt(context)
        
        # Generate email using appropriate API
        if self.analyzer.api_type == "openai":
            email_content = self._generate_with_openai(prompt)
        elif self.analyzer.api_type == "perplexity":
            email_content = self._generate_with_perplexity(prompt)
        else:
            raise ValueError("Unsupported API type")
        
        return email_content
    
    def _prepare_context(self, student_data: Dict, report_data: Dict, class_stats: Dict = None) -> Dict:
        """Prepare structured context for AI prompt"""
        
        # Handle pronouns based on gender
        gender = student_data.get('Gender', '').lower().strip()
        if gender in ['m', 'male', 'boy']:
            pronouns = {'they': 'he', 'them': 'him', 'their': 'his', 'They': 'He', 'Them': 'Him', 'Their': 'His'}
        elif gender in ['f', 'female', 'girl']:
            pronouns = {'they': 'she', 'them': 'her', 'their': 'her', 'They': 'She', 'Them': 'Her', 'Their': 'Her'}
        else:
            pronouns = {'they': 'they', 'them': 'them', 'their': 'their', 'They': 'They', 'Them': 'Them', 'Their': 'Their'}
        
        context = {
            'student_name': student_data.get('First Name', 'Student'),
            'parent_name': student_data.get('Parent Name', 'Parent'),
            'pronouns': pronouns,
            'homework_score': student_data.get('homework_score', ''),
            'stage_test': student_data.get('stage_test', ''),
            'engagement': student_data.get('engagement', ''),
            'camera': student_data.get('camera', ''),
            'absent': student_data.get('absent', ''),
            'absent_notified': student_data.get('absent_notified', ''),
            'extra_feedback': student_data.get('extra_feedback', ''),
            'extra_section': student_data.get('extra_section', ''),
            'notes': student_data.get('notes',''),
            'report_link': student_data.get('report_link',''),
            'report_data': report_data,
            'class_stats': class_stats or {}
        }
        for key, value in context.items():
            if pd.isna(value):
                context[key] = ''
        return context
    
    def _create_email_prompt(self, context: Dict) -> str:
        """Create detailed prompt for AI email generation"""
        
        prompt = f"""You are a math teacher named Teacher Omar, writing a personalized weekly feedback email to a parent about their child's performance in class. Write a warm, professional, and specific email.

STUDENT INFORMATION:
- Student Name: {context['student_name']}
- Parent Name: {context['parent_name']}
- Pronouns: {context['pronouns']}

LESSON INFORMATION:
- This week's subject matter: Multiplying fractions by whole numbers, multiplying fractions by fractions, and cross-simplification.

PERFORMANCE DATA:
"""
        absent=False
        absent_notified=False
        questions = context['report_data'].get('questions', {})
        attendance = context['report_data'].get('attendance', {})
        #print(f"Attendance: {attendance.get('online_segments',0)}")
        if (context['report_data'].get('scrape_success') and attendance.get('online_segments',0)==0) or context['absent'] == 'yes':
            absent=True
            if context['absent_notified'] == 'yes':
                absent_notified=True
        if (questions and attendance) or absent:
            if (not absent):
                #print("Attended")
                # Add attendance and question performance if available
                prompt += 'IN-CLASS QUESTION DATA (NOT HOMEWORK):\n'
                prompt += f"- Questions Answered: {len(questions)} total\n"
                correct = sum(1 for status in questions.values() if status == 'correct')
                partly = sum(1 for status in questions.values() if status == 'partly-correct')
                incorrect = sum(1 for status in questions.values() if status == 'incorrect')
                no_answer = sum(1 for status in questions.values() if status == 'no-answer')
                    
                prompt += f"  - Correct: {correct}, Partially Correct: {partly}, Incorrect: {incorrect}, No Answer: {no_answer}\n"
                
                # Add specific question numbers for struggling areas
                incorrect_questions = [q for q, status in questions.items() if status in ['incorrect']]
                partly_correct_questions = [q for q, status in questions.items() if status in ['partly-correct']]
                unanswered_questions = [q for q, status in questions.items() if status in ['no-answer']]
                #print(incorrect_questions)
                if incorrect_questions:
                    prompt += f"  - Incorrect questions: Question #{', #'.join(incorrect_questions)}\n"
                if partly_correct_questions:
                    prompt += f"  - Partially correct questions: Question #{', #'.join(partly_correct_questions)}\n"
                if unanswered_questions:
                    prompt += f"  - Unanswered questions: Question #{', #'.join(unanswered_questions)}\n"
                
                if int(attendance['attendance_percentage']) != 100:
                    minutes = int(np.round(attendance['attendance_percentage']*0.9))
                    prompt += f"- Attendance: Student attended {minutes} minutes out of the 90-minute class time, missing {90-minutes} minutes.\n"
                    if attendance['late_arrival']:
                        prompt += "  - Arrived late to class. Acknowledge that sometimes things can come up, and there may have been a reason for this. Gently encourage parent to help the student attend class on time. If they did not answer some of the earlier questions, this is likely why.\n"
                    if attendance['early_departure']:
                        prompt += "  - Left class early. Acknowledge that sometimes things can come up, and there may have been a reason for this. Gently encourage parent to ensure the student remains in class until the end time. If they did not answer some of the later questions, this is likely why.\n"
                    if attendance['intermittent_disconnections'] > 0:
                        prompt += f"  - Had {attendance['intermittent_disconnections']} disconnection(s) during class. One of these may have been due to the break. Encourage the parent to ensure continuous attendance and troubleshoot technical issues. If needed, they can reach out to Think customer service for technical support.\n"
                    
                # Add engagement information
                #print(context['engagement'])
                if context['engagement']:
                    engagement_text = {
                        '4': 'excited and enthusiastic',
                        '3': 'engaged and attentive', 
                        '2': 'somewhat distracted - encourage to stay focused',
                        '1': 'very distracted - encourage to stay focused, maybe sit with student during class'
                    }.get(str(int(context['engagement'])), '')
                    #print(engagement_text)
                    if engagement_text:
                        prompt += f"- Class Engagement: {engagement_text}\n"
                
                # Add camera information
                if context['camera']:
                    camera_text = {
                        '4': 'camera consistently on',
                        '3': 'camera mostly on - encourage to turn on consistently',
                        '2': 'camera mostly off - encourage to keep on', 
                        '1': 'camera always off - encourage to keep on, please troubleshoot if due to technical issues'
                    }.get(str(int(context['camera'])), '')
                    if camera_text:
                        prompt += f"- Camera Usage: {camera_text}\n"
                #Add report link
                if context['report_link']:
                    prompt += f"- Link to in-class report: {context['report_link']}\n"
            else:
                prompt += "- Student was ABSENT from this class\n"
                if absent_notified:
                    prompt += "- Parent NOTIFIED me of student absence. Thank parent for notifying me.\n"
                else:
                    prompt += "- Parent DID NOT NOTIFY me of student absence. Acknowledge that things come up and that the winter break is coming, so they may have had holiday events or travel plans, but also gently check in and let parent know we can help with rescheduling/transferring class times if necessary\n"
        else:  
            raise Exception(f"Report scraped, but no data found for {context['student_name']}.")



        # Add homework information
        if not context['homework_score'] == '':
            prompt+= "HOMEWORK TOPIC: Finding the volume of rectangular prisms and cubes using two formulas: length x width x height and area of base x height. \n"
            if context['homework_score'] == 0:
                prompt += "THE STUDENT DID NOT COMPLETE THEIR HOMEWORK FROM LAST WEEK. ENCOURAGE STUDENT TO COMPLETE IT AND COMPLETE HOMEWORK ON TIME IN FUTURE.\n"
            else:    
                prompt += f"- Previous Homework Score: {int(context['homework_score'])}%.\n"
                if (context['homework_score'] != 0):
                    prompt+="Encourage the parent to have the student watch the homework solution videos and correct their mistakes."
        else:
            prompt+="DO NOT MENTION PREVIOUS HOMEWORK OR HOMEWORK SCORE.\n"
        #Add stage test information
        if not context['stage_test'] == '':
            prompt+= "Stage test - a short quiz covering topics from lessons 9 through 12. Students were asked to stay for a little bit at the end of class to complete it."
            if context['stage_test'] == 0:
                prompt += "THE STUDENT LEFT AND DID NOT COMPLETE THEIR STAGE TEST FOR LESSONS 9-12. ENCOURAGE STUDENT TO COMPLETE IT ASAP. It can be found in the Think Student app between lessons 12 and 13.\n"
            else:    
                prompt += f"- Stage Test Score: {int(context['stage_test'])}%.\n"
        # Add extra sections
        if not (context['extra_section']==''):
            prompt += f"\nEXTRA SECTION TO INCLUDE AT TOP:\n{context['extra_section']}\n"
        
        if not (context['extra_feedback'] == ''):
            prompt += f"\nADDITIONAL FEEDBACK TO INCORPORATE:\n{context['extra_feedback']}\n"
        if not (context['notes'] == ''):
            prompt += f"\nADDITIONAL NOTES ABOUT THE PARENT OR STUDENT TO BEAR IN MIND:\n{context['notes']}\n"
        prompt += f"""
AVAILABLE RESOURCES TO MENTION:
- Extra supplemental problems in the workbook (this is a supplemental book which is distinct from the textbook, where the regular classwork and homework is)
- Extra practice problems tailored for each lesson, attached to the "Learning Materials" module for the lesson in the Think Student app 
- Additional problems available upon request
- Office hours every Tuesday at 5 PM PDT on Think Student app (homework and workbook help). Not taught by me, but no need to mention that.
- Recordings of classes and office hours available for those who can't attend live
- Up to 3 free 1-on-1 tutoring sessions per semester: https://outlook.office365.com/book/ThinkAcademyMiddleSchool1on1@thethinkacademy.com/. Be sure to include the booking link exactly once in the email.

INSTRUCTIONS:
1. Address the parent by name (use "Parent" if no name provided). If there is an extra section, address and greet the parent (I hope this email finds you well) BEFORE the extra section.
2. Use the student's correct pronouns throughout
3. Mention the specific in-class question numbers that the student struggled with (available in report).
4. Do not mention the specific areas that the in-class questions covered, beyond the general subject matter covered in the lesson. That information is not available to you.
5. There is no need to mention detailed attendance data (e.g. how many minutes the student did or did not attend) if the student attended the entire lesson or was entirely absent.
6. If the student was partially absent, describe attendance data in minutes, not percentages. Since the full class time is 90 minutes, you would say to the parent of a child who missed 10% of the class that the child "missed 9 minutes of class". Round to the nearest minute.
7. Unless otherwise specified in the above instructions, mention and comment on the student's homework score (as a percentage). If the student did not complete the homework, say they did not complete it and encourage them to complete it. Do not provide feedback about specific questions on the homework, since that information is not available to you
8. Provide actionable recommendations based on the data
9. If student was absent, focus on homework feedback and offer catch-up resources
10. If provided above, include the in-class report link at the end of the email and mention to the parent they can find more detailed performance information in it.
11. Be encouraging but honest about areas needing improvement
12. Include relevant resources based on student's needs. If recommending 1-on-1 tutoring, include the tutoring link in the email.
13. Keep tone warm, professional, and personalized
14. Keep the feedback part (not including extra section) between 200-300 words. If the extra section is present, it should be written ABOVE/BEFORE the main email body and is not included in the word count. If an extra section was not mentioned, ignore this instruction.
15. Do NOT use generic template language
16. Do NOT include a subject at the top of the email body
17. Do NOT use asterisks ** to indicate bold, this is for markdown and does not work for plain text.
18. After the English email, write the exact same email translated into Mandarin Chinese.
19. DO NOT INCLUDE ANY PART OF THE INSTRUCTIONS OR PROMPT IN THE EMAIL!!!!!


Format your output exactly as follows:

English:
Dear --parent_name--,
<extra section in English here>
<write English email body here>

Chinese:
<exact email as above, all translated to Mandarin>"""
        #print(prompt)
        return prompt
    
    def _generate_with_openai(self, prompt: str) -> str:
        """Generate email using OpenAI API"""
        try:
            response = self.analyzer.client.chat.completions.create(
                model="gpt-4o-mini",
                messages=[
                    {"role": "system", "content": "You are an experienced math tutor writing personalized feedback emails to parents. Write specific, actionable, and warm emails based on the provided data."},
                    
                    {"role": "user", "content": prompt}
                ],
                max_tokens=1000,
                temperature=0.7
            )
            return response.choices[0].message.content.strip()
        except Exception as e:
            print(f"‚ùå Error generating email with OpenAI: {e}")
            return f"Error generating email: {e}"
    
    def _generate_with_perplexity(self, prompt: str) -> str:
        try:
            headers = {
                "Authorization": f"Bearer {self.analyzer.api_key}",
                "Content-Type": "application/json"
            }
            
            data = {
                "model": "sonar",
                "messages": [
                    {"role": "system", "content": "You are an experienced math tutor writing personalized feedback emails to parents. Write specific, actionable, and warm emails based on the provided data."},
                    {"role": "user", "content": prompt}
                ],
                "max_tokens": 2500,
                "temperature": 0.7
            }
            
            response = requests.post(self.analyzer.base_url, headers=headers, json=data)
            
            # Always print the response text to help debug
            print("Perplexity API response status:", response.status_code)
            print("Perplexity API response content:", response.text)
            
            # Raise for status after printing content
            response.raise_for_status()
            
            response_json = response.json()
            
            content = response_json.get("choices", [{}])[0].get("message", {}).get("content", "")
            return content.strip()
        
        except requests.exceptions.HTTPError as e:
            # Print response content even on HTTPError
            print(f"‚ùå HTTP error: {e}")
            if e.response is not None:
                print(f"Response content: {e.response.text}")
            return f"Error generating email: {e}"
        
        except Exception as e:
            print(f"‚ùå Error generating email with Perplexity: {e}")
            return f"Error generating email: {e}"

print("‚úÖ EmailGenerator class created successfully!")
print("ü§ñ Ready to generate AI emails with translation support")

‚úÖ EmailGenerator class created successfully!
ü§ñ Ready to generate AI emails with translation support


## Step 5: Create Email Sending Class

This class handles:
- Formatting emails with both English and Chinese content
- Sending via Outlook SMTP
- Proper CC handling to weeklyreport@thethinkacademy.com
- Fallback to console output if email credentials missing

In [6]:
class EmailSender:
    def __init__(self, sender_email: str = "omar.shohoud@thethinkacademy.com"):
        self.sender_email = sender_email
        self.cc_email = "weeklyreport@thethinkacademy.com"
        print(f"‚úÖ EmailSender initialized")
        print(f"üìß Sender: {self.sender_email}")
        print(f"üìã CC: {self.cc_email}")
    
    def send_email(self, to_email: str, subject: str, content: str, 
                   smtp_server: str = "smtp.office365.com", smtp_port: int = 587, password: str = None):
        """
        Send email via Outlook/Office365 SMTP
        """
        if not to_email:
            print(f"‚ö†Ô∏è No email provided. Email content would be:")
            print(f"Subject: {subject}")
            print(f"Content:\n{content}")
            print("-" * 50)
            return False
        
        if not password:
            print(f"‚ö†Ô∏è No SMTP password provided. Email content for {to_email}:")
            print(f"Subject: {subject}")
            print(f"Content:\n{content}")
            print("-" * 50)
            return False
        
        try:
            # Create message
            msg = MIMEMultipart()
            msg['From'] = self.sender_email
            msg['To'] = to_email
            msg['Cc'] = self.cc_email
            msg['Subject'] = subject
            
            # Combine English and Chinese content
            full_content = content
            
            msg.attach(MIMEText(full_content, 'plain', 'utf-8'))
            
            # Connect to server and send
            server = smtplib.SMTP(smtp_server, smtp_port)
            server.starttls()
            server.login(self.sender_email, password)
            
            recipients = [to_email, self.cc_email]
            server.sendmail(self.sender_email, recipients, msg.as_string())
            server.quit()
            
            print(f"‚úÖ Email sent successfully to {to_email}")
            return True
            
        except Exception as e:
            print(f"‚ùå Error sending email to {to_email}: {e}")
            print(f"üìÑ Email content would be:")
            print(f"Subject: {subject}")
            print(f"Content:\n{content}")
            print("-" * 50)
            return False

print("‚úÖ EmailSender class created successfully!")
print("üì¨ Ready to send emails via Outlook SMTP")

‚úÖ EmailSender class created successfully!
üì¨ Ready to send emails via Outlook SMTP


## Step 6: Create Main Pipeline Class

This is the orchestrator class that:
- Processes entire spreadsheets of student data
- Calculates class-wide statistics for comparison
- Handles the complete workflow from scraping to email sending
- Provides testing capabilities for individual students

In [7]:
class StudentReportPipeline:
    def __init__(self, api_key: str, api_type: str = "openai", smtp_password: str = None):
        """
        Main pipeline class that orchestrates the entire process
        """
        self.analyzer = StudentReportAnalyzer(api_key, api_type)
        self.email_generator = EmailGenerator(self.analyzer)
        self.email_sender = EmailSender()
        self.smtp_password = smtp_password
        
        print(f"‚úÖ Pipeline initialized with {api_type.upper()} API")
        if smtp_password:
            print("üìß SMTP password provided - emails will be sent")
        else:
            print("‚ö†Ô∏è No SMTP password - emails will be printed to console")
    
    def process_spreadsheet(self, lesson_no, file_path: str, start=None, end=None, delay_between_emails: float = 2.0):
        """
        Process the entire spreadsheet and send emails, optionally for a subset of students
        """
        try:
            # Read the spreadsheet
            if file_path.endswith('.csv'):
                df = pd.read_csv(file_path)
            else:
                df = pd.read_excel(file_path)
            
            print(f"üìä Loaded {len(df)} students from spreadsheet")
            print(f"üìã Columns: {list(df.columns)}")
            
            # Slice dataframe to selected student subset if specified
            if start is not None or end is not None:
                start_index = start-1 if start is not None else 0
                end_index = end if end is not None else len(df)
                df = df.iloc[start_index:end_index]
                print(f"üîç Processing students {start} through {end}")
            
            # Track statistics for class comparison
            all_report_data = []
            
            # Process each student
            for index, row in df.iterrows():
                student_name = row.get('First Name', 'Unknown')
                if start:
                    print(f"\nüéØ Processing student {index + 2-start}/{len(df)}: {student_name}")
                else:
                    print(f"\nüéØ Processing student {index + 1}/{len(df)}: {student_name}")
                
                try:
                    # Convert row to dictionary
                    student_data = row.to_dict()
                    
                    # Scrape report data if not absent and report_link exists
                    report_data = {'scrape_success': False}
                    if (not student_data.get('absent','') == 'yes') and student_data.get('report_link'):
                        print(f"  üåê Scraping report: {student_data['report_link']}")
                        report_data = self.analyzer.scrape_report_data(student_data['report_link'])
                        if report_data['scrape_success']:
                            all_report_data.append(report_data)
                            questions_count = len(report_data.get('questions', {}))
                            attendance_pct = report_data.get('attendance', {}).get('attendance_percentage', 0)
                            print(f"  ‚úÖ Scraping successful: {questions_count} questions, {attendance_pct:.1f}% attendance")
                        else:
                            print(f"  ‚ùå Scraping failed: {report_data.get('error', 'Unknown error')}")
                    elif student_data.get('absent','') == 'yes':
                        print(f"  üè† Student was absent - skipping report scraping")
                    else:
                        print(f"  ‚ö†Ô∏è No report link provided")
                    
                    # Calculate class statistics
                    class_stats = self._calculate_class_stats(all_report_data)
                    
                    # Generate email content
                    print(f"  ü§ñ Generating email content...")
                    content = self.email_generator.generate_email_content(student_data, report_data, class_stats)
                    
                    if "Error generating email" in content:
                        print(f"  ‚ùå Email generation failed")
                        continue
                    else:
                        print(f"  ‚úÖ Email generated ({len(content)} characters)")
                    
                    # Generate subject
                    student_name = student_data.get('First Name', 'Student')
                    last_name = student_data.get('Last Name','')
                    if type(last_name) == float:
                        last_name = ''
                    subject = f"Lesson {lesson_no} Feedback for {student_name} {last_name} ‚Äî Spring G4H Think Academy"
                    
                    # Send email
                    if not pd.isna(student_data.get('Email', '')):
                        to_email = student_data.get('Email', '').strip()
                        print(f"  üìß Sending email to: {to_email if to_email else 'Console (no email provided)'}")
                        
                        success = self.email_sender.send_email(
                            to_email=to_email,
                            subject=subject,
                            content=content,
                            password=self.smtp_password
                        )
                        
                        if success:
                            print(f"  ‚úÖ Email sent successfully!")
                        else:
                            print(f"  üìÑ Email printed to console")
                            print(content)
                    else:
                        print(f"  üìÑ Email printed to console")
                        print(content)
                    # Delay between emails to avoid rate limiting
                    if delay_between_emails > 0 and index < len(df) - 1:
                        print(f"  ‚è±Ô∏è Waiting {delay_between_emails}s before next student...")
                        time.sleep(delay_between_emails)
                
                except Exception as e:
                    print(f"  ‚ùå Error processing student: {e}")
                    continue
            
            print(f"\nüéâ Pipeline completed! Processed {len(df)} students.")
            print(f"üìà Successfully scraped {len(all_report_data)} reports")
            
        except Exception as e:
            print(f"‚ùå Error processing spreadsheet: {e}")
    
    def _calculate_class_stats(self, all_report_data: List[Dict]) -> Dict:
        """Calculate class-wide statistics for comparison"""
        if not all_report_data:
            return {}
        
        stats = {
            'total_students_with_reports': len(all_report_data),
            'average_attendance': 0,
            'average_correct_rate': 0,
            'question_difficulty': {}
        }
        
        # Calculate attendance stats
        attendance_percentages = []
        for data in all_report_data:
            attendance = data.get('attendance', {})
            if attendance.get('attendance_percentage'):
                attendance_percentages.append(attendance['attendance_percentage'])
        
        if attendance_percentages:
            stats['average_attendance'] = sum(attendance_percentages) / len(attendance_percentages)
        
        # Calculate question performance stats
        all_questions = {}
        correct_rates = []
        
        for data in all_report_data:
            questions = data.get('questions', {})
            if questions:
                total_questions = len(questions)
                correct_questions = sum(1 for status in questions.values() if status == 'correct')
                if total_questions > 0:
                    correct_rates.append(correct_questions / total_questions)
                
                # Track individual question difficulty
                for q_num, status in questions.items():
                    if q_num not in all_questions:
                        all_questions[q_num] = {'correct': 0, 'total': 0}
                    all_questions[q_num]['total'] += 1
                    if status == 'correct':
                        all_questions[q_num]['correct'] += 1
        
        if correct_rates:
            stats['average_correct_rate'] = sum(correct_rates) / len(correct_rates)
        
        # Determine question difficulty
        for q_num, q_stats in all_questions.items():
            if q_stats['total'] >= 3:  # Only consider questions answered by at least 3 students
                correct_rate = q_stats['correct'] / q_stats['total']
                if correct_rate < 0.3:
                    difficulty = 'very difficult'
                elif correct_rate < 0.5:
                    difficulty = 'difficult'
                elif correct_rate < 0.7:
                    difficulty = 'moderate'
                else:
                    difficulty = 'easy'
                stats['question_difficulty'][q_num] = {
                    'difficulty': difficulty,
                    'correct_rate': correct_rate
                }
        
        return stats
    
    def test_single_student(self, student_data: Dict) -> Dict:
        """
        Test the pipeline with a single student (for debugging)
        """
        student_name = student_data.get('First Name', 'Unknown')
        print(f"üß™ Testing with student: {student_name}")
        
        # Scrape report data if available
        report_data = {'scrape_success': False}
        if student_data.get('report_link') and not (student_data.get('absent','')=='yes'):
            print(f"üåê Scraping: {student_data['report_link']}")
            report_data = self.analyzer.scrape_report_data(student_data['report_link'])
            if report_data['scrape_success']:
                print(f"‚úÖ Scraping successful")
            else:
                print(f"‚ùå Scraping failed: {report_data.get('error')}")
        
        # Generate email
        print(f"ü§ñ Generating email...")
        content = self.email_generator.generate_email_content(student_data, report_data)
        
        result = {
            'student_data': student_data,
            'report_data': report_data,
            'content': content,
        }
        
        print(f"‚úÖ Test completed successfully")
        return result

print("‚úÖ StudentReportPipeline class created successfully!")
print("üöÄ Main pipeline ready for use")

‚úÖ StudentReportPipeline class created successfully!
üöÄ Main pipeline ready for use


## Step 7: Create Sample Data for Testing

Generate sample student data that demonstrates all the different scenarios the system can handle.

## Step 8: Configuration - Set Your API Key

‚ö†Ô∏è **CRITICAL STEP**: You must provide your actual API key here. The system requires either OpenAI or Perplexity API access.

In [8]:
# ========================================
# CONFIGURATION - REPLACE WITH YOUR VALUES
# ========================================

# üîë API KEY - REQUIRED!
# Get from: https://platform.openai.com/api-keys (OpenAI)
# Or from: https://www.perplexity.ai/settings/api (Perplexity)
API_KEY = "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"  # <<<< REPLACE WITH YOUR ACTUAL API KEY

# Choose API provider
API_TYPE = "perplexity"  # or "perplexity"

# üìß SMTP PASSWORD - OPTIONAL
# If not provided, emails will be printed to console

SMTP_PASSWORD = getpass.getpass("Enter your Outlook password/app password: ")  # <<<< REPLACE WITH YOUR EMAIL PASSWORD IF DESIRED

# Validate configuration
if API_KEY == "your-api-key-here" or not API_KEY:
    print("‚ùå ERROR: You must set your API key!")
    print("Please replace 'your-api-key-here' with your actual OpenAI or Perplexity API key")
    print("")
    print("How to get API keys:")
    print("‚Ä¢ OpenAI: https://platform.openai.com/api-keys")
    print("‚Ä¢ Perplexity: https://www.perplexity.ai/settings/api")
else:
    print(f"‚úÖ API key provided for {API_TYPE.upper()}")
    if SMTP_PASSWORD:
        print("‚úÖ SMTP password provided - emails will be sent")
    else:
        print("‚ö†Ô∏è No SMTP password - emails will be printed to console")
    
    print(f"\nüöÄ Configuration complete! Ready to initialize pipeline.")

‚úÖ API key provided for PERPLEXITY
‚úÖ SMTP password provided - emails will be sent

üöÄ Configuration complete! Ready to initialize pipeline.


## Step 9: Initialize the Pipeline

Create the main pipeline object with your API credentials.

In [9]:
# Initialize the pipeline
if API_KEY == "your-api-key-here" or not API_KEY:
    print("‚ùå Cannot initialize pipeline without API key")
    print("Please set your API key in the previous cell")
else:
    try:
        # Create the pipeline
        pipeline = StudentReportPipeline(
            api_key=API_KEY,
            api_type=API_TYPE,
            smtp_password=SMTP_PASSWORD
        )
        
        print("\nüéâ Pipeline initialization successful!")
        print("\nPipeline components:")
        print(f"‚Ä¢ üåê Web scraper for Think Academy reports")
        print(f"‚Ä¢ ü§ñ AI email generator using {API_TYPE.upper()}")
        print(f"‚Ä¢ üìß Email sender via Outlook SMTP")
        
        # Test API connection
        print("\nüîç Testing API connection...")
        if API_TYPE == "openai":
            try:
                # Try a minimal API call
                test_response = pipeline.analyzer.client.chat.completions.create(
                    model="gpt-4o-mini",
                    messages=[{"role": "user", "content": "Say 'API test successful'"}],
                    max_tokens=10
                )
                print("‚úÖ OpenAI API connection successful")
            except Exception as e:
                print(f"‚ùå OpenAI API connection failed: {e}")
                print("Please check your API key and account credits")
        
        print("\n‚ú® Ready to process student data!")
        
    except Exception as e:
        print(f"‚ùå Pipeline initialization failed: {e}")
        print("Please check your API key and try again")

‚úÖ StudentReportAnalyzer initialized with PERPLEXITY API
‚úÖ EmailGenerator initialized
‚úÖ EmailSender initialized
üìß Sender: omar.shohoud@thethinkacademy.com
üìã CC: weeklyreport@thethinkacademy.com
‚úÖ Pipeline initialized with PERPLEXITY API
üìß SMTP password provided - emails will be sent

üéâ Pipeline initialization successful!

Pipeline components:
‚Ä¢ üåê Web scraper for Think Academy reports
‚Ä¢ ü§ñ AI email generator using PERPLEXITY
‚Ä¢ üìß Email sender via Outlook SMTP

üîç Testing API connection...

‚ú® Ready to process student data!


## Step 10: Test with Single Student

Before processing the entire spreadsheet, let's test with one student to make sure everything works correctly.

## Step 11: Process Full Spreadsheet

Now let's process the complete sample spreadsheet. This demonstrates the full pipeline in action.

In [10]:
# Process the full sample spreadsheet
if 'pipeline' not in locals():
    print("‚ùå Pipeline not initialized. Please run the previous cells first.")
else:
    print("üöÄ Processing full sample spreadsheet...")
    print("This will demonstrate all pipeline features:")
    print("‚Ä¢ Web scraping (will fail for demo URLs)")
    print("‚Ä¢ AI email generation with Chinese Translation")
    print("‚Ä¢ Email sending (or console output)")
    print("‚Ä¢ Handling of absent students")
    print("‚Ä¢ Different gender pronoun usage")
    
    lesson_no = input("Enter lesson number: ")
    
    # Prompt for delay length in seconds (optional)
    
    try:
        # Process the sample data with slicing options if supported by the pipeline
        pipeline.process_spreadsheet(
            file_path='Student Tags - 0126.csv',
            lesson_no=lesson_no,
            delay_between_emails=2.0
        )
        
        print("\n" + "="*60)
        print("üéâ FULL PIPELINE PROCESSING COMPLETED!")
        print("="*60)
        print("\nüìã Summary of what happened:")
        print("‚Ä¢ ‚úÖ Loaded student data from CSV")
        print("‚Ä¢ üåê Attempted to scrape report links (demo URLs failed as expected)")
        print("‚Ä¢ ü§ñ Generated " \
        "personalized AI emails for each student")
        print("‚Ä¢ üìß Sent emails (or printed to console based on configuration)")
        print("‚Ä¢ üë• Handled different scenarios: absent students, missing data, etc.")
        
        print("\n‚ú® The pipeline is fully functional!")
        print("\nüìù To use with real data:")
        print("1. Replace the CSV file with your actual student data")
        print("2. Ensure report links are valid Think Academy URLs")
        print("3. Optionally provide SMTP password for actual email sending")
        print("4. Run: pipeline.process_spreadsheet('your_file.csv')") 
        
    except Exception as e:
        print(f"‚ùå Processing failed: {e}")
        print("Please check the error details above.")


üöÄ Processing full sample spreadsheet...
This will demonstrate all pipeline features:
‚Ä¢ Web scraping (will fail for demo URLs)
‚Ä¢ AI email generation with Chinese Translation
‚Ä¢ Email sending (or console output)
‚Ä¢ Handling of absent students
‚Ä¢ Different gender pronoun usage
üìä Loaded 99 students from spreadsheet
üìã Columns: ['Student Number', 'Class Time (Spring)', 'First Name', 'Last Name', 'homework_score', 'report_link', 'stage_test', 'Parent Name', 'Gender', 'Email', 'engagement', 'camera', 'absent', 'absent_notified', 'extra_feedback', 'extra_section', 'notes']

üéØ Processing student 1/99: Thea
  üåê Scraping report: https://www.thethinkacademy.com/j/98Prp1
  ‚úÖ Scraping successful: 9 questions, 100.0% attendance
  ü§ñ Generating email content...
Perplexity API response status: 200
Perplexity API response content: {"id":"1b4d27ee-808f-4aeb-b64f-7090437476b1","model":"sonar","created":1769482008,"usage":{"prompt_tokens":1062,"completion_tokens":825,"total_tokens"

## Step 12: Usage Instructions for Real Data

Here's how to use this notebook with your actual student data.

In [None]:
# Instructions for real usage
print("üìö HOW TO USE WITH REAL DATA")
print("="*50)

print("\n1. üìä PREPARE YOUR DATA:")
print("   ‚Ä¢ Create a CSV or Excel file with required columns")
print("   ‚Ä¢ Required columns: Email, Gender, Parent Name, First Name, Last Name,")
print("     engagement, camera, homework_score, report_link, absent, absent_notified,")
print("     extra_feedback, extra_section")

print("\n2. üîó VERIFY REPORT LINKS:")
print("   ‚Ä¢ Ensure all report_link URLs are valid Think Academy report pages")
print("   ‚Ä¢ Format: https://www.thethinkacademy.com/j/[unique_code]")
print("   ‚Ä¢ Leave blank for absent students")

print("\n3. üîë API SETUP:")
print("   ‚Ä¢ OpenAI: Get key from https://platform.openai.com/api-keys")
print("   ‚Ä¢ Perplexity: Get key from https://www.perplexity.ai/settings/api")
print("   ‚Ä¢ Ensure your account has sufficient credits")

print("\n4. üìß EMAIL SETUP (OPTIONAL):")
print("   ‚Ä¢ To send actual emails, provide your Outlook password")
print("   ‚Ä¢ If using 2FA, you may need an app-specific password")
print("   ‚Ä¢ Without password, emails will be printed to console")

print("\n5. üöÄ RUN THE PIPELINE:")
print("   ‚Ä¢ Upload your CSV file to this notebook environment")
print("   ‚Ä¢ Update the file path in the process_spreadsheet call")
print("   ‚Ä¢ Execute: pipeline.process_spreadsheet('your_file.csv')")

print("\n" + "="*50)
print("EXAMPLE CODE FOR YOUR DATA:")
print("="*50)
print("""
# After setting up your API key and initializing the pipeline:

# Process your actual data file
pipeline.process_spreadsheet(
    file_path='my_student_data.csv',  # Your file name
    delay_between_emails=2.0  # Delay between emails in seconds
)

# Or test with a single student first:
single_student = {
    'Email': 'real_parent@email.com',
    'Gender': 'M',
    'Parent Name': 'John Doe',
    'First Name': 'Student Name',
    # ... other fields
}
result = pipeline.test_single_student(single_student)
""")

print("\n‚ö†Ô∏è  IMPORTANT NOTES:")
print("‚Ä¢ Each API call costs money - test with small batches first")
print("‚Ä¢ Web scraping may fail if Think Academy changes their HTML structure")
print("‚Ä¢ Translation service requires internet connection")
print("‚Ä¢ Email sending requires valid SMTP credentials")
print("‚Ä¢ Always verify generated content before sending to parents")

print("\n‚úÖ The notebook is ready for production use!")

In [80]:
# Demonstration of BeautifulSoup find_all with class regex matching

import re
from bs4 import BeautifulSoup

# Sample HTML string with divs having various class attributes
html_doc = """
<section class="answer-list">
  <div class="answer correct">Question 1: Correct</div>
  <div class="answer partly-correct">Question 2: Partly Correct</div>
  <div class="answer incorrect">Question 3: Incorrect</div>
  <div class="answer no-answer">Question 4: No Answer</div>
  <div class="answer almost-correct">Question 5: Almost Correct (should NOT match)</div>
  <div class="answer-correct">Question 6: Single class (should NOT match)</div>
  <div class="question correct">Question 7: Wrong class (should NOT match)</div>
  <div class="answer wrong">Question 8: Wrong status (should NOT match)</div>
  <div class="answer">Question 9: Missing status (should NOT match)</div>
</section>
"""
print(html_doc)
# Parse HTML
soup = BeautifulSoup(html_doc, 'html.parser')

# Find the answer-list section
answer_section = soup.find('section', class_='answer-list')

# Use the regex search to find all divs matching the specified classes
question_divs = answer_section.find_all(
    'div',
    class_=re.compile(r'answer (correct|partly-correct|incorrect|no-answer)')
)

print(f"Found {len(question_divs)} matching <div> elements:")
for div in question_divs:
    print(f" - {div['class']}: {div.text.strip()}")



<section class="answer-list">
  <div class="answer correct">Question 1: Correct</div>
  <div class="answer partly-correct">Question 2: Partly Correct</div>
  <div class="answer incorrect">Question 3: Incorrect</div>
  <div class="answer no-answer">Question 4: No Answer</div>
  <div class="answer almost-correct">Question 5: Almost Correct (should NOT match)</div>
  <div class="answer-correct">Question 6: Single class (should NOT match)</div>
  <div class="question correct">Question 7: Wrong class (should NOT match)</div>
  <div class="answer wrong">Question 8: Wrong status (should NOT match)</div>
  <div class="answer">Question 9: Missing status (should NOT match)</div>
</section>

Found 4 matching <div> elements:
 - ['answer', 'correct']: Question 1: Correct
 - ['answer', 'partly-correct']: Question 2: Partly Correct
 - ['answer', 'incorrect']: Question 3: Incorrect
 - ['answer', 'no-answer']: Question 4: No Answer


## üéâ Notebook Complete!

This notebook provides a complete solution for:

### ‚úÖ What Works:
- **Web scraping** of Think Academy report pages for question performance and attendance
- **AI-powered email generation** using OpenAI or Perplexity APIs
- **Personalized content** based on student data, engagement, and performance
- **Chinese translation** using Google Translate
- **Email sending** via Outlook SMTP with proper CC handling
- **Error handling** for missing data, failed scraping, and API issues
- **Class statistics** for comparing individual performance to class averages

### üîß Key Features:
- No generic templates - every email is AI-generated and unique
- Handles different genders with appropriate pronouns
- Processes absent students appropriately
- Includes specific question numbers and performance details
- Offers relevant resources based on student needs
- Comprehensive error handling and logging

### üìù To Use:
1. Set your API key in Step 8
2. Prepare your student data CSV with required columns
3. Run the pipeline with `pipeline.process_spreadsheet('your_file.csv')`

The system will handle everything automatically while providing detailed progress updates!