# Python Email Exercise Ideas

Since we can't really assess any code that would involve your personal email address, here are some ideas for you to test your new skills. Please keep in mind, we can not assess these.


## Ideas

* Daily Automatic Email Reminder for your Tasks
* Webscrape some statistics from a website automatically each day and email them to yourself
* Automatically email daily/weekly/monthly reports at your work
* Have end of day messages to your friends and family be sent out at random to spread joy
* Be creative! Mix together any of the skills you've learned so far with email :)

# Podcast Newsletter App

In [None]:
'''
IDEA:

- create a Podcast newsletter for headlines from geopolitics, finance, science, tech, and AI.
    - websites:
        - geopolitics:
            - https://www.reuters.com/world/
            - https://ground.news/
        - finance
            - https://www.reuters.com/business/
            - https://www.reuters.com/markets/ 
        - science:
            - https://ground.news/interest/science
            - https://www.sciencenews.org/
        - tech: 
            - https://www.wired.com/
            - https://ground.news/interest/tech
        - AI:
            - https://ground.news/interest/ai
            - https://www.artificialintelligence-news.com/

- newsletter that will be sent every second day at 9am (server needed?)
- set up server on Google Cloud Run

'''

In [None]:
'''
STEPS:

1. set up web scraper -> headlines, text
2. set up LLM (perplexity) integration -> short explainers for biggest headlines (let it choose which is biggest headline)
3. finalize text and add image as part of email
4. set up emailing infrastructure
5. integrate structure into Cloud Run

'''

In [None]:
# TODO add error handling

import random
import os
import requests
import bs4
import smtplib
import getpass
from PIL import Image

from transformers import AutoModelForCausalLM, AutoTokenizer

In [4]:
# - geopolitics:
    # - https://www.reuters.com/world/
    # - https://ground.news/
# - finance
    # - https://www.reuters.com/business/
    # - https://www.reuters.com/markets/ 
# - science:
    # - https://ground.news/interest/science
    # - https://www.sciencenews.org/
# - tech: 
    # - https://www.wired.com/
    # - https://ground.news/interest/tech
# - AI:
    # - https://ground.news/interest/ai
    # - https://www.artificialintelligence-news.com/

In [5]:
### 1 ### scrape headlines and first few lines

# FIX needed?

headlines_and_text = {} # dict for headlines and text


In [6]:
# request websites

# TODO use Forbes instead of reuters
# reuters_geo = requests.get('https://www.reuters.com/world/')
ground_geo = requests.get('https://ground.news/')
# reuters_bus = requests.get('https://www.reuters.com/business/')
# reuters_fin = requests.get('https://www.reuters.com/markets/')
news_sci = requests.get('https://www.sciencenews.org/')
ground_sci = requests.get('https://ground.news/interest/science')
wired_tech = requests.get('https://www.wired.com/')
ground_tech = requests.get('https://ground.news/interest/tech')
ground_ai = requests.get('https://ground.news/interest/ai')
news_ai = requests.get('https://www.artificialintelligence-news.com/')

In [None]:
# function for getting headlines:
# TODO add the images
# TODO how to get rid of extra symbols in text?

def get_headline(request, category, request_2=None):

    def get_number_headlines(headlines, category):
        mapping = {
            "geo": 4,
            "science" : 2,
            "tech": 3,
            "ai": 3
        }

        n = mapping[category]

        return set(random.sample([headline.getText().strip() for headline in headlines], n))

    if request_2 is None:
        soup = bs4.BeautifulSoup(request.text, 'lxml')
        headlines = soup.find_all('h4')

        return list(get_number_headlines(headlines, category))
    else:
        soup_1 = bs4.BeautifulSoup(request.text, 'lxml')
        headlines_1 = soup_1.find_all('h4')
        headlines_1 = get_number_headlines(headlines_1, category)
    
        soup_2 = bs4.BeautifulSoup(request_2.text, 'lxml')
        headlines_2 = soup_2.find_all('h4')
        headlines_2 = get_number_headlines(headlines_2, category)
        headlines_1.update(headlines_2)
        return list(headlines_1)

In [None]:
# getting all headlines

headlines_geo = get_headline(ground_geo, "geo")
headlines_sci = get_headline(ground_sci, "science", news_sci)

# todo finish all other websites


['Man Arrested in U.K. over Alleged Cyberattack that Affected European Airports',
 'Drone disruption unlikely to hit profitability, Ryanair boss says',
 'Denmark’s leader apologizes to Indigenous girls and women in Greenland for forced contraception',
 'Stellantis To Pause Output At Six European Factories: Report']

In [None]:
### 2 ### LLM integration (using QWEN)

model_name = "Qwen/Qwen3-4B"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

In [None]:
# prepare the model input
prompt = "Give me a short introduction to large language model."
messages = [
    {
        "role": "user", 
        "content": prompt
    }
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=True # Switches between thinking and non-thinking modes. Default is True.
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

# conduct text completion
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=32768
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() 

# parsing thinking content
try:
    # rindex finding 151668 (</think>)
    index = len(output_ids) - output_ids[::-1].index(151668)
except ValueError:
    index = 0

thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n")
content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n")

print("thinking content:", thinking_content)
print("content:", content)

