# Generate Fake Datasets

In [1]:
topics = ["AI", "Career", "Design", "Machine Learning",
          "Hiring", "Remote Work", "Biology", "Leadership"]

In [2]:
texts = [
    "Transformer models are reshaping AI, but their size increases memory demands. Innovations in architecture like sparse attention aim to balance performance with efficiency.",
    "Effective remote work strategies hinge on clear communication and trust. Tools like async updates and regular check-ins help distributed teams stay aligned and productive.",
    "Biology meets machine learning as protein folding problems get solved with deep neural networks. These breakthroughs accelerate drug discovery and bioengineering applications.",
    "Design thinking empowers product teams to solve real user problems. Empathy, rapid prototyping, and testing are essential to building impactful, human-centered products.",
    "Hiring in tech remains competitive. Companies now use structured interviews and skill-based assessments to fairly evaluate candidates and reduce hiring biases.",
    "Career development today involves not just climbing ladders, but growing skills across domains. Lateral moves, mentorship, and self-directed learning are key for modern professionals.",
    "AI is revolutionizing leadership. From decision-support tools to sentiment analysis dashboards, executives gain real-time insights to steer complex organizations.",
    "Remote work culture is shifting. Companies are investing in virtual onboarding, mental health support, and asynchronous collaboration to build sustainable distributed teams.",
    "Machine learning models benefit from high-quality data curation. Feature engineering, bias reduction, and real-world validation remain crucial parts of any ML pipeline.",
    "Leadership in a digital era involves clarity, adaptability, and inclusion. The best leaders foster autonomy while aligning teams around shared, measurable goals.",
    "AI ethics are a growing concern. Teams must consider data privacy, explainability, and social impact when building and deploying intelligent systems.",
    "Biological systems inspire AI architectures. Neural networks borrow from brain-like structures, and evolutionary strategies mimic natural selection for optimization.",
    "Effective design is not just aesthetic—it solves real user problems. Grid systems, accessibility standards, and UX research all shape the final user experience.",
    "Career switches are more common than ever. Bootcamps, online courses, and portfolio-based hiring help professionals pivot into new tech roles.",
    "Hiring remote employees expands the talent pool. Companies must adapt onboarding, compliance, and team-building processes to work across time zones.",
    "AI is now embedded in daily tools. From smart email filters to adaptive learning apps, consumer products quietly harness LLMs and models behind the scenes.",
    "Machine learning fairness requires testing for demographic parity and representation. Responsible ML teams include evaluation metrics beyond accuracy.",
    "Design systems streamline product development. Shared components, tokens, and guidelines improve speed and consistency across design and engineering.",
    "Leadership in hybrid teams involves managing visibility and equity. Leaders must intentionally support remote voices and avoid proximity bias.",
    "Career ladders are evolving. Dual tracks for ICs and managers help retain talent while aligning growth with impact—not just seniority.",
    "Biology labs now integrate automation. Pipetting robots, image classifiers, and CRISPR optimization algorithms are redefining how experiments are run.",
    "Remote-first companies are leading with flexibility. They invest in outcomes, not hours, and prioritize written documentation and async workflows.",
    "AI-powered writing tools assist with clarity, grammar, and tone. These tools are especially helpful in non-native contexts and fast-paced environments.",
    "Design is collaborative. The best outcomes come from diverse teams that combine research insights, creative exploration, and technical feasibility.",
    "Hiring for potential—not pedigree—is a trend. Assessments and structured interviews outperform CVs when predicting on-the-job success.",
    "Machine learning in healthcare improves diagnostics. Algorithms now detect tumors, predict deterioration, and personalize treatment plans.",
    "Leadership storytelling creates alignment. Clear narratives help teams understand why decisions are made and where the company is headed.",
    "Career growth in tech now includes public speaking, open source contributions, and writing—forms of impact beyond just code.",
    "Biology visualization tools use 3D and VR to explore protein structures, neuron networks, and molecular interactions interactively.",
    "Remote work productivity varies. The best teams use checklists, clarity in goals, and rituals like stand-ups to stay on track.",
    "AI agents are being deployed in customer service, finance, and logistics. These autonomous systems handle multi-step tasks with increasing reliability.",
    "Design critiques, if structured well, drive innovation. Framing feedback around goals and user needs keeps teams aligned and productive.",
    "Machine learning engineers must understand infrastructure. Model training, serving, and monitoring all require devops literacy.",
    "Hiring diverse teams drives innovation. Diverse perspectives lead to better problem-solving and market relevance in global products.",
    "Leadership coaching is becoming data-driven. Feedback dashboards and 360 reviews help leaders track and adjust their development plans.",
    "Career satisfaction is influenced by autonomy, purpose, and mastery. These factors matter more than titles or compensation alone.",
    "Biology datasets are expanding rapidly. AI helps analyze genomics, behavior, and metabolic pathways with tools built for big biological data.",
    "Remote collaboration tools are moving beyond Zoom. Products like Miro, Notion, and Loom create shared spaces for async work.",
    "Design tokens unify themes across platforms. They encode visual identity and make cross-platform implementation faster and more accurate.",
    "AI model interpretability is key. Tools like SHAP, LIME, and feature importance plots help developers and users trust predictions.",
    "Hiring pipelines are being automated. From resume screening to coding tests, tech enables faster but also more fair hiring decisions.",
    "Leadership during uncertainty involves clarity, empathy, and decisiveness. Great leaders focus on principles and people, not just results.",
    "Machine learning models must generalize. Real-world performance often drops from benchmarks, so testing in production is essential.",
    "Design leadership requires advocating for users and designers. Balancing business goals and user experience is a strategic skill.",
    "Career transitions are powered by learning. Internal mobility programs and certifications help retain employees and fill skill gaps.",
    "Remote work policies now include location-based pay, home office stipends, and guidelines for offsite team retreats.",
    "AI in education personalizes learning. Adaptive systems adjust content difficulty, pace, and feedback for each student.",
    "Biology meets AI in synthetic life design. Scientists now use algorithms to propose viable, novel genetic configurations.",
    "Hiring during economic uncertainty shifts focus to core roles and essential outcomes. Recruiters must partner tightly with leadership.",
    "Machine learning in marketing optimizes customer journeys. Algorithms adjust targeting, timing, and creative content in real time."
]


In [3]:
import pandas as pd
from faker import Faker
import random

fake = Faker()

# Create fake blog posts
blog_posts = []

for i, text in enumerate(texts):
    sentences = text.split(". ")
    title = sentences[0].strip(".")  # Remove period for cleaner title
    content = ". ".join(sentences[1:]).strip()
    if content and not content.endswith("."):
        content += "."
    
    blog_posts.append({
        "post_id": f"post_{i+1}",
        "title": title,
        "content": content,
        "url": fake.url(),
        "date": fake.date_this_month(),
    })

# Create fake users with interests
user_profiles = []
user_topics = random.choices(topics, k=5)

for i in range(15):
    name = fake.name()
    user_profiles.append({
        "user_id": f"user_{i+1}",
        "name": name,
        "email": fake.email(),
        "interests": ", ".join(random.sample(user_topics, k=random.randint(1, 3)))
    })

# Convert to DataFrames
df_posts = pd.DataFrame(blog_posts)
df_users = pd.DataFrame(user_profiles)

# Save to CSV for user
df_posts_path = "data/fake_blog_posts.csv"
df_users_path = "data/fake_user_profiles.csv"

df_posts.to_csv(df_posts_path, index=False)
df_users.to_csv(df_users_path, index=False)


Pyarrow will become a required dependency of pandas in the next major release of pandas (pandas 3.0),
(to allow more performant data types, such as the Arrow string type, and better interoperability with other libraries)
but was too old on your system - pyarrow 10.0.1 is the current minimum supported version as of this release.
If this would cause problems for you,
please provide us feedback at https://github.com/pandas-dev/pandas/issues/54466
        
  import pandas as pd


In [4]:
df_posts.sample(3)

Unnamed: 0,post_id,title,content,url,date
16,post_17,Machine learning fairness requires testing for...,Responsible ML teams include evaluation metric...,http://holmes.biz/,2025-07-15
25,post_26,Machine learning in healthcare improves diagno...,"Algorithms now detect tumors, predict deterior...",http://huynh.com/,2025-07-21
35,post_36,"Career satisfaction is influenced by autonomy,...",These factors matter more than titles or compe...,http://byrd.com/,2025-07-06


In [5]:
df_users.sample(2)

Unnamed: 0,user_id,name,email,interests
1,user_2,Melissa Villanueva,stephaniewoodward@example.com,Biology
2,user_3,John Webb,jacob17@example.com,"Design, Machine Learning"


# Rule-based Python pipeline

In [6]:
import os
import pandas as pd
import openai
from openai import OpenAI
import smtplib
from email.mime.text import MIMEText
from email.mime.multipart import MIMEMultipart
import random

In [10]:
from dotenv import load_dotenv
import os

# Load env vars
load_dotenv()

# Check key is there
api_key = os.getenv("OPENAI_API_KEY")
assert api_key, "OPENAI_API_KEY is missing"

In [11]:
import openai
client = OpenAI() 

In [12]:
posts_df = pd.read_csv("data/fake_blog_posts.csv")
users_df = pd.read_csv("data/fake_user_profiles.csv")

## Tools

In [13]:
def get_posts(posts_df, N=15):
    selected_posts_df = posts_df.sample(N)
    return selected_posts_df

In [14]:
# Tool: simulate sending email (for dev/testing)
def send_email(user_email, subject, body, sender="no-reply@agentic.local"):
    print(f"\n--- Simulated Email ---")
    print(f"From: {sender}")
    print(f"To: {user_email}")
    print(f"Subject: {subject}")
    print(f"Body:\n{body}\n")

In [15]:
# Tool: summarize text using OpenAI
def summarize_text(text):
    try:
        response = client.chat.completions.create(
            model="gpt-3.5-turbo",
            messages=[
                {"role": "system", "content": "You are a helpful assistant that summarizes blog content."},
                {"role": "user", "content": f"Summarize the following blog post:\n\n{text}"}
            ]
        )
        return response.choices[0].message.content.strip()
    except Exception as e:
        return f"[Error summarizing text: {e}]"

In [16]:
# Tool: classify topic
def classify_topic(text, topics):
    topics = ", ".join(topics)
    
    try:
        response = client.chat.completions.create(
            model="gpt-3.5-turbo",
            messages=[
                {"role": "system", "content": "You are a helpful assistant that categorizes blog content."},
                {"role": "user", "content": f"""Classify each summary into one of the given topics. \n
                     Text: \n\n{text},\n
                     Available topics: \n\n{topics}.\n
                     Output should be one topic from the list without comments."""}
            ]
        )
        return response.choices[0].message.content.strip()
    except Exception as e:
        return f"[Error classifying text: {e}]"


In [17]:
import json

def decide_action(users_interests, available_topics): 
    context = (
        f"""A user is interested in blog posts on: {users_interests}.
The blog has recent posts on the following topics: {available_topics}.

Your task:
- Choose which of the available topics are worth summarizing for this user.
- If none are relevant, return an empty list []. It's better than recommend irrelevant content.
- You may also suggest a new topic close to the user's interests if a better match is available.

Respond in JSON format with:
{{
  "comment": "...your reasoning...",
  "available_or_suggested_topics": ["topic1", "topic2", ...] or []
}}
"""
    )

    response = client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[
            {"role": "system", "content": "You are an agent planner. Make decisions on task flow."},
            {"role": "user", "content": context}
        ]
    )

    raw_reply = response.choices[0].message.content.strip()
    
    try:
        decision = json.loads(raw_reply)
        return decision
    except json.JSONDecodeError:
        print("⚠️ Failed to parse JSON from agent. Raw output:")
        print(raw_reply)
        return {"comment": "Parsing error", "available_or_suggested_topics": []}


In [18]:
decide_action(users_interests=['AI'], available_topics=['Sun', "LLM", "ML"])

{'comment': 'There are no recent blog posts on AI related topics available. However, you might be interested in the Machine Learning (ML) post as it is closely related to AI.',
 'available_or_suggested_topics': ['ML']}

In [19]:
import json

def load_memory():
    try:
        with open("memory.json", "r") as f:
            return json.load(f)
    except FileNotFoundError:
        return {}

def save_memory(memory):
    with open("memory.json", "w") as f:
        json.dump(memory, f, indent=2)
        
def clear_memory():
    memory = {"sent_posts": []}
    save_memory(memory)
    print("🧹 Memory has been reset to a clean list.")


## Calling

In [20]:
# Optional: clean the memory
# clear_memory()
memory = load_memory()

In [21]:
import json

def summarize_and_email(posts_df, users_df, memory=memory, topics=topics, N=15):
    # 1. Sample N posts and classify
    selected_posts_df = get_posts(posts_df, N)
    selected_posts_df["topic"] = selected_posts_df["title"].apply(lambda x: classify_topic(x, topics))

    # 2. Skip already emailed posts
    new_posts = []
    for _, row in selected_posts_df.iterrows():
        if row["post_id"] in memory.get("sent_posts", []):
            print(f"Skipping already-emailed post {row['post_id']}")
        else:
            new_posts.append(row["post_id"])

    new_posts_df = selected_posts_df[selected_posts_df["post_id"].isin(new_posts)]
    available_topics = new_posts_df["topic"].unique()
    print(available_topics)

    # 3. Process each user individually
    for _, user in users_df.iterrows():
        user_id = user["user_id"]
        user_name = user["name"]
        user_email = user["email"]
        user_interests = user["interests"]

        # 4. Agent decides what topics to send based on interests and availability
        decision = decide_action(user_interests, available_topics)
        print(f"Decision for {user_name}: {decision}")

        suggested_topics = decision.get("available_or_suggested_topics", [])
        if not suggested_topics:
            print(f"Agent decided to skip user '{user_name}'.")
            continue

        # 5. Summarize relevant posts
        summaries = []
        for topic in suggested_topics:
            topic_posts = new_posts_df[new_posts_df["topic"] == topic]
            for _, post in topic_posts.iterrows():
                summary = summarize_text(post["content"])
                summaries.append(f"- {post['title']}\n{summary}\nLink: {post['url']}")
                
                memory.setdefault("sent_posts", [])
                if post["post_id"] not in memory["sent_posts"]:
                    memory["sent_posts"].append(post["post_id"])


        if summaries:
            summary_block = "\n\n".join(summaries)
            subject = f"📰 Curated Articles for You: {', '.join(suggested_topics)}"
            body = f"Hi {user_name},\n\nBased on your interests, here are some new articles you might enjoy:\n\n{summary_block}\n\nBest,\nYour Content Agent"
            send_email(user_email, subject, body)

    # Save memory once after all emails
    save_memory(memory)


In [22]:
summarize_and_email(posts_df, users_df, memory, topics, 5)

Skipping already-emailed post post_41
Skipping already-emailed post post_19
['Hiring' 'AI' 'Career']
Decision for Alexander Pennington: {'comment': 'None of the available topics directly relate to Biology. However, as a new topic suggestion close to Biology, you may be interested in the recent post on AI which can sometimes intersect with biological sciences through topics like artificial intelligence in healthcare or bioinformatics.', 'available_or_suggested_topics': ['AI']}

--- Simulated Email ---
From: no-reply@agentic.local
To: normancarmen@example.org
Subject: 📰 Curated Articles for You: AI
Body:
Hi Alexander Pennington,

Based on your interests, here are some new articles you might enjoy:

- AI agents are being deployed in customer service, finance, and logistics
The blog post discusses autonomous systems and their ability to manage multi-step tasks with growing dependability.
Link: https://ford.net/

Best,
Your Content Agent

Decision for Melissa Villanueva: {'comment': "None o

## Alternative: Modular structure

In [20]:
def agent_decide(user, available_topics):
    return decide_action(user["interests"], available_topics)


In [21]:
def agent_plan(topics, posts_df, memory):
    summaries = []
    post_ids_to_mark = []
    
    for topic in topics:
        topic_posts = posts_df[posts_df["topic"] == topic]
        for _, post in topic_posts.iterrows():
            if post["post_id"] in memory.get("sent_posts", []):
                continue
            
            summary = summarize_text(post["content"])
            summaries.append(f"- {post['title']}\n{summary}\nLink: {post['url']}")
            post_ids_to_mark.append(post["post_id"])
    
    return summaries, post_ids_to_mark


In [22]:
def agent_act(user, topics, summaries):
    if not summaries:
        return

    subject = f"📰 Curated Articles for You: {', '.join(topics)}"
    summary_block = "\n\n".join(summaries)
    body = f"Hi {user['name']},\n\nBased on your interests, here are some new articles you might enjoy:\n\n{summary_block}\n\nBest,\nYour Content Agent"
    send_email(user["email"], subject, body)


In [35]:
def summarize_and_email(posts_df, users_df, memory, topics, N=15):
    selected_posts_df = get_posts(posts_df, N=N)
    selected_posts_df["topic"] = selected_posts_df["title"].apply(lambda x: classify_topic(x, topics))
    available_topics = selected_posts_df["topic"].unique()

    for _, user in users_df.iterrows():
        decision = agent_decide(user, available_topics)
        suggested_topics = decision.get("available_or_suggested_topics", [])
        if not suggested_topics:
            print(f"Agent skipped user {user['name']}")
            continue

        summaries, post_ids = agent_plan(suggested_topics, selected_posts_df, memory)
        agent_act(user, suggested_topics, summaries)

        for post_id in post_ids:
            memory.setdefault("sent_posts", []).append(post_id)

    save_memory(memory)


In [33]:
summarize_and_email(posts_df, users_df, memory, topics, N=5)


--- Simulated Email ---
From: no-reply@agentic.local
To: fergusonvanessa@example.org
Subject: 📰 Curated Articles for You: Leadership
Body:
Hi Kelly Kent,

Based on your interests, here are some new articles you might enjoy:

- Hiring diverse teams drives innovation
The blog post discusses how diverse perspectives contribute to improved problem-solving and market relevance in global products. It highlights the importance of incorporating a variety of viewpoints to enhance the overall quality and effectiveness of products on a global scale.
Link: https://www.adams.info/

Best,
Your Content Agent


--- Simulated Email ---
From: no-reply@agentic.local
To: matthewgibbs@example.net
Subject: 📰 Curated Articles for You: Leadership, Remote Work
Body:
Hi Steven Mclaughlin,

Based on your interests, here are some new articles you might enjoy:

- Effective remote work strategies hinge on clear communication and trust
The blog post discusses how tools like async updates and regular check-ins can h

# Langchain Implementation

In [26]:
import os
os.environ["OPENAI_API_KEY"] = openai.api_key

In [30]:
from langchain.agents import initialize_agent, AgentType
from langchain.memory import ConversationBufferMemory
from langchain.tools import tool
from langchain_openai import ChatOpenAI
from langchain_experimental.tools import PythonREPLTool

global_posts_df = posts_df
memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)

# Define tools using LangChain's Tool interface
@tool
def get_posts_tool(N: int = 15) -> str:
    """Get N random blog posts."""
    N = int(N)
    sampled_df = get_posts(global_posts_df, N)
    return sampled_df.to_csv(index=False)


@tool
def classify_topic_tool(title: str) -> str:
    """Classify blog title into a predefined topic."""
    return classify_topic(title, topics)

@tool
def summarize_text_tool(text: str) -> str:
    """Summarize blog post content."""
    return summarize_text(text)

@tool
def send_email_tool(email: str, subject: str, body: str) -> str:
    """Send a simulated email."""
    send_email(email, subject, body)
    return f"Email sent to {email}."

# Wrap tools in LangChain Tool format
tools = [
    get_posts_tool,
    classify_topic_tool,
    summarize_text_tool,
    send_email_tool,
    PythonREPLTool()
]


# Initialize the agent
llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)
agent = initialize_agent(
    tools,
    llm,
    agent=AgentType.OPENAI_FUNCTIONS,
    memory=memory,
    verbose=True
)

# Example call to agent with a simple task
agent.run("Send me email with summaries of articles with topic 'ML'. My email is hihello@blogs.com")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `get_posts_tool` with `{}`


[0m[36;1m[1;3mpost_id,title,content,url,date
post_17,Machine learning fairness requires testing for demographic parity and representation,Responsible ML teams include evaluation metrics beyond accuracy.,http://holmes.biz/,2025-07-15
post_25,Hiring for potential—not pedigree—is a trend,Assessments and structured interviews outperform CVs when predicting on-the-job success.,http://donaldson.biz/,2025-07-20
post_22,Remote-first companies are leading with flexibility,"They invest in outcomes, not hours, and prioritize written documentation and async workflows.",https://www.sparks.com/,2025-07-11
post_27,Leadership storytelling creates alignment,Clear narratives help teams understand why decisions are made and where the company is headed.,http://rhodes-thompson.com/,2025-07-08
post_12,Biological systems inspire AI architectures,"Neural networks borrow from brain-like structures, and evolu

'I have sent an email to hihello@blogs.com with summaries of articles on the topic of Machine Learning. The email includes brief summaries of three articles related to Machine Learning.'

In [29]:
# response = agent.run("Get me 3 posts")
print(memory.buffer_as_messages)

[HumanMessage(content="Email me summaries of articles with topic 'ML'. My email is hihello@blogs.com"), AIMessage(content='I have emailed you a summary of the article on hiring. If you would like summaries of more articles on the topic of Machine Learning, please let me know!')]
