# Pretrain Research Intern Trial Notebook

Thank you so much for taking the time to go through this notebook! We are looking for an ambitious, creative and intelligent person to join our team as a research intern.

Your main tasks in the job will include:
- Creating evals for creative writing use cases such as tweet or email writing
- Testing different optimizers to figure out what prompts and models give the best outputs
- Log each of your experiments and extract insights from them
- Generalize the outcomes of your experiments to actionable insights for the company.
- Experiments can include things such as: changing automated prompt engineering (DSPy) hyperparameters, fine-tuning experiments, splitting tasks into multiple steps, RAG, experimenting with different input or output variables, etc. Your creativity is most valued here, what else can we do to improve the quality of LLM outputs?
- Reading papers and staying up-to-date with the latest new models and LLM techniques.
- Daily reports of experiments and new insights you gained (this can be things that you tried and that didn't lead to improvements, so we know what to avoid in the future too).

Not every use case is possible yet today with AI. We do not have AGI yet. But we can figure out what's possible today, and prepare for a future where models get better and better. Over time, more and more use cases will become available to us.

At Pretrain we help our users train AI workflows for their specific use case. They don't need any code or prompts or even AI knowledge. All they need is some examples of what they want the AI to do. It is working well for several use cases already, and your job is to help us expand the set of possible use cases.

The ideal person for this job is curious and comfortable with uncertain outcomes. Nobody knows whether a single experiment is going to work. There will be periods where it's hard to come by new improvements, but we have to keep trying.

If we like working together and you show great growth throughout your time at the company, you are eligible to become an AI lead which comes with a substantial salary increase compared to the intern position.

Now, let's get started with the task:

# Training A Personal Tweet Writer

You are given a dataset of 300 business-related tweets.

We have created a baseline AI that creates new tweets based on the examples from the dataset.

Your task is to:
1) Create an evaluation metric that measures the quality of the tweets
2) Create an AI system that creates better outputs according to your evaluation metric.

Your AI system should show a clear improvement compared to the baseline. 

When you have finished your task, please record a short video detailing how you approached this task:
- What did you try? Why did you try those things?
- What ended up working and what didn't?
- How did you chose your evaluation metric? 
- What % improvement did you get?
- What are the strengths and weaknesses of your approach?
- What ideas do you have for improving the outputs further if you had more time and resources?

A key principle of our company to keep in mind for this task:
Scaling human judgement is at the core of Pretrain. Our AI outputs are only as good as the customer says they are.

# Resources
Resources to get you started:

DSPy:
https://x.com/lateinteraction/status/1777098439832293593
https://github.com/stanfordnlp/dspy/blob/main/examples/tweets/tweet_metric.py

Alternative, Adalflow:
https://adalflow.sylph.ai/tutorials/evaluation.html

Any out of the box thinking is also appreciated!

Just like in the real job, there aren't really any rules for what frameworks or strategies you should use. Whatever you can do to improve LLM outputs is fair game.

# Data

In [2]:
import json
with open('./data/hormozi_tweets.jsonl', 'r') as f:
    data = [json.loads(line) for line in f]

# make a list of 300 tweets
tweets = [item['tweet'] for item in data][:300]

# Baseline Tweet Generator

In [3]:
import random
import openai
import os
from dotenv import load_dotenv

load_dotenv()

random_tweets = random.sample(tweets, 10)

prompt = f"""
Past Tweets:
{random_tweets}

Create a new tweet based on the past tweets.
"""

openai.api_key = os.environ.get('OPENAI_API_KEY')

def generate_tweet(prompt):
    response = openai.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": "You are a helpful assistant that creates tweets similar to past tweets from the user."},
            {"role": "user", "content": prompt}
        ],
        max_tokens=280,
    )
    return response.choices[0].message.content.strip().lower()

# Create a new tweet
new_tweet = generate_tweet(prompt)

In [4]:
new_tweet

'"success isn\'t about never hitting a low point; it\'s about recognizing that those moments are opportunities to push through while others quit. remember, every setback is just a setup for your comeback. 💪"'