# Generate Synthetic Taylor Swift Lyrics with Gretel GPT

* This notebook demonstrates how to use Gretel GPT to generate synthetic Taylor Swift lyrics.
* To run this notebook, you will need an API key from the [Gretel Console](https://console.gretel.ai/).

## Getting Started

In [None]:
%%capture
!pip install -U gretel-client

In [None]:
import pandas as pd

from gretel_client import configure_session
from gretel_client.helpers import poll
from gretel_client.projects import create_or_get_unique_project, get_project

In [None]:
# Log into Gretel
configure_session(api_key="prompt", cache="yes", endpoint="https://api.gretel.cloud", validate=True, clear=True)

pd.set_option('max_colwidth', None)

## Load and preview training data

In [None]:
# Specify a dataset to train on 
DATASET_PATH = 'https://gretel-public-website.s3.us-west-2.amazonaws.com/datasets/taylor_swift_lyrics/TaylorSwiftLyrics.csv' 
df = pd.read_csv(DATASET_PATH, usecols=['text'])

# Print human-friendly preview of training data
print(df['text'][0])

## Create the model configuration

In this notebook we will use GPT-Neo, a transformer model designed using EleutherAI's replication of OpenAI's GPT-3 Architecture. This model has been pre-trained on the Pile, a large-scale dataset using 300 billion tokens over 572,300 steps. In this example, we will finetune GPT-Neo to generate synthetic Taylor Swift lyrics.

In [None]:
config = {
  "models": [
    {
      "gpt_x": {
        "data_source": "__",
        "pretrained_model": "EleutherAI/gpt-neo-125M",
        "batch_size": 4,
        "epochs": 3,
        "weight_decay": 0.01,
        "warmup_steps": 100,
        "lr_scheduler": "linear",
        "learning_rate": 0.0002,
        "validation": 5
      }
    }
  ]
}

## Train the synthetic model

In [None]:
# Designate project
PROJECT = 'taylor-swift-lyrics'
project = create_or_get_unique_project(name=PROJECT)

# Create and submit model
model = project.create_model_obj(model_config=config, data_source=df)
model.name = f"{PROJECT}-gpt"
model.submit_cloud()

poll(model)

## Generate Lyrics

In [None]:
params={"maximum_text_length": 200, "top_p": 0.95, "num_records": 1}

record_handler = model.create_record_handler_obj(params = params)
record_handler.submit_cloud()
poll(record_handler)

In [None]:
# View Results
gpt_output = pd.read_csv(record_handler.get_artifact_link("data"), compression='gzip')
print(gpt_output['text'][0])