# Creative Text Generation with a Large Language Model

In this assignment, we will use `microsoft/Phi-3-mini-4k-instruct` LLM model to generate story based on user prompt. It covers the process of setting up the environment, loading a pre-trained model and tokenizer, and using a high-level pipeline for efficient inference.


## Environment Setup

### Install required libraries

In [1]:
!pip install transformers>=4.40.1 accelerate>=0.27.2

### Change runtime type to use GPU

### Model and Tokenization Setup

Load pretrained `microsoft/Phi-3-mini-4k-instruct` model and it's corresponding tokenizer.

In [3]:
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "microsoft/Phi-3-mini-4k-instruct",
    device_map="cuda",
    torch_dtype="auto",
    trust_remote_code=False,
)

tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-3-mini-4k-instruct")


Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/2.67G [00:00<?, ?B/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.97G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/181 [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

added_tokens.json:   0%|          | 0.00/306 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/599 [00:00<?, ?B/s]

### Generating a pipeline

Define a pipeline utility which handle process of tokenizing the input, to decoding the output.

In [4]:
from transformers import pipeline

generator = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    return_full_text=False,
    max_new_tokens=500,
    do_sample=False
)

Device set to use cuda
The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


### Define the prompt and execute.

In [5]:
messages = [
    {
        "role": "user",
        "content": "Write a very short story on 'My first Hello World program."
    }
]

output = generator(messages)


In [6]:
output[0]["generated_text"]

' Once upon a time, in the land of Codeville, there was a young coder named Alex. Alex had just begun their journey into the world of programming and was eager to learn the first step. They had heard of a magical phrase that every programmer must utter before they could create anything truly wondrous. This phrase was "Hello, World!"\n\nOne sunny afternoon, Alex sat down at their computer, a shiny new machine that hummed with potential. They opened their trusty text editor, a place where they could weave their code into existence. With a deep breath, Alex typed out the phrase, carefully following the instructions they had read in a book about programming.\n\n"print(\'Hello, World!\')"\n\nAlex hit the run button, and the computer whirred to life. A small window popped up, and there it was, the first message Alex had ever seen on their screen:\n\n"Hello, World!"\n\nAlex\'s heart swelled with pride. They had done it! They had spoken their first words to the computer, and it had responded. 