## Visual e-reader: basic functionality

This notebook shows the core design of the "visual e-reader" application, which envisions selected passages in a story.

Given a story and one particular fragment in the story to visualize, an LLM is used to generate a scene description for the fragment. This scene description is used as the prompt for a text-to-image model in order to depict the fragment.

In [None]:
with open("llm_prompt.txt", "r") as f:
    llm_prompt_template = f.read()
print(llm_prompt_template)

In [None]:
story = ("Lisa has a beautiful sapphire ring. "
"She always takes it off to wash her hands. "
"One afternoon, she noticed it was missing from her finger! "
"Lisa searched everywhere she had been that day. "
"She was elated when she found it on the bathroom floor!")

In [None]:
llm_prompt = llm_prompt_template.format(
    story=story,
    fragment="She was elated when she found it on the bathroom floor!")

In [None]:
import replicate, os

client = replicate.client.Client(api_token=os.environ["REPLICATE_API_KEY"])
scene_description = "".join(client.run(
    "meta/meta-llama-3.1-405b-instruct",
    input={"prompt": llm_prompt}))
print(scene_description)

In [None]:
from IPython.display import Image

url = client.run("black-forest-labs/flux-1.1-pro",
                 input={"prompt": scene_description,
                        "output_format": "png"})
Image(url, width=300)