# Building Your First Multimodal GenAI App

![Alt text](img/multimodal_app_1.png)

In this notebook, we'll explore a variety of SOTA GenAI models and get a sense of how to stitch them together!

## Get our API Keys in our environment

- Create a [Groq](https://groq.com/) account and navigate [here to get your API key](https://console.groq.com/keys). They have a free tier with a bunch of LLMs (see screenshot below)!
- If you'd prefer to use OpenAI, you can do that and get [your API key here](https://platform.openai.com/api-keys).
- To use the models below as is, you'll need a [Replicate account](https://replicate.com/). If you're using this notebook in a workshop, chances are Hugo is able to provision free Replicate credits for you so ask him, if he hasn't mentioned it. If you're at ODSC APAC (August, 2024), complete [this form](https://forms.gle/AcaY1dki6Gxpgd4y7) and Hugo will send you credits (expire Aug 20)
- Many of these models [you can also find on HuggingFace](https://huggingface.co/models), if you'd prefer.

![Alt text](img/multimodal_app_2.png)

In [1]:
import getpass


# Prompt for the Replicate API key
replicate_api_key = getpass.getpass("Please enter your Replicate API key: ")
print("Replicate API key captured successfully!")

# Prompt for the Grok API key
groq_api_key = getpass.getpass("Please enter your Groq API key: ")
print("Groq API key captured successfully!")

# # Prompt for the OpenAI API key
# openai_api_key = getpass.getpass("Please enter your OpenAI API key: ")
# print("Replicate OpenAI key captured successfully!")


Replicate API key captured successfully!
Groq API key captured successfully!


## Suno Bark: text to audio

First up, we'll experiment with the [Suno Bark](https://github.com/suno-ai/bark) text to audio model:

In [2]:
import replicate

# Create a Replicate client instance with the API token
client = replicate.Client(api_token=replicate_api_key)

# Define the input parameters for the model
input_params = {
    "prompt": "Hello, my name is Hugo. And, uh — and I like pizza. [laughs] But I also have other interests such as playing chess. [chuckles]",
    "text_temp": 0.7,
    "output_full": False,
    "waveform_temp": 0.7,
    "history_prompt": "announcer"
}

# Run the model using Replicate API
try:
    output = client.run(
        "suno-ai/bark:b76242b40d67c76ab6742e987628a2a9ac019e11d56ab96c4e91ce03b79b2787",
        input=input_params
    )
    print(output)
except Exception as e:
    print(f"Error: {e}")

{'audio_out': 'https://replicate.delivery/czjl/swKgub3nE0IALZpzMypqGmBfH3GZDl983qIi07fzwj4aKHSTA/audio.wav'}


### LLM output --> Suno bark

But what if we want to pipe the output of an LLM into Bark?

In [3]:
from groq import Groq

def get_llm_response(user_input):
    client = Groq(
        api_key=groq_api_key)

    response = client.chat.completions.create(
        messages=[
            {
                "role": "user",
                "content": user_input,
            }
        ],
        model="llama3-8b-8192",
    )

    return response.choices[0].message.content

# from openai import OpenAI
# import os


# def get_llm_response(user_input):
#     client = OpenAI(api_key=openai_api_key)
    
#     response = client.chat.completions.create(
#         model="gpt-3.5-turbo-0613",
#         messages=[
#     {"role": "user", "content": user_input}
#   ]
#         )
#     return response.choices[0].message.content

In [4]:
song = get_llm_response("a short pirates sea shanty")
print(song)

Arrrr, here's a short pirate sea shanty for ye:

(Verse 1)
Oh, we set sail on the Black Swan's tide
Bound for Spain, where the treasure's hide
Our crew's as salty as the sea we sail
We'll plunder and pillage, without fail

(Chorus)
Heave ho, me hearties, let the anchor go
In the Caribbean, we'll find our gold, yo
Heave ho, me hearties, the winds they do blow
We'll sing and we'll fight, until our victory show

(Verse 2)
Our captain's beard is long and gray
He's fought in battles, night and day
Our bosun's got a hook for a hand
He'll make ye walk the plank, if ye don't stand

(Chorus)
Heave ho, me hearties, let the anchor go
In the Caribbean, we'll find our gold, yo
Heave ho, me hearties, the winds they do blow
We'll sing and we'll fight, until our victory show

(Bridge)
So raise yer tankards, me hearties all
And toast to the sea, and the courage we'll call
For we be pirates, bold and true
And our legend will live, forever anew

(Chorus)
Heave ho, me hearties, let the anchor go
In the Ca

In [5]:
# Define the input parameters for the model
input_params = {
    "prompt": song,
    "text_temp": 0.7,
    "output_full": False,
    "waveform_temp": 0.7,
    "history_prompt": "announcer",
   # "duration": 30
}

# Run the model using Replicate API
try:
    output = client.run(
        "suno-ai/bark:b76242b40d67c76ab6742e987628a2a9ac019e11d56ab96c4e91ce03b79b2787",
        input=input_params
    )
    print(output)
except Exception as e:
    print(f"Error: {e}")

{'audio_out': 'https://replicate.delivery/czjl/GTHvkAE45ELrIl2But8AJwTveBpeejReSLDXJpUB6gY3scINB/audio.wav'}


This is totally bent and makes no sense ☝️

## Text to music w/ meta musicgen

What if we wanted to create some music with text? Let's try Musicgen from Meta.

In [6]:
input = {
    "prompt": "Horns and Drums. Edo25 major g melodies that sound triumphant and cinematic. Leading up to a crescendo that resolves in a 9th harmonic",
    "model_version": "stereo-large",
    "output_format": "mp3",
    "normalization_strategy": "peak"
}

output = client.run(
    "meta/musicgen:671ac645ce5e552cc63a54a2bbff63fcf798043055d2dac5fc9e36a837eedcfb",
    input=input
)
print(output)
#=> "https://replicate.delivery/pbxt/OeLYIQiltdzMaCex1shlEFy6...

https://replicate.delivery/yhqm/ToWKm0N4lUqFLh8N8nh9k4fhzyz8veKeHe0Lkrf2wxlFc5QaC/out.mp3


In [7]:
input = {
    "prompt": "Ancient Trip Hop with Throat Singing",
    "model_version": "stereo-large",
    "output_format": "mp3",
    "normalization_strategy": "peak",
    "duration": 30 
}

output = client.run(
    "meta/musicgen:671ac645ce5e552cc63a54a2bbff63fcf798043055d2dac5fc9e36a837eedcfb",
    input=input
)
print(output)
#=> "https://replicate.delivery/pbxt/OeLYIQiltdzMaCex1shlEFy6...

https://replicate.delivery/yhqm/f3gL4sACCDx5RaFu8MF5wsnhz8vqWROjfTcBw5PscPBqMHSTA/out.mp3


## Text to music with riffusion

There are lots of other models to experiment with, such as riffusion:

In [8]:
output = client.run(
    "riffusion/riffusion:8cf61ea6c56afd61d8f5b9ffd14d7c216c0a93844ce2d82ac1c9ecc9c7f24e05",
    input={
        "alpha": 0.5,
        "prompt_a": "West African Desert Blues",
        "prompt_b": "Throat Singing",
        "denoising": 0.75,
        "seed_image_id": "vibes",
        "num_inference_steps": 50
    }
)
print(output)

{'audio': 'https://replicate.delivery/czjl/G2OwfkftGzomoETExQylu2tXMFvmPWtMqfbYYJ47BPvnZOkmA/gen_sound.wav', 'spectrogram': 'https://replicate.delivery/czjl/pOiQNgILoHZ4G9e40ejnBefju9W3RBbBHvSIWdBT4BcMzcINB/spectrogram.jpg'}


___

## Experiment: One prompt to many models

Now what if we wanted to use a single prompt to create text, audio, images, and video?

In [9]:
message = "The Waffle House is really messing up the pancakes and bacon tonight HOLY MOLEY and there's anarchist jazz also!"

### text to image

In [10]:
input = {
    "prompt": message
}

output = client.run(
    "fofr/epicrealismxl-lightning-hades:0ca10b1fd361c1c5568720736411eaa89d9684415eb61fd36875b4d3c20f605a",
    input=input
)
print(output)
#=> ["https://replicate.delivery/pbxt/ulYZRIyAUDYpOZfl7OjhrKx...

['https://replicate.delivery/pbxt/2C3d46eff6ITXIJf3QCf0KteMMFiYUmtdzAefNc9zC4V1MHSTA/R8__00001_.webp']


### text to audio

In [11]:
# Define the input parameters for the model
input_params = {
    "prompt": message,
    "text_temp": 0.7,
    "output_full": False,
    "waveform_temp": 0.7,
    "history_prompt": "announcer",
   # "duration": 30
}

# Run the model using Replicate API
try:
    output = client.run(
        "suno-ai/bark:b76242b40d67c76ab6742e987628a2a9ac019e11d56ab96c4e91ce03b79b2787",
        input=input_params
    )
    print(output)
except Exception as e:
    print(f"Error: {e}")

{'audio_out': 'https://replicate.delivery/czjl/bUffZN42SoioX0S9lVV9s4hKH3fL6sx4w1Dge1qGYvtk1cINB/audio.wav'}


### text to music

In [12]:
input = {
    "prompt": message,
    "model_version": "stereo-large",
    "output_format": "mp3",
    "normalization_strategy": "peak",
    "duration": 30 
}

output = client.run(
    "meta/musicgen:671ac645ce5e552cc63a54a2bbff63fcf798043055d2dac5fc9e36a837eedcfb",
    input=input
)
print(output)
#=> "https://replicate.delivery/pbxt/OeLYIQiltdzMaCex1shlEFy6...

https://replicate.delivery/yhqm/Du1yhfPBxM3XLS17tbC7x7yyOe2rOfsRZRsgGOR0bb7e5cINB/out.mp3


## Many models at once

Let's write some utility functions that use these models:

In [13]:
def generate_epic_realism(prompt, api_token):
    # Create a Replicate client instance with the API token
    client = replicate.Client(api_token=replicate_api_key)

    # Define the input parameters for the model
    input_data = {
        "prompt": prompt
    }

    # Run the model using Replicate API
    output = client.run(
        "fofr/epicrealismxl-lightning-hades:0ca10b1fd361c1c5568720736411eaa89d9684415eb61fd36875b4d3c20f605a",
        input=input_data
    )
    
    return output



def generate_suno_bark(prompt, api_token, text_temp=0.7, output_full=False, waveform_temp=0.7, history_prompt="announcer"):
    # Create a Replicate client instance with the API token
    client = replicate.Client(api_token=replicate_api_key)

    # Define the input parameters for the model
    input_params = {
        "prompt": prompt,
        "text_temp": text_temp,
        "output_full": output_full,
        "waveform_temp": waveform_temp,
        "history_prompt": "zh_speaker_7",
    }

    # Run the model using Replicate API
    try:
        output = client.run(
            "suno-ai/bark:b76242b40d67c76ab6742e987628a2a9ac019e11d56ab96c4e91ce03b79b2787",
            input=input_params
        )
        return output
    except Exception as e:
        print(f"Error: {e}")
        return None




def generate_music_gen(prompt, api_token, duration=30, model_version="stereo-large", output_format="mp3", normalization_strategy="peak"):
    # Create a Replicate client instance with the API token
    client = replicate.Client(api_token=replicate_api_key)

    # Define the input parameters for the model
    input_data = {
        "prompt": prompt,
        "model_version": model_version,
        "output_format": output_format,
        "normalization_strategy": normalization_strategy,
        "duration": duration 
    }

    # Run the model using Replicate API
    output = client.run(
        "meta/musicgen:671ac645ce5e552cc63a54a2bbff63fcf798043055d2dac5fc9e36a837eedcfb",
        input=input_data
    )
    
    return output


def generate_suno_bark(prompt, api_token, text_temp=0.7, output_full=False, waveform_temp=0.7, history_prompt="announcer"):
    # Create a Replicate client instance with the API token
    client = replicate.Client(api_token=replicate_api_key)

    # Define the input parameters for the model
    input_params = {
        "prompt": prompt,
        "text_temp": text_temp,
        "output_full": output_full,
        "waveform_temp": waveform_temp,
        "history_prompt": "announcer",
    }

    # Run the model using Replicate API
    try:
        output = client.run(
            "suno-ai/bark:b76242b40d67c76ab6742e987628a2a9ac019e11d56ab96c4e91ce03b79b2787",
            input=input_params
        )
        return output
    except Exception as e:
        print(f"Error: {e}")
        return None






Let's test them out:

In [14]:
message = "crazy wild zombie party at the blaring symphony orchestra"
output = generate_epic_realism(message, replicate_api_key)
print(output)

output = generate_suno_bark(message, replicate_api_key)
print(output)

output = generate_music_gen(message, replicate_api_key)
print(output)


['https://replicate.delivery/pbxt/qhe4MITqZGylCyKPeY8LHa86I6ep31HXV7AQh5z9oEPEdOkmA/R8__00001_.webp']
{'audio_out': 'https://replicate.delivery/czjl/u0QcO80BT9quEloJYuXzCKLTUaNloq9qrLpm4WP7fplanDpJA/audio.wav'}
https://replicate.delivery/yhqm/0Jyd5dzh5qJWMZfx8GMnsDkrKwoy95zo9abqp4oSoedcQHSTA/out.mp3


In [15]:
# Define your API token and prompt message
# api_token = 'your_api_token_here'
message = "The Waffle House messing it up for real with the pancakes and bacon and punk abstract jazz, yo!"

# Run the Epic Realism model
epicrealism_output = generate_epic_realism(message, replicate_api_key)
print("Epic Realism Output:")
print(epicrealism_output)

# Run the Meta MusicGen model
musicgen_output = generate_music_gen(message, replicate_api_key)
print("Meta MusicGen Output:")
print(musicgen_output)

# Run the Suno Bark model
bark_output = generate_suno_bark(message, replicate_api_key)
print("Suno Bark Output:")
print(bark_output)



Epic Realism Output:
['https://replicate.delivery/pbxt/ap5vInR47d7VAhnVO3gzhPv3ltKwZNueDufEUBAxyLFegOkmA/R8__00001_.webp']
Meta MusicGen Output:
https://replicate.delivery/yhqm/RQYRBFXtyRr7ChSfn2rpkDwZrBwMgZVE9LyX48YXkF6zoDpJA/out.mp3
Suno Bark Output:
{'audio_out': 'https://replicate.delivery/czjl/X4tGcQZIn0rjENB0hDFr5nSlzorIcHRem5xzjdGfeXx7jOkmA/audio.wav'}


### Experiment: text to video

In [16]:
message = "The Waffle House messing it up for real with the pancakes and bacon and punk abstract jazz, yo!"

input = {
    "sampler": "klms",
    "max_frames": 100,
    "animation_prompts": message
}

output = client.run(
    "deforum/deforum_stable_diffusion:e22e77495f2fb83c34d5fae2ad8ab63c0a87b6b573b6208e1535b23b89ea66d6",
    input=input
)
print(output)
#=> "https://replicate.delivery/mgxm/873a1cc7-0427-4e8d-ab3c-...

https://replicate.delivery/yhqm/ghKmg6PI2s7fBS1mkWwJe1a9jD5V1MpisDZ0hGB2sX12UHSTA/out.mp4
