<a href="https://colab.research.google.com/github/galiperbas/soundtracks/blob/main/MusicGen.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [15]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [31]:
import os
import pandas as pd
import librosa
import soundfile as sf
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import ipywidgets as widgets
from IPython.display import display, Audio
import numpy as np
import scipy
import time

In [16]:
import os

soundtracks_path = '/content/drive/MyDrive/soundtracks'
games_excel_path = '/content/drive/MyDrive/Games.xlsx'

# Check if the soundtracks folder exists and list its contents
if os.path.exists(soundtracks_path):
    print(f"Contents of {soundtracks_path}:")
    for item in os.listdir(soundtracks_path):
        print(item)
else:
    print(f"Folder not found: {soundtracks_path}")

# Check if the Games.xlsx file exists
if os.path.exists(games_excel_path):
    print(f"File found: {games_excel_path}")
else:
    print(f"File not found: {games_excel_path}")

Contents of /content/drive/MyDrive/soundtracks:
Star Wars Battlefront.mp3
The Talos Principle.mp3
Heroes & Generals.mp3
Spelunky.mp3
Gris.mp3
Dying Light.mp3
Vampyr.mp3
Valheim.mp3
Killer is Dead.mp3
Battlefield 3.mp3
Battlefield V.mp3
Rogue Legacy.mp3
Overlord II.mp3
Chaos on Deponia.mp3
Crysis 2.mp3
Star Wars Battlefront II.mp3
Tabletop Simulator Rarrr!!.mp3
Dust An Elysian Tail.mp3
Dirt Showdown.mp3
Crash Bandicoot N. Sane Trilogy.mp3
Assassin's Creed Revelations.mp3
Outer Wilds.mp3
Day of Defeat Source.mp3
Max Payne 2 The Fall of Max Payne.mp3
Medal of Honor Allied Assault - Spearhead.mp3
Sniper Elite 4.mp3
Persona 5 Strikers.mp3
Life is Strange 2.mp3
The LEGO Ninjago Movie Video Game.mp3
Red Faction Guerrilla.mp3
Pillars of Eternity.mp3
Valiant Hearts The Great War.mp3
Machinarium.mp3
Yakuza Kiwami.mp3
The Swapper.mp3
Crysis.mp3
S.T.A.L.K.E.R. Call of Pripyat.mp3
Binary Domain.mp3
Shadow Tactics Blades of the Shogun.mp3
S.T.A.L.K.E.R. Clear Sky.mp3
Sins of a Solar Empire Rebellion

## Excel dosyasını okuma

### Subtask:
"Games.xlsx" dosyasını pandas kullanarak okuyun ve soundtrack dosyaları ile metin açıklamalarını (summary, genres, themes) eşleştirebileceğiniz bir yapı oluşturun.


**Reasoning**:
Import the pandas library and read the Excel file into a DataFrame. Then display the head of the DataFrame to inspect its structure.



In [17]:
import pandas as pd

df_games = pd.read_excel(games_excel_path)
display(df_games.head())

Unnamed: 0,name,summary,genres,themes
0,Name,Summary,Genres,Themes
1,Grand Theft Auto V,This bundle contains the original version of G...,"Shooter, Arcade","Action, Open world"
2,The Witcher 3: Wild Hunt + Dark Souls III,A bundle containing The Witcher 3: Wild Hunt a...,"Role-playing (RPG), Adventure","Action, Fantasy"
3,Portal Maze 2,,"Shooter, Platform, Puzzle",Action
4,Counter-Strike: Global Offensive,Not Found,,


**Reasoning**:
The first row of the dataframe appears to be header information rather than data. It should be removed for proper data processing.



In [18]:
df_games = df_games.iloc[1:].copy()
display(df_games.head())

Unnamed: 0,name,summary,genres,themes
1,Grand Theft Auto V,This bundle contains the original version of G...,"Shooter, Arcade","Action, Open world"
2,The Witcher 3: Wild Hunt + Dark Souls III,A bundle containing The Witcher 3: Wild Hunt a...,"Role-playing (RPG), Adventure","Action, Fantasy"
3,Portal Maze 2,,"Shooter, Platform, Puzzle",Action
4,Counter-Strike: Global Offensive,Not Found,,
5,Tomb Raider (2013),Not Found,,


## Soundtrack dosyalarını yükleme ve i̇şleme

### Subtask:
"soundtracks" klasöründeki MP3 dosyalarını yükleyin. Her bir ses dosyasını MusicGen'in işleyebileceği formata getirmek için uygun ses işleme kütüphanelerini kullanın.


**Reasoning**:
Install necessary libraries for loading and processing audio files.



In [19]:
!pip install --quiet librosa soundfile

**Reasoning**:
List MP3 files in the soundtracks directory and load each audio file and its sampling rate, storing the data in a dictionary.



In [20]:
import os
import librosa
import soundfile as sf

soundtracks_path = '/content/drive/MyDrive/soundtracks'
audio_data = {}

if os.path.exists(soundtracks_path):
    mp3_files = [f for f in os.listdir(soundtracks_path) if f.endswith('.mp3')]
    for mp3_file in mp3_files:
        file_path = os.path.join(soundtracks_path, mp3_file)
        try:
            # Load the audio file
            y, sr = librosa.load(file_path, sr=None)
            audio_data[mp3_file] = {"array": y, "sampling_rate": sr}
            print(f"Loaded {mp3_file} with sampling rate {sr}")
        except Exception as e:
            print(f"Error loading {mp3_file}: {e}")
else:
    print(f"Soundtracks folder not found at {soundtracks_path}")

print(f"Loaded {len(audio_data)} audio files.")

Loaded Star Wars Battlefront.mp3 with sampling rate 44100
Loaded The Talos Principle.mp3 with sampling rate 44100
Loaded Heroes & Generals.mp3 with sampling rate 44100
Loaded Spelunky.mp3 with sampling rate 44100
Loaded Gris.mp3 with sampling rate 44100
Loaded Dying Light.mp3 with sampling rate 44100
Loaded Vampyr.mp3 with sampling rate 44100
Loaded Valheim.mp3 with sampling rate 44100
Loaded Killer is Dead.mp3 with sampling rate 44100
Loaded Battlefield 3.mp3 with sampling rate 44100
Loaded Battlefield V.mp3 with sampling rate 44100
Loaded Rogue Legacy.mp3 with sampling rate 44100
Loaded Overlord II.mp3 with sampling rate 44100
Loaded Chaos on Deponia.mp3 with sampling rate 44100
Loaded Crysis 2.mp3 with sampling rate 44100
Loaded Star Wars Battlefront II.mp3 with sampling rate 44100
Loaded Tabletop Simulator Rarrr!!.mp3 with sampling rate 44100
Loaded Dust An Elysian Tail.mp3 with sampling rate 44100
Loaded Dirt Showdown.mp3 with sampling rate 48000
Loaded Crash Bandicoot N. Sane Tri

## Metin ve ses verilerini eşleştirme

### Subtask:
Kullanıcının girdiği metin istemini, Excel dosyasındaki ilgili soundtrack'lerin metin açıklamaları ile karşılaştırarak veya birleştirerek MusicGen için girdi hazırlayın. Ayrıca, ilgili soundtrack'in ses verilerini de ses istemi olarak hazırlayın.


**Reasoning**:
I need to get a text prompt from the user, find the most relevant game descriptions in the dataframe, retrieve the corresponding audio data, and then prepare the input for the MusicGen model using the processor.



In [21]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import ipywidgets as widgets
from IPython.display import display
import numpy as np

def get_most_similar_game(user_prompt, df, vectorizer, tfidf_matrix):
    """Finds the most similar game description to the user prompt."""
    user_tfidf = vectorizer.transform([user_prompt])
    cosine_similarities = cosine_similarity(user_tfidf, tfidf_matrix).flatten()
    most_similar_index = np.argmax(cosine_similarities)
    return df.iloc[most_similar_index]

# Combine relevant text columns for similarity comparison
df_games['combined_text'] = df_games['summary'].fillna('') + ' ' + df_games['genres'].fillna('') + ' ' + df_games['themes'].fillna('')

# Create a TF-IDF vectorizer and fit it on the combined text
vectorizer = TfidfVectorizer()
tfidf_matrix = vectorizer.fit_transform(df_games['combined_text'])

# Create a text input widget for the user prompt
prompt_input = widgets.Text(
    value='',
    placeholder='Enter your music description',
    description='Prompt:',
    disabled=False
)

def on_prompt_submit(sender):
    """Handles the submission of the user prompt."""
    user_prompt = sender.value
    if user_prompt:
        print(f"User prompt: {user_prompt}")

        # Find the most similar game
        most_similar_game = get_most_similar_game(user_prompt, df_games, vectorizer, tfidf_matrix)
        game_name = most_similar_game['name']
        print(f"Most similar game found: {game_name}")

        # Determine the expected audio file name (assuming it's game_name.mp3)
        audio_file_name = f"{game_name}.mp3"

        # Retrieve the audio data and sampling rate
        if audio_file_name in audio_data:
            audio_sample = audio_data[audio_file_name]
            audio_array = audio_sample["array"]
            audio_sampling_rate = audio_sample["sampling_rate"]
            print(f"Retrieved audio data for {audio_file_name}")

            # Prepare input for MusicGen
            musicgen_inputs = processor(
                audio=audio_array,
                sampling_rate=audio_sampling_rate,
                text=[user_prompt],
                padding=True,
                return_tensors="pt",
            )
            print("MusicGen inputs prepared.")
            # At this point, musicgen_inputs can be used for model generation
            # For demonstration, we'll just print confirmation
            print(f"Prepared input tensor shape: {musicgen_inputs['input_values'].shape}")

        else:
            print(f"Audio file not found for game: {audio_file_name}")
            # If audio is not found, prepare input with only text prompt
            musicgen_inputs = processor(
                text=[user_prompt],
                padding=True,
                return_tensors="pt",
            )
            print("Prepared input with only text prompt.")
            print(f"Prepared input tensor shape: {musicgen_inputs['input_ids'].shape}")


prompt_input.on_submit(on_prompt_submit)

print("Please enter your music description in the text box below and press Enter.")
display(prompt_input)

Please enter your music description in the text box below and press Enter.


Text(value='', description='Prompt:', placeholder='Enter your music description')

## Musicgen ile müzik üretme

### Subtask:
Hazırlanan metin ve ses girdilerini kullanarak MusicGen modelinden müzik üretin. Hem metin koşullu hem de ses yönlendirmeli üretimi bir arada kullanmayı düşünebilirsiniz.


**Reasoning**:
Move the prepared musicgen_inputs to the correct device and then generate audio using the model.



In [24]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import ipywidgets as widgets
from IPython.display import display, Audio
import numpy as np
import scipy
import time

def get_most_similar_game(user_prompt, df, vectorizer, tfidf_matrix):
    """Finds the most similar game description to the user prompt."""
    user_tfidf = vectorizer.transform([user_prompt])
    cosine_similarities = cosine_similarity(user_tfidf, tfidf_matrix).flatten()
    most_similar_index = np.argmax(cosine_similarities)
    return df.iloc[most_similar_index]

# Combine relevant text columns for similarity comparison
df_games['combined_text'] = df_games['summary'].fillna('') + ' ' + df_games['genres'].fillna('') + ' ' + df_games['themes'].fillna('')

# Create a TF-IDF vectorizer and fit it on the combined text
vectorizer = TfidfVectorizer()
tfidf_matrix = vectorizer.fit_transform(df_games['combined_text'])

# Create a text input widget for the user prompt
prompt_input = widgets.Text(
    value='',
    placeholder='Enter your music description',
    description='Prompt:',
    disabled=False
)

# Create an output widget to display results
output_area = widgets.Output()

def on_generate_click(b):
    """Handles the click event of the generate button."""
    with output_area:
        output_area.clear_output()
        user_prompt = prompt_input.value
        if user_prompt:
            print(f"User prompt: {user_prompt}")

            # Find the most similar game
            most_similar_game = get_most_similar_game(user_prompt, df_games, vectorizer, tfidf_matrix)
            game_name = most_similar_game['name']
            print(f"Most similar game found: {game_name}")

            # Determine the expected audio file name (assuming it's game_name.mp3)
            audio_file_name = f"{game_name}.mp3"

            # Retrieve the audio data and sampling rate
            audio_sample = audio_data.get(audio_file_name)

            if audio_sample:
                audio_array = audio_sample["array"]
                audio_sampling_rate = audio_sample["sampling_rate"]
                print(f"Retrieved audio data for {audio_file_name}")

                # Prepare input for MusicGen with audio and text
                musicgen_inputs = processor(
                    audio=audio_array,
                    sampling_rate=audio_sampling_rate,
                    text=[user_prompt],
                    padding=True,
                    return_tensors="pt",
                )
                print("MusicGen inputs prepared with audio and text.")
            else:
                print(f"Audio file not found for game: {audio_file_name}. Preparing input with only text prompt.")
                # If audio is not found, prepare input with only text prompt
                musicgen_inputs = processor(
                    text=[user_prompt],
                    padding=True,
                    return_tensors="pt",
                )
                print("Prepared input with only text prompt.")

            # Move inputs to the correct device
            musicgen_inputs = {k: v.to(device) for k, v in musicgen_inputs.items()}
            print(f"MusicGen inputs moved to device: {device}")

            # Calculate max_new_tokens for 2 minutes of audio
            # frame_rate is 50 Hz (50 tokens per second)
            # 2 minutes = 120 seconds
            # 120 seconds * 50 tokens/second = 6000 tokens
            max_new_tokens = 6000

            # Generate audio using the model
            print(f"Generating audio for {max_new_tokens} tokens (approx 2 minutes)...")
            audio_values = model.generate(**musicgen_inputs, do_sample=True, guidance_scale=3, max_new_tokens=max_new_tokens)
            print("Audio generation complete.")

            # Get the sampling rate from the model config
            sampling_rate = model.config.audio_encoder.sampling_rate

            # Display the generated audio in the notebook
            print("Displaying generated audio:")
            display(Audio(audio_values[0].cpu().numpy(), rate=sampling_rate))

            # Save the generated audio to a .wav file
            timestamp = int(time.time())
            output_filename = f"generated_music_{timestamp}.wav"
            scipy.io.wavfile.write(output_filename, rate=sampling_rate, data=audio_values[0, 0].cpu().numpy())
            print(f"Generated audio saved to {output_filename}")

        else:
            print("Please enter a text prompt to generate music.")

# Create a button to trigger music generation
generate_button = widgets.Button(description="Generate Music")
generate_button.on_click(on_generate_click)

print("Please enter your music description in the text box below and click 'Generate Music'.")
display(widgets.VBox([prompt_input, generate_button, output_area]))

Please enter your music description in the text box below and click 'Generate Music'.


VBox(children=(Text(value='', description='Prompt:', placeholder='Enter your music description'), Button(descr…

## Üretilen müziği kaydetme veya çalma

### Subtask:
Üretilen müzik çıktısını kullanıcıya sunmak için bir ses dosyası olarak kaydedin veya doğrudan notebook içinde çalın.


**Reasoning**:
Add code to display the generated audio in the notebook and save it as a WAV file within the `on_prompt_submit` function, as per the instructions.



In [38]:
import scipy
from IPython.display import Audio
import time

def on_prompt_submit(sender):
    """Handles the submission of the user prompt."""
    user_prompt = sender.value
    if user_prompt:
        print(f"User prompt: {user_prompt}")

        # Find the most similar game
        most_similar_game = get_most_similar_game(user_prompt, df_games, vectorizer, tfidf_matrix)
        game_name = most_similar_game['name']
        print(f"Most similar game found: {game_name}")

        # Determine the expected audio file name (assuming it's game_name.mp3)
        audio_file_name = f"{game_name}.mp3"

        # Retrieve the audio data and sampling rate
        audio_sample = audio_data.get(audio_file_name)

        if audio_sample:
            audio_array = audio_sample["array"]
            audio_sampling_rate = audio_sample["sampling_rate"]
            print(f"Retrieved audio data for {audio_file_name}")

            # Prepare input for MusicGen with audio and text
            musicgen_inputs = processor(
                audio=audio_array,
                sampling_rate=audio_sampling_rate,
                text=[user_prompt],
                padding=True,
                return_tensors="pt",
            )
            print("MusicGen inputs prepared with audio and text.")
        else:
            print(f"Audio file not found for game: {audio_file_name}. Preparing input with only text prompt.")
            # If audio is not found, prepare input with only text prompt
            musicgen_inputs = processor(
                text=[user_prompt],
                padding=True,
                return_tensors="pt",
            )
            print("Prepared input with only text prompt.")

        # Move inputs to the correct device
        musicgen_inputs = {k: v.to(device) for k, v in musicgen_inputs.items()}
        print(f"MusicGen inputs moved to device: {device}")

        # Generate audio using the model
        print("Generating audio...")
        audio_values = model.generate(**musicgen_inputs, do_sample=True, guidance_scale=3, max_new_tokens=256)
        print("Audio generation complete.")

        # Get the sampling rate from the model config
        sampling_rate = model.config.audio_encoder.sampling_rate

        # Display the generated audio in the notebook
        print("Displaying generated audio:")
        display(Audio(audio_values[0].cpu().numpy(), rate=sampling_rate))

        # Save the generated audio to a .wav file
        timestamp = int(time.time())
        output_filename = f"generated_music_{timestamp}.wav"
        scipy.io.wavfile.write(output_filename, rate=sampling_rate, data=audio_values[0, 0].cpu().numpy())
        print(f"Generated audio saved to {output_filename}")


prompt_input.on_submit(on_prompt_submit)

print("Please enter your music description in the text box below and press Enter to generate music.")
display(prompt_input)

Please enter your music description in the text box below and press Enter to generate music.


Text(value='', description='Prompt:', placeholder='Enter your music description')

User prompt: An adrenaline-pumping, fast-paced action track with intense drums, powerful strings, and dramatic synths — perfect for chase scenes or high-stakes moments.
Most similar game found: Need for Speed Hot Pursuit - Remastered
Audio file not found for game: Need for Speed Hot Pursuit - Remastered.mp3. Preparing input with only text prompt.
Prepared input with only text prompt.
MusicGen inputs moved to device: cuda:0
Generating audio...
Audio generation complete.
Displaying generated audio:


Generated audio saved to generated_music_1750238114.wav
User prompt: An adrenaline-pumping, fast-paced action track with intense drums, powerful strings, and dramatic synths — perfect for chase scenes or high-stakes moments.
Most similar game found: Need for Speed Hot Pursuit - Remastered
Audio file not found for game: Need for Speed Hot Pursuit - Remastered.mp3. Preparing input with only text prompt.
Prepared input with only text prompt.
MusicGen inputs moved to device: cuda:0
Generating audio...
Audio generation complete.
Displaying generated audio:


Generated audio saved to generated_music_1750238124.wav
