# Pop2Piano - Inference (Kaggle) üéπ

This notebook is for generating piano covers using the pre-trained Pop2Piano model.

## How to Use on Kaggle:
1. Click **"Add Data"** button (right side) and upload your MP3/WAV file
2. Run all cells in order
3. Find the output MIDI in the **Output** tab (right side)

## Key Steps:
1. **Setup Environment:** Install dependencies.
2. **Clone Repository:** Get the code.
3. **Load Model:** Download/Load the pre-trained model.
4. **Generate:** Process audio and create piano MIDI.

In [None]:
# @title 1. Install Dependencies üì¶
import os
import sys
import subprocess

print("Installing dependencies... ‚è≥")

# Install Python packages
subprocess.check_call([sys.executable, "-m", "pip", "install", "-q", "transformers", "essentia", "librosa", "resampy", "pretty_midi", "midi2audio", "scipy", "yt-dlp", "imageio-ffmpeg", "omegaconf", "joblib", "tqdm"])

# Install System packages
os.system("sudo apt-get update -q")
os.system("sudo apt-get install -y -q fluidsynth fluid-soundfont-gm ffmpeg")

print("Done! ‚úÖ")

In [None]:
# @title 2. Clone Repository üêô
import os

repo_url = "https://github.com/kareemkamal10/pop2piano.git"
repo_name = "pop2piano"

%cd /kaggle/working

if os.path.exists(repo_name):
    print("Repository already cloned. Pulling latest changes...")
    %cd {repo_name}
    !git pull
else:
    print(f"Cloning {repo_url}...")
    !git clone {repo_url}
    %cd {repo_name}

print(f"Current working directory: {os.getcwd()}")

In [None]:
# @title 3. Load Pre-trained Model & Generate üéπ
from transformers import Pop2PianoForConditionalGeneration, Pop2PianoProcessor
import librosa
import glob

# 1. Load Model
print("‚è≥ Loading Model...")
model = Pop2PianoForConditionalGeneration.from_pretrained("sweetcocoa/pop2piano")
processor = Pop2PianoProcessor.from_pretrained("sweetcocoa/pop2piano")
print("‚úÖ Model Loaded!")

# 2. Find Audio File (uploaded via "Add Data" button)
audio_files = glob.glob("/kaggle/input/**/*.mp3", recursive=True) + glob.glob("/kaggle/input/**/*.wav", recursive=True)

if audio_files:
    audio_filename = audio_files[0]
    print(f"üéµ Found audio: {audio_filename}")
else:
    raise FileNotFoundError("‚ö†Ô∏è No audio file found! Please add an MP3/WAV file via 'Add Data' button on the right.")

# 3. Process & Generate
audio, sr = librosa.load(audio_filename, sr=44100)
inputs = processor(audio=audio, sampling_rate=sr, return_tensors="pt")

print("üéπ Generating MIDI... (This may take a minute)")

model_output = model.generate(
    input_features=inputs["input_features"],
    composer="composer1"
)

# 4. Decode & Save (output goes to /kaggle/working/ which appears in Output tab)
output_midi_name = "/kaggle/working/output_midi.mid"

tokenizer_output = processor.batch_decode(
    token_ids=model_output, 
    feature_extractor_output=inputs
)["pretty_midi_objects"][0]

tokenizer_output.write(output_midi_name)

print(f"‚ú® Success! Saved to {output_midi_name}")
print("üì• Check the 'Output' tab on the right to download your MIDI file!")