# The AI Violinist
### Crafting Notes Beyond Human

<div style="text-align: left;" style="border: 2px solid black;">
    <img src="notebooks/AI_Violinist.png" alt="AI_Violinist" width="600" height="500">
</div>
<div align="left">
    <font size="1">Image by DALL-E</font>
</div>

This project represents a groundbreaking exploration into the realm of music generation using advanced machine learning techniques. Leveraging the rich dataset of violin audio files and cutting-edge models, this endeavor aims to innovate the way we create and perceive music.

## Business Objective
The core objective of this project is to **generate violin music autonomously using machine learning models**. This aligns with broader interests in automating and enhancing creative processes through AI.

## Data Overview
- **Dataset**: The project utilizes a collection of 1,500 violin audio file previews, each 30 seconds in length. Additionally, 150 MIDI files and 688 spectrogram files were generated from the dataset.

- **Source**: These audio previews were meticulously sourced using the Spotify API, ensuring a diverse and high-quality dataset.

- [Link to Data Folders](https://drive.google.com/drive/u/0/folders/16jsUFzij_pxj7f-OBSa0ioWp5Ub78uOl)


## Audio Processors Used
1. **Pitch Detection with CREPE**: A deep learning-based tool designed to analyze audio and accurately detect pitch. 

2. **Rhythm & Tempo Analysis**: Advanced algorithms that dissect audio to pinpoint rhythmic patterns and tempo variations.


## Models Used
1. **WaveNet**: A deep neural network for generating raw audio waveforms, renowned for its ability to produce coherent and realistic sounds.

2. **LSTM (Long Short-Term Memory)**: A type of recurrent neural network (RNN) used to understand the sequence and temporal dependencies within the audio data.

3. **Jukebox (Pre-Trained)**: Leveraging OpenAI's Jukebox, a pre-trained model, as a foundation to further fine-tune and generate complex music compositions.


## Challenges & Limitations
- Limited computational resources!!!

- Multiple complex models for audio converting, predicting, and generating.

- It's important to learn from failure. Checkout the 'Failed_Models_Spectrograms.ipynb' for comprehensive trial and error documentation.


# Notebook Directory
``` bash
├── AI_Violinist_Intro.ipynb                <- Data capture/project overview
├── Model_1_WaveNet.ipynb                   <- Baseline/WaveNet Models
├── Model_2_LSTM.ipynb                      <- First LSTM Model
├── Model_3__Complex_LSTM.ipynb             <- Second LSTM Model
├── Visual_Analysis_Model_Comparison.ipynb  <- Model Evaluation
├── Pretrained_Model_Jukebox.ipynb          <- Generating Final Music
└── Failed_Models_Spectrograms.ipynb        <- Failed attempts
```


<div style="text-align: center;">
        <img src="notebooks/Computerized_Violin.png" alt="Computerized_Violin" width="700" height="200">
    </div>
</div>
<div align="center">
    <font size="1">Image by DALL-E</font>
</div>

### Imports

In [None]:
import requests
import csv
import os
import time
from pydub import AudioSegment
import torch
import librosa
import librosa.display
import matplotlib.pyplot as plt
import numpy as np
import random
import shutil

## Collecting Data
### Spotify API Access and Track Downloads

First, I'll collect my data by automating the process of accessing the Spotify API to search for violin tracks and download their previews. It includes functions for obtaining an access token, searching for tracks, and downloading the track previews along with their metadata.


In [None]:
# Function to obtain an access token from Spotify
def get_spotify_access_token(client_id, client_secret):
    # Making a POST request to Spotify Accounts for token
    auth_response = requests.post('https://accounts.spotify.com/api/token', {
        'grant_type': 'client_credentials',
        'client_id': client_id,
        'client_secret': client_secret,
    })
    # Returning the access token
    return auth_response.json().get('access_token')

# Function to search for violin tracks on Spotify
def search_violin_tracks(access_token, query, offset=0, limit=50):
    # Headers with authorization token
    headers = {'Authorization': f'Bearer {access_token}'}
    # Constructing the search URL
    search_url = f'https://api.spotify.com/v1/search?q={query}&type=track&market=US&limit={limit}&offset={offset}'
    # Making a GET request to search tracks
    response = requests.get(search_url, headers=headers)
    # Returning the search result as JSON
    return response.json()

# Function to download track previews and save metadata
def download_track_previews(tracks, download_folder, metadata_file):
    with open(metadata_file, 'a', newline='', encoding='utf-8') as file:
        writer = csv.writer(file)
        # Iterating through each track item
        for track in tracks['tracks']['items']:
            preview_url = track['preview_url']
            # Downloading only if preview URL is available
            if preview_url:
                # Extracting track information
                track_id = track['id']
                track_name = track['name']
                artist_name = track['artists'][0]['name']
                album_name = track['album']['name']
                file_name = os.path.join(download_folder, f"{track_id}.mp3")
                # Writing metadata to CSV file
                writer.writerow([track_id, track_name, artist_name, album_name, preview_url])
                # Downloading the preview
                download_preview(preview_url, file_name)
                print(f"Downloaded preview: {file_name}")

# Function to download a track preview from a URL
def download_preview(url, file_name):
    with requests.get(url, stream=True) as r:
        with open(file_name, 'wb') as f:
            # Writing the content in chunks to a file
            for chunk in r.iter_content(chunk_size=8192):
                f.write(chunk)

# Main function to orchestrate the downloading process
def main():
    # Spotify API credentials (hidden for security)
    client_id = 'hidden_for_security'
    client_secret = 'hidden_for_security'
    # Setting up directories and file names
    download_folder = 'violin_track_previews'
    metadata_file = 'violin_track_metadata.csv'
    # Defining search queries
    search_queries = ['violin instrumental', 'strings instrumental']

    # Getting access token
    token = get_spotify_access_token(client_id, client_secret)

    # Creating download directory if it doesn't exist
    if not os.path.exists(download_folder):
        os.makedirs(download_folder)

    # Creating or resetting metadata file
    if not os.path.exists(metadata_file):
        with open(metadata_file, 'w', newline='', encoding='utf-8') as file:
            writer = csv.writer(file)
            writer.writerow(['Track ID', 'Track Name', 'Artist Name', 'Album Name', 'Preview URL'])

    # Looping through each query to search and download tracks
    for query in search_queries:
        offset = 0
        while True:
            try:
                # Searching tracks
                tracks = search_violin_tracks(token, query, offset)
                # Break if no items found
                if not tracks['tracks']['items']:
                    break
                # Downloading previews and metadata
                download_track_previews(tracks, download_folder, metadata_file)
                # Incrementing offset for pagination
                offset += 50
                # Adding sleep to respect rate limits
                time.sleep(1)
            except KeyError:
                break

if __name__ == "__main__":
    main()


## Data Preparation

### MP3 to WAV Conversion

In order for the models to process my data, I'll need to take some preprocessing steps.

First, I'll convert the MP3 audio files to WAV format. I'll set a target sample rate, as required by some pre-trained models.


In [None]:
input_folder = 'Violin_Comp_Data/MyDrive/Violin_Comp_Data/violin_track_previews'
output_folder = 'Violin_Comp_Data/MyDrive/Violin_Comp_Data/converted_music_files'
target_sample_rate = 44100

# Function for converting mp3 files to .wav
def convert_audio_to_wav(input_path, output_path, target_sample_rate=44100):
    # Load the MP3 file
    audio = AudioSegment.from_file(input_path, format="mp3")

    # Convert to the target sample rate
    audio = audio.set_frame_rate(target_sample_rate)

    # Export the converted file
    audio.export(output_path, format="wav")

for file_name in os.listdir(input_folder):
    if file_name.endswith('.mp3'):
        input_path = os.path.join(input_folder, file_name)
        output_path = os.path.join(output_folder, file_name.replace('.mp3', '.wav'))
        convert_audio_to_wav(input_path, output_path, target_sample_rate)


### Loading Audio Files into Tensors

This function loads audio files and converts them into PyTorch tensors for further processing. Each audio file is normalized and converted to a mono-channel tensor for efficiency and focusing on relevant features.


In [None]:
def load_audio_file(file_path, target_sample_rate=44100):
    # Load an audio file using pydub
    audio = AudioSegment.from_file(file_path).set_frame_rate(target_sample_rate).set_channels(1)

    # Convert to samples
    samples = torch.tensor(audio.get_array_of_samples()).float()

    # Normalize
    samples = samples / (2**15)

    return samples.view(1, -1)

def load_dataset(directory):
    dataset = []
    for file_name in os.listdir(directory):
        if file_name.endswith('.wav'):
            file_path = os.path.join(directory, file_name)
            audio_tensor = load_audio_file(file_path)
            dataset.append(audio_tensor)

    return dataset


directory = '/content/gdrive/MyDrive/Violin_Comp_Data/converted_music_files'
dataset = load_dataset(directory)


### Converting Audio to Spectrogram Images

This function loads audio files and converts them into spectrogram images. It uses `librosa` to create a Short-Time Fourier Transform (STFT) spectrogram and saves them as PNG images.




In [None]:
def audio_to_spectrogram(file_path, save_path, file_name, sr=22050, n_fft=2048, hop_length=512):
    # Load audio file
    y, sr = librosa.load(file_path, sr=sr)

    # Create a spectrogram
    S = librosa.stft(y, n_fft=n_fft, hop_length=hop_length)
    Y = librosa.amplitude_to_db(np.abs(S), ref=np.max)

    # Plot
    plt.figure(figsize=(10, 4))
    librosa.display.specshow(Y, sr=sr, hop_length=hop_length, x_axis='time', y_axis='log')
    plt.colorbar(format='%+2.0f dB')
    plt.title('Spectrogram')

    # Save the spectrogram as an image
    plt.savefig(os.path.join(save_path, f"{file_name}.png"))
    plt.close()

# Directory containing your audio files
audio_dir = '/content/gdrive/MyDrive/Violin_Comp_Data/converted_music_files'

# Directory to save the spectrograms
spectrogram_dir = '/content/gdrive/MyDrive/Violin_Comp_Data/spectrograms'
if not os.path.exists(spectrogram_dir):
    os.makedirs(spectrogram_dir)

# Convert all files in the directory
for file in os.listdir(audio_dir):
    if file.endswith('.wav'):
        file_path = os.path.join(audio_dir, file)
        file_name = os.path.splitext(file)[0]
        audio_to_spectrogram(file_path, spectrogram_dir, file_name)


Due to high RAM requirements, I was only able to create 688 spectrograms out of approx. 1500 music files.

### Sampling Audio Files

Now I'll randomly select a sample of audio files (10% of the total) for testing models in order to save computational resources.


In [None]:
directory_path = '/content/gdrive/MyDrive/Violin_Comp_Data/converted_music_files'
new_directory_path = '/content/gdrive/MyDrive/Violin_Comp_Data/sample_audio_files'

# Create the new directory if it doesn't exist
if not os.path.exists(new_directory_path):
    os.makedirs(new_directory_path)

# Get a list of all files in the directory
all_files = [f for f in os.listdir(directory_path) if os.path.isfile(os.path.join(directory_path, f))]

# Filter to include only .wav files
audio_files = [f for f in all_files if f.lower().endswith('.wav')]

# Calculate 10% sample size
sample_size = int(len(audio_files) * 0.1)

# Randomly select the files
sampled_files = random.sample(audio_files, sample_size)

# Copy the sampled files to the new directory
for file in sampled_files:
    source = os.path.join(directory_path, file)
    destination = os.path.join(new_directory_path, file)
    shutil.copy2(source, destination)

print(f"Total files: {len(audio_files)}")
print(f"Sampled files ({sample_size}) have been copied to {new_directory_path}")


<div style="text-align: center;">
        <img src="images/Computerized_Violin.png" alt="Computerized_Violin" width="700" height="200">
    </div>
</div>
<div align="center">
    <font size="1">Image by DALL-E</font>
</div>

## Conclusion
This project demonstrates the use of advanced machine learning models to generate violin music, marking a significant step in AI-generated art. Despite computational limitations, the models were able to produce coherent and aesthetically pleasing music.

## Next Steps
- **Enhance Computational Resources**: To overcome the limitation of processing power, future work may involve leveraging more robust computational infrastructure or cloud-based solutions.

- **Refining MIDI Conversion Techniques**: Continuous improvement and fine-tuning of the models to enhance the quality and diversity of the generated music starts with high quality audio. 

- **Exploring Diverse Instruments**: Expanding the project scope to include other instruments and genres, exploring the full spectrum of musical creativity.

## Sources
- Spotify API

- [Analytics Vidhya](https://www.analyticsvidhya.com/blog/2020/01/how-to-perform-automatic-music-generation/)

- [OpenAI's Jukebox](https://openai.com/research/jukebox)

- [One Click Jukebox with Autosave v2](https://colab.research.google.com/github/openai/jukebox/blob/master/jukebox/Interacting_with_Jukebox.ipynb)

- All images created by DALL-E

- This notebook was, of course, developed using knowledge from ChatGPT



# Proceed to 'Model_1_WaveNet.ipynb'