# Real-Time Art Generation from Audio [Pathway 1]
This Google Colab notebook provides an interactive art generator that creates surrealist art in real time from recorded audio. The generated art is produced by a combination of a trained Convolutional Neural Network (CNN) model and a trained Generative Adversarial Network (GAN) generator, both pre-trained on datasets of spectrograms and surrealist art, respectively.

# Purpose & Future Implementation
Though this element of the code was not included in the final product of the project, due to time restraints, we hope to expand upon the following code in the future. The hope is that further development of the following code will allow us to integrate a real-time dynamic art generation feature to the current model of our audio-to-art generator.

# Integration for Future Development:
Advanced Audio Processing: Additional audio features,

*   **User Feedback Loop:** Implementing a feedback mechanism where users can rate the generated art would allow the models to fine-tune outputs over time, achieving a dynamic, evolving art generator.
*   **Cross-Platform Integration:** The recording interface can be extended to integrate directly with streaming platforms or music applications, allowing for continuous, dynamic art generation based on a variety of audio inputs.
*   **Interactive Art Gallery:** Generated art can be archived and displayed in an interactive gallery, potentially integrating with web applications or augmented reality, enabling users to explore and engage with the generated images in new ways.



---



# Loading Models:
The pre-trained CNN model is loaded from Google Drive, and can predict valence and energy values from a music spectrogram. The GAN generator is built using the `build_generator` function, synthesizing art from a latent vector.

In [None]:
# Load the trained models
cnn_model = tf.keras.models.load_model('/content/drive/My Drive/COSC_5470/trained_cnn_model.h5')
gan_generator = build_generator(latent_dim)  # Assuming this function is already defined

In [None]:
# Load the trained CNN model
cnn_model = tf.keras.models.load_model('/content/drive/My Drive/COSC_5470/trained_cnn_model.h5')

# Load the trained GAN generator
gan_generator = tf.keras.models.load_model('/content/drive/My Drive/COSC_5470/saved_gan_model_final.h5')


# Mapping Mood to Latent Space:
The function `map_mood_to_latent_space` converts the predicted valence and energy values from the CNN model into a latent vector, modulating it based on these values. The function scales both valence and energy to a range of [-1, 1], combining them into a modulation factor to create a corresponding latent vector.

In [None]:
def map_mood_to_latent_space(valence, energy, latent_dim=100):
    # Generate a base latent vector from a normal distribution
    latent_vector = np.random.normal(0, 1, (1, latent_dim))

    # Modulate the latent vector based on valence and energy
    # Here we use an arbitrary way to combine valence and energy into the latent vector.

    # Convert valence and energy into a range of [-1, 1] from [0, 1]
    valence = valence * 2 - 1
    energy = energy * 2 - 1

    # Use valence and energy to modulate the latent vector
    modulation_factor = np.hstack([np.full((latent_dim // 2,), valence), np.full((latent_dim // 2,), energy)])
    modulated_latent_vector = latent_vector * modulation_factor

    return modulated_latent_vector


# Art Generation from Music:
The function `generate_art_from_music` uses the pre-trained CNN model to predict valence and energy values from a music spectrogram. These values are then passed to the `map_mood_to_latent_space` function to generate a modulated latent vector, which is fed into the GAN generator to synthesize the corresponding artwork.

In [None]:
def generate_art_from_music(cnn_model, gan_generator, preprocessed_spectrogram):
    # Predict mood using the CNN model
    predictions = cnn_model.predict(np.expand_dims(preprocessed_spectrogram, axis=0))
    valence_energy = predictions[0]

    valence = valence_energy[0]  # Assuming the first value is valence
    energy = valence_energy[1]   # Assuming the second value is energy

    # Generate art using the latent vector that includes mood colors
    latent_vector = map_mood_to_latent_space(valence, energy)
    art = gan_generator.predict(np.array([latent_vector]))

    return art.squeeze()

In [None]:
# Example usage with a path to a spectrogram image
spectrogram_file_path = '/content/drive/My Drive/COSC_5470/spectograms'
art = generate_art_from_music(cnn_model, gan_generator, spectrogram_file_path)

# Real-Time Audio Recording and Processing:
The get_system_audio function records audio directly from the system using a virtual audio device, saving it as a flattened numpy array. The function `real_time_art_generation` then implements an interactive art generator: it initializes an image buffer to store audio spectrograms and uses rolling windows to update it with new audio snippets. It then converts the current buffer into a Mel spectrogram, which is preprocessed to match the CNN's input format. The mood is predicted from the spectrogram, a corresponding artwork is generated, and the Matplotlib plot is updated in real time.

In [None]:
import librosa
import matplotlib.pyplot as plt

def get_system_audio(duration, sr=22050):
    # Make sure the 'BlackHole' virtual audio device is installed and set as the input source
    recording = sd.rec(int(duration * sr), samplerate=sr, channels=1, dtype='float32')
    sd.wait()  # Wait until the recording is finished
    return recording.flatten()

def real_time_art_generation(cnn_model, gan_generator, buffer_size, sr=22050, duration=5, overlap=0.5):
    plt.ion()  # Turn on interactive mode for dynamic updates
    fig, ax = plt.subplots()
    image = ax.imshow(np.random.rand(64, 64, 3), cmap='gray')  # Initial placeholder image

    buffer = np.zeros(int(sr * duration * (1 + overlap)))  # Initialize buffer to hold extended duration for overlap
    hop_length = int(sr * duration * overlap)  # Define overlap size in samples

    while True:
        try:
            # Simulate getting new audio snippet (this we will replace this with actual microphone input or other audio source)
            new_audio = get_system_audio(duration / 2, sr)  # duration is in seconds now
            buffer = np.roll(buffer, -hop_length)
            buffer[-hop_length:] = new_audio[-hop_length:]

            # Convert audio snippet to a spectrogram
            spectrogram = librosa.feature.melspectrogram(current_audio, sr=sr, hop_length=hop_length, n_fft=2048)
            spectrogram = librosa.power_to_db(spectrogram, ref=np.max)  # Convert to dB scale for better visualization

            # Preprocess the spectrogram (example: resizing to match input dimensions expected by the CNN)
            preprocessed_spectrogram = np.expand_dims(np.expand_dims(spectrogram, axis=-1), axis=0)

            # Predict mood and generate art
            art = generate_art_from_music(cnn_model, gan_generator, preprocessed_spectrogram)

            image.set_data(art)

            # Update the plot
            image.set_data(art[0, :, :, :])  # Update the image with generated art
            fig.canvas.draw()
            fig.canvas.flush_events()
            # Assuming art is already in the correct format
            # ax.draw_artist(ax.patch)
            # ax.draw_artist(image)

            plt.pause(0.1)  # Pause briefly to allow update

        except KeyboardInterrupt:
           plt.ioff()
           break  # Exit on Ctrl+C

# Define constants
# sr = 22050  # Sample rate
# duration = 5  # Duration of buffer in seconds
# overlap = 0.5  # 50% overlap


In [None]:
# Generate art from the new music piece
buffer_size = 22050 * 5  # = int(sr * duration * (1 + overlap))  # Buffer to hold extended duration for overlap
real_time_art_generation(cnn_model, gan_generator, buffer_size)

# Integration for Future Development:
Improving audio processing by incorporating additional audio features, such as rhythm patterns or pitch, can further enrich the inputs, leading to more dynamic art. Web integration can allow the system to connect with streaming platforms, allowing for seamless art generation from a variety of audio sources. Optimizing performance and reducing latency can enhance real-time interactivity, enabling users to see the immediate effects of audio on art generation.



---



# Real-Time Art Generation from Music [Pathway 2]
This part of the Google Colab notebook demonstrates an additional pathway to the generation of surrealist art from audio inputs in real time. The implementation leverages a pre-trained Convolutional Neural Network (CNN) model to predict the mood (valence and energy) from a music spectrogram, which is then used to modulate a latent vector for a Generative Adversarial Network (GAN) generator to create corresponding artwork.

# Integration for Future Development:
* **Advanced Audio Processing:** Incorporating additional audio features such as rhythm patterns or pitch can further enrich the input data, leading to more dynamic art generation.
* **Web Integration:** The recording interface can be extended to integrate with streaming platforms or music applications, allowing for continuous, dynamic art generation from various audio sources.
* **Real-Time Performance:** Optimizing performance and reducing latency can enhance real-time interactivity, allowing users to see the immediate effects of audio on art generation.

# Dependencies Installation:
The necessary packages are installed, including TensorFlow, NumPy, Matplotlib, SciPy, FFmpeg-Python, and SoundDevice.

In [None]:
!pip install IPython
!pip install numpy
!pip install matplotlib
!pip install scipy
!pip install ffmpeg-python
!pip install sounddevice

In [None]:
import base64
import librosa
import matplotlib.pyplot as plt
import IPython
from IPython.display import Audio, display, HTML, Javascript
from google.colab.output import eval_js
from base64 import b64decode
import numpy as np
from scipy.io.wavfile import read as wav_read
import io
import ffmpeg
from google.colab import output



# Loading Models:
Two pre-trained models are loaded from Google Drive:

*   CNN Model: This model can predict valence and energy values from a music spectrogram.
*   GAN Generator: This model synthesizes surrealist art from a modulated latent vector.

In [None]:
# Load the trained CNN model
cnn_model = tf.keras.models.load_model('/content/drive/My Drive/COSC_5470/trained_cnn_model.h5')

# Load the trained GAN generator
gan_generator = tf.keras.models.load_model('/content/drive/My Drive/COSC_5470/saved_gan_model_final.h5')


# Audio Recording Interface:
An HTML interface is provided to record audio directly in the notebook. This interface allows users to start and stop recordings, save them as WebM files, and convert them to WAV format for further processing.

In [None]:
AUDIO_HTML = """
<div>
    <button onclick="startRecording()">Start Recording</button>
    <button onclick="stopRecording()" disabled>Stop Recording</button>
    <audio controls></audio>
    <script>
        let audioContext;
        let mediaRecorder;
        let audioChunks = [];
        let audioElement = document.querySelector('audio');

        function startRecording() {
            audioChunks = [];
            navigator.mediaDevices.getUserMedia({ audio: true })
                .then(stream => {
                    audioContext = new AudioContext();
                    mediaRecorder = new MediaRecorder(stream);
                    mediaRecorder.start();
                    mediaRecorder.ondataavailable = event => {
                        audioChunks.push(event.data);
                    };
                    mediaRecorder.onstop = () => {
                        const audioBlob = new Blob(audioChunks, {type: 'audio/webm'});
                        const audioUrl = URL.createObjectURL(audioBlob);
                        audioElement.src = audioUrl;
                        const reader = new FileReader();
                        reader.readAsDataURL(audioBlob);
                        reader.onloadend = function() {
                            var base64data = reader.result;
                            document.getElementById('base64data').value = base64data;
                            google.colab.kernel.invokeFunction('notebook.get_audio_data', [base64data], {});
                        };
                    };
                    document.querySelector('button[onclick="stopRecording()"]').disabled = false;
                    document.querySelector('button[onclick="startRecording()"]').disabled = true;
                });
        }

        function stopRecording() {
            mediaRecorder.stop();
            document.querySelector('button[onclick="startRecording()"]').disabled = false;
            document.querySelector('button[onclick="stopRecording()"]').disabled = true;
        }
    </script>
    <input type="hidden" id="base64data">
</div>
"""

# Audio Processing:
*   **Spectrogram Generation:** The WAV file is read and converted into a Mel spectrogram using the Librosa library, which serves as input to the art generator.
*   **Mood Prediction:** The spectrogram is preprocessed and passed to the CNN model, which predicts valence and energy values.

In [None]:
def get_audio_data(base64data):
    if ',' in base64data:
        binary = base64.b64decode(base64data.split(',')[1])
        with open("/content/temp_audio.webm", "wb") as file:
            file.write(binary)

        # Use ffmpeg to convert the webm file to wav format
        process = (ffmpeg
            .input('/content/temp_audio.webm')
            .output('/content/temp_audio.wav', format='wav')
            .run_async(pipe_stdin=True, pipe_stdout=True, pipe_stderr=True, quiet=True, overwrite_output=True)
        )
        process.wait()  # Wait until the process is finished

        # Read the WAV file
        sr, audio = wavfile.read('/content/temp_audio.wav')
        audio = audio.flatten()
        plt.figure(figsize=(20,10))
        plt.plot(audio)
        plt.title("Recorded Audio Waveform")
        plt.show()

        # Here you can add the code to generate spectrogram and art
        generate_art_from_audio(audio, sr)
    else:
        print("No audio data found. Please ensure the recording was made.")
# Display the audio recording controls

# Art Generation:
*   **Latent Vector:** The predicted mood values are used to modulate a latent vector, which is then fed into the GAN generator to create corresponding artwork.
*   **Real-Time Visualization:** The generated artwork is displayed in real time on a Matplotlib plot, which updates as new audio is recorded and processed.

In [None]:
def real_time_art_generation(cnn_model, gan_generator, duration=5, sr=22050, overlap=0.5):
    plt.ion()  # Interactive mode on
    fig, ax = plt.subplots()
    image = ax.imshow(np.random.rand(64, 64, 3), cmap='gray')

    buffer_size = int(sr * duration * (1 + overlap))
    buffer = np.zeros(buffer_size)

    hop_length = int(sr * duration * overlap)
    audio, sr = get_audio_data()
    while True:
        try:
            if audio is not None and sr is not None:
                buffer = np.roll(buffer, -len(audio))
                buffer[-len(audio):] = audio

                # Convert audio snippet to a spectrogram
                spectrogram = librosa.feature.melspectrogram(buffer, sr=sr, hop_length=hop_length, n_fft=2048)
                spectrogram = librosa.power_to_db(spectrogram, ref=np.max)

                # Preprocess and predict
                preprocessed_spectrogram = np.expand_dims(np.expand_dims(spectrogram, axis=-1), axis=0)
                art = generate_art_from_music(cnn_model, gan_generator, preprocessed_spectrogram)

                # Update plot
                image.set_data(art[0, :, :, :])
                fig.canvas.draw()
                fig.canvas.flush_events()

            plt.pause(0.1)  # Brief pause for updates

        except KeyboardInterrupt:
            plt.ioff()
            break  # Exit on Ctrl+C

# Define constants
# sr = 22050  # Sample rate
# duration = 5  # Duration of buffer in seconds
# overlap = 0.5  # 50% overlap


In [None]:
# Generate art from the new music piece
buffer_size = 22050 * 5  # = int(sr * duration * (1 + overlap))  # Buffer to hold extended duration for overlap
real_time_art_generation(cnn_model, gan_generator, buffer_size)



---

