# Network Model 2
This visualization is meant to mimic a human neural network. This model consists of three layers: input layer, association layer, and output layer. The input layer intakes the raw feature data such as loudness, pitch, timbre from the audio. The association layer are the intermediate chromesthesia associations between input features and visual features - mostly working to combine multiple audio features to then activate a visual image associated with those features. Output layer nodes are the visual layer in which activation determines what shapes, color , intnesity the user sees. As the visual output begins, all nodes in adjacent layers are automatically connected to each other (input nodes all connected to association, association nodes all connected to output) with randomized weights set, but as the audio plays the weights have the opportunity to be strengthened based on the learning mechanisms in play. When the audio plays, the input nodes are set, they are turned on in proportion to its strength in the audio. They now contain a signal which is then transmitted to following layers through the edges, similating how action potentials are propagated through neurons carrying a signal. If both the start and end node are activated at the same time, the edge weight increases (Hebbian learning- "nodes that fire together, wire together), hence why edges grow thicker

Sources:
- https://www.jeremykun.com/2012/12/09/neural-networks-and-backpropagation/?utm_source=chatgpt.com- setting up nodes and edges
- https://github.com/SophieWalden/snakeNeuralNetwork/blob/master/snakeNN.py- neural network example

In [2]:
!pip install librosa soundfile



In [3]:
# import relevant libraries
import numpy as np
import librosa
import librosa.display
import pygame
import time
import matplotlib.pyplot as plt
import random
import sys
from sklearn.preprocessing import MinMaxScaler
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler

pygame 2.6.1 (SDL 2.28.4, Python 3.12.2)
Hello from the pygame community. https://www.pygame.org/contribute.html


In [4]:
# implemented same way as from feature extraction doc

# predefined colors for each note (implement user entry later)
NOTE_RGB = {
    "C":  (214, 174, 16),
    "C#": (115, 58, 75),
    "D":  (38, 63, 145),
    "D#": (68, 118, 67),
    "E":  (211, 87, 49),
    "F":  (159, 194, 76),
    "F#": (195, 10, 103),
    "G":  (255, 156, 223),
    "G#": (37, 149, 150),
    "A":  (155, 152, 223),
    "A#": (10, 107, 62),
    "B":  (238, 145, 50)
}
NOTE_NAMES = ["C","C#","D","D#","E","F","F#","G","G#","A","A#","B"]


def extract_audio_features(audio_path: str, duration: float = None, HOP_LENGTH: int = 2048, FRAME_LENGTH: int = 2048 ):
    """
    This function loads an audio file and extracts audio features: pitch (f0), loudness (RMS energy), and timbre (MFCC) for each beat frame.
    It also normalizes rms and MFCC on a 0-1 scale for visual mapping.
    
    Parameters:
        audio_path: str
            path to any audio file format (.wav, .mp3, .ogg, etc.)
        duration: float, optional
            Duration from the start of the file (seconds). 
            Loads the full track if none. 
        hop_length: int, fixed
            Number of samples between frames.
        frame_length: int, fixed
            Number of samples per frame. 

    Returns:
         audio_features : dict
            times: array of frame timestamps,
            midi: midi number ,
            rms: normalized loudness,
            mfcc: normalized MFCC matrix
    """

    # load audio file
    y, sr = librosa.load(audio_path, duration=duration)

    # pitch (f0) extraction
    f0, voiced_flag, voiced_probs = librosa.pyin(y,
                                             sr=sr,
                                             frame_length = FRAME_LENGTH,
                                             hop_length = HOP_LENGTH,
                                             fmin=librosa.note_to_hz('A0'),
                                             fmax=librosa.note_to_hz('C8'))
    f0 = np.nan_to_num(f0, nan=np.nanmean(f0))  
    # if there are NaN values, replace them with the mean pitch
    midi = librosa.hz_to_midi(f0)
    # convert frequency to midi note number

    # loudness(RMS energy) extraction
    rms = librosa.feature.rms(y=y,frame_length=FRAME_LENGTH, hop_length=HOP_LENGTH)[0]

    # time stamps for mapping to pygame 
    times = librosa.times_like(rms, sr=sr, hop_length=HOP_LENGTH)

    # normalize numerical features for mapping (0â€“1 range)
    def normalize(x):
        min = np.min(x)
        max = np.max(x)
        denom = max - min
        if denom == 0:
            return np.zeros_like(x)  
        return (x - min) / denom
    
    rms_norm = normalize(rms)

    # dictionary of audio features for the funciton to return
    audio_features = {
       "times" : times,
       "midi" : midi,
       "rms" : rms_norm,
       "sr" : sr 
       }
    return audio_features

In [5]:
class Node:
    # creates class for nodes
    def __init__(self, x, y, node_type="input", feature=None):
        # init function, takes in self, x, y (referring to coordinate positions in x and y planes), what layer the node is a part of
        self.x = x
        # node's x position
        self.y = y
        # refers to node's y position
        self.type = node_type
        # refers to what layer the node is a part of- this neural network will have three layers (input, association, output/visual), helps for display
        self.feature = feature
        # pitch, rms, timbre --> what each node will represent
        self.activation = 0.0
        # current activation level for node
        self.pulse_size = 1.0
        # how much the node will pulse
        self.edges = []
        # initialize empty list to hold edges
        self.glow = 0.0
        # initialize to 0
    def add_edge(self, target_node, weight= 0.1):
        self.edges.append(Edge(self,target_node,weight))
        # actually able to add the edge between noe

class Edge:
    def __init__(self, start_node, end_node, weight= 0.1):
        """
        sends activation from starting node to target node. 
        """
        # this function describes how to add edges
        self.start_node = start_node
        # sets start node as where signal begins
        self.end_node = end_node
        # sets end node, where edge will be connected to/end
        self.weight = weight
        # strength of edge/connection between nodes with a higher value meaning more connection (this weight will be modified)
    def propagate(self):
        signal = self.start_node.activation * self.weight
        # self.from_node.activation = node we are starting at, activation is coming from audio and its extracted features
        # weight refers to strength of connective edges between nodes
        # multiplying them together to get the size of signal - if activation and weight are both high, then there's a big signal
        # if weight is low, then there's a weak signal, but if the node isn't activated at all (or weak) then no signal sent (or just super weak)
        self.end_node.activation = self.end_node.activation + signal
        # adds the already existent activity in the ending node of the edge and adds the signal propagating through to it

def coloradjust(rgb, factor):
        """
        Darken or lighten RGB based on the octave
        """
        rgb = np.array(rgb, dtype=float)
        # convert rgb value to array so that we can alter, floats for precision
        newrgb = rgb * factor
        # multiples rgb by the factor associated with the octave
        newrgb[newrgb < 0] =0
        # if the new calculated rgb is below 0, just make it 0 so that it is within the number range for rgb
        newrgb[newrgb > 255] = 255
        # similarly, if above 255 just make it 255 to make sure it isn't above number range for rgb
        return tuple(newrgb.astype(int))
        # returns a tuple / by 255 



In [6]:
class NeuralVisualizer:
    """
    Create neural visualizer class that will create a visualization of the neural network output. 
    Consists of three layers, input, association, output, each with separate nodes. Meant to simulate a neural network.
    """
    def __init__(self,audio_path,features,height=800,width=1200, fps=60, learning_rate=0.01):
        # init function, takes in audio path to connect to audio, features as created above (shapes, colors, etc), size of image, frames per second, and learning rate
        pygame.init()
        # initialize all pygame parts
        pygame.mixer.init()
        # initialize audio playback
        pygame.display.set_caption("Chromesthesia Neural Visualizer")
        # captions with informative title
        self.clock = pygame.time.Clock()
        # control frame rate and time
        self.learning_rate = learning_rate
        # used for learnign rate of node associations - will change over time
        self.width = width 
        # set width display
        self.height = height
        # set height display
        self.screen = pygame.display.set_mode((self.width, self.height))
        # this creates the display/window for output and sets it to width/height set above
        self.fps = fps
        # frames per second- controls how often this is updated in the window

        self.start_time = 0
        # when playback starts for audio and visualization
        self.current_frame = 0
        # tracks frames of visualization, intialize to zero to start
        self.running = False
        # don't start running yet

        self.audio_path = audio_path
        # connects to audio
        self.times = features["times"]
        # timestamps for each frame
        self.midi = features["midi"]
        # takes midi notes- pitch in midi numbers associated with color
        self.rms = features["rms"]
        # loudness- used for scaling
        self.sr = features["sr"]
        # sample rate of audio
        

        self.input_nodes = []
        # input layer nodes- represents raw features of audio-  in this case the note
        for i,note in enumerate(NOTE_NAMES):
            x= 100+ i*120
            # horizontal spacing, between each node (played around with numbers, kind of random)
            y= 100
            # vertical spacing 
            self.input_nodes.append(Node(x,y, "input", note))
            # appends empty list with the node x/y position, input label, and note associated with it

        octaves = [1,2,3,4,5,6,7]
        self.association_nodes = []
        # association/middle layer of nodes- create the chromestesia associations- notes to octaves
        for i,octave in enumerate(octaves):
            x= 150+ i*120
            # again horizontal position
            y= 300
            # again veritcal
            self.association_nodes.append(Node(x,y, "association", octave))
            # again append positon, label, and octave to association
            
        self.output_nodes = []
        # output/visualization layer- visual representation of nodes
        for i,note in enumerate(NOTE_NAMES):
            x= 100+ i*90
            # again horizontal position
            y= 500
            # again vertical
            self.output_nodes.append(Node(x,y, "output", note))
            # append position, label, octave again
        self.max_radius = 20
        # just sets a maximum size a node can grow to

        for in_node in self.input_nodes:
            for association_node in self.association_nodes:
                in_node.add_edge(association_node, weight = random.uniform(0.1,0.3))
                # creates an edge from input node to association node with a random weight associated
                # edges represent strength of relationships between notes and octaves
        for association_node in self.association_nodes:
            for out_node in self.output_nodes:
                association_node.add_edge(out_node, weight=random.uniform(0.1,0.3))
                # this one is association node to output node edges with again random weight associated to start with

    
    # next two functions also explained in detail in video output file- just relating to starting/stopping play
    def start(self):
        """Start visualization and play audio."""
        
        # load and play audio
        pygame.mixer.music.load(self.audio_path)
        self.start_time = time.time()
        pygame.mixer.music.play()
        self.running = True
        self.current_frame = 0

        # while the visualization is in action
        while self.running and self.current_frame < len(self.times):
            self.stop()
            self.update()
            self.draw()

            if not pygame.mixer.music.get_busy(): # if the audio is no longer playing
                self.running = False

            pygame.display.flip() # make changes appear on display
            self.clock.tick(self.fps)

        pygame.quit()
        sys.exit()
    def stop(self):
        """Handle quit events."""
        for event in pygame.event.get(): # for anything event object 
            if event.type == pygame.QUIT: # if event is of type QUIT (eg. close window)
                self.running = False
                pygame.mixer.music.stop() 
                pygame.quit()
                sys.exit()
    
    def update(self):
        """ 
        Update node and edge activation based on audio
        """
        current_time = time.time()-self.start_time
        # calculate current time- take time right now and subtract by start time (0:0)
        while self.current_frame < len(self.times) and current_time >= self.times[self.current_frame]:
            # loop through while the current frame index is less than the number of total frames and 
            # the current time that the video has played for is more than  the current time frame --> meant to ensure current frame has been triggered
            midi_val = int(round(self.midi[self.current_frame]))
            #convert midi value for this specific frame (indexed) , rounded to an int
            note_name = NOTE_NAMES[midi_val % 12]
            # associates midi value with note name by dividing midi value by 12 (number of notes)
            octave_val = midi_val // 12
            # get octave number, divide(integer division) midi value by 12 (so then MIDI 60 --> 5 octave value... how midi notes are mapped)
            for node in self.input_nodes:
                # activates the input node that matches the feature, if not associated feature then everything else is at base 0 activation
                node.activation = 1.0 if node.feature == note_name else 0.0
            octave_val = int(self.midi[self.current_frame] // 12)
            # recalculate octave value ofor current frame and ensure it's an int
            for node in self.association_nodes:
                # for the node in association layer, activated if the node feature is equivalent to the octave value, otherwise again 0 activation
                node.activation = 1.0 if node.feature == octave_val else 0.0
            self.current_frame = self.current_frame + 1
            # increase current frame by one
            
        for node in self.input_nodes + self.association_nodes:
            # for all the nodes in input layer
            for edge in node.edges:
                # for all the outgoing edges for the specific node, propagate activation forward to next node
                edge.propagate()
        
        for node in self.input_nodes + self.association_nodes + self.output_nodes:
            # for every node in all three layers
            node.activation = node.activation*0.85
            # node activation is reduced for each frame by 0.85 factor

        current_frame_index = max(0,self.current_frame-1)
        # use frame index one prior, but never less than 0
        for node in self.output_nodes:
            # for each node in output layer
            if node.feature == NOTE_NAMES[int(round(self.midi[current_frame_index])) %12]:
                # if the node feature is the same as note names list indexed at the midi index of the current frame divided by 12 
                node.activation = self.rms[current_frame_index]
                # then node activation is the self rms value for the current frame
                # activation based on loudness // controls pulsing strength

    def draw(self):
        # draw on pygame display so user has visual output
        self.screen.fill((0,0,0))

        font = pygame.font.SysFont("Arial", 15, bold=True)
        # set font to be arial
        input_label = font.render("Input Layer (Notes-MIDI)", True, (200,200, 255))
        association_label = font.render("Association Layer (Note-Octaves)", True, (200,200, 255))
        output_label = font.render("Output Layer (Visualization)", True, (200,200, 255))
        self.screen.blit(input_label, (self.width//2 - input_label.get_width()//2, 40))
        self.screen.blit(association_label, (self.width//2 + 400 - input_label.get_width()//2, 300))
        self.screen.blit(output_label, (self.width//2 - input_label.get_width()//2, 650))



        for node in self.input_nodes + self.association_nodes:
            # loop through each node in input/association layers
            for edge in node.edges:
                # each outgoing edge for the node
                color_intensity = int(edge.weight * 200)
                # edge weight converted to intensity value for color (multiply by 200, within 0-255 range for rgb value)
                # controls brightness of eddge
                color_intensity = max(0,min(255, color_intensity))
                # # ensures doesn't exceed 255
                edge_color = (0,0, color_intensity)
                # edge color set to be the blue color intensity value 
                pygame.draw.line(self.screen, edge_color, (int(edge.start_node.x), int(edge.start_node.y)), (int(edge.end_node.x), int(edge.end_node.y)), max(1, int(edge.weight*8)))
                # draw the edge with created edge color starting at x,y coordinate of start node and ending at x,y coordinate of end node, with edge weight of either 1 as the minimum o the edge weight scaled to times 8 for max

        for node in self.input_nodes + self.association_nodes:
            base_radius = 5
            # if node is inactive then radius is 5
            radius = base_radius + int(node.activation*10)
            # radius changed to base radius plus the node activation value multipled by ten (int value to avoid error)
            color = (0,0,245)
            # color set to be blue for all of them
            pygame.draw.circle(self.screen,color, (int(node.x), int(node.y)), radius)
            # draw a circle for each node for input/association nodes for this x,y coordinate and radius

        
        for node in self.output_nodes:
            # for output nodes, glow reset to 0 each time
            node.glow = 0
        for association_node in self.association_nodes:
            if association_node.activation > 0:
                for edge in association_node.edges:
                    # but if association node activation is more than 0, then the edge will glow to either the edge glow (base 0 and the association node activation value added together) or just 1
                    edge.end_node.glow = min(1.0, edge.end_node.glow + association_node.activation)
        for node in self.output_nodes:
            # then for node glow, it needs to decrease/decay for smooth transitions, multiply by .85 factor
            node.glow = node.glow * 0.95
                    
        
        for node in self.input_nodes + self.association_nodes + self.output_nodes:
            # for nodes in all layers
            if node.type == "input" or node.type == "association":
                # if node is input or association labelled
                radius = 8 if node.activation > 0 else 5
                # if node is activated radius is 8 otherwise it stays 5 (on/off pulsing with audio)
                color = (0,0,245)
                # color set to a blue value
                pygame.draw.circle(self.screen,color,(int(node.x), int(node.y)), radius)
                # draw nodes at x/y coordinate with set radius
            elif node.type == "output":
                # but if output labelled
                radius = max(8, int(8+node.activation*15))
                # then radius changes based on node activation value
                color = NOTE_RGB[node.feature]
                # color is based on the color mapping from up above
                factor = 1 + node.glow
                # factor is 1 plus the node glow value 
                color = coloradjust(color, factor)
                # then use color adjust function
                pygame.draw.circle(self.screen, color, (int(node.x), int(node.y)), radius)
                # draw circle with set color and coordinate and radius

        

# test case
audio_path = "emotional-piano-005-am-80-97777.mp3"
# audio set to be simple piano chords
features = extract_audio_features(audio_path, duration=30)
# extract features
product = NeuralVisualizer(audio_path, features)
# use the neural visualizer!
product.start()
# start



SystemExit: 

  warn("To exit: use 'exit', 'quit', or Ctrl-D.", stacklevel=1)
