# Network Model 2
This visualization is meant to mimic a human neural network. This model consists of three layers: input layer, association layer, and output layer. The input layer intakes the raw feature data such as loudness, pitch, timbre from the audio. The association layer are the intermediate chromesthesia associations between input features and visual features - mostly working to combine multiple audio features to then activate a visual image associated with those features. Output layer nodes are the visual layer in which activation determines what shapes, color , intnesity the user sees. As the visual output begins, all nodes in adjacent layers are automatically connected to each other (input nodes all connected to association, association nodes all connected to output) with randomized weights set, but as the audio plays the weights have the opportunity to be strengthened based on the learning mechanisms in play. When the audio plays, the input nodes are set, they are turned on in proportion to its strength in the audio. They now contain a signal which is then transmitted to following layers through the edges, similating how action potentials are propagated through neurons carrying a signal. If both the start and end node are activated at the same time, the edge weight increases (Hebbian learning- "nodes that fire together, wire together), hence why edges grow thicker

Sources:
- https://www.jeremykun.com/2012/12/09/neural-networks-and-backpropagation/?utm_source=chatgpt.com- setting up nodes and edges
- https://github.com/SophieWalden/snakeNeuralNetwork/blob/master/snakeNN.py- neural network example

In [2]:
!pip install librosa soundfile



In [3]:
# import relevant libraries
import numpy as np
import librosa
import librosa.display
import pygame
import time
import matplotlib.pyplot as plt
import random
import sys
from sklearn.preprocessing import MinMaxScaler
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler

pygame 2.6.1 (SDL 2.28.4, Python 3.12.2)
Hello from the pygame community. https://www.pygame.org/contribute.html


In [4]:
# implemented same way as from feature extraction doc

# predefined colors for each note (implement user entry later)
NOTE_RGB = {
    "C":  (214, 174, 16),
    "C#": (115, 58, 75),
    "D":  (38, 63, 145),
    "D#": (68, 118, 67),
    "E":  (211, 87, 49),
    "F":  (159, 194, 76),
    "F#": (195, 10, 103),
    "G":  (255, 156, 223),
    "G#": (37, 149, 150),
    "A":  (155, 152, 223),
    "A#": (10, 107, 62),
    "B":  (238, 145, 50)
}
NOTE_NAMES = ["C","C#","D","D#","E","F","F#","G","G#","A","A#","B"]

# predefined cluster shapes 
CLUSTER_SHAPES = {
    0: "diamond", 
    1: "circle",    
    2: "wave",        
    }

def extract_audio_features(audio_path: str, duration: float = None, HOP_LENGTH: int = 2048, FRAME_LENGTH: int = 2048 ):
    """
    This function loads an audio file and extracts audio features: pitch (f0), loudness (RMS energy), and timbre (MFCC) for each beat frame.
    It also normalizes rms and MFCC on a 0-1 scale for visual mapping.
    
    Parameters:
        audio_path: str
            path to any audio file format (.wav, .mp3, .ogg, etc.)
        duration: float, optional
            Duration from the start of the file (seconds). 
            Loads the full track if none. 
        hop_length: int, fixed
            Number of samples between frames.
        frame_length: int, fixed
            Number of samples per frame. 

    Returns:
         audio_features : dict
            times: array of frame timestamps,
            midi: midi number ,
            rms: normalized loudness,
            mfcc: normalized MFCC matrix
    """

    # load audio file
    y, sr = librosa.load(audio_path, duration=duration)

    # 1. Pitch (f0) extraction
    f0, voiced_flag, voiced_probs = librosa.pyin(y,
                                             sr=sr,
                                             frame_length = FRAME_LENGTH,
                                             hop_length = HOP_LENGTH,
                                             fmin=librosa.note_to_hz('A0'),
                                             fmax=librosa.note_to_hz('C8'))
    f0 = np.nan_to_num(f0, nan=np.nanmean(f0))  
    # if there are NaN values, replace them with the mean pitch
    midi = librosa.hz_to_midi(f0)
    # convert frequency to midi note number

    # 2. Loudness(RMS energy) extraction
    rms = librosa.feature.rms(y=y,frame_length=FRAME_LENGTH, hop_length=HOP_LENGTH)[0]
    
    # 3. Timbre (MFCC) extraction
    mfcc = librosa.feature.mfcc(y=y, sr=sr, n_mfcc=13, hop_length=HOP_LENGTH)

    # time stamps for mapping to pygame 
    times = librosa.times_like(rms, sr=sr, hop_length=HOP_LENGTH)

    # normalize numerical features for mapping (0â€“1 range)
    def normalize(x):
        min = np.min(x)
        max = np.max(x)
        denom = max - min
        if denom == 0:
            return np.zeros_like(x)  
        return (x - min) / denom
    
    rms_norm = normalize(rms)
    mfcc_norm = np.apply_along_axis(normalize, 1, mfcc)
    
    # K-Means Clustering on MFCC to classify timbre into 3 general groups
    mfcc_T = mfcc_norm.T   # take transpose of MFCC so each row is a time stamp and cols are the 13 features
    scaler = StandardScaler()  
    mfcc_scaled = scaler.fit_transform(mfcc_T) # standardize each coefficent
    kmeans = KMeans(n_clusters=3, random_state=0)
    cluster_labels = kmeans.fit_predict(mfcc_scaled)

    # dictionary of audio features for the funciton to return
    audio_features = {
       "times" : times,
       "midi" : midi,
       "rms" : rms_norm,
       "mfcc" : mfcc_norm,
       "cluster_labels" : cluster_labels,
       "sr" : sr 
       }
    return audio_features

In [5]:
class Node:
    # creates class for nodes
    def __init__(self, x, y, node_type="input", feature=None):
        # init function, takes in self, x, y (referring to coordinate positions in x and y planes), what layer the node is a part of
        self.x = x
        # node's x position
        self.y = y
        # refers to node's y position
        self.type = node_type
        # refers to what layer the node is a part of- this neural network will have three layers (input, association, output/visual), helps for display
        self.feature = feature
        # pitch, rms, timbre --> what each node will represent
        self.activation = 0.0
        # current activation level for node
        self.pulse_size = 1.0
        # how much the node will pulse
        self.edges = []
        # initialize empty list to hold edges
    def add_edge(self, target_node, weight= 0.1):
        self.edges.append(Edge(self,target_node,weight))
        # actually able to add the edge between noe

class Edge:
    def __init__(self, start_node, end_node, weight= 0.1):
        """
        sends activation from starting node to target node. 
        """
        # this function describes how to add edges
        self.start_node = start_node
        # sets start node as where signal begins
        self.end_node = end_node
        # sets end node, where edge will be connected to/end
        self.weight = weight
        # strength of edge/connection between nodes with a higher value meaning more connection (this weight will be modified)
    def propagate(self):
        signal = self.start_node.activation * self.weight
        # self.from_node.activation = node we are starting at, activation is coming from audio and its extracted features
        # weight refers to strength of connective edges between nodes
        # multiplying them together to get the size of signal - if activation and weight are both high, then there's a big signal
        # if weight is low, then there's a weak signal, but if the node isn't activated at all (or weak) then no signal sent (or just super weak)
        self.end_node.activation = self.end_node.activation + signal
        # adds the already existent activity in the ending node of the edge and adds the signal propagating through to it


In [7]:
class NeuralVisualizer:
    """
    Create neural visualizer class that will create a visualization of the neural network output. 
    Consists of three layers, input, association, output, each with separate nodes. Meant to simulate a neural network.
    """
    def __init__(self,audio_path,features,height=800,width=1000, fps=60, learning_rate=0.01):
        # init function, takes in audio path to connect to audio, features as created above (shapes, colors, etc), size of image, frames per second, and learning rate
        pygame.init()
        # initialize all pygame parts
        pygame.mixer.init()
        # initialize audio playback
        pygame.display.set_caption("Chromesthesia Neural Visualizer")
        # captions with informative title
        self.clock = pygame.time.Clock()
        # control frame rate and time
        self.learning_rate = learning_rate
        # used for learnign rate of node associations - will change over time
        self.width = width 
        # set width display
        self.height = height
        # set height display
        self.screen = pygame.display.set_mode((self.width, self.height))
        # this creates the display/window for output and sets it to width/height set above
        self.fps = fps
        # frames per second- controls how often this is updated in the window

        self.start_time = 0
        # when playback starts for audio and visualization
        self.current_frame = 0
        # tracks frames of visualization, intialize to zero to start
        self.running = False
        # don't start running yet

        self.audio_path = audio_path
        # connects to audio
        self.times = features["times"]
        # timestamps for each frame
        self.midi = features["midi"]
        # takes midi notes- pitch in midi numbers associated with color
        self.rms = features["rms"]
        # loudness- used for scaling
        self.mfcc = features["mfcc"]
        # timbre coefficients
        self.sr = features["sr"]
        # sample rate of audio
        self.cluster_labels = features["cluster_labels"]
        # timbre clusters 

        self.input_nodes = []
        # input layer nodes- represents raw features of audio- pitch, timbre, loudness
        self.association_nodes = []
        # association/middle layer of nodes- create the chromestesia associations- color, shape
        self.output_nodes = []
        # output/visualization layer- visual representation of nodes
        self.max_radius = 20
        # just sets a maximum size a node can grow to

        
        n_inputs = 17
        # pitch nodes = 12, timbre clusters = 3, rhythm = 1, loudness=1 ... add together
        for i in range(n_inputs):
            # create one node per input feature
            x = random.randint(50, width-50)
            # choose a random integer for x positioning between 2 values a,b - 50 is arbitrary, width-50 to ensure it stays wtihin window
            y = random.randint(50, height//3)
            # also randomized for y coordinate, 50 is still arbitrary but consistent iwth top, height//3 (integer division, keep it simple)
            # is so that all nodes are
            # in top third of the screen for the first layer
            self.input_nodes.append(Node(x,y,"input"))
            # appends and adds this node to the input nodes at the x,y coordinate randomized above and labels it as input

        association_inputs = 8
        # associations to draw from raw feature data 
        # for instance: Pitch + Timbre -> color // Pitch + Rhythm -> movement //Loudness + Timbre -> shape size // Timbre cluster -> shape type
        # just different varied combinations of raw features to the association made with it in a person with chromestesia mind
        # more associations means more complex
        for i in range(association_inputs):
            # loop through association input amount
            x = random.randint(100, width-100)
            # same as before, randomize x position 
            # bigger padding since the nodes from above will converge onto less nodes here
            y = random.randint(height//3, height//2)
            # again, randomize y position by taking the middle portion of the display (integer dvision of the three parts to get top third,
            # and then get bottom part so y must be in middle layer)
            # contribute to downward vertical flow
            self.association_nodes.append(Node(x,y,"association"))
            # add the node from this iteration into the association nodes with randomized position and association label

        output_layer = 12
        # 12 outputs- correspond to visual elements users will see (color, shape) (again, higher value, means more complex)
        for i in range(output_layer):
            # loop through output layer amount
            x = random.randint(50,width-50)
            # again, randomly set x coordinate
            # made the value 50 again since the output layer has more nodes than middle layer, richer
            y = random.randint(height//2, height-50)
            # randomize y coordinate, bottom part of display through integer divison and padding so it doesn't pass edge of display
            self.output_nodes.append(Node(x,y,"output"))
            # append node to output nodes with randomized location and output label

        for inputs in self.input_nodes:
            # loop through each input layer node
            for association in self.association_nodes:
                # then loop through each association node
                inputs.add_edge(association, weight=random.uniform(0.02,0.5))
                # add an edge between input and association nodes
                # for now, random weights to simulate how each node is not connected with same strength in our human mind
        for association in self.association_nodes:
            # loop through association layer nodes
            for output in self.output_nodes:
                # then loop through each output node
                association.add_edge(output, weight=random.uniform(0.02,0.5))
                # again add an edge, connect association to output layer with randomly assigned weights
    
    # next two functions also explained in detail in video output file- just relating to starting/stopping play
    def start(self):
        """Start visualization and play audio."""
        
        # load and play audio
        pygame.mixer.music.load(self.audio_path)
        self.start_time = time.time()
        pygame.mixer.music.play()
        self.running = True
        self.current_frame = 0

        # while the visualization is in action
        while self.running and self.current_frame < len(self.times):
            self.stop()
            self.update()
            self.draw()

            if not pygame.mixer.music.get_busy(): # if the audio is no longer playing
                self.running = False

            pygame.display.flip() # make changes appear on display
            self.clock.tick(self.fps)

        pygame.quit()
        sys.exit()
    def stop(self):
        """Handle quit events."""
        for event in pygame.event.get(): # for anything event object 
            if event.type == pygame.QUIT: # if event is of type QUIT (eg. close window)
                self.running = False
                pygame.mixer.music.stop() 
                pygame.quit()
                sys.exit()
    
    def update(self):
        """ 
        Update node and edge activation based on audio
        """
        current_time = time.time()-self.start_time
        # calculate current time- take time right now and subtract by start time (0:0)
        while self.current_frame < len(self.times) and current_time >= self.times[self.current_frame]:
            # loop through while the current frame index is less than the number of total frames and 
            # the current time that the video has played for is more than  the current time frame --> meant to ensure current frame has been triggered
            pitch_node_idx = int(self.midi[self.current_frame]*12)
            # map midi note of current frame to an input node index, hence the multiply by 12 to spread across nodes
            # self.midi is the array of midi numbers extracted from features, indexing at current frame to give MIDI at this time
            # turned to int to prevent float error
            self.input_nodes[pitch_node_idx % len(self.input_nodes)].activation = self.rms[self.current_frame]
            # set activation of input to be the current frame's rms
            # indexes the input nodes to  be the pitch node (wrap around if exceeds input noes through the remainder of length of input nodes)
            # does this by taking pitch nodee (MIDI note integer index, find remainder of that divided by all input nodes in case it exceeds total- avoid idnex error)
            cluster_idx = self.cluster_labels[self.current_frame]
            # finding cluster index by takingthe cluster labels (from k-means clustering of mfcc- which is 0 1 2) based on current frame
            self.input_nodes[12 + cluster_idx].activation = self.rms[self.current_frame]
            # indexing input nodes, 12 represent the pitch (midi notes) and the cluster index is based on the three timbre options, so 
            # this is indexing one of the timbre nodes and activating it based on current frame rms
            self.input_nodes[-1].activation = self.rms[self.current_frame]
            # last node in input layer is for loudness
            # set to be activated based on the current frame's loudness
            self.current_frame = self.current_frame + 1
            # then increase the current frame by one for next loop
            
        for node in self.input_nodes + self.association_nodes:
            # for all the nodes in input layer
            for edge in node.edges:
                # for all the edges for the specific node
                edge.end_node.activation = edge.end_node.activation + node.activation * edge.weight * 3.0 
                # the end node is activated by adding that node's activation by current node activation multiplied by the weight of the edge (scaled by 1.5)
                edge.end_node.activation = min(edge.end_node.activation, 1.0) 
                # the end node activation capped at 1
        
        for node in self.input_nodes + self.association_nodes + self.output_nodes:
            # for every node in all three layers
            node.activation = node.activation*0.85
            # the node activation is reduced slightly each frame (fading out) by multipling current activation by 0.85- smoother

    def draw(self):
        # draw on pygame display so user has visual output
        self.screen.fill((0,0,0))
        #clear display to black at the start of the frame
        layer_colors = { "input": (0,128, 255), "association":(255, 128, 0), "output": (128,255,0) }
        # layer base colors, input is blue, association is orange, output is green        
        
        for node in self.input_nodes + self.association_nodes:
            # for node in input and association node displays
            for edge in node.edges:
                # all edges for current node
                color_intensity = (edge.weight*200) 
                # set color intensity based on edge weight but make the max 200 (not 255 to prevent white from taking over)
                color_intensity = max(0,min(255,color_intensity))
                #set color intensity to be the set color intensity, unless it exceeds 255 then max it out there
                if node.type == "input":
                    # if the node is in the input layer
                    edge_color = (0,0,color_intensity)
                    # change the intensity of the blue part of rgb -> shades of blue
                else:
                    # only other layer looped through is assoication here
                    edge_color = (color_intensity, 128,0)
                    # alter edge color based on color intensity of red -> shades of orange
                pygame.draw.line(self.screen, edge_color, (int(edge.start_node.x), int(edge.start_node.y)),(int(edge.end_node.x), int(edge.end_node.y)), max(1, int(edge.weight*3)))
                # self.screen is where we draw the line
                # alter red and green color intensities based on what they are set as before for rgb, but keep blue tint the same for consistency
                # start drawing at the current start node coordinates and finish drawing line at end note coordinates
                # width is set at end to ensure each line is at least 1 pixel, but really trying to set it at the weight of the edge (multiply by 3)
        for node in self.input_nodes + self.association_nodes + self.output_nodes:
            #iterate over all nodes including output nodes now
            radius= max(1,min(self.max_radius, int(self.max_radius * node.activation +5)))
            # choose circle radius, 1 is the minimum radius (make sure it shows up), then it can either be the max radius (set above) or the integer of
            # the self radius * node activation + 5 (to ensure low activatioin ones still noticeable) to increase it
            intensity = max(50, min(255, int(255*node.activation)))
            # visibility/intensity of color
            if node.type == "input":
                color = (0,0,intensity)
                # shades of blue altered in rgb
            if node.type == "association":
                color = (intensity,128,0)
                # shades of orange altered by altering red in rgb
            if node.type == "output":
                color = (128,intensity, 0)
                # shades of green altered in rgb
            if node.type == "input":
                pygame.draw.circle(self.screen,color,(int(node.x),int(node.y)), radius)
            elif node.type == "association":
                size = radius * 2
                pygame.draw.rect(self.screen,color,(int(node.x - radius), int(node.y - radius), size, size))
            elif node.type == "output":
                pygame.draw.circle(self.screen,color,(int(node.x),int(node.y)), radius)

# test case
audio_path = "the-best-jazz-club-in-new-orleans-164472.mp3"
# audio set to be jazz music
features = extract_audio_features(audio_path, duration=30)
# extract features
product = NeuralVisualizer(audio_path, features)
# use the neural visualizer!
product.start()
# start

    

SystemExit: 

  warn("To exit: use 'exit', 'quit', or Ctrl-D.", stacklevel=1)
