<a href="https://colab.research.google.com/github/saidyaka/saywar.github.io/blob/master/Instrument_Classifier.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Install needed code and data

We will need the following:
* Deep learning package [Pytorch](https://pytorch.org/)
* Sound file reading software [PySoundFile](https://pysoundfile.readthedocs.io/en/latest/) 
* Data plotting package [Plotly](https://plotly.com/). 
* Command-line audio processing software [SoX](https://pypi.org/project/sox/)
* Tool for audio file format conversion [ffmpeg](https://ffmpeg.org/)

In [None]:
# Uncomment following magic command to keep the output from the install processs from showing on the screen.
%%capture 
!pip install "torchtext==0.8.0"
!pip install "torchvision==0.8.2"
!pip install "torch==1.7.1"
!pip install "pytorch-lightning==1.2.2" 
!pip install pysoundfile
!pip install plotly
!apt-get install sox libsox-dev libsndfile libsndfile-dev
!apt-get install ffmpeg

For this, we'll be using [a subset of the Philharmonia dataset](https://github.com/hugofloresgarcia/philharmonia-dataset/tree/master) that contains 19 instrument classes.  Run the following code to install the dataset and a [Pytorch port of OpenL3](https://github.com/hugofloresgarcia/torchopenl3) from their respective github repositories. 


In [None]:
# Uncomment following magic command to keep the output from the install processs from showing on the screen.
%%capture 
%load_ext autoreload

# Create local directories to hold the data and the trained embedding model 
!mkdir lib 
!mkdir data

# install torchopenl3
!git clone https://github.com/hugofloresgarcia/torchopenl3 lib/torchopenl3
!cd lib/torchopenl3 && pip install -e . && cd ../..

# install philharmonia dataset custom dataset software
!git clone https://github.com/hugofloresgarcia/philharmonia-dataset lib/philharmonia
!cd lib/philharmonia && pip install -e . && cd ../..

Once you've installed from GitHub repositories, you may need to restart the runtime so that it recognizes the installed stuff. 

In [None]:
# restart runtime
import os
os.kill(os.getpid(), 9)

I found that, after the kernel restart, things installed with apt-get need to 
be reinstalled. So I put this line here.
NOTE: You may need to run this next cell twice, if you're in Colab.

In [None]:
# Try to install Sox AGAIN
# NOTE: You may need to run this twice, if you're in Colab.
!apt-get install sox libsox-dev libsndfile1 libsndfile1-dev
##!apt-get install sox libsox-dev libsndfile libsndfile-dev
!apt-get install ffmpeg

Reading package lists... Done
Building dependency tree       
Reading state information... Done
libsndfile1 is already the newest version (1.0.28-4ubuntu0.18.04.1).
libsndfile1-dev is already the newest version (1.0.28-4ubuntu0.18.04.1).
libsox-dev is already the newest version (14.4.2-3ubuntu0.18.04.1).
sox is already the newest version (14.4.2-3ubuntu0.18.04.1).
0 upgraded, 0 newly installed, 0 to remove and 29 not upgraded.
Reading package lists... Done
Building dependency tree       
Reading state information... Done
ffmpeg is already the newest version (7:3.4.8-0ubuntu0.2).
0 upgraded, 0 newly installed, 0 to remove and 29 not upgraded.


Now we can upload the dataset. Note this is a pytorch dataset object. It is worth your time to read up on pytorch datasets and dataloaders. Here's a [tutorial](https://pytorch.org/tutorials/beginner/data_loading_tutorial.html) on the basic use of dataloaders and datasets. Here's [another tutorial](https://pytorch.org/tutorials/recipes/recipes/loading_data_recipe.html) that explains how to make custom datasets.

*NOTE 1: You may need to try a couple of times to install SOX (above) before the dataset creation works.*

*NOTE 2: Although you downloaded the dataset to your drive in a previous step, you are now going to be loading it into a data structure. As a step in this, each of the .mp3 files in the dataset is turned into .wav format. This takes time. Like 20 minutes of time, if you're on Colab. Be prepared.*

In [None]:

# Import Pytorch and its dataset module
import torch
import torch.utils.data as Data

from philharmonia_dataset import PhilharmoniaDataset
SAMPLE_RATE = 48000 # required for pretrained OpenL3

# create a  Pytorch dataset object
dataset = PhilharmoniaDataset(
			root='./data/philharmonia', 
			download=True, 
      sample_rate=SAMPLE_RATE,
)

print('the size of my dataset is ', len(dataset))

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
INFO:root:processing: violin_D5_025_mezzo-piano_arco-normal.mp3
INFO:root:processing: english-horn_Gs3_05_pianissimo_normal.mp3
INFO:root:processing: violin_Cs6_1_mezzo-piano_non-vibrato.mp3
INFO:root:processing: english-horn_Gs5_05_forte_normal.mp3
INFO:root:processing: violin_Fs7_1_piano_arco-normal.mp3
INFO:root:processing: english-horn_E3_phrase_mezzo-forte_fluttertonguing.mp3
INFO:root:processing: violin_Cs6_1_piano_arco-col-legno-tratto.mp3
INFO:root:processing: english-horn_Ds4_15_fortissimo_normal.mp3
INFO:root:processing: violin_C5_1_mezzo-piano_con-sord.mp3
INFO:root:processing: english-horn_As5_05_pianissimo_normal.mp3
INFO:root:processing: violin_G5_phrase_mezzo-forte_arco-spiccato.mp3
INFO:root:processing: english-horn_Ds5_05_pianissimo_normal.mp3
INFO:root:processing: violin_As3_1_mezzo-forte_arco-sul-ponticello.mp3
INFO:root:processing: english-horn_Fs5_1_fortissimo_normal.mp3
INFO:root:processing: violin_G

the size of my dataset is  13531


## Loading, encoding and examining the Philharmonia Dataset (8 points)


#### 7. (1 point) Write a function that lets a user see the spectrogram and hear the audio for a single example, in accordence with the provided docstring.  (*Hint: librosa, Ipython.display and matplotlib are helpful here*) 

In [None]:
import numpy as np
import librosa
import librosa.display
import matplotlib.pyplot as plt
import IPython
from IPython.display import Audio

def LookAndListen(item):
  """ Display the spectrogram for an item in the Philharmonia dataset and create a 
  play button to hear the example. The instrument, pitch, and dynamic are also printed 
  in the title of the spectrogram.

  Parameters
  ----------
  item      - An item in the Philharmonia dataset

  """
  print(item)
  sr = 44100
  aud= item.get("audio")
  print(aud)
  ins =item.get("instrument")
  pitch =item.get("pitch")
  link = item.get("articulation")

  plt.figure()
  aaud = librosa.stft(np.squeeze(aud))
  aaaud = librosa.amplitude_to_db(np.abs(aaud), ref=np.max)
  librosa.display.specshow(aaaud)
  plt.colorbar()
  
  IPython.display.display(Audio(aud,rate=sr))

# Your code goes here.

In [None]:
# Leave this here. It's an easy way to test your LookAndListen function
import random
item = dataset[random.randint(0,len(dataset))]
LookAndListen(item)


#### Some helpful code to understand how to do embeddings


OpenL3 comes in a couple of flavors. We can choose from:

- **input representations**: `mel128` or `mel256`. `linear` coming soon
- **content types**: `music` or `env`. The `music` model variant was trained on music, while the `env` was trained on environmental sounds. 
- **embedding size**: output embedding size. Either 512 or 6144. 

Let's load a model! We will choose the `mel128`, `music`, `512` variant.

In [None]:
import torchopenl3
import torch

model = torchopenl3.OpenL3Embedding(input_repr='mel128', 
                                    embedding_size=512, 
                                    content_type='music')

We can use `torchopenl3.embed` to compute an embedding from a `numpy` array that we assume to be an audio waveform. 

It calculates an 512-dimensional embedding for each 1 second of audio, with a hop size defined by the `hop_size` argument (in seconds). We use a hop size of 1 second. 

Each audio file in this dataset has a different duration, the model will compute multiple embeddings for files longer than 1 second.  Below is example code to compute the embedding for a single file in the dataset.

In [None]:
import numpy as np

# If you have a GPU available and CUDA drivers installled, 
# then this line makes things go faster.
DEVICE = 'cuda' if torch.cuda.is_available() else 'cpu' # use GPU if we can!
print(DEVICE)
# This will be a 2-D nunmpy array embedding[t][d], where t is the timestep and d 
# is the number of dimensions. Each embedding is calculated on 1 second of audio. 
# The hop size determines whether those embeddings  are overlapped.
item = dataset[100]
embedding = torchopenl3.embed(model=model, 
                                audio=item['audio'],
                                sample_rate=SAMPLE_RATE, 
                                hop_size=1, 
                                device=DEVICE)
IPython.display.display(Audio(item['audio'],rate=48000))
print(embedding.shape)
me = np.mean(embedding,axis=0)
print(me.shape)
print(me.shape)

#### 8. (1 point) Create embeddings for the dataset. Then, for each example encoded as an embedding, calculate the mean value (over time) of its embedding, so that every example is represented by a 512 element mean embedding. From here onward, when we refer to embeddings, it will mean the mean embeddings.

*Hint: This step is slow. On Colab it takes around 20 minutes with CUDA on. Do yourself a favor when you make the embeddings and extract the one_hot labels and the instrument names, while you do this so that you don't have to iterate over the original dataset again*

*Hint: If you save your embeddings, one-hot-labels and instrument names to file, then you never have to redo this step.* 

In [None]:
import numpy as np
from google.colab import drive
drive.mount('/content/drive')
# If you have a GPU available and CUDA drivers installled, 
# then this line makes things go faster.
DEVICE = 'cuda' if torch.cuda.is_available() else 'cpu' # use GPU if we can!
print(DEVICE)
# This will be a 2-D nunmpy array embedding[t][d], where t is the timestep and d 
# is the number of dimensions. Each embedding is calculated on 1 second of audio. 
# The hop size determines whether those embeddings  are overlapped.
SAMPLE_RATE = 48000
me = np.zeros([len(dataset), 512])

instruments  = np.zeros(0)
ones = np.zeros([len(dataset), 19])
for i in range(len(dataset)):
  item = dataset[i]
  embedding = torchopenl3.embed(model=model, 
                                audio=item['audio'],
                                sample_rate=SAMPLE_RATE, 
                                hop_size=1, 
                                device=DEVICE)
  me[i] = np.mean(embedding,axis=0)
  instruments = np.append(instruments,item.get("instrument"))
  ones[i] = item.get("one_hot")
  print(i)





np.save('/content/drive/My Drive/SavedArrays/embed.npy', me, allow_pickle=True, fix_imports=True)
np.save('/content/drive/My Drive/SavedArrays/instruments.npy', instruments, allow_pickle=True, fix_imports=True)
np.save('/content/drive/My Drive/SavedArrays/onehs.npy', ones, allow_pickle=True, fix_imports=True)
'''
labels = np.zeros(0)
instruments  = np.zeros(0)
for item in dataset:
  instruments = np.append(instruments,item.get("instrument"))

np.save('/content/drive/My Drive/SavedArrays/instruments.npy', instruments, allow_pickle=True, fix_imports=True)
a = np.load('/content/drive/My Drive/SavedArrays/arrs.npy')
print(a)
'''



[1;30;43mStreaming output truncated to the last 5000 lines.[0m
8531
8532
8533
8534
8535
8536
8537
8538
8539
8540
8541
8542
8543
8544
8545
8546
8547
8548
8549
8550
8551
8552
8553
8554
8555
8556
8557
8558
8559
8560
8561
8562
8563
8564
8565
8566
8567
8568
8569
8570
8571
8572
8573
8574
8575
8576
8577
8578
8579
8580
8581
8582
8583
8584
8585
8586
8587
8588
8589
8590
8591
8592
8593
8594
8595
8596
8597
8598
8599
8600
8601
8602
8603
8604
8605
8606
8607
8608
8609
8610
8611
8612
8613
8614
8615
8616
8617
8618
8619
8620
8621
8622
8623
8624
8625
8626
8627
8628
8629
8630
8631
8632
8633
8634
8635
8636
8637
8638
8639
8640
8641
8642
8643
8644
8645
8646
8647
8648
8649
8650
8651
8652
8653
8654
8655
8656
8657
8658
8659
8660
8661
8662
8663
8664
8665
8666
8667
8668
8669
8670
8671
8672
8673
8674
8675
8676
8677
8678
8679
8680
8681
8682
8683
8684
8685
8686
8687
8688
8689
8690
8691
8692
8693
8694
8695
8696
8697
8698
8699
8700
8701
8702
8703
8704
8705
8706
8707
8708
8709
8710
8711
8712
8713
8714
8715
8716
8717


'\nlabels = np.zeros(0)\ninstruments  = np.zeros(0)\nfor item in dataset:\n  instruments = np.append(instruments,item.get("instrument"))\n\nnp.save(\'/content/drive/My Drive/SavedArrays/instruments.npy\', instruments, allow_pickle=True, fix_imports=True)\na = np.load(\'/content/drive/My Drive/SavedArrays/arrs.npy\')\nprint(a)\n'

#### 9. (1 point) Within each instrument class, calculate the number of examples, as well as the mean of the embeddings for the examples in that class.  

In [None]:
# Your code goes here.
import numpy as np
from google.colab import drive
drive.mount('/content/drive')
a = np.load('/content/drive/My Drive/SavedArrays/instruments.npy')
b = np.load('/content/drive/My Drive/SavedArrays/embed.npy')
inst = ['banjo', 'bass-clarinet', 'bassoon', 'cello' , 'clarinet' , 'contrabassoon', 'double-bass' , 'english-horn' , 'flute' , 'french-horn' , 'guitar', 'mandolin' , 'oboe' , 'saxophone', 'trombone', 'trumpet', 'tuba', 'viola' , 'violin']
gg = np.zeros([19, 512])


occ = np.zeros(19)
for i in range(19):
  ind = np.where(a == inst[i])

  ind = ind[0]
  c = b[ind]
  #print(ind)
  mean = np.mean(c , axis=0)
  gg[i] = mean
  occ[i] = len(ind)
  print("Instrument:" + inst[i] + "| "+str(occ[i]) + " occurences \n")





Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


FileNotFoundError: ignored

#### 10. (1 point) Write code to print out 3 things:
* The number of examples of each instrument class
* The pair of classes with the closest means to each other
* The pair of classes with the most distant means from each other

Be sure to include an explanation of how you measured distance.


In [None]:
# Your code goes here.
import numpy as np
from google.colab import drive
drive.mount('/content/drive')
a = np.load('/content/drive/My Drive/SavedArrays/instruments.npy')
b = np.load('/content/drive/My Drive/SavedArrays/embed.npy')
inst = ['banjo', 'bass-clarinet', 'bassoon', 'cello' , 'clarinet' , 'contrabassoon', 'double-bass' , 'english-horn' , 'flute' , 'french-horn' , 'guitar', 'mandolin' , 'oboe' , 'saxophone', 'trombone', 'trumpet', 'tuba', 'viola' , 'violin']
gg = np.zeros([19, 512])


occ = np.zeros(19)
for i in range(19):
  ind = np.where(a == inst[i])

  ind = ind[0]
  c = b[ind]
  #print(ind)
  mean = np.mean(c , axis=0)
  gg[i] = mean
  occ[i] = len(ind)
  print("Instrument:" + inst[i] + "| "+str(occ[i]) + " occurences \n")
min = float('inf')
max = 0
closest = ['a', 'b']
furthest = ['c' , 'd']

for i in range(18):
  for j in range(i+1,19):
    dist =  np.linalg.norm( gg[i] - gg[j])
    if dist < min:
      min = dist
      closest = [inst[i], inst[j]]
    if dist > max: 
      max = dist 
      furthest = [inst[i], inst[j]]
print(closest)
print(furthest)

a = [0,0,0]
b = [5,5,5]
#dist =  np.linalg.norm(a - b)
print(max)
print(min)


Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
13531
Instrument:banjo| 74.0 occurences 

Instrument:bass-clarinet| 943.0 occurences 

Instrument:bassoon| 720.0 occurences 

Instrument:cello| 889.0 occurences 

Instrument:clarinet| 846.0 occurences 

Instrument:contrabassoon| 710.0 occurences 

Instrument:double-bass| 852.0 occurences 

Instrument:english-horn| 691.0 occurences 

Instrument:flute| 878.0 occurences 

Instrument:french-horn| 652.0 occurences 

Instrument:guitar| 105.0 occurences 

Instrument:mandolin| 80.0 occurences 

Instrument:oboe| 596.0 occurences 

Instrument:saxophone| 732.0 occurences 

Instrument:trombone| 831.0 occurences 

Instrument:trumpet| 485.0 occurences 

Instrument:tuba| 972.0 occurences 

Instrument:viola| 973.0 occurences 

Instrument:violin| 1502.0 occurences 

['viola', 'violin']
['guitar', 'saxophone']
13.667629263858508
3.2214201575595904


##### A helper function to plot embeddings

In [None]:
import pandas as pd
import plotly.express as px
def plot_embeddings(emb, labels, method='tsne', n_components=3, title=''):
    """
    This performes dimensionality reduction for visualization and then
    returns a plotly figure to visualize a 2d or 3d dimensional reduction 
    of your data.

    Parameters
    ----------
        emb (np.ndarray): the samples to be reduced with shape (n, features)
        method (str): one of "tsne", "umap" or "pca"
        labels (list): list of labels for embedding with shape (n)
        title (str): title for the figure

    Returns
    -------    
        fig (plotly figure): 
    """
    if method == 'umap':
        import umap
        reducer = umap.UMAP(n_components=n_components)
    elif method == 'tsne':
        from sklearn.manifold import TSNE
        reducer = TSNE(n_components=n_components)
    elif method == 'pca':
        from sklearn.decomposition import PCA
        reducer = PCA(n_components=n_components)
    else:
        raise ValueError(f'dunno how to do {method}')
 
    proj = reducer.fit_transform(emb)

    if n_components == 2:
        df = pd.DataFrame(dict(
            x=proj[:, 0],
            y=proj[:, 1],
            instrument=labels
        ))
        fig = px.scatter(df, x='x', y='y', color='instrument',
                        title=title)

    elif n_components == 3:
        df = pd.DataFrame(dict(
            x=proj[:, 0],
            y=proj[:, 1],
            z=proj[:, 2],
            instrument=labels
        ))
        fig = px.scatter_3d(df, x='x', y='y', z='z',
                        color='instrument',
                        title=title)

    else:
        raise ValueError("cant plot more than 3 components")

    fig.update_traces(marker=dict(size=6,
                                  line=dict(width=1,
                                            color='DarkSlateGrey')),
                      selector=dict(mode='markers'))

    return fig

In [None]:
import numpy as np
from google.colab import drive
drive.mount('/content/drive')
a = np.load('/content/drive/My Drive/SavedArrays/instruments.npy')
b = np.load('/content/drive/My Drive/SavedArrays/embed.npy')
inst = ['banjo', 'bass-clarinet', 'bassoon', 'cello' , 'clarinet' , 'contrabassoon', 'double-bass' , 'english-horn' , 'flute' , 'french-horn' , 'guitar', 'mandolin' , 'oboe' , 'saxophone', 'trombone', 'trumpet', 'tuba', 'viola' , 'violin']

labels = []


indg = np.where(a == 'guitar')
indg = indg[0]


inds = np.where(a == 'saxophone')
inds = inds[0]
indlin = np.where(a == 'violin')
indlin = indlin[0]
indla = np.where(a == 'viola')
indla = indla[0]
l1 = [['guitar'] * len(indg) + ['saxophone'] * len(inds) + ['violin']* len(indlin) + ['viola']* len(indla)]
l1 = l1[0]

c1 = b[indg]

c1 = np.append(c1, b[inds], axis=0)

c1 = np.append(c1,b[indlin], axis=0)

c1 = np.append(c1,b[indla] , axis=0)







plot_embeddings(c1, l1, method='umap')
#plot_embeddings(c1, l1, method='tsne')
#plot_embeddings(c1, l1, method='pca')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


**YOUR ANSWER GOES HERE**

While it does seem like they are pretty mixed looked at from the angle where Y is in the bottom we can clearly see that a lot of the strings instruments are closer to each other, we already knew that Violin and Viola were close to each other but we can see an instrument like Cello also seems to be close to it. 

 After running it multiple times, I realized there basically 2 big clumps of data and they aren't seperated exactly but one clump (mostly bass-clarinet, sax, english horn, trombone, bassoon) second clump (violin, viola double bass,cello) we can see the regular clarinet apparent in both clumps and also french horn.

 
 But once I really zoom in I can see that the elements of the same color happen to be clumped together very close, even though theres different colors near it if we were to do a KNN with an appropriate amount of neighbors we could probably get it right. 

question: doesn't having more samples of one kind corrupt our data, since we have a lot of Violin if we wre to use something like KNN it'll assume a lot of things are violins just because there are too many violin samples.

One thing that took my attention is that theres a huge group of Tubas that happen to be super super far away from everything else that are clumped together, so if we were to use a classifier it could easily tell its a Tuba just by using distance.

In [None]:
## YOUR CODE GOES HERE
import numpy as np
from google.colab import drive
drive.mount('/content/drive')
a = np.load('/content/drive/My Drive/SavedArrays/instruments.npy')
b = np.load('/content/drive/My Drive/SavedArrays/embed.npy')

ind = np.random.randint(len(a), size=2000)
l1 = a[ind]

c1 = b[ind]

plot_embeddings(c1, l1, method='umap')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


#### 13. (1 point) Make 2 more random draws of the data and visualize it the same way, each time. Look at the visualization. Did the patterns you saw in the data repeat themselves every time? What stayed the same and what changed?

**YOUR ANSWER GOES HERE**

regardless of how many times I run it I can always see 2 different big clumps, and that Tuba is really far away by itself. Simiar Instruments always tend to be close to each other, one thing I noticed is that the regular Clarinet and French horn tend be in both clumps

## Designing, building, and testing an instrument class labeler

#### 18. (2 points) Code up and train your classifier here. Make sure to provide clear documentation (doc strings) and well commented code. 

In [None]:
## YOUR CODE GOES HERE
import numpy as np

def knn(data_X, data_Y, query_X, k):
    """
    Takes a data set of examples encoded as feature vectors, along with the label for each example in the data. 
    It also takes in a set of queries, for which we want to know the labels. It finds the distance from each 
    query_X to each example in data_X. It returns a label for each example in query_X by picking the most 
    popular label from the k nearest neighbors in data_X. Distance is determined by the selected distance metric.
    
    Input Parameters
    ----------------
    data_X: a 2-D numpy array with a shape of (the number of examples in the data, the number of features per example).
    data_Y: a 1-D numpy array containing labels for the examples in data_X. Encode the labels as integer values.
    query_X: a 2-D numpy array with a shape of (the number of query examples, the number of features). 
    Note, the query_X must have the same number of features, in the same order as the data_X 
    k: the number of nearest neighbors in the data to consider, when labeling a query
    
    Returns
    ----------------
    query_Y: a 1-D numpy array of integer values referring to predicted labels for the set of queries
    """
    
    
    l = len(query_X)    
    query_Y = [0] * l
    
    for x in range(l):
      distanceArray =[0]* len(data_X) 
      labels = [None] * k
      
      for i in range(len(data_X)):
        distanceArray[i] = np.linalg.norm(query_X[x] - data_X[i]) 
        


    
      a = np.argsort(distanceArray)
      for j in range(k):
        labels[j] = data_Y[a[j]]
      
      ff = np.argmax(labels)
      query_Y[x] = labels[ff]
    # put your code here   

    return query_Y
def evaluation(pred, truth, num_labels):
    '''
    Takes a set of predicted labels and ground truth labels, and compute the 
    classification accuracy. *NOTE* We will assume the set of labels used is 
    drawn from the consecutive counting numbers, starting with zero (e.g. 0,1,2 
    for 3 classes or 0,1,2,3,4,5 for 6 classes). This will let us  use the label 
    as an index into the confusion matrix. 

    Input Parameters
    ----------------
    pred: a 1-D numpy array of integer values which refer to labels predicted from a classifier.
    truth: a 1-D numpy array of integer values which refer to ground truth labels
    num_labels: an integer specifying how many labels there are.

    Returns
    -----------------
    accuracy: a float number indicating the classification accuracy as a number in the range 0 to 1.
    confusion_matrix: a 2-D numpy array containing a confusion matrix of size c by c, where c
            is the number of classes. Here, confusion[t,p] contains the number of times 
            the predicted label p was applied to items whose true class is t. 
    '''
    assert len(pred) == len(truth)
    
    # your code goes here
   
    confusion_matrix = np.array([[0]*num_labels]*num_labels) 
    correct = 0 
    for i in range(len(pred)): 
      maybe = int(pred[i])
      true = int(truth[i])

      confusion_matrix[true,maybe] +=1
      if maybe == true:
        correct+=1

    accuracy = float(correct/len(pred))
    print(accuracy)
    print("\n")
    print(confusion_matrix)

    return accuracy, confusion_matrix
    
def classifier(trainemb, trainlabel, testemb, testlabel ):
  import numpy as np
  import sklearn
  from sklearn.preprocessing import MinMaxScaler
  
  #scaling the data to be able to fit it so it works with the Knn
  scaler = MinMaxScaler()
  scaled = np.concatenate((trainemb, testemb), axis=0) 
  scaledData = scaler.fit_transform(scaled)
  scaledD = scaledData[0:len(trainemb)]
  scaledq = scaledData[len(trainemb) :len(scaledData)]


  #sending our data into a KNN with 3 neighbors, I was able to get the best results with 3 neighbors
  pred = knn(scaledD,trainlabel, scaledq, 3)
  ##creating an accuracy matrix with my results from the knn
  accuracy, mat = evaluation(pred,testlabel ,19)
  return accuracy,mat




#### 19. (2 points) Run the test you designed on your classifier. Report the results. Give your analysis of the strengths and weaknesses of your classifier. What does it label well? What does it do poorly on?

**Your answer goes here**


After running it 5 different times, the lowest accuracy I got was 94% and the highest I got was 96% but its mostly at 95% accuracy total.
I honestly think this is really impressive, as 19 classes is hard and I think it probably does better than I could do myself, because I would struggle a lot between some instruments .

Highest Accuracy rate on an instrument: 

100% Double Bass
99.5% Contrabasoon
99% Tuba

Lowest Accuracy ratings : 
80% Banjo
88.5% Saxophone
91% Mandolin

Banjo has much less total samples so I think that might be a big part of the reason its the worst labeled one.

Overall though I think 95% is very decent, a 1/19 guess is about 5.2% chance to get it right so is ~18 times better than guessing.

In [None]:
#Your supporting code goes here.
import sklearn
from sklearn.utils import shuffle
import numpy as np
from google.colab import drive
from sklearn.utils import shuffle


drive.mount('/content/drive')
a = np.load('/content/drive/My Drive/SavedArrays/instruments.npy')
b = np.load('/content/drive/My Drive/SavedArrays/embed.npy')

inst = ['banjo', 'bass-clarinet', 'bassoon', 'cello' , 'clarinet' , 'contrabassoon', 'double-bass' , 'english-horn' , 'flute' , 'french-horn' , 'guitar', 'mandolin' , 'oboe' , 'saxophone', 'trombone', 'trumpet', 'tuba', 'viola' , 'violin']
  #lets shuffle the embeddings and the labels just so we don't have to worry about "what if they have any type of order" situation this also lets me randomize it everytime I run it

labshuff, embshuff = sklearn.utils.shuffle(a, b)

#Turning my labels into numbers, this is not a big deal because they just get the number they are indexed in so we can switch back and forth but I realized my KNN works much faster with numbers as labels
for i in range(len(labshuff)):
  labshuff[i] = int(inst.index(labshuff[i]))

### splitting the data, I want to make the first 10000 for training purposes
trainemb =  embshuff[0: 10000]
trainlabel = labshuff[0: 10000]

## the remanining ~3000 will be used for testing 
testemb= embshuff[10000 : len(embshuff)]
testlabel = labshuff[10000 : len(labshuff)]

#sending my training labels and embeddings and testing labels and embeddings into my classifier this also automatically prints out my matrix which I can analyze to see which instruments do the best
accuracy,mat = classifier(trainemb, trainlabel,testemb, testlabel)


##printing the accuracy of the whole model
accuracy = accuracy*100
print("Accurate " + str(accuracy)+ "% of the time")


## printing accuracy for each instrument just to see which instruments were better seperated
acc = zeros(0)*19
for i in range(19):
  right = mat[i][i]
  tot = np.sum(mat[i])
  wr = tot-right
  acc[i] = (right/tot)*100
  print("---------------------\nInstrument: " + inst[i] + " \nCorrect: " + str(right) + "\n Wrong: " +  str(wr) + "\n Total: " + str(tot) + "\n Accuracy: " + str(acc) +"%" )







Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
0.9595015576323987


[[ 16   0   0   0   0   0   0   0   0   0   0   4   0   0   0   0   0   0
    0]
 [  0 238   0   1   4   1   0   3   0   3   0   0   0   0   0   0   0   0
    0]
 [  0   0 174   1   5   0   0   5   0   0   0   0   0   0   0   0   0   0
    0]
 [  0   0   0 205   0   2   0   2   0   0   0   0   0   0   0   0   0   0
    0]
 [  0   0   0   0 229   0   1   2   0   0   0   0   0   0   0   0   0   0
    0]
 [  0   0   0   0   0 199   0   1   0   0   0   0   0   0   0   0   0   0
    0]
 [  0   0   0   0   0   0 217   0   0   0   0   0   0   0   0   0   0   0
    0]
 [  0   0   0   0   0   1   0 181   0   1   0   0   0   1   0   0   0   0
    0]
 [  0   0   0   0   0   0   0   0 237   1   0   0   0   0   0   0   0   0
    0]
 [  0   0   0   0   1   0   0   0   1 175   0   0   0   0   0   0   0   0
    0]
 [  0   0   0   0   0   0   0   0   0   