### Artist Valuation – How can we measure/forecast an intrinsic value of an artist?
In stock valuation, various accounting figures and price movements are used to assess health and prospect of a company. <br> Artist valuation can work in a similar way using various popularity measures and how the measures change across time. To make good ‘investments’ in promising artists and to distinguish itself from other industry competitors, MMT can pioneer systematic data-driven approach.
1. Required Input Data Specification: (Weekly) [Artist Evaluation] table, with primary key being [artist ID] and other attributes representing various artist popularity measures on different platforms. 
2. Based on how dynamically the intrinsic values of artists’ change, it would be better to have the table updated at an appropriate frequency (daily, weekly) to have up-to-date information and to track the change.
<br>
3. Expected Output Insights: Visual, geometric mapping/clustering of artists in a human- recognizable form on 2D/3D. Comparable artist groups. Relationship among different metrics.
4. Applicable Data Models: K Nearest Neighbor (K-NN)<br>

### Below is an example how this can be achieved

STEPS
======
1> Load the data

2> Initialise the value of k

3> For getting the predicted class, iterate from 1 to total number of training data points

    1> Calculate the distance between test data and each row of training data. Here we will use Euclidean distance as our distance metric since it’s the most popular method. The other metrics that can be used are Chebyshev, cosine, etc.
    2> Sort the calculated distances in ascending order based on distance values
    3> Get top k rows from the sorted array
    4> Get the most frequent class of these rows
    5> Return the predicted class

In [1]:
import pandas as pd
import numpy as np
import math
import operator

#### Start of STEP 1
# Importing data 
data = pd.read_csv("resources/iris.csv")
#### End of STEP 1

data.head()

Unnamed: 0,SepalLength,SepalWidth,PetalLength,PetalWidth,Name
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa


In [2]:
# Defining a function which calculates euclidean distance between two data points
def euclideanDistance(data1, data2, length):
    distance = 0
    
    for x in range(length):
        distance += np.square(data1[x] - data2[x])
    return np.sqrt(distance)

#Defining our K-NN model

def knn(trainingSet, testInstance, k):
    
    distance = {}
    sort = {}
    
    length = trainingSet.shape[1] #col size
    
    #### Start of STEP 3
    # Calculating euclidean distance between each row of training data and test data
    for x in range(len(trainingSet)):
        #### Start of STEP 3.1
        dist = euclideanDistance(testInstance, trainingSet.iloc[x], length)
        distance[x] = dist[0]
        #### End of STEP 3.1
        
    
    #### Start of STEP 3.2
    # Sorting them on the basis of distance    
  
    sorted_d = sorted(distances.items(), key=operator.itemgetter(1))
    #### End of STEP 3.2
 
    neighbors = []
    
    #### Start of STEP 3.3
    # Extracting top k neighbors
    
    for x in range(k):
        neighbors.append(sorted_d[x][0])
        
    #### End of STEP 3.3
    
    classVotes = {}    
        
    #### Start of STEP 3.4
    # Calculating the most freq class in the neighbors
    
    for x in range(len(neighbors)):
        response = trainingSet.iloc[neighbors[x]][-1]
 
        if response in classVotes:
            classVotes[response] += 1
        else:
            classVotes[response] = 1
            
    #### End of STEP 3.4

    #### Start of STEP 3.5
    
    sortedVotes = sorted(classVotes.items(), key=operator.itemgetter(1), reverse=True)
    return(sortedVotes[0][0], neighbors)

    #### End of STEP 3.5
        

In [3]:
# Creating a dummy testset
testSet = [[7.2, 3.6, 5.1, 2.5]]
test = pd.DataFrame(testSet)

In [None]:
#### Start of STEP 2
# Setting number of neighbors = 1
k = 1
#### End of STEP 2

# Running KNN model
result,neigh = knn(data, test, k)

# Predicted class
print(result)