https://towardsdatascience.com/machine-learning-basics-with-the-k-nearest-neighbors-algorithm-6a6e71d01761
Recommender Systems

At scale, this would look like recommending products on Amazon, articles on Medium, movies on Netflix, or videos on YouTube. Although, we can be certain they all use more efficient means of making recommendations due to the enormous volume of data they process.

However, we could replicate one of these recommender systems on a smaller scale using what we have learned here in this article. Let us build the core of a movies recommender system.

What question are we trying to answer?

Given our movies data set, what are the 5 most similar movies to a movie query?

Gather movies data

If we worked at Netflix, Hulu, or IMDb, we could grab the data from their data warehouse. Since we don’t work at any of those companies, we have to get our data through some other means. We could use some movies data from the UCI Machine Learning Repository, IMDb’s data set, or painstakingly create our own.

Explore, clean, and prepare the data

Wherever we obtained our data, there may be some things wrong with it that we need to correct to prepare it for the KNN algorithm. For example, the data may not be in the format that the algorithm expects, or there may be missing values that we should fill or remove from the data before piping it into the algorithm. 
Our KNN implementation above relies on structured data. It needs to be in a table format. Additionally, the implementation assumes that all columns contain numerical data and that the last column of our data has labels that we can perform some function on. So, wherever we got our data from, we need to make it conform to these constraints.

In [2]:
from knn_from_scratch import knn, euclidean_distance


In [8]:
def recommend_movies(movie_query, k_recommendations):
    
    raw_movies_data = []  # define empty list
    with open('movies_recommendation_data.csv', 'r') as md:
        # Discard the first line (headings)
        next(md)
        
        # Read the data into memory
        for line in md.readlines():
            data_row = line.strip().split(',')
            raw_movies_data.append(data_row)
            print(data_row)
        
            '''
            # raw_input()          #' insert 0 5     '
            raw_input().strip()  #'insert 0 5'
            raw_input().strip().split()  #['insert', '0', '5']
            
            '''
        # Prepare the data for use in the knn algorithm by picking the relevant columns and 
        # converting the numeric columns to numbers since they were read in as strings
        
    movies_recommendation_data = []
    for row in raw_movies_data:
        data_row = list(map(float, row[2:]))  # mapping from 2nd column from data to data_row
        movies_recommendation_data.append(data_row)
            
    # Use the KNN algorithm to get the 5 movies that are most similar to the post
        
    recommendation_indices, _ = knn(movies_recommendation_data, movie_query, k = k_recommendations,
                                   distance_fn=euclidean_distance, choice_fn = lambda x: None)
        
    movie_recommendations = []
        
    for _, index in recommendation_indices:
        movie_recommendations.append(raw_movies_data[index])
        #print(movie_recommendations)    
        return movie_recommendations  
        '''

      Sometimes lambda functions are used for data transformation, and in that case 'do nothing' means to return 
      the input, i.e.

      lambda x: x

      To return none you can write

      lambda x: None
      
      '''
    '''

Underscore _ is considered as "I don't Care" or "Throwaway" variable in Python

    The python interpreter stores the last expression value to the special variable called _.

    >>> 10 
    10

    >>> _ 
    10

    >>> _ * 3 
    30

    The underscore _ is also used for ignoring the specific values. If you don’t need the specific values or the values are not used, just assign the values to underscore.

    Ignore a value when unpacking

    x, _, y = (1, 2, 3)

    >>> x
    1

    >>> y 
    3

    Ignore the index

    for _ in range(10):     
        do_something()

'''  
    
    

In [9]:
the_post = [7.2, 1, 1, 0, 0, 0, 0, 1, 0] # feature vector for the Post

recommended_movies = recommend_movies(movie_query=the_post, k_recommendations=5)

# Print recommended movie titles

for recommendation in recommended_movies:
    print(recommendation[1])

['58', 'The Imitation Game', '8', '1', '1', '1', '0', '0', '0', '0', '0']
['8', 'Ex Machina', '7.7', '0', '1', '0', '0', '0', '1', '0', '0']
['46', 'A Beautiful Mind', '8.2', '1', '1', '0', '0', '0', '0', '0', '0']
['62', 'Good Will Hunting', '8.3', '0', '1', '0', '0', '0', '0', '0', '0']
['97', 'Forrest Gump', '8.8', '0', '1', '0', '0', '0', '0', '0', '0']
['98', '21', '6.8', '0', '1', '0', '0', '1', '0', '1', '0']
['31', 'Gifted', '7.6', '0', '1', '0', '0', '0', '0', '0', '0']
['3', 'Travelling Salesman', '5.9', '0', '1', '0', '0', '0', '1', '0', '0']
['51', 'Avatar', '7.9', '0', '0', '0', '0', '0', '0', '0', '0']
['47', 'The Karate Kid', '7.2', '0', '1', '0', '0', '0', '0', '0', '0']
['50', 'A Brilliant Young Mind', '7.2', '0', '1', '0', '0', '0', '0', '0', '0']
['49', 'A Time To Kill', '7.4', '0', '1', '1', '0', '1', '0', '0', '0']
['30', 'Interstellar', '8.6', '0', '1', '0', '0', '0', '0', '0', '0']
['94', 'The Wolf of Wall Street', '8.2', '1', '0', '0', '1', '1', '0', '0', '0']
[

In [None]:
12 Years a Slave[8.1', '1', '1', '0', '0', '0', '0', '1', '0']
  Queen of Katwe[7.4', '1', '1', '0', '0', '0', '0', '0', '0']   
  Hacksaw Ridge [8.2', '1', '1', '0', '0', '0', '0', '1', '0'] 
  The Wind Rises[7.8', '1', '1', '0', '0', '0', '0', '0', '0']  
A Beautiful Mind[8.2', '1', '1', '0', '0', '0', '0', '0', '0']      
        the_post[7.2', '1', '1', '0', '0', '0', '0', '1', '0']              