<a href="https://colab.research.google.com/github/nivedita-rajesh/Song-Reccomendation-System/blob/main/Song_Recommender.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Music Recommender System using Apache Spark and Python
**Estimated time: 8hrs**

## Description

For this project, you are to create a recommender system that will recommend new musical artists to a user based on their listening history. Suggesting different songs or musical artists to a user is important to many music streaming services, such as Pandora and Spotify. In addition, this type of recommender system could also be used as a means of suggesting TV shows or movies to a user (e.g., Netflix). 

To create this system you will be using Spark and the collaborative filtering technique. The instructions for completing this project will be laid out entirely in this file. You will have to implement any missing code as well as answer any questions.

**Submission Instructions:** 
* Add all of your updates to this IPython file and do not clear any of the output you get from running your code.
* Upload this file onto moodle.

## Datasets

You will be using some publicly available song data from audioscrobbler, which can be found [here](http://www-etud.iro.umontreal.ca/~bergstrj/audioscrobbler_data.html). However, we modified the original data files so that the code will run in a reasonable time on a single machine. The reduced data files have been suffixed with `_small.txt` and contains only the information relevant to the top 50 most prolific users (highest artist play counts).

The original data file `user_artist_data.txt` contained about 141,000 unique users, and 1.6 million unique artists. About 24.2 million users’ plays of artists are recorded, along with their count.

Note that when plays are scribbled, the client application submits the name of the artist being played. This name could be misspelled or nonstandard, and this may only be detected later. For example, "The Smiths", "Smiths, The", and "the smiths" may appear as distinct artist IDs in the data set, even though they clearly refer to the same artist. So, the data set includes `artist_alias.txt`, which maps artist IDs that are known misspellings or variants to the canonical ID of that artist.

The `artist_data.txt` file then provides a map from the canonical artist ID to the name of the artist.

## Necessary Package Imports

In [1]:
!pip install pyspark

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [2]:
from pyspark.mllib.recommendation import *
import random
from operator import *


## Loading data

Load the three datasets into RDDs and name them `artistData`, `artistAlias`, and `userArtistData`. View the README, or the files themselves, to see how this data is formated. Some of the files have tab delimeters while some have space delimiters. Make sure that your `userArtistData` RDD contains only the canonical artist IDs.

In [3]:
from pyspark import SparkContext
sc = SparkContext()

#Our Project:
when user listens to a song, the record in added to a file. That file is the dataset.
we mapreduce it
and then use it for recommendation

## MapReduce

In [4]:
#read from the user's listen history
redundant_lst= [['3rMyMv8EjKXoPnaRo2hdJN', "Stuntin' Like My Daddy - Street", 'Birdman, Lil Wayne', '5', 'pop rap'],
['6lV2MSQmRIkycDScNtrBXO', 'Airplanes (feat. Hayley Williams of Paramore)', 'B.o.B, Hayley Williams', '1', 'pop rap'],
['4hrae8atte6cRlSC9a7VCO', 'Always On Time', 'Ja Rule, Ashanti', '2', 'pop rap'],
['03tqyYWC9Um2ZqU0ZN849H', 'No Hands [feat. Roscoe Dash and Wale]', 'Waka Flocka Flame', '3', 'pop rap'],
['66TRwr5uJwPt15mfFkzhbi', 'Crank That (Soulja Boy)', 'Soulja Boy', '4', 'pop rap'],
['6t2ubAB4iSYOuIpRAOGd4t', 'Cake', 'Flo Rida, 99 Percent', '8', 'pop rap'],
['6t2ubAB4iSYOuIpRAOGd4t', 'Cake', 'Flo Rida, 99 Percent', '8', 'pop rap'],
['66TRwr5uJwPt15mfFkzhbi', 'Crank That (Soulja Boy)', 'Soulja Boy', '4', 'pop rap'],
['66TRwr5uJwPt15mfFkzhbi', 'Crank That (Soulja Boy)', 'Soulja Boy', '4', 'pop rap'],
['66TRwr5uJwPt15mfFkzhbi', 'Crank That (Soulja Boy)', 'Soulja Boy', '4', 'pop rap'],
['3rMyMv8EjKXoPnaRo2hdJN', "Stuntin' Like My Daddy - Street", 'Birdman, Lil Wayne', '5', 'pop rap'],
['3rMyMv8EjKXoPnaRo2hdJN', "Stuntin' Like My Daddy - Street", 'Birdman, Lil Wayne', '5', 'pop rap'],
['3rMyMv8EjKXoPnaRo2hdJN', "Stuntin' Like My Daddy - Street", 'Birdman, Lil Wayne', '5', 'pop rap'],
['7wqSzGeodspE3V6RBD5W8L', 'See You Again (feat. Charlie Puth)', 'Wiz Khalifa, Charlie Puth', '7', 'pop rap'],
['6t2ubAB4iSYOuIpRAOGd4t', 'Cake', 'Flo Rida, 99 Percent', '8', 'pop rap'],
['5IZc3KIVFhjzJ0L2kiXzUl', 'Promise', 'Kid Ink, Fetty Wap', '9', 'pop rap'],
['3B7i9OKRRmIsSBHEbJz58Y', 'Grind With Me', 'Pretty Ricky', '10', 'pop rap'],
['3tvWMBIblzT5FSjKtIeRR1', 'Whatever You Like', 'T.I.', '11', 'pop rap'],
['1hWYT0w2R0J19rlVkiez7X', 'Battle Scars', 'Lupe Fiasco, Guy Sebastian', '12', 'pop rap'],
['14B2bUopOga5V3ypld7d6n', 'Suga Suga', 'Baby Bash, Frankie J', '13', 'pop rap'],
['3B7i9OKRRmIsSBHEbJz58Y', 'Grind With Me', 'Pretty Ricky', '10', 'pop rap'],
['3B7i9OKRRmIsSBHEbJz58Y', 'Grind With Me', 'Pretty Ricky', '10', 'pop rap'],
['14B2bUopOga5V3ypld7d6n', 'Suga Suga', 'Baby Bash, Frankie J', '13', 'pop rap'],
['14B2bUopOga5V3ypld7d6n', 'Suga Suga', 'Baby Bash, Frankie J', '13', 'pop rap'],
['14B2bUopOga5V3ypld7d6n', 'Suga Suga', 'Baby Bash, Frankie J', '13', 'pop rap'],
['66TRwr5uJwPt15mfFkzhbi', 'Crank That (Soulja Boy)', 'Soulja Boy', '4', 'pop rap']
]

redundant_lst

with open('dummy.txt','a') as f:
  for i in redundant_lst:
    f.write(str(i)+"\n")


In [5]:
from pyspark.sql import SparkSession

dummyData = sc.textFile("dummy.txt")
# a = dummyData.map(lambda x:(x,1)).reduceByKey(lambda x,y : x + y).collect()
a = dummyData.map(lambda x:(x,1)).reduceByKey(lambda x,y : x + y).collect()


fin_list = []

for i in a:

  res = i[0].strip('][').replace("'","").split(', ')
  while res[4].isdigit():
    res[2]+="&"+res[3]
    del res[3]
  
  res.append(i[1])
  fin_list.append(res)


schema = ["spotify_id","title","artist(s)","popularity","genre", "count"]

rdd = sc.parallelize(fin_list)
spark = SparkSession(sc)

rdd.toDF(schema).show()

+--------------------+--------------------+--------------------+----------+-------+-----+
|          spotify_id|               title|           artist(s)|popularity|  genre|count|
+--------------------+--------------------+--------------------+----------+-------+-----+
|4hrae8atte6cRlSC9...|      Always On Time|     Ja Rule&Ashanti|         2|pop rap|    2|
|03tqyYWC9Um2ZqU0Z...|No Hands [feat. R...|   Waka Flocka Flame|         3|pop rap|    2|
|3B7i9OKRRmIsSBHEb...|       Grind With Me|        Pretty Ricky|        10|pop rap|    6|
|1hWYT0w2R0J19rlVk...|        Battle Scars|Lupe Fiasco&Guy S...|        12|pop rap|    2|
|14B2bUopOga5V3ypl...|           Suga Suga| Baby Bash&Frankie J|        13|pop rap|    8|
|3rMyMv8EjKXoPnaRo...|"Stuntin Like My ...|   Birdman&Lil Wayne|         5|pop rap|    8|
|6lV2MSQmRIkycDScN...|Airplanes (feat. ...|B.o.B&Hayley Will...|         1|pop rap|    2|
|66TRwr5uJwPt15mfF...|Crank That (Soulj...|          Soulja Boy|         4|pop rap|   10|
|6t2ubAB4i

In [6]:
rdd.sortBy(lambda x:x[-1], ascending=False).toDF(schema).show()

+--------------------+--------------------+--------------------+----------+-------+-----+
|          spotify_id|               title|           artist(s)|popularity|  genre|count|
+--------------------+--------------------+--------------------+----------+-------+-----+
|66TRwr5uJwPt15mfF...|Crank That (Soulj...|          Soulja Boy|         4|pop rap|   10|
|14B2bUopOga5V3ypl...|           Suga Suga| Baby Bash&Frankie J|        13|pop rap|    8|
|3rMyMv8EjKXoPnaRo...|"Stuntin Like My ...|   Birdman&Lil Wayne|         5|pop rap|    8|
|3B7i9OKRRmIsSBHEb...|       Grind With Me|        Pretty Ricky|        10|pop rap|    6|
|6t2ubAB4iSYOuIpRA...|                Cake| Flo Rida&99 Percent|         8|pop rap|    6|
|4hrae8atte6cRlSC9...|      Always On Time|     Ja Rule&Ashanti|         2|pop rap|    2|
|03tqyYWC9Um2ZqU0Z...|No Hands [feat. R...|   Waka Flocka Flame|         3|pop rap|    2|
|1hWYT0w2R0J19rlVk...|        Battle Scars|Lupe Fiasco&Guy S...|        12|pop rap|    2|
|6lV2MSQmR