# Music Recommender System using Apache Spark and Python
**Estimated time: 8hrs**

## Description

For this project, you are to create a recommender system that will recommend new musical artists to a user based on their listening history. Suggesting different songs or musical artists to a user is important to many music streaming services, such as Pandora and Spotify. In addition, this type of recommender system could also be used as a means of suggesting TV shows or movies to a user (e.g., Netflix). 

To create this system you will be using Spark and the collaborative filtering technique. The instructions for completing this project will be laid out entirely in this file. You will have to implement any missing code as well as answer any questions.

**Submission Instructions:** 
* Add all of your updates to this IPython file and do not clear any of the output you get from running your code.
* Upload this file onto moodle.

## Datasets

You will be using some publicly available song data from audioscrobbler, which can be found [here](http://www-etud.iro.umontreal.ca/~bergstrj/audioscrobbler_data.html). However, we modified the original data files so that the code will run in a reasonable time on a single machine. The reduced data files have been suffixed with `_small.txt` and contains only the information relevant to the top 50 most prolific users (highest artist play counts).

The original data file `user_artist_data.txt` contained about 141,000 unique users, and 1.6 million unique artists. About 24.2 million users’ plays of artists are recorded, along with their count.

Note that when plays are scribbled, the client application submits the name of the artist being played. This name could be misspelled or nonstandard, and this may only be detected later. For example, "The Smiths", "Smiths, The", and "the smiths" may appear as distinct artist IDs in the data set, even though they clearly refer to the same artist. So, the data set includes `artist_alias.txt`, which maps artist IDs that are known misspellings or variants to the canonical ID of that artist.

The `artist_data.txt` file then provides a map from the canonical artist ID to the name of the artist.

## Necessary Package Imports

In [2]:
!pip install pyspark

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting pyspark
  Downloading pyspark-3.2.1.tar.gz (281.4 MB)
[K     |████████████████████████████████| 281.4 MB 33 kB/s 
[?25hCollecting py4j==0.10.9.3
  Downloading py4j-0.10.9.3-py2.py3-none-any.whl (198 kB)
[K     |████████████████████████████████| 198 kB 56.4 MB/s 
[?25hBuilding wheels for collected packages: pyspark
  Building wheel for pyspark (setup.py) ... [?25l[?25hdone
  Created wheel for pyspark: filename=pyspark-3.2.1-py2.py3-none-any.whl size=281853642 sha256=47c444461995187ff88e742ef450faaf3e3fee9f321b58791b8b1c9854a2b51b
  Stored in directory: /root/.cache/pip/wheels/9f/f5/07/7cd8017084dce4e93e84e92efd1e1d5334db05f2e83bcef74f
Successfully built pyspark
Installing collected packages: py4j, pyspark
Successfully installed py4j-0.10.9.3 pyspark-3.2.1


In [3]:
from pyspark.mllib.recommendation import *
import random
from operator import *


## Loading data

Load the three datasets into RDDs and name them `artistData`, `artistAlias`, and `userArtistData`. View the README, or the files themselves, to see how this data is formated. Some of the files have tab delimeters while some have space delimiters. Make sure that your `userArtistData` RDD contains only the canonical artist IDs.

In [None]:
from pyspark import SparkContext
sc = SparkContext()

In [82]:
#Loading data into RDD
# artistData = sc.textFile("artist_data_small.txt")
artistData = sc.textFile("artist_data_small_new.txt")

artistAlias = sc.textFile("artist_alias_small.txt")
userArtistData = sc.textFile("user_artist_data_small.txt")

alias_data = artistAlias.collect()
user_data = userArtistData.collect()
artist_canonical_dict = {}
user_list = []

for line in alias_data:
    artist_record = line.split("\t")
    artist_canonical_dict[artist_record[0]] = artist_record[1]

#Function to get canonical artist names
def canonicalArtistID(line):
    line = line.split(" ")
    
    if line[1] in artist_canonical_dict:
        return (int(line[0]),int(artist_canonical_dict[line[1]]),int(line[2]))
    else:
        return (int(line[0]),int(line[1]),int(line[2]))
    
#Getting canonical artist names        
userArtistData = userArtistData.map(canonicalArtistID)

#Creating allArtists dataset to be used later during model evaluation process
allArtists = userArtistData.map(lambda x:(x[1])).collect()
print(allArtists)


allArtists = list(set(allArtists))

[1000010, 1000049, 1000056, 1000062, 1000094, 1000112, 1000113, 1000114, 1000123, 1000130, 1000139, 1000241, 1000263, 1000289, 1000305, 1000320, 1000340, 1000427, 1000428, 1000433, 1000445, 1000527, 1000617, 1000632, 1000676, 1000790, 1000877, 1000890, 1000926, 1000999, 1001007, 1001027, 1001066, 1001068, 1001107, 1001117, 1001130, 1001198, 1001233, 1001249, 1001412, 1001439, 1001482, 1001487, 1001523, 1001530, 1001779, 1001809, 1001828, 1001894, 1002095, 1002128, 1002204, 1002216, 1002223, 1002225, 1002269, 1002289, 1002326, 1002560, 1002584, 1034635, 1002723, 1002734, 1002742, 1002850, 1002912, 1003159, 1003176, 1003241, 1003250, 1003568, 1003673, 1003681, 1003689, 1003727, 1003794, 1003853, 1003928, 1004201, 1004226, 1004274, 1004278, 1004294, 1004296, 1004301, 1004342, 1004392, 1004484, 1004574, 1005222, 1005363, 1005990, 1006029, 1006113, 1006123, 1006185, 1006229, 1006230, 1006234, 1006245, 1006287, 1006354, 1006411, 1006594, 1006597, 1006607, 1006628, 1006631, 1006633, 1006657, 

#Our Project:
when user listens to a song, the record in added to a file. That file is the dataset.
we mapreduce it
and then use it for recommendation

## MapReduce

In [206]:
#read from the user's listen history
redundant_lst= [['3rMyMv8EjKXoPnaRo2hdJN', "Stuntin' Like My Daddy - Street", 'Birdman, Lil Wayne', '5', 'pop rap'],
['6lV2MSQmRIkycDScNtrBXO', 'Airplanes (feat. Hayley Williams of Paramore)', 'B.o.B, Hayley Williams', '1', 'pop rap'],
['4hrae8atte6cRlSC9a7VCO', 'Always On Time', 'Ja Rule, Ashanti', '2', 'pop rap'],
['03tqyYWC9Um2ZqU0ZN849H', 'No Hands [feat. Roscoe Dash and Wale]', 'Waka Flocka Flame', '3', 'pop rap'],
['66TRwr5uJwPt15mfFkzhbi', 'Crank That (Soulja Boy)', 'Soulja Boy', '4', 'pop rap'],
['6t2ubAB4iSYOuIpRAOGd4t', 'Cake', 'Flo Rida, 99 Percent', '8', 'pop rap'],
['6t2ubAB4iSYOuIpRAOGd4t', 'Cake', 'Flo Rida, 99 Percent', '8', 'pop rap'],
['66TRwr5uJwPt15mfFkzhbi', 'Crank That (Soulja Boy)', 'Soulja Boy', '4', 'pop rap'],
['66TRwr5uJwPt15mfFkzhbi', 'Crank That (Soulja Boy)', 'Soulja Boy', '4', 'pop rap'],
['66TRwr5uJwPt15mfFkzhbi', 'Crank That (Soulja Boy)', 'Soulja Boy', '4', 'pop rap'],
['3rMyMv8EjKXoPnaRo2hdJN', "Stuntin' Like My Daddy - Street", 'Birdman, Lil Wayne', '5', 'pop rap'],
['3rMyMv8EjKXoPnaRo2hdJN', "Stuntin' Like My Daddy - Street", 'Birdman, Lil Wayne', '5', 'pop rap'],
['3rMyMv8EjKXoPnaRo2hdJN', "Stuntin' Like My Daddy - Street", 'Birdman, Lil Wayne', '5', 'pop rap'],
['7wqSzGeodspE3V6RBD5W8L', 'See You Again (feat. Charlie Puth)', 'Wiz Khalifa, Charlie Puth', '7', 'pop rap'],
['6t2ubAB4iSYOuIpRAOGd4t', 'Cake', 'Flo Rida, 99 Percent', '8', 'pop rap'],
['5IZc3KIVFhjzJ0L2kiXzUl', 'Promise', 'Kid Ink, Fetty Wap', '9', 'pop rap'],
['3B7i9OKRRmIsSBHEbJz58Y', 'Grind With Me', 'Pretty Ricky', '10', 'pop rap'],
['3tvWMBIblzT5FSjKtIeRR1', 'Whatever You Like', 'T.I.', '11', 'pop rap'],
['1hWYT0w2R0J19rlVkiez7X', 'Battle Scars', 'Lupe Fiasco, Guy Sebastian', '12', 'pop rap'],
['14B2bUopOga5V3ypld7d6n', 'Suga Suga', 'Baby Bash, Frankie J', '13', 'pop rap'],
['3B7i9OKRRmIsSBHEbJz58Y', 'Grind With Me', 'Pretty Ricky', '10', 'pop rap'],
['3B7i9OKRRmIsSBHEbJz58Y', 'Grind With Me', 'Pretty Ricky', '10', 'pop rap'],
['14B2bUopOga5V3ypld7d6n', 'Suga Suga', 'Baby Bash, Frankie J', '13', 'pop rap'],
['14B2bUopOga5V3ypld7d6n', 'Suga Suga', 'Baby Bash, Frankie J', '13', 'pop rap'],
['14B2bUopOga5V3ypld7d6n', 'Suga Suga', 'Baby Bash, Frankie J', '13', 'pop rap'],
['66TRwr5uJwPt15mfFkzhbi', 'Crank That (Soulja Boy)', 'Soulja Boy', '4', 'pop rap']
]

redundant_lst

with open('dummy.txt','a') as f:
  for i in redundant_lst:
    f.write(str(i)+"\n")


In [208]:
from pyspark.sql import SparkSession

dummyData = sc.textFile("dummy.txt")
# a = dummyData.map(lambda x:(x,1)).reduceByKey(lambda x,y : x + y).collect()
a = dummyData.map(lambda x:(x,1)).reduceByKey(lambda x,y : x + y).collect()


fin_list = []

for i in a:

  res = i[0].strip('][').replace("'","").split(', ')
  while res[4].isdigit():
    res[2]+="&"+res[3]
    del res[3]
  
  res.append(i[1])
  fin_list.append(res)


schema = ["spotify_id","title","artist(s)","popularity","genre", "count"]

rdd = sc.parallelize(fin_list)
spark = SparkSession(sc)

rdd.toDF(schema).show()

+--------------------+--------------------+--------------------+----------+-------+-----+
|          spotify_id|               title|           artist(s)|popularity|  genre|count|
+--------------------+--------------------+--------------------+----------+-------+-----+
|4hrae8atte6cRlSC9...|      Always On Time|     Ja Rule&Ashanti|         2|pop rap|    1|
|03tqyYWC9Um2ZqU0Z...|No Hands [feat. R...|   Waka Flocka Flame|         3|pop rap|    1|
|3B7i9OKRRmIsSBHEb...|       Grind With Me|        Pretty Ricky|        10|pop rap|    3|
|1hWYT0w2R0J19rlVk...|        Battle Scars|Lupe Fiasco&Guy S...|        12|pop rap|    1|
|14B2bUopOga5V3ypl...|           Suga Suga| Baby Bash&Frankie J|        13|pop rap|    4|
|3rMyMv8EjKXoPnaRo...|"Stuntin Like My ...|   Birdman&Lil Wayne|         5|pop rap|    4|
|6lV2MSQmRIkycDScN...|Airplanes (feat. ...|B.o.B&Hayley Will...|         1|pop rap|    1|
|66TRwr5uJwPt15mfF...|Crank That (Soulj...|          Soulja Boy|         4|pop rap|    5|
|6t2ubAB4i

In [197]:
rdd.sortBy(lambda x:x[-1], ascending=False).toDF(schema).show()

+--------------------+--------------------+--------------------+----------+-------+-----+
|          spotify_id|               title|           artist(s)|popularity|  genre|count|
+--------------------+--------------------+--------------------+----------+-------+-----+
|66TRwr5uJwPt15mfF...|Crank That (Soulj...|          Soulja Boy|         4|pop rap|    3|
|3B7i9OKRRmIsSBHEb...|       Grind With Me|        Pretty Ricky|        10|pop rap|    2|
|3rMyMv8EjKXoPnaRo...|"Stuntin Like My ...|   Birdman&Lil Wayne|         5|pop rap|    2|
|4hrae8atte6cRlSC9...|      Always On Time|     Ja Rule&Ashanti|         2|pop rap|    1|
|03tqyYWC9Um2ZqU0Z...|No Hands [feat. R...|   Waka Flocka Flame|         3|pop rap|    1|
|1hWYT0w2R0J19rlVk...|        Battle Scars|Lupe Fiasco&Guy S...|        12|pop rap|    1|
|14B2bUopOga5V3ypl...|           Suga Suga| Baby Bash&Frankie J|        13|pop rap|    1|
|6lV2MSQmRIkycDScN...|Airplanes (feat. ...|B.o.B&Hayley Will...|         1|pop rap|    1|
|7wqSzGeod

## Data Exploration

In the blank below, write some code that with find the users' total play counts. Find the three users with the highest number of total play counts (sum of all counters) and print the user ID, the total play count, and the mean play count (average number of times a user played an artist). Your output should look as follows:
```
User 1059637 has a total play count of 674412 and a mean play count of 1878.
User 2064012 has a total play count of 548427 and a mean play count of 9455.
User 2069337 has a total play count of 393515 and a mean play count of 1519.
```


In [29]:
artist_data = artistAlias.collect()
    
user_play_count = {}
user_count_number = {}

for line in user_data:
     user_record = line.split()
     if user_record[0] in user_play_count:
         user_play_count[str(user_record[0])] = user_play_count[user_record[0]] + int(user_record[2])
         user_count_number[str(user_record[0])] = user_count_number[user_record[0]] + 1
     else:
         user_play_count[str(user_record[0])] = int(user_record[2])
         user_count_number[str(user_record[0])] = 1
top = 0
maximum = 2


for k in user_play_count:
  if top > maximum:       
    break
  print ('User ' + str(k) + ' has a total play count of ' + str(user_play_count[k]) )
  top += 1
# print(sorted(user_play_count.iteritems(), reverse = True))
# for word, count in sorted(user_play_count.iteritems(), key=lambda (k,v): (v,k), reverse = True):
#      if top > maximum:
#         break
#      print ('User ' + str(word) + ' has a total play count of ' + str(count) + ' and a mean play count of ' + str(count/user_count_number[word]) )
#      top += 1

User 1059637 has a total play count of 674412
User 2064012 has a total play count of 548427
User 2069337 has a total play count of 393515


####  Splitting Data for Testing

Use the [randomSplit](http://spark.apache.org/docs/latest/api/python/pyspark.html#pyspark.RDD.randomSplit) function to divide the data (`userArtistData`) into:
* A training set, `trainData`, that will be used to train the model. This set should constitute 40% of the data.
* A validation set, `validationData`, used to perform parameter tuning. This set should constitute 40% of the data.
* A test set, `testData`, used for a final evaluation of the model. This set should constitute 20% of the data.

Use a random seed value of 13. Since these datasets will be repeatedly used you will probably want to persist them in memory using the [cache](http://spark.apache.org/docs/latest/api/python/pyspark.html#pyspark.RDD.cache) function.

In addition, print out the first 3 elements of each set as well as their sizes; if you created these sets correctly, your output should look as follows:
```
[(1059637, 1000049, 1), (1059637, 1000056, 1), (1059637, 1000113, 5)]
[(1059637, 1000010, 238), (1059637, 1000062, 11), (1059637, 1000112, 423)]
[(1059637, 1000094, 1), (1059637, 1000130, 19129), (1059637, 1000139, 4)]
19817
19633
10031
```

In [30]:
#Splitting the data into train, test and cross validation
trainData, validationData, testData = userArtistData.randomSplit([4, 4, 2], 13)

print (trainData.take(3))
print (validationData.take(3))
print( testData.take(3))
print (trainData.count())
print (validationData.count())
print (testData.count())

#Caching and creating ratings object
trainData = trainData.map(lambda l: Rating(*l)).cache()
validationData = validationData.map(lambda l: Rating(*l)).cache()
testData = testData.map(lambda l: Rating(*l)).cache()

[(1059637, 1000049, 1), (1059637, 1000056, 1), (1059637, 1000114, 2)]
[(1059637, 1000010, 238), (1059637, 1000062, 11), (1059637, 1000123, 2)]
[(1059637, 1000094, 1), (1059637, 1000112, 423), (1059637, 1000113, 5)]
19769
19690
10022


## The Recommender Model

For this project, we will train the model with implicit feedback. You can read more information about this from the collaborative filtering page: [http://spark.apache.org/docs/latest/mllib-collaborative-filtering.html](http://spark.apache.org/docs/latest/mllib-collaborative-filtering.html). The [function you will be using](http://spark.apache.org/docs/latest/api/python/pyspark.mllib.html#pyspark.mllib.recommendation.ALS.trainImplicit) has a few tunable parameters that will affect how the model is built. Therefore, to get the best model, we will do a small parameter sweep and choose the model that performs the best on the validation set

Therefore, we must first devise a way to evaluate models. Once we have a method for evaluation, we can run a parameter sweep, evaluate each combination of parameters on the validation data, and choose the optimal set of parameters. The parameters then can be used to make predictions on the test data.

### Model Evaluation

Although there may be several ways to evaluate a model, we will use a simple method here. Suppose we have a model and some dataset of *true* artist plays for a set of users. This model can be used to predict the top X artist recommendations for a user and these recommendations can be compared the artists that the user actually listened to (here, X will be the number of artists in the dataset of *true* artist plays). Then, the fraction of overlap between the top X predictions of the model and the X artists that the user actually listened to can be calculated. This process can be repeated for all users and an average value returned.

For example, suppose a model predicted [1,2,4,8] as the top X=4 artists for a user. Suppose, that user actually listened to the artists [1,3,7,8]. Then, for this user, the model would have a score of 2/4=0.5. To get the overall score, this would be performed for all users, with the average returned.

**NOTE: when using the model to predict the top-X artists for a user, do not include the artists listed with that user in the training data.**

Name your function `modelEval` and have it take a model (the output of ALS.trainImplicit) and a dataset as input. For parameter tuning, the dataset parameter should be set to the validation data (`validationData`). After parameter tuning, the model can be evaluated on the test data (`testData`).

In [31]:
from pyspark.mllib.recommendation import ALS, MatrixFactorizationModel, Rating
from collections import defaultdict

#model evaluation function
def modelEval(model, dataset):
    global trainData
    global allArtists
    
    #Getting nonTrainArtists for each user
    userArtists = defaultdict(list)
    
    for data in trainData.collect():
        userArtists[data[0]].append(data[1])
        
    cvList = []
        
    for key in userArtists.keys():
        userArtists[key] = list(set(allArtists) - set(userArtists[key]))
        for artist in userArtists[key]:
            cvList.append((key, artist))
    
    #Creating user,nonTrainArtists RDD
    cvData = sc.parallelize(cvList)
    
    userOriginal = dataset.map(lambda x:(x.user, (x.product, x.rating))).groupByKey().collect()
    
    #prediction on the user, nonTrainArtists RDD
    predictions = model.predictAll(cvData)
    userPredictions = predictions.map(lambda x:(x.user, (x.product, x.rating))).groupByKey().collect()
    original = {}
    predictions = {}
    
    #Getting top X artists for each user
    for line in userOriginal:
        original[line[0]] = sorted(line[1], key=lambda x:x[1], reverse = True)
        
    for line in userPredictions:
        predictions[line[0]] = sorted(line[1], key=lambda x:x[1], reverse = True)
        
    similarity = []
    
    for key in userOriginal:
        similar = 0.0
        
        pred = predictions[key[0]]
        org = original[key[0]]
            
        for value in org:
            for item in pred[0:len(org)]:
                if (value[0] == item[0]):
                    similar += 1
                    break
                    
        #Similarity calculation        
        similarity.append(float(similar/len(org)))        
        
    string = "The model score for rank " + str(rank) + " is " + str(float(sum(similarity)/len(similarity)))    
    print (string)

### Model Construction

Now we can build the best model possibly using the validation set of data and the `modelEval` function. Although, there are a few parameters we could optimize, for the sake of time, we will just try a few different values for the [rank parameter](http://spark.apache.org/docs/latest/mllib-collaborative-filtering.html#collaborative-filtering) (leave everything else at its default value, **except make `seed`=345**). Loop through the values [2, 10, 20] and figure out which one produces the highest scored based on your model evaluation function.

Note: this procedure may take several minutes to run.

For each rank value, print out the output of the `modelEval` function for that model. Your output should look as follows:
```
The model score for rank 2 is 0.090431
The model score for rank 10 is 0.095294
The model score for rank 20 is 0.090248
```

In [32]:
#Model evaluation through different rank parameters
rank_list = [2, 10, 20]

for rank in rank_list:
    model = ALS.trainImplicit(trainData, rank, seed=345)
    modelEval(model,validationData)

The model score for rank 2 is 0.08685469384370723
The model score for rank 10 is 0.09640706020783121
The model score for rank 20 is 0.08526939704391619


Now, using the bestModel, we will check the results over the test data. Your result should be ~`0.0507`.

In [16]:
bestModel = ALS.trainImplicit(trainData, rank=10, seed=345)
modelEval(bestModel, testData)

The model score for rank 20 is 0.062383739358596715


## Trying Some Artist Recommendations
Using the best model above, predict the top 5 artists for user `1059637` using the [recommendProducts](http://spark.apache.org/docs/1.5.2/api/python/pyspark.mllib.html#pyspark.mllib.recommendation.MatrixFactorizationModel.recommendProducts) function. Map the results (integer IDs) into the real artist name using `artistAlias`. Print the results. The output should look as follows:
```
Artist 0: Brand New
Artist 1: Taking Back Sunday
Artist 2: Evanescence
Artist 3: Elliott Smith
Artist 4: blink-182
```

In [73]:
ratings = bestModel.recommendProducts(1059637, 20)
print(len(ratings))

20


In [80]:
import re
artist_data = artistData.collect()
mood = "happy"
artist_names_dict = {}
recomm_num = 5

for line in artist_data:
    pattern = re.match( r'(\d+)(\s+)(.*)', line)
    if pattern.group(3).split(" ")[-1]==mood:
      artist_names_dict[str(pattern.group(1))] = pattern.group(3)

print(dict(list(artist_names_dict.items())[0: 10]) )
print(dict(list(artist_canonical_dict.items())[0: 10]) )
if '1002095' in artist_canonical_dict:
  print("yes")
else:
  print("no")

import json



with open('convert.txt', 'w') as convert_file:
	convert_file.write(json.dumps(artist_canonical_dict))

recommed = 0

print(artist_canonical_dict)
for i in range(0, len(ratings)):
  print(ratings[i].product)
  print(artist_canonical_dict[str(ratings[i].product)])


# i=-1
# while(i<recomm_num):
# # for i in range(0,50):
#   i+=1
#   try:
#       if str(ratings[i].product) in artist_canonical_dict:
#           # print("line try if")
#           artist_id = artist_canonical_dict[str(ratings[i].product)]
#           #print("Artist " + str(i) + ": "  + str(artist_names_dict[str(artist_id)]))
#           print(f'Know more from https://www.google.com/search?q={str(artist_names_dict[str(artist_id)]).replace(" ","+").strip()}+songs')
#       else:
#           # print("line try else")

#           print( "Artist " + str(i) + ": " + str(artist_names_dict[str(ratings[i].product)]))
#           print(f'Know more from https://www.google.com/search?q={str(artist_names_dict[str(ratings[i].product)]).replace(" ","+")}+songs')
#           # print(f'Listen in spotify https://open.spotify.com/track/spotifyid')

#   except KeyError:
#     i-=1
#     pass


{'1240185': 'Lexy & K. Paul happy', '6671680': 'Armstrong, Louis & His Hot Five happy', '6990766': 'Phil Hendrie - 11/06/98 happy', '1240554': 'Ami Yoshida happy', '10023740': 'Red & Blue feat. Cathy Dennis happy', '1240589': 'Sebastian Bach & Friends happy', '1240603': 'The Wake happy', '1238620': 'Juno Reactor, Don Davis happy', '6671734': 'Alternate Main Title #3 happy', '1127179': 'Shotgun Remedy happy'}
{'1027859': '1252408', '1017615': '668', '6745885': '1268522', '1018110': '1018110', '1014609': '1014609', '6713071': '2976', '1014175': '1014175', '1008798': '1008798', '1013851': '1013851', '6696814': '1030672'}
no
{'1027859': '1252408', '1017615': '668', '6745885': '1268522', '1018110': '1018110', '1014609': '1014609', '6713071': '2976', '1014175': '1014175', '1008798': '1008798', '1013851': '1013851', '6696814': '1030672', '1036747': '1239516', '1278781': '1021980', '2035175': '1007565', '1327067': '1308328', '2006482': '1140837', '1314530': '1237371', '1160800': '1345290', '12

KeyError: ignored