# Collaborativ Filtering

Ziel ist, eine Tabelle zu haben, mit verschiedenen Playlisten in den Zeilen und Songs in den Spalten. Die Zellen sagen aus, ob ein Song in einer Playlist ist oder nicht (0 oder 1).

Die Tabelle kann dann für Collab Filtering verwendet werden.

Da unsere Datnebank in einem anderen Netzwerk liegt, wird zunächst ein Tunnel über ein Terminal benötigt:

```ssh <username>@login1.mi.hs-rm.de -L 9001:db.intern.mi.hs-rm.de:5432```

In [1]:
import psycopg2
import pandas as pd
from getpass import getpass

## Generate Collab Filtering Table

Also brauchen wir zunächst einmal die Tabelle mit dem Inhalt, welches Lied in welcher Playlist ist.

### Datenbankverbindung

In [2]:
user = input("Bitte User für DB eingeben")
pswd = getpass("Bitte User für DB eingeben")
conn = psycopg2.connect(f"dbname='orent001_spotify_test' user='{user}' host='localhost' port='9001' password='{pswd}'")

In [3]:
# query = "SELECT playlist.name AS playlist, song.track_name AS song FROM playlist INNER JOIN p_enthaelt_s ON p_enthaelt_s.playlist = playlist.playlist_id INNER JOIN song ON song.song_id = p_enthaelt_s.song;"
# playlists = pd.io.sql.read_sql_query(query,conn)
# playlists

### Select Data from SQL and pivot it to a collab usefull table

pd.pivot_table macht aus:           gleich:

```
playlist | song | drinnen               song | A | B | C 
---------+------+--------           playlist |   |   |   
 0       | A    | 1                 ---------+---+---+---
 0       | B    | 1          ==>     0       | 1 | 1 | 1 
 0       | C    | 1                  1       | 0 | 0 | 1 
 1       | C    | 1                  2       | 0 | 1 | 1 
 2       | B    | 1      
 2       | C    | 1      
```

In [4]:
query = "SELECT playlist, song, 1 as drinnen FROM p_enthaelt_s;"
playlist_recommendations = pd.io.sql.read_sql_query(query,conn)
playlist_recommendations = playlist_recommendations.pivot_table(index='playlist', columns='song', values='drinnen')

# um unregemmaessigkeiten zu entfernen, werden alle Songs, die in weniger als 10 Playlisten drinnen sind, rausgeschmissen.
playlist_recommendations = playlist_recommendations.dropna(thresh=10, axis=1).fillna(0)

playlist_recommendations.columns.name = None
playlist_recommendations

Unnamed: 0_level_0,spotify:track:000xQL6tZNLJzIrtIgxqSl,spotify:track:00BuKLSAFkaEkaVAgIMbeA,spotify:track:00LfFm08VWeZwB0Zlm24AT,spotify:track:00fNdIFKoMxxt8Hnm2kAKL,spotify:track:00lNx0OcTJrS3MKHcB80HY,spotify:track:00qOE7OjRl0BpYiCiweZB2,spotify:track:00xR9dHhuaNznqB4FSzOlr,spotify:track:015IsLQFXbEm0f541N2qoX,spotify:track:01A7PEPSnmtixFPfB2UTal,spotify:track:01DidSmPasiXdPhDVuaULL,...,spotify:track:7zBPzAjKAqQpcv8F8GCq5s,spotify:track:7zBQRGpYImAdIZc97FNj3V,spotify:track:7zFXmv6vqI4qOt4yGf3jYZ,spotify:track:7zNM46fo01dCBidY4yGNTZ,spotify:track:7zTx8ePYAmPFQuxP3xlXZn,spotify:track:7zVCrzzEJU7u24sbJPXA5W,spotify:track:7zWj09xkFgA9tcV6YhfU6q,spotify:track:7zkLpY72g6lKQbiHDqri1S,spotify:track:7zsw78LtXUD7JfEwH64HK2,spotify:track:7zxRMhXxJMQCeDDg0rKAVo
playlist,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
10,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
100,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
10000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
100000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
100001,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
100002,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
100003,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Jetzt wollen wir unsere recommendation Matrix erstellen, die wie folget gefildet wird:

```
         |  A    |  B   |  C                  | A | B | C 
---------+-------+------+-------           ---+---+---+---
 0       |  0    | 0    |  0        ==>     A | 1 | ? | ?
 1       | -0.33 | 0.67 | -0.33             B | ? | 1 | ?
 2       | -0.67 | 0.33 |  0.33             C | ? | ? | 1  
 ```
Wert für das Beispiel hab ich nicht ausgerechnet.

Dafür verwende ich die corr function von pandas von DataFrame.

Dauer ca. 8 Minuten bei der Test Datenmenge :(

In [5]:
item_similarity = playlist_recommendations.corr(method='pearson')
item_similarity

Unnamed: 0,spotify:track:000xQL6tZNLJzIrtIgxqSl,spotify:track:00BuKLSAFkaEkaVAgIMbeA,spotify:track:00LfFm08VWeZwB0Zlm24AT,spotify:track:00fNdIFKoMxxt8Hnm2kAKL,spotify:track:00lNx0OcTJrS3MKHcB80HY,spotify:track:00qOE7OjRl0BpYiCiweZB2,spotify:track:00xR9dHhuaNznqB4FSzOlr,spotify:track:015IsLQFXbEm0f541N2qoX,spotify:track:01A7PEPSnmtixFPfB2UTal,spotify:track:01DidSmPasiXdPhDVuaULL,...,spotify:track:7zBPzAjKAqQpcv8F8GCq5s,spotify:track:7zBQRGpYImAdIZc97FNj3V,spotify:track:7zFXmv6vqI4qOt4yGf3jYZ,spotify:track:7zNM46fo01dCBidY4yGNTZ,spotify:track:7zTx8ePYAmPFQuxP3xlXZn,spotify:track:7zVCrzzEJU7u24sbJPXA5W,spotify:track:7zWj09xkFgA9tcV6YhfU6q,spotify:track:7zkLpY72g6lKQbiHDqri1S,spotify:track:7zsw78LtXUD7JfEwH64HK2,spotify:track:7zxRMhXxJMQCeDDg0rKAVo
spotify:track:000xQL6tZNLJzIrtIgxqSl,1.000000,-0.005607,0.021245,-0.003562,0.109532,-0.005130,-0.002907,-0.005046,-0.003562,-0.002907,...,-0.007795,-0.002907,0.027908,-0.004604,0.041288,-0.005607,-0.005046,-0.003904,-0.003562,0.031641
spotify:track:00BuKLSAFkaEkaVAgIMbeA,-0.005607,1.000000,0.033348,0.037933,-0.004736,-0.006820,-0.003865,0.023509,-0.004736,-0.003865,...,-0.010363,-0.003865,-0.007947,-0.006121,-0.005870,-0.007455,0.053727,-0.005190,0.037933,-0.007249
spotify:track:00LfFm08VWeZwB0Zlm24AT,0.021245,0.033348,1.000000,0.027540,-0.006045,-0.008705,-0.004934,0.015222,-0.006045,-0.004934,...,-0.013227,-0.004934,0.050234,-0.007812,0.019653,-0.009516,0.039006,-0.006624,-0.006045,0.012778
spotify:track:00fNdIFKoMxxt8Hnm2kAKL,-0.003562,0.037933,0.027540,1.000000,0.063858,-0.004333,-0.002456,0.090445,-0.003009,-0.002456,...,-0.006584,-0.002456,-0.005049,-0.003889,-0.003729,-0.004736,0.043092,-0.003297,-0.003009,-0.004606
spotify:track:00lNx0OcTJrS3MKHcB80HY,0.109532,-0.004736,-0.006045,0.063858,1.000000,-0.004333,-0.002456,-0.004262,-0.003009,-0.002456,...,0.024325,-0.002456,0.035021,-0.003889,-0.003729,-0.004736,-0.004262,-0.003297,-0.003009,-0.004606
spotify:track:00qOE7OjRl0BpYiCiweZB2,-0.005130,-0.006820,-0.008705,-0.004333,-0.004333,1.000000,0.053494,-0.006137,-0.004333,-0.003536,...,-0.009480,-0.003536,-0.007270,-0.005599,-0.005369,-0.006820,-0.006137,0.037794,-0.004333,-0.006632
spotify:track:00xR9dHhuaNznqB4FSzOlr,-0.002907,-0.003865,-0.004934,-0.002456,-0.002456,0.053494,1.000000,-0.003478,-0.002456,-0.002004,...,-0.005373,-0.002004,-0.004120,-0.003173,-0.003043,-0.003865,-0.003478,-0.002691,-0.002456,-0.003759
spotify:track:015IsLQFXbEm0f541N2qoX,-0.005046,0.023509,0.015222,0.090445,-0.004262,-0.006137,-0.003478,1.000000,-0.004262,-0.003478,...,-0.009325,-0.003478,-0.007151,-0.005508,-0.005282,-0.006708,0.061033,-0.004670,-0.004262,-0.006523
spotify:track:01A7PEPSnmtixFPfB2UTal,-0.003562,-0.004736,-0.006045,-0.003009,-0.003009,-0.004333,-0.002456,-0.004262,1.000000,0.079399,...,0.055234,-0.002456,-0.005049,-0.003889,-0.003729,-0.004736,-0.004262,-0.003297,-0.003009,-0.004606
spotify:track:01DidSmPasiXdPhDVuaULL,-0.002907,-0.003865,-0.004934,-0.002456,-0.002456,-0.003536,-0.002004,-0.003478,0.079399,1.000000,...,-0.005373,-0.002004,-0.004120,-0.003173,-0.003043,-0.003865,0.054489,-0.002691,-0.002456,-0.003759


## Speichern

Die Rating Matrix kann am einfachsten als csv gespeichert werden, um wieder verwendet zu werden.

Dauert mit den Test Daten ca. ne Minute und ist fast 800 mb gross

In [6]:
item_similarity.to_csv('item_similarity.csv')

In [7]:
def get_similar_songs(songURL):
    similar_score = item_similarity[songURL]*0.5
    similar_score = similar_score.sort_values(ascending=False)
    return similar_score

def get_similar_playlist(songs):
    similar_songs = pd.DataFrame()
    for song in songs:
        similar_songs = similar_songs.append(get_similar_songs(song), ignore_index=True)
    similar_songs = similar_songs.sum().sort_values(ascending=False)
    return similar_songs

get_similar_playlist(['spotify:track:7zsw78LtXUD7JfEwH64HK2', 'spotify:track:7zBPzAjKAqQpcv8F8GCq5s'])


ERROR:root:Internal Python error in the inspect module.
Below is the traceback from this internal error.

ERROR:root:Internal Python error in the inspect module.
Below is the traceback from this internal error.

ERROR:root:Internal Python error in the inspect module.
Below is the traceback from this internal error.



Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/pandas/core/indexes/base.py", line 2525, in get_loc
    return self._engine.get_loc(key)
  File "pandas/_libs/index.pyx", line 117, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 139, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1265, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1273, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'spotify:track:7zzSan8uETSRwOsg2CDFpN'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/mi/orent001/.local/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 3343, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-7-a520d4e52b54>", line 13, in <module>
    get_similar_playlist(['spotify:track:7zzSan8uE