# Visualizing User Subsciptions and Building a Reccomendation System
To wrap up this project, I will use the subreddit vector representations (possibly in 3 dimensions) to reccomend subreddits give a user's current subscriptions. To start, I though it would make sense to plot a subscription set in the vector space. Here is an example of what that looks like (taken from one of my own reddit accounts)
![Example User Subscriptions Plotted in 2d Space](figures/user_subscriptions.png)
This particular account has a few subscriptions in the technology/programming region (far right), and a couple in other places. This makes it a pretty good representation of a typical user.

## Reccomender Model
### Hypothesis
The subreddit reccomendation system is built around the idea that my subreddit reduced vector representations in 2 (or 3) - dimensional space embody similarities in content and/or discussion on the subreddits themselves, which I have observed to be generally true – excepting a few anomalies – when using the vectorization and dimensionality reduction techniques described in previous entries. Furthermore, the reccomender will use simple Euclidian distance as the metric by which to connect existing subscriptions to corrresponding reccomendations.
### Versions
In keeping true to my original inspiration for the project, I will implement two versions of the reccomender: One to suggest similar but undisovered subreddits based on existing subscriptions, and another to suggest those communities that fall far way from a user's existing subscription domain.
### System
The reccomender system is fairly simple, but ultimately both versions of the system should look something like this:


![Reccomender Pipeline](figures/reccomender.png)

Where the encoder and decoder simply find vector data from subreddit names, and vice-versa.

### Implementation – Similarity Version

In [123]:
# load the vector mappings
data_dir = 'data/dataset_medium'
import pandas as pd
vecs_map = pd.read_csv(data_dir + '/vecs.csv')

In [124]:
import numpy as np
import numpy.linalg as la

# map subredit names to cartesian points
def encode_subs(sub_names):
    points_list = []
    for i, name in enumerate(sub_names):
        data_row = vecs_info.loc[vecs_info['name'] == name]
        if data_row.empty == False:
            point = data_row.loc[:,'x':'y'].values
            points_list.append(point)
            subscription_point_names.append(data_row['name'].values[0])
    points = np.array(points_list).reshape((len(points_list),point.shape[1]))
    return points

In [125]:
# return a list of subreddit names,
# sorted by their distance to input sub
def sort_dists(sub):
    point_mapping = encode_subs([sub])
    
    if len(point_mapping) > 0:
        point_mapping = encode_subs([sub])[0]
    else:
        print('subreddit does not match any in the database')
        return []
    
    v = point_mapping.T
    X = vecs_map.loc[:,'x':'y'].values
    D = np.subtract(X, v)
    
    dists = la.norm(D,axis=1)
    dists_map = pd.DataFrame({'name': vecs_map['name'], 'dist': dists}, columns=['name','dist'])
    dists = dists_map.sort_values('dist')['name'].values[1:len(dists) + 1] # remove input sub
    return dists

sort_dists('learnprogramming')[0:5]

array(['coding', 'softwaredevelopment', 'Web_Development', 'FreeCodeCamp',
       'webdev'], dtype=object)

The similarity reccomender works as expected. In a deployed version of the model, one might allow a user to view reccomendations in the order supplied by the function, limited by user preference and easily 'skippable'.