Skip to content

A Django Heroku web app that visualizes the "social graph" of a Discord server

Notifications You must be signed in to change notification settings

samclane/SocialGraphWebapp

Repository files navigation

DiscordSocialGraph

https://infinite-sands-83078.herokuapp.com/

Please be patient, I'm using a free-tier Heroku account, so it's very slow to initially respond and to load. You may have to wait around a minute.

Intro

DiscordSocialGraph is my first original Machine Learning project. Originally, I set out to create a Discord bot that would predict the next user to join the server. However, that revealed a more interesting prospect: who is the most popular person on our server? This is an inversion of the original question: which user has the most "draw" to get users to join the server.

I started by creating an addon to the existing Discord Bot framework that I use (RedBot). I wanted to collect as little information as possible (to start simple), so all this module does is log 2 things:

  1. When a user joins a voice channel, it logs the user IDs of the members already in the channel.

  2. When a user mentions another using @

Here's what that file looks like:

timestamp,member,present
1539291278,user3,"['user8', 'user1', 'user0', 'user7', 'user4', 'user6']"
1539291514,user1,"['user5', 'user0', 'user4', 'user2', 'user7', 'user8', 'user9', 'user6']"
1539292425,user0,"['user7', 'user5', 'user3', 'user9', 'user4']"
1539293267,user3,"['user4', 'user0', 'user2', 'user7', 'user6']"
1539293442,user9,['user8']
1539293609,user3,"['user6', 'user0', 'user7', 'user4']"
1539293634,user3,"['user0', 'user6', 'user5', 'user8']"
1539293848,user9,"['user6', 'user4', 'user0', 'user7']"
1539294307,user9,"['user6', 'user1', 'user0']"
1539294408,user6,"['user1', 'user4', 'user0', 'user7', 'user5', 'user2']"
1539295361,user6,"['user0', 'user1', 'user4', 'user3', 'user9', 'user2']"

Note: User IDs have been changed to userX for readability

The bot collects this information and uploads it to a remote PostgresSQL server.

The list of all users with interactions on the server is kept as a one-hot vector. The user is treated as the label. The classifier has to use a probabilistic OneVsAll approach, giving a probability distribution over the entire user-base instead of just the top answer. Using this method across the entire userbase generates a distribution of the one-way probability of a user interacting with another user. This will generate a graph:

Note: This graph uses the Fruchterman Reingold layout algorithm which tries to display the Graph in a spatially meaningful way. Unfortunately that doesn't always work.

The "popularity" or "draw" of the user is the sum of the weights of all the in-degree weights. Currently, this correlates pretty heavily with the number of instances that user appears in the dataset, but not exactly, meaning that some special relationships are being discovered. For example, it's noticed that watersnake_test, my test account, exclusively when I'm already in the server (I only use it when I need to simulate another user besides myself). However, the algorithm can sometimes overfit, drawing strong bonds between my test-account and other accounts I'm actually friends with. It's interesting to try and see the model try and discover who's friends with who.

Three different models were used in the development of this project:

  1. Naive Bayes (sklearn.naive_bayes.GaussianNB)
  2. Support Vector Machine (sklearn.svm.SVC)
  3. Multilayer Perceptron (sklearn.neural_network.MLPClassifier)

Currently, model #3, the MLP, gives the most accurate results. The accuracy of the model is quantified by the area under the Receiver Operating Characteristic curve.

Basically, it describes the correct guesses (True Positive) against the bad guesses (false positive rate) as the threshold for classification is narrowed.

The Webapp

Since this application uses information collected from other people, it was suggested to put it online for all to see. The app is hosted on a free Heroku account, with a Hobby Dyno. The heavy ML lifting is done with a RedisQueue background job. The model is retrained from scratch each time it's restarted, as it doesn't take that long and gives the most recent, accurate result.

Results

From what I've gathered, and from what I can interpret from the metrics returned by the test data set, the classifier is working better than random guessing, by roughly 25%. The way that popularity is calculated could use some work, as it really does heavily on the non-uniform distribution of user participation for the majority of its "Accuracy". Also, some users who have rarely visited the server have really strong bonds with several other users, as a certain user could always be in the server, leading to a 100% bond strength between the outlier and the central user(s).

About

A Django Heroku web app that visualizes the "social graph" of a Discord server

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages