playing with extracting knowledge from slack metadata
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.
gh-pages @ 5f06eab
vis @ 81f143f

Build Status



Playing with extracting knowledge from slack data or metadata. For example, try to find similar channels, based on content.



virtualenv venv
source ./venv/bin/activate

pip3 install -r requirements.txt


Setup defaults

export API_TOKEN="your-api-token"

Get base information

Find all messages, for any channels that have between 10 and 1000 recent messages

python3 --token $API_TOKEN

Analyse these messages, extracting what we need for building anything with them


Find similar channels

For a single channel, find top 10 similar channels

python3 --channel some-channel-name --topn 10

Bulk Convert to channel similarities

Use analysed message content to find similarities between channels, and convert this similarity to a distance measure, only allowing channels closer than 0.8.

python3 --distance 0.8 --out content.sims.json

Visualise channel similarity

Set up vis submodule

cd vis/
git submodule init
git submodule update
cd ..

Generate visualisation format

python3 --in content.sims.json --out vis/vis.json

Display visualisation (after this step, you rerun generation step and just reload the page)

cd vis && python3 -m http.server 8000 &
open http://localhost:8000

Sharing channel similarity visualisation

If you'd like to share the visualisation, you can just share the contents of the vis/ directory.

If you want to share, but don't want to give away too much about your channel names then try

python3 --in content.sims.json --out vis/vis.json --obfuscate

This will replace parts of channel names with an internally consistent, but arbitrary, replacement e.g. "my-channel-name" becomes "mazut-duller-toffy" and "my-other-channel-name" becomes "mazut-tumefy-duller-toffy". The intent of this is to retain any insight from spotting similar-sounding channel names being related by content, without actually letting anyone know the real names of the channels.

These names are randomized so the exact replacement you see will be different. By default this will look for random replacement words in /usr/share/dict/words, but you can point it to any file you like

python3 --in content.sims.json --out vis/vis.json --obfuscate --words-file your-words-file