# Analysis of the OpenMRS #fhir Channel

### Some ideas
- [ ] Scoping by Thread
- [ ] Scoping by User
- [ ] Scoping by Timestamp




In [1]:
# Install a conda package in the current Jupyter kernel
import sys

try:
     import slacker
except ImportError:
    !conda install --yes -c conda-forge slacker

In [2]:
# Import required packages
# From: https://gist.github.com/Chandler/fb7a070f52883849de35

from slacker import Slacker
import json
import argparse
import os

In [3]:
# From: https://gist.github.com/Chandler/fb7a070f52883849de35

# This script finds all channels, private channels and direct messages
# that your user participates in, downloads the complete history for
# those converations and writes each conversation out to seperate json files.
#
# This user centric history gathering is nice because the official slack data exporter
# only exports public channels.
#
# PS, this only works if your slack team has a paid account which allows for unlimited history.
#
# PPS, this use of the API is blessed by Slack.
# https://get.slack.help/hc/en-us/articles/204897248
# " If you want to export the contents of your own private groups and direct messages
# please see our API documentation."
#




## Registering your "App" to get an Authentication Token

This process - which uses OAuth and app registration in a very similar manner to SMART on FHIR - is the new process through whcih slack would like you to interact with their API. 

Read this disclaimer: https://api.slack.com/custom-integrations/legacy-tokens

### Create your App
To register a Slack App for your notebook, go here: https://api.slack.com/slack-apps)

#### Select Scopes
In the `Scopes` section, select the `Access user's public channels` scope, and any additional scopes you might need (list to follow soon).

Create your app, and store the OAuth Access Token

In [20]:
TOKEN = '<your-api-token-from-slack-apps>'

## Testing out Authentication

You should see `Successfully authenticated for team OpenMRS and user <your-username>` as the output of the following cell.

In [21]:
# Get basic info about the slack channel to ensure the authentication token works
def doTestAuth(slack):
  testAuth = slack.auth.test().body
  teamName = testAuth['team']
  currentUser = testAuth['user']
  print("Successfully authenticated for team {0} and user {1} ".format(teamName, currentUser))
  return testAuth

slack = Slacker(TOKEN)
testAuth = doTestAuth(slack)

Successfully authenticated for team OpenMRS and user pmanko 


## Read in all posts on the #fhir channel

First, we can take a look at all of the public channels we have access to.

PS: Using the IPython `JSON` helper works in Jupyter Lab if you run the following in the Anaconda environment: `jupyter labextension install @jupyterlab/geojson-extension
Node v10.13.0`.  

The helper produces a nested JSON output. If, instead, you get `<IPython.core.display.JSON object>`, you can always just return the raw JSON without using the `JSON` helper.


In [22]:
from IPython.display import JSON

channels = slack.channels.list().body['channels']
JSON(channels[1:5])

<IPython.core.display.JSON object>

Note the ID of the `fhir` channel, which in my case is `CKLPH66BB`.

In [23]:
import itertools
import json


# From https://hackersandslackers.com/extract-data-from-complex-json-python/
#   Probably overkill, but useful JSON helper function
def extract_values(obj, key):
    """Pull all values of specified key from nested JSON."""
    arr = []

    def extract(obj, arr, key):
        """Recursively search for values of key in JSON tree."""
        if isinstance(obj, dict):
            for k, v in obj.items():
                if isinstance(v, (dict, list)):
                    extract(v, arr, key)
                elif k == key:
                    arr.append(v)
        elif isinstance(obj, list):
            for item in obj:
                extract(item, arr, key)
        return arr

    results = extract(obj, arr, key)
    return results

fhirIndex = extract_values(channels, "name").index('fhir')
fhirId = extract_values(channels, "id")[fhirIndex]

fhirId


'CKLPH66BB'

Now we can list the history of our channel, which we first will print in a very messy way for the first 100 messages:

In [24]:
# slack.conversations.history("#fhir")
hist = slack.conversations.history(fhirId).body['messages']

In [25]:
JSON(hist[:10])

<IPython.core.display.JSON object>

Now we can use Pandas to start playing with the dataset. First, let's transform it into a data frame, and maybe filter out the `channel_join` sybtype:

In [26]:
import pandas as pd
import ast

#data = pd.read_json(print(hist))
#df = pd.DataFrame(data)


df = pd.DataFrame(hist)
df = df[df.subtype != 'channel_join']

df.head()

Unnamed: 0,attachments,bot_id,client_msg_id,edited,icons,inviter,last_read,latest_reply,name,old_name,...,root,subscribed,subtype,team,text,thread_ts,ts,type,user,username
5,,,51362F0A-D336-4AD6-B3DD-F3960CA7067D,,,,,,,,...,,,,T03U4PGDY,Because avoiding duplicate network requests is...,,1561805798.0971,message,UHV91HPGR,
6,,,CE28843B-C162-42FF-B7B7-6AD13A5D9E26,,,,,1561971201.1006,,,...,,False,,T03U4PGDY,The current OpenMRS REST APIs are so configura...,1561805491.0913,1561805491.0913,message,UHV91HPGR,
7,,,CFE2A3E5-8326-4542-BB7C-9859C44098E4,,,,,,,,...,,,,T03U4PGDY,"Like Burke pointed out, a big benefit of using...",,1561805129.088,message,UHV91HPGR,
8,,,56CE0450-1BD6-422F-9999-B4F64AF2768B,,,,,1561869546.1,,,...,,False,,T03U4PGDY,For the work being done with the microfrontend...,1561804834.0842,1561804834.0842,message,UHV91HPGR,
9,,,FA4C36D0-5C44-4B77-A104-3AE2C7E7F40B,"{'user': 'UHV91HPGR', 'ts': '1561804886.000000'}",,,1562088020.000479,1562000747.1009,,,...,,True,,T03U4PGDY,The conversations about internal vs external h...,1561804718.0822,1561804718.0822,message,UHV91HPGR,


Let's take a look at some other possibly-useful columns and their value sets:


In [27]:
df.columns

Index(['attachments', 'bot_id', 'client_msg_id', 'edited', 'icons', 'inviter',
       'last_read', 'latest_reply', 'name', 'old_name', 'pinned_info',
       'pinned_to', 'purpose', 'reactions', 'replies', 'reply_count',
       'reply_users', 'reply_users_count', 'root', 'subscribed', 'subtype',
       'team', 'text', 'thread_ts', 'ts', 'type', 'user', 'username'],
      dtype='object')

In [28]:
df.subtype.unique()

array([nan, 'thread_broadcast', 'channel_purpose', 'channel_name',
       'bot_message'], dtype=object)

In [29]:
df.username.unique()

array([nan, 'Polly'], dtype=object)

In [30]:
df.user.unique()

array(['UHV91HPGR', 'UJKMYB5GS', 'UHJURBNMR', 'UK9GPECLB', 'U04395ES6',
       'UGHMA5GCS', 'UJKN7QDEW', 'UKWGQRW3G', 'U055KNCB1', 'UJ87ATGH5',
       'UHTEN6LHW', 'U2933U3RN', nan], dtype=object)

Since the users in our data frame are identified by their ID, we need to get some more user information:

In [40]:
allUsers = slack.users.list().body['members']

In [43]:
userDf = pd.DataFrame(allUsers)

In [44]:
userDf.head()

Unnamed: 0,color,deleted,has_2fa,id,is_admin,is_app_user,is_bot,is_invited_user,is_owner,is_primary_owner,is_restricted,is_ultra_restricted,name,profile,real_name,team_id,tz,tz_label,tz_offset,updated
0,757575,False,,USLACKBOT,False,False,False,,False,False,False,False,slackbot,"{'title': '', 'phone': '', 'skype': '', 'real_...",Slackbot,T03U4PGDY,,Pacific Daylight Time,-25200.0,0
1,4bbe2e,False,,U03U2FS2Z,False,False,False,,False,False,False,False,ryan,"{'title': '', 'phone': '', 'skype': '', 'real_...",Ryan Yates,T03U4PGDY,America/Indiana/Indianapolis,Eastern Daylight Time,-14400.0,1510345295
2,,True,,U03U4PGEY,,False,False,,,,,,michael,"{'title': 'Worldwide Community Manager', 'phon...",,T03U4PGDY,,,,1545236742
3,e7392d,False,,U03U5TZUU,False,False,False,,False,False,False,False,elliott,"{'title': '', 'phone': '', 'skype': '', 'real_...",Elliott Williams,T03U4PGDY,America/Indiana/Indianapolis,Eastern Daylight Time,-14400.0,1510345303
4,,True,,U03U61N1B,,False,False,,,,,,r0bby,"{'title': '', 'phone': '', 'skype': '', 'real_...",,T03U4PGDY,,,,1533174676


In [46]:
userDf.columns

Index(['color', 'deleted', 'has_2fa', 'id', 'is_admin', 'is_app_user',
       'is_bot', 'is_invited_user', 'is_owner', 'is_primary_owner',
       'is_restricted', 'is_ultra_restricted', 'name', 'profile', 'real_name',
       'team_id', 'tz', 'tz_label', 'tz_offset', 'updated'],
      dtype='object')