## Clinical NLP & Analysis with Elasticsearch & Kibana
In this notebook, I show how to leverage Elasticsearch APIs to perform scalable and rapid text analystics using a collection of online mental disorder, self-diagnosed users.

In [1]:
%load_ext autoreload
%autoreload 2

### Load the data

In [2]:
import helpers.pickle_helpers as ph
import json

In [75]:
bipolar = ph.load_from_pickle("data/users/bipolar_group_clean.p")
bpd = ph.load_from_pickle("data/users/bpd_group_clean.p")
regular = ph.load_from_pickle("data/users/regular_users.p")

In [56]:
len(bipolar.group)

278

In [71]:
bipolar.group[1].head(n=5)

Unnamed: 0,text,polarity,emotion,emotion_2,ambiguous,dt,date,user_id,user_type
2014-09-30 23:26:47,@DerekActual hehe Yeah it's definitely 1 that ...,1,joy,anger,True,1.533333,2014-09-30 23:26:47,1,bipolar
2014-09-30 23:28:19,@DestinyTheGame Omg plz bring it out for pc.,0,anticipation,joy,False,4.05,2014-09-30 23:28:19,1,bipolar
2014-09-30 23:32:22,@Redtippertruck with great pleasure. Xxx,1,joy,0,False,1.316667,2014-09-30 23:32:22,1,bipolar
2014-09-30 23:33:41,@TherapyAfterCSA every day. Xxx,0,joy,trust,False,1.65,2014-09-30 23:33:41,1,bipolar
2014-09-30 23:35:20,@Redtippertruck Hehe I signed it lol. Also ask...,1,sadness,joy,True,7.033333,2014-09-30 23:35:20,1,bipolar


### Convert and Index the Data

In [72]:
# example code of how to convert a user into json
bipolar.group[1]["date"] = bipolar.group[1].index
bipolar.group[1]["user_id"] = 1
bipolar.group[1]["user_type"] = "bipolar"
bipolar.group[1].to_json(orient="records", date_format="iso", path_or_buf="data/user_json/user.json", index=True)
converted = json.load(open("data/user_json/user.json"))
converted[0:2]

[{'text': "@DerekActual hehe Yeah it's definitely 1 that defies logic and explanation. Stranger things exist in heaven &amp; earth..",
  'polarity': 1,
  'emotion': 'joy',
  'emotion_2': 'anger',
  'ambiguous': True,
  'dt': 1.5333333333,
  'date': '2014-09-30T23:26:47.000Z',
  'user_id': 1,
  'user_type': 'bipolar'},
 {'text': '@DestinyTheGame Omg plz bring it out for pc.',
  'polarity': 0,
  'emotion': 'anticipation',
  'emotion_2': 'joy',
  'ambiguous': False,
  'dt': 4.05,
  'date': '2014-09-30T23:28:19.000Z',
  'user_id': 1,
  'user_type': 'bipolar'}]

In [46]:
from elasticsearch import Elasticsearch
from elasticsearch import helpers

# Elasticsearch configurations
ELASTICSEARCH = dict(
    hostname = "127.0.0.1:9200",
    index = "twitter",
    type = "_doc"
)

es =  Elasticsearch(ELASTICSEARCH['hostname'])

In [47]:
# function to format data
def convert_to_es_format(docs):
    return [
        {
            "_index": ELASTICSEARCH['index'],
            "_type": ELASTICSEARCH['type'],
            "_source": doc
        }
        for doc in docs
    ]

In [31]:
# index to elasticsearch
helpers.bulk(es, convert_to_es_format(converted[0:2]))

(2, [])

### Elasticsearch Bulk Index

In [73]:
# insert the entire collection of users to Elasticsearch
def insert_users(users, utype):
    print("Indexing users...")
    total = len(users.group)
    for i in range(total):
        users.group[i]["date"] = users.group[i].index
        users.group[i]["user_id"] = i
        users.group[i]["user_type"] = utype
        users.group[i].to_json(orient="records", date_format="iso", path_or_buf="data/user_json/user.json", index=True)
        converted = json.load(open("data/user_json/user.json"))
        helpers.bulk(es, convert_to_es_format(converted))

In [76]:
%time
# insert the bipolar users
insert_users(bipolar, "bipolar")
# insert the normal users
insert_users(regular, "normal")
# insert the bpd users
insert_users(bpd, "bpd")

CPU times: user 4 µs, sys: 1 µs, total: 5 µs
Wall time: 9.06 µs
Indexing users...
Indexing users...
Indexing users...
