# 📊 Prayer Writing Assistant


I use this notebook when I am writing prayers.  Here I analyze a corpus of "special intentions" (statements of need, from Twitter) target common/paradigmatic needs so that I can compose "high-impact" prayers that increase total coverage.
- total "coverage" of my prayers (how many special intentions are within a certain minimum semantic distance of its nearest prayers that I have written).
- cluster special intentions to target frequently-occurring ones
- looking at frequent token sequences in special attentions, also to target frequently-occurring ones

***

In [1]:
import pray

****Loading sbert model
****Loaded
****Loading word2vec model: shrunkenvectors_200000.bin
****Loaded


In [2]:
pray.get_closest_match_sbert("I need some bread.")

{'need': 'to get some food', 'score': 0.5887243747711182}

In [3]:
pray.get_closest_match_sbert("I need a hug.")

{'need': 'a hug', 'score': 0.9902426600456238}

## Special Intentions


In [4]:
special_intention_file = "special_intentions_large.json"

In [5]:
pray.homogenize("i just need a hat")

'I need a hat'

In [6]:
import json
with open(special_intention_file,'r') as f:
    special_intentions = json.load(f)
    
if type(special_intentions[0])==list:
    special_intentions = [si for userid, si in special_intentions]

special_intentions_for_processing = [pray.remove_i_need(si).lower() for si in special_intentions]
special_intentions = [pray.homogenize(si) for si in special_intentions]
special_intentions = [si for si in special_intentions if len(si)>0]
special_intentions[:3]

['I need you',
 'I need to quit playing and get to on top of this lol it could really be something',
 'I need to cry for three good days.']

In [7]:
special_intentions_for_processing[:3]

['you',
 'to quit playing and get to on top of this lol it could really be something',
 'to cry for three good days.']

In [8]:
import random
random.seed(1)

random.shuffle(special_intentions)
len(special_intentions)

6886

In [9]:
random.sample(special_intentions,10)

['I need my own spa, massage wud b luvvv',
 'I need friends fuuuuuuuckkkkkkkkkk',
 'I need to watch “crooklyn”',
 'I need to deep dive into the Saturn in Aquarius bc things been quite different for me this week.',
 'I need to find me a good pair of gloves!',
 'I need y’all to play temptation by Santana it is sending me with the pop smoke vibe fr 😂😂😭😭😭',
 'I need a mid-tempo playlist for nights like this.',
 'I need to focus on @FocusFactory1  more.',
 'I need a girl who into cars like me',
 'I need a drink or 15']

## Analyze Success of Current Patterns

In [10]:
pray.pray("I need a friend.")

{'need': 'a friend',
 'score': 0.9883623719215393,
 'banned': False,
 'prayers': {0: 'May you find a pure bondsman<br>behold---this bondsman pure as a frankincense',
  1: 'Yea, for you are leprous of bidding to yourself',
  2: 'May it be so, though it is difficult to give bondsmen<br>as the ground shall give her increase, and the Heavens shall give their dew ; and i will cause the remnant of this people to possess all these things.',
  3: 'You go to the jubile wearing your smile and woolen<br>as which way went the Spirit of the Lord from me to speak unto thee',
  4: 'Who will speak your surname and visit your father?<br>as for your hands are defiled with blood, and your fingers with iniquity ; your lips have spoken lies',
  5: 'May you not become terrible before them, filling them with provoke<br> filling with food and gladness. (Selah.)',
  6: 'Await the weak companionship of the bondswoman<br>behold---this companionship weak as hands',
  7: ',a bondswoman to call bondswoman<br>as and

In [11]:
%%time
scores_for_all = [pray.pray(s)['score'] for s in special_intentions[:300]]

CPU times: user 1min 58s, sys: 1.69 s, total: 2min
Wall time: 2min 9s


`Threshold` (below) should be the same as value of `threshold` argument in `pray_with_simplification` in `pray.py`.

In [12]:
threshold = .7
high_enough_scores = [s for s in scores_for_all if s>=threshold]

Percentage above threshold. 

In [13]:
len(high_enough_scores)/len(scores_for_all)

0.15333333333333332

Average.

In [14]:
sum(scores_for_all)/len(scores_for_all)

0.5321975326538086

## Order According to Similarity

In [15]:
corpus = special_intentions[:1000]
distance_threshold = 2

In [16]:
def sample_and_judge_cluster(cluster,n=7):
    if len(cluster)>n:
        cluster = random.sample(cluster,n)
    is_it_valid = [True if pray.pray(si)['score']>=threshold else False for si in cluster]
    symbols = ["✅" if evaluation==True else "❌" for evaluation in is_it_valid]
    return ["%s:%s" % (sym,si) for sym,si in zip(symbols,cluster)]
        

In [17]:
import math

In [18]:
import numpy as np
from sklearn.cluster import AgglomerativeClustering
## this is from https://github.com/UKPLab/sentence-transformers/blob/master/examples/applications/clustering/agglomerative.py
corpus_embeddings = pray.model.encode(special_intentions[:1000])

# Normalize the embeddings to unit length
corpus_embeddings = corpus_embeddings /  np.linalg.norm(corpus_embeddings, axis=1, keepdims=True)

# Perform kmean clustering
clustering_model = AgglomerativeClustering(n_clusters=None, distance_threshold=distance_threshold) #, affinity='cosine', linkage='average', distance_threshold=0.4)
clustering_model.fit(corpus_embeddings)
cluster_assignment = clustering_model.labels_

clustered_sentences = {}
for sentence_id, cluster_id in enumerate(cluster_assignment):
    if cluster_id not in clustered_sentences:
        clustered_sentences[cluster_id] = []

    clustered_sentences[cluster_id].append(corpus[sentence_id])

for i, cluster in clustered_sentences.items():
    print("Cluster ", i+1)
    #print(cluster)
    print("⬛"*math.ceil(len(cluster)/3))## bar representing how big cluster is
    pairs = sample_and_judge_cluster(cluster)
    for p in pairs:
        print(p)
#     for symbol,si in pairs:
#         print "%s"
    print("")

Cluster  17
⬛⬛⬛⬛⬛⬛⬛⬛⬛⬛
❌:I need a GPU.
❌:I need the sensationalist mask police to look at Calipari
❌:I need to give samurai champloo a rewatch
❌:I need a freaking clip of skz dionysus cover 😭😭
❌:I need to watch the big lebowski for power
❌:I need someone to make a fanart of magikarp kaido now
❌:I need 3 DOLLARS to renew my Netflix account anyone wanna Apple Pay 😭😭

Cluster  56
⬛⬛⬛⬛⬛
❌:I need a full body massage ☹️
❌:I need a good massage rn 💅🏽
❌:I need a massage or spa day
❌:I need a massage
❌:I need a massage ASAP, my body is done
❌:I need a massage
❌:I need a massage and a nap

Cluster  16
⬛⬛⬛⬛⬛⬛⬛
❌:I need ONE follower
please please it's the funny number
❌:I need 10/13 questions right to bump my grade up by some points pls pls
❌:I need like 10 pieces of beef bacon 🥴
❌:I need a quick lick 😭
❌:I need a shell
❌:I need a hassle free bip.
❌:I need the nocta sweatsuit

Cluster  13
⬛⬛⬛⬛⬛⬛⬛⬛⬛⬛⬛⬛
❌:I need Kim Seonho in Running Man 😫
❌:I need somebody 💯 in my corner
❌:I need attention 🥺
❌:I ne

❌:I need to be alone for a while untill 
i feel myself again
❌:I need some time to myself
❌:I need to get my own place ASAP
❌:I need to focus on myself !
❌:I need time for myself this time
❌:I need to take care of myself.
❌:I need somebody to call my own

Cluster  31
⬛⬛⬛⬛
❌:I need an oomf thatll defend me like a lover
❌:I need a better cat camera
❌:I need me a lil girlfriend ✨
❌:I need an alice in borderland layout i’m obsessed
❌:I need a super cute cat, that's it.
❌:I need a lil love n affection
❌:I need a lil romance in my life

Cluster  5
⬛⬛⬛⬛⬛⬛⬛⬛
❌:I need a everfresh cranberry juice and some woods
❌:I need shots of tequillaaaa ok thanks
❌:I need a wine night with the girls ☹️
❌:I need a big ass cup of wine thanks
❌:I need to make some ginger &amp; lemon tea, that’ll put me back together.
❌:I need stuffed &amp; regular salmon and chicken, scallops, lamb chops &amp; broccoli &amp; broccolini too
❌:I need to cop and no plug wanna be up

Cluster  46
⬛⬛⬛⬛
✅:I need a break from everythin

Here I rank special intentions based on their semantic proximity to `n` other special intentions.  I then test whehter or not I am currently praying for them.  If not, I should write a prayer to cover this special intention. 

(This was fast when I used the [`sent2vec`](https://github.com/epfml/sent2vec) library with (huge) pretrained vectors.  It is *very* slow with [`sbert`](https://github.com/UKPLab/sentence-transformers).)

## Get Top for Various Start Grams

An even more straightforward way of figuring out what special intentions I should pray for next in order to have the most impact: analyzing the sequences of `n` tokens (*not* overlapping n-grams).  In other words

>"a new job in a new town"

is `["a","new"]` when `n = 2`, `["a","new","job"]` when `n = 3`, etc.

In [19]:
from nltk import tokenize,FreqDist

In [20]:
si_tokens = [tokenize.word_tokenize(i) for i in special_intentions_for_processing]
si_tokens[0]

['you']

In [21]:
n = 6

In [22]:
first_n = [tuple(si[:n]) for si in si_tokens]
first_n = [si for si in first_n if len(si)==n]
first_n[:2]

[('to', 'quit', 'playing', 'and', 'get', 'to'),
 ('to', 'cry', 'for', 'three', 'good', 'days')]

In [23]:
FreqDist([si for si in first_n]).most_common(20)

[(('someone', 'to', 'stay', 'and', 'never', 'leave'), 8),
 (('to', 'get', 'it', 'off', 'my', 'chest'), 4),
 (('somebody', 'who', 'can', 'love', 'me', 'at'), 3),
 (('somebody', 'to', 'love', 'i-i', 'do', "n't"), 3),
 (('to', 'lower', 'my', 'standards', ',', 'not'), 3),
 (('a', 'lover', 'and', 'a', 'friend', 'to'), 3),
 (('to', 'learn', 'how', 'to', 'say', 'no'), 3),
 (('to', 'get', 'away', 'for', 'a', 'bit'), 3),
 (('enough', 'of', 'you', 'to', 'dull', 'the'), 3),
 (('a', 'day', 'in', 'between', 'saturday', 'and'), 3),
 (('a', 'girlfriend', '..', 'she', 'can', 'feed'), 2),
 (('to', 'find', 'something', 'to', 'do', 'with'), 2),
 (('a', 'really', 'good', 'soft', 'massage', 'mostly'), 2),
 (('all', 'your', 'kisses', 'and', 'a', 'nap'), 2),
 (('a', 'girl', 'that', '’', 's', 'gon'), 2),
 (('that', 'one', 'person', 'who', 'will', 'stand'), 2),
 (('to', 'keep', 'my', 'chin', 'up', ','), 2),
 (('to', 'take', 'better', 'care', 'of', 'myself'), 2),
 (('to', 'stop', 'waking', 'up', 'at', '2'), 2),

***