# Instructions

Use "prodigyEnv" conda environment for this notebook.

To set up Prodigy environment, download the wheel file from the Prodigy email (which you receive after purchasing a license). 

Then run `pip install ./prodigy*.whl`

Instructions: https://prodi.gy/docs/install

Database is stored at /

<br><br>

# Imports

In [1]:
from collections import defaultdict
import random

import pandas as pd
from prodigy.components.db import connect

%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns
sns.set(style='ticks', font_scale=1.2)

In [2]:
def sort_by_mean(df, by, column, rot=0):
    # use dict comprehension to create new dataframe from the iterable groupby object
    # each group name becomes a column in the new dataframe
    df2 = pd.DataFrame({col:vals[column] for col, vals in df.groupby(by)})
    # find and sort the median values in this new dataframe
    means = df2.mean().sort_values()
    # use the columns in the dataframe, ordered sorted by median value
    # return axes so changes can be made outside the function
#     return df2[meds.index].boxplot(rot=rot, return_type="axes")
    return means

<br><br>

# Connect to database

In [3]:
db = connect()

db.datasets # This will list all of your prodigy databases

['book-graph',
 'book-graph-ethics',
 'book-graph-structure',
 'book-graph-humor-romance',
 'birth-control',
 'birth-control-webmd',
 'birth-control-reddit',
 'birth-control-all-labels-reddit',
 'birth-control-all-labels-twitter',
 'birth-control-all-labels-webmd',
 'birth-control-reddit-discourse',
 'birth-control-all-labels-reddit-comments',
 'birth-control-all-labels-twitter-posts',
 'birth-control-all-labels-twitter-replies',
 'birth-control-all-labels-reddit-extra']

In [5]:
# db.drop_dataset('birth-control-all-labels-twitter')  # Only do this if you want to delete all your annotations!!!!!!!!!!!

<br><br>

# Explore REDDIT comments

In [13]:
examples = db.get_dataset('birth-control-all-labels-reddit-comments')

print(len(examples))

500


In [14]:
label_count_dict = defaultdict(int)
label_texts_dict = defaultdict(list)
for e in examples:
    for _label in e['accept']:
        label_count_dict[_label] += 1
        label_texts_dict[_label].append(e['text'])

print('------------------------------------------------------')
print('total number of posts labeled')
print('[out of 4,395 total]')
print('------------------------------------------------------')
print()
for _label, _count in sorted(label_count_dict.items(), key=lambda x: x[1], reverse=True):
    print(_count, '\t', _label)

------------------------------------------------------
total number of posts labeled
[out of 4,395 total]
------------------------------------------------------

270 	 PROVIDING INFORMATIONAL SUPPORT
243 	 PROVIDING EXPERIENCES
64 	 PROVIDING EMOTIONAL SUPPORT
40 	 DISCOURSE
8 	 SEEKING EMOTIONAL SUPPORT
7 	 SEEKING INFORMATIONAL SUPPORT
7 	 SEEKING EXPERIENCES


In [15]:
label_percent_dict = {_label: _count/float(len(examples)) for _label, _count in label_count_dict.items()}

print('------------------------------')
print('percent of posts with label')
print('------------------------------')
print()
for _label, _percent in sorted(label_percent_dict.items(), key=lambda x: x[1], reverse=True):
    print(str(round(_percent*100, 1)) + '%', '\t', _label)

------------------------------
percent of posts with label
------------------------------

54.0% 	 PROVIDING INFORMATIONAL SUPPORT
48.6% 	 PROVIDING EXPERIENCES
12.8% 	 PROVIDING EMOTIONAL SUPPORT
8.0% 	 DISCOURSE
1.6% 	 SEEKING EMOTIONAL SUPPORT
1.4% 	 SEEKING INFORMATIONAL SUPPORT
1.4% 	 SEEKING EXPERIENCES


In [16]:
for _label, _texts in label_texts_dict.items():
    if _label == 'DISCOURSE':
        print('===========================================')
        print(_label)
        print('===========================================\n')
        for e in _texts:
            print(e)
            print('\n-------------------------------------------\n')
        print()

DISCOURSE

Holy fucking smokes. Now I appreciate the doctor who dangled the old IUD that she removed before my eyes way more.

-------------------------------------------

I would think because it sounds like she’s already made up her mind about keeping the babies- so it’s kind of like someone asking “are you sure?” to someone to has made up their mind about an abortion. 

However, I completely agree with this post and agree that these are all extremely valid points. Everyone jumps to congratulate someone being pregnant even though it is very likely not a great situation. I just hope OP has a support system.

-------------------------------------------

Do you say that to patients who ask questions that have been asked millions of times too?

-------------------------------------------

Haha yeah, that's the point you switched to once your statement that "perfect use for pills is not 99.7% effective" was proven wrong. 

-------------------------------------------

Men’s problems (inclu

<br><br>

# Explore REDDIT posts

In [17]:
examples = db.get_dataset('birth-control-all-labels-reddit')

print(len(examples))

500


In [18]:
label_count_dict = defaultdict(int)
label_texts_dict = defaultdict(list)
for e in examples:
    for _label in e['accept']:
        label_count_dict[_label] += 1
        label_texts_dict[_label].append(e['text'])

print('------------------------------------------------------')
print('total number of posts labeled')
print('[out of 4,395 total]')
print('------------------------------------------------------')
print()
for _label, _count in sorted(label_count_dict.items(), key=lambda x: x[1], reverse=True):
    print(_count, '\t', _label)

------------------------------------------------------
total number of posts labeled
[out of 4,395 total]
------------------------------------------------------

355 	 SEEKING INFORMATIONAL SUPPORT
170 	 SEEKING EXPERIENCES
123 	 SEEKING EMOTIONAL SUPPORT
32 	 PROVIDING EXPERIENCES
5 	 PROVIDING INFORMATIONAL SUPPORT
4 	 PROVIDING EMOTIONAL SUPPORT
4 	 DISCOURSE


In [19]:
print(examples[0])

{'text': '[TITLE: starting/getting birth control online with online health center?] \n\nNew to all things birth control. Well, except condoms. Which my bf and I have been using up until this point. However, we both would like to ditch them or at least go without them once in awhile so I\'m looking into birth control and ways to obtain it.  My preferred choice would be those ones implanted in the arm that I\'ve read about, but due to expenses and being a poor college student at the moment... the pill seems as though the better solution to go for now. \n\nSo being that I\'ll be at college without a mode of transportation and just overall being a socially awkward person that doesn\'t like to go to new places and talk to strangers all that often... I was quite intrigued in finding out Planned Parenthood has an Online Health Center and I was wondering if anyone here uses this or can give me any and all information about it? \n\n-If one has never gone in for an appt. before and has never bee

In [20]:
label_percent_dict = {_label: _count/float(len(examples)) for _label, _count in label_count_dict.items()}

print('------------------------------')
print('percent of posts with label')
print('------------------------------')
print()
for _label, _percent in sorted(label_percent_dict.items(), key=lambda x: x[1], reverse=True):
    print(str(round(_percent*100, 1)) + '%', '\t', _label)

------------------------------
percent of posts with label
------------------------------

71.0% 	 SEEKING INFORMATIONAL SUPPORT
34.0% 	 SEEKING EXPERIENCES
24.6% 	 SEEKING EMOTIONAL SUPPORT
6.4% 	 PROVIDING EXPERIENCES
1.0% 	 PROVIDING INFORMATIONAL SUPPORT
0.8% 	 PROVIDING EMOTIONAL SUPPORT
0.8% 	 DISCOURSE


In [21]:
for _label, _texts in label_texts_dict.items():
    if _label == 'SEEKING EMOTIONAL SUPPORT':
        print('===========================================')
        print(_label)
        print('===========================================\n')
        for e in _texts:
            print(e)
            print('\n-------------------------------------------\n')
        print()

SEEKING EMOTIONAL SUPPORT

[TITLE: took pill for 26 days - am i safe to start my placebo week?] 

Hey everybody! I'll try to make to make this very short -

I'm in a LDR, I recently visited my lovely partner and the last time we've had unprotected intercourse was the night of the 21st to the 22nd of July. I should've already been on my placebo week then, but didn't want to deal with my mood swings while I was with him.

Will I be protected if I take my 7 days of placebo week again starting on Saturday? This might be a no-brainer, sorry, but I just wanted to make sure! After that period, I'll start a new full blister and keep the started one for backup.

Many thanks!

-------------------------------------------

[TITLE: nervous about starting birth control for the first time (lo loestrin fe)] 

I am supposed to start taking Lo Loestrin Fe today but I am really nervous about the side effects. I have never taken any birth control before so I don’t have anything to compare it to. I felt go

<br><br>

# Explore TWITTER posts

In [22]:
examples = db.get_dataset('birth-control-all-labels-twitter-posts')

print(len(examples))

460


In [23]:
label_total_count_dict = defaultdict(int)
for e in examples:
    for _label in e['accept']:
        label_total_count_dict[_label] += 1

print('------------------------------------------------------')
print('total number of tweets labeled')
print('[out of 4,395 total]')
print('------------------------------------------------------')
print()
for _label, _count in sorted(label_total_count_dict.items(), key=lambda x: x[1], reverse=True):
    print(_count, '\t', _label)

------------------------------------------------------
total number of tweets labeled
[out of 4,395 total]
------------------------------------------------------

205 	 DISCOURSE
131 	 PROVIDING INFORMATIONAL SUPPORT
101 	 PROVIDING EXPERIENCES
39 	 SEEKING EMOTIONAL SUPPORT
21 	 SEEKING EXPERIENCES
21 	 SEEKING INFORMATIONAL SUPPORT
1 	 PROVIDING EMOTIONAL SUPPORT


In [24]:
print(examples[0])

{'text': "President Obama &amp; HHS will fine all participants in Obamacare who won't pay for Oral Contraception $36,000.00 per year per person. #TYRANNY", 'meta': {'ID': 504048215810113540, 'Source': 'twitter-posts'}, '_input_hash': -1270619619, '_task_hash': -1560586231, 'options': [{'id': 'SEEKING INFORMATIONAL SUPPORT', 'text': 'SEEKING INFORMATIONAL SUPPORT'}, {'id': 'PROVIDING INFORMATIONAL SUPPORT', 'text': 'PROVIDING INFORMATIONAL SUPPORT'}, {'id': 'SEEKING EMOTIONAL SUPPORT', 'text': 'SEEKING EMOTIONAL SUPPORT'}, {'id': 'PROVIDING EMOTIONAL SUPPORT', 'text': 'PROVIDING EMOTIONAL SUPPORT'}, {'id': 'SEEKING EXPERIENCES', 'text': 'SEEKING EXPERIENCES'}, {'id': 'PROVIDING EXPERIENCES', 'text': 'PROVIDING EXPERIENCES'}, {'id': 'DISCOURSE', 'text': 'DISCOURSE'}], '_session_id': None, '_view_id': 'choice', 'accept': ['DISCOURSE'], 'answer': 'accept'}


In [25]:
label_positive_percent_dict = {_label: _positive_count/float(len(examples)) for _label, _positive_count in label_total_count_dict.items()}

print('------------------------------')
print('percent of reviews with label')
print('------------------------------')
print()
for _label, _percent in sorted(label_positive_percent_dict.items(), key=lambda x: x[1], reverse=True):
    print(str(round(_percent*100, 1)) + '%', '\t', _label)

------------------------------
percent of reviews with label
------------------------------

44.6% 	 DISCOURSE
28.5% 	 PROVIDING INFORMATIONAL SUPPORT
22.0% 	 PROVIDING EXPERIENCES
8.5% 	 SEEKING EMOTIONAL SUPPORT
4.6% 	 SEEKING EXPERIENCES
4.6% 	 SEEKING INFORMATIONAL SUPPORT
0.2% 	 PROVIDING EMOTIONAL SUPPORT


In [26]:
label_examples_dict = defaultdict(list)
for e in examples:
    for _label in e['accept']:
        label_examples_dict[_label].append(e['text'])

for _label, _examples in label_examples_dict.items():
    if _label == 'SEEKING EMOTIONAL SUPPORT':
        print('===========================================')
        print(_label)
        print('===========================================\n')
        for e in _examples:
            print(e)
            print('-------------------------------------------')
        print()

SEEKING EMOTIONAL SUPPORT

I want the IUD so bad but I’m scared 🥺🙄
-------------------------------------------
i cannot believe there’s a non-hormonal oral contraceptive that’s been on the market for thirty YEARS but it’s only available in india. this world is a sick joke
-------------------------------------------
Thank you Jesus no more implanon !! Bck to basic birth control !!*praise the lord I'm free*
-------------------------------------------
I'm at the lady doctor... Recommend some BC for me... I lowkey want the IUD or Mirena but, people scare me with their horror stories
-------------------------------------------
I'm so nervous to get this nexplanon put it 😰
-------------------------------------------
Ok so I’m having really bad cramps Rn and i JUST got off my period. If having the iud is just constant cramps idk if i can do this shit
-------------------------------------------
Finally able to get this nexplanon removed as of tomorrow!!!
---------------------------------------

<br><br>

# Explore Twitter REPLIES

In [19]:
examples = db.get_dataset('birth-control-all-labels-twitter-replies')

print(len(examples))

500


In [20]:
label_total_count_dict = defaultdict(int)
for e in examples:
    for _label in e['accept']:
        label_total_count_dict[_label] += 1

print('------------------------------------------------------')
print('total number of tweets labeled')
print('[out of 4,395 total]')
print('------------------------------------------------------')
print()
for _label, _count in sorted(label_total_count_dict.items(), key=lambda x: x[1], reverse=True):
    print(_count, '\t', _label)

------------------------------------------------------
total number of tweets labeled
[out of 4,395 total]
------------------------------------------------------

246 	 PROVIDING EXPERIENCES
163 	 DISCOURSE
124 	 PROVIDING INFORMATIONAL SUPPORT
19 	 SEEKING EMOTIONAL SUPPORT
16 	 SEEKING INFORMATIONAL SUPPORT
15 	 PROVIDING EMOTIONAL SUPPORT
11 	 SEEKING EXPERIENCES


In [21]:
label_positive_percent_dict = {_label: _positive_count/float(len(examples)) for _label, _positive_count in label_total_count_dict.items()}

print('------------------------------')
print('percent of reviews with label')
print('------------------------------')
print()
for _label, _percent in sorted(label_positive_percent_dict.items(), key=lambda x: x[1], reverse=True):
    print(str(round(_percent*100, 1)) + '%', '\t', _label)

------------------------------
percent of reviews with label
------------------------------

49.2% 	 PROVIDING EXPERIENCES
32.6% 	 DISCOURSE
24.8% 	 PROVIDING INFORMATIONAL SUPPORT
3.8% 	 SEEKING EMOTIONAL SUPPORT
3.2% 	 SEEKING INFORMATIONAL SUPPORT
3.0% 	 PROVIDING EMOTIONAL SUPPORT
2.2% 	 SEEKING EXPERIENCES


In [23]:
label_examples_dict = defaultdict(list)
for e in examples:
    for _label in e['accept']:
        label_examples_dict[_label].append(e['text'])

for _label, _examples in label_examples_dict.items():
    if _label == 'SEEKING INFORMATIONAL SUPPORT':
        print('===========================================')
        print(_label)
        print('===========================================\n')
        for e in _examples:
            print(e)
            print('-------------------------------------------')
        print()

SEEKING INFORMATIONAL SUPPORT

@iAmMissKarma whats IUD?
-------------------------------------------
@fanxybean WTF IS A KYLEENA?!!?! https://t.co/zMSLGkNk0o
-------------------------------------------
@amberj__ I’ve heard of the IUD what’s the bar? The thing that goes into your arm or something?
-------------------------------------------
@LauraHolland The pill, as in the contraceptive pill?
-------------------------------------------
@linkHalfway Which was a condom or birth control pill right?
-------------------------------------------
@bcphoto can you not wear a cup with an iud?
-------------------------------------------
@SafeeraHussainy I got the emergency contraceptive pill recently from @ChemistWhouse. The pharmacist said it was a legal requirement that I fill out/sign the PSA checklist. I called later to query this and the pharmacist said PSA guidelines are still up-to-date. Is this true?
-------------------------------------------
@Olwe2Lesh How often should I be checking on m

<br><br>

# Backup labeling into a CSV

In [4]:
reddit_post_examples = db.get_dataset('birth-control-all-labels-reddit')
reddit_comment_examples = db.get_dataset('birth-control-all-labels-reddit-comments')
reddit_extra_examples = db.get_dataset('birth-control-all-labels-reddit-extra')
twitter_post_examples = db.get_dataset('birth-control-all-labels-twitter-posts')
twitter_replies_examples = db.get_dataset('birth-control-all-labels-twitter-replies')

In [5]:
len(reddit_post_examples), len(reddit_comment_examples), len(reddit_extra_examples), len(twitter_post_examples), len(twitter_replies_examples)

(500, 500, 53, 500, 500)

In [14]:
reddit_comment_examples[0]

{'text': "You should take it easy for the first week at least, you'll have bruising and it'll be tender. \n\nJust go with your body! I've had mine for almost 3 months and I don't notice it at all, the insertion mark is a tiny dot on my arm. Unless you were actively looking/feeling for it - you'd never know!",
 'meta': {'ID': 'exeejsv', 'Source': 'reddit-comments'},
 '_input_hash': -246065321,
 '_task_hash': 529846214,
 'options': [{'id': 'SEEKING INFORMATIONAL SUPPORT',
   'text': 'SEEKING INFORMATIONAL SUPPORT'},
  {'id': 'PROVIDING INFORMATIONAL SUPPORT',
   'text': 'PROVIDING INFORMATIONAL SUPPORT'},
  {'id': 'SEEKING EMOTIONAL SUPPORT', 'text': 'SEEKING EMOTIONAL SUPPORT'},
  {'id': 'PROVIDING EMOTIONAL SUPPORT', 'text': 'PROVIDING EMOTIONAL SUPPORT'},
  {'id': 'SEEKING EXPERIENCES', 'text': 'SEEKING EXPERIENCES'},
  {'id': 'PROVIDING EXPERIENCES', 'text': 'PROVIDING EXPERIENCES'},
  {'id': 'DISCOURSE', 'text': 'DISCOURSE'}],
 '_session_id': None,
 '_view_id': 'choice',
 'accept': 

In [6]:
label_dicts = []
for e in reddit_post_examples + reddit_comment_examples + reddit_extra_examples + twitter_post_examples + twitter_replies_examples:
    for _label in e['accept']:
        label_dicts.append({'Source': e['meta']['Source'],
                            'ID': e['meta']['ID'],
                            'Label': _label,
                            'Text': e['text']})
label_df = pd.DataFrame(label_dicts)

In [7]:
len(label_df)

2570

In [8]:
label_df.sample(3)

Unnamed: 0,Source,ID,Label,Text
2134,twitter-replies,1110522567302004700,PROVIDING EXPERIENCES,@aardvarkwizard @J_M_T_ @tiarala OH I REMEMBER...
2344,twitter-replies,1319984836921184300,PROVIDING EXPERIENCES,@DarTwoDeeTwo @JessicaValenti YES! I have the ...
781,reddit-comments,dzjo6xk,PROVIDING EXPERIENCES,I would use it if I had a life modelling gig o...


In [9]:
label_df.to_csv('/Volumes/Passport-1/data/birth-control/labeling/labeled_by_maria.all.csv')

In [15]:
import pickle

data_directory_path   = '/Volumes/Passport-1/data/birth-control'

id_type_dict1 = pickle.load(open(data_directory_path + '/labeling/reddit_comments.id_type_dict.pickle', 'rb'))
id_type_dict2 = pickle.load(open(data_directory_path + '/labeling/reddit.id_type_dict.pickle', 'rb'))

In [17]:
for e in reddit_extra_examples:
    _id = e['meta']['ID']
    assert _id in id_type_dict or _id in id_type_dict2