# Instructions

Use "prodigyEnv" conda environment for this notebook.

To set up Prodigy environment, download the wheel file from the Prodigy email (which you receive after purchasing a license). 

Then run `pip install ./prodigy*.whl`

Instructions: https://prodi.gy/docs/install

Database is stored at /

<br><br>

# Imports

In [15]:
from collections import defaultdict
import random
import re

import pandas as pd
from prodigy.components.db import connect

%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns
sns.set(style='ticks', font_scale=1.2)

In [16]:
def sort_by_mean(df, by, column, rot=0):
    # use dict comprehension to create new dataframe from the iterable groupby object
    # each group name becomes a column in the new dataframe
    df2 = pd.DataFrame({col:vals[column] for col, vals in df.groupby(by)})
    # find and sort the median values in this new dataframe
    means = df2.mean().sort_values()
    # use the columns in the dataframe, ordered sorted by median value
    # return axes so changes can be made outside the function
#     return df2[meds.index].boxplot(rot=rot, return_type="axes")
    return means

<br><br><br><br>

---

<br><br><br><br>


# Connect to database

In [17]:
db = connect()

db.datasets # This will list all of your prodigy databases

['bc-reddit-posts',
 'bc-reddit-comments',
 'bc-twitter-posts',
 'bc-twitter-replies',
 'discourse-webmd-reviews',
 'discourse-reddit-posts',
 'discourse-reddit-comments',
 'discourse-twitter-posts',
 'discourse-twitter-replies']

In [18]:
# db.drop_dataset('discourse-reddit-comments')  # Only do this if you want to delete all your annotations!!!!!!!!!!!

<br><br><br><br>

---

<br><br><br><br>

# Explore REDDIT posts

In [19]:
examples = db.get_dataset('discourse-reddit-posts')

print(len(examples))

200


In [20]:
label_count_dict = defaultdict(int)
method_label_count_dict = defaultdict(lambda: defaultdict(int))
label_texts_dict = defaultdict(list)
for e in examples:
    for _label in e['accept']:
        label_count_dict[_label] += 1
        method_label_count_dict[e['meta']['Method']][_label] += 1
        label_texts_dict[_label].append(e['text'])
    if len(e['accept']) < 1:
        label_count_dict['NONE'] += 1
        label_texts_dict['NONE'].append(e['text'])

print('------------------------------------------------------')
print('total number of posts labeled')
print('------------------------------------------------------')
print()
for _label, _count in sorted(label_count_dict.items(), key=lambda x: x[1], reverse=True):
    print(_count, '\t', _label)

------------------------------------------------------
total number of posts labeled
------------------------------------------------------

80 	 SHARING PERSONAL EXPERIENCES
31 	 SEEKING INFORMATION
27 	 NONE
19 	 SHARING OPINIONS AND PREFERENCES
16 	 SHARING FUTURE PLANS
15 	 SHARING CAUSAL REASONING / HYPOTHESIZING
13 	 SEEKING EXPERIENCES
12 	 SEEKING EMOTIONAL SUPPORT
10 	 SHARING PERSONAL BACKGROUND
9 	 SHARING/DESCRIBING ADDITIONAL RESEARCH
6 	 SEEKING ADVICE
4 	 SEEKING NORMALITY
3 	 SHARING INFORMATION
2 	 META DISCUSSION
2 	 SHARING SECONDHAND EXPERIENCES
1 	 SHARING NORMALITY


In [21]:
for _method, _label_count_dict in method_label_count_dict.items():
    print('--------------------------------')
    print(_method)
    print('--------------------------------')
    for _label, _count in sorted(_label_count_dict.items(), key=lambda x: x[1], reverse=True):
        print(_count, '\t', _label)
    print()

--------------------------------
pill
--------------------------------
28 	 SHARING PERSONAL EXPERIENCES
12 	 SEEKING INFORMATION
5 	 SHARING CAUSAL REASONING / HYPOTHESIZING
5 	 SHARING OPINIONS AND PREFERENCES
4 	 SHARING PERSONAL BACKGROUND
4 	 SHARING FUTURE PLANS
3 	 SEEKING EMOTIONAL SUPPORT
2 	 SEEKING EXPERIENCES
2 	 SEEKING ADVICE
1 	 META DISCUSSION
1 	 SHARING/DESCRIBING ADDITIONAL RESEARCH
1 	 SEEKING NORMALITY
1 	 SHARING NORMALITY
1 	 SHARING INFORMATION

--------------------------------
iud
--------------------------------
26 	 SHARING PERSONAL EXPERIENCES
9 	 SEEKING EXPERIENCES
9 	 SHARING FUTURE PLANS
8 	 SEEKING INFORMATION
7 	 SHARING OPINIONS AND PREFERENCES
6 	 SHARING/DESCRIBING ADDITIONAL RESEARCH
6 	 SHARING CAUSAL REASONING / HYPOTHESIZING
5 	 SHARING PERSONAL BACKGROUND
4 	 SEEKING EMOTIONAL SUPPORT
2 	 SHARING INFORMATION
2 	 SEEKING ADVICE
1 	 SHARING SECONDHAND EXPERIENCES
1 	 META DISCUSSION
1 	 SEEKING NORMALITY

--------------------------------
implant


In [22]:
label_percent_dict = {_label: _count/float(len(examples)) for _label, _count in label_count_dict.items()}

print('------------------------------')
print('percent of posts with label')
print('------------------------------')
print()
for _label, _percent in sorted(label_percent_dict.items(), key=lambda x: x[1], reverse=True):
    print(str(round(_percent*100, 1)) + '%', '\t', label_count_dict[_label], '\t', _label)

------------------------------
percent of posts with label
------------------------------

40.0% 	 80 	 SHARING PERSONAL EXPERIENCES
15.5% 	 31 	 SEEKING INFORMATION
13.5% 	 27 	 NONE
9.5% 	 19 	 SHARING OPINIONS AND PREFERENCES
8.0% 	 16 	 SHARING FUTURE PLANS
7.5% 	 15 	 SHARING CAUSAL REASONING / HYPOTHESIZING
6.5% 	 13 	 SEEKING EXPERIENCES
6.0% 	 12 	 SEEKING EMOTIONAL SUPPORT
5.0% 	 10 	 SHARING PERSONAL BACKGROUND
4.5% 	 9 	 SHARING/DESCRIBING ADDITIONAL RESEARCH
3.0% 	 6 	 SEEKING ADVICE
2.0% 	 4 	 SEEKING NORMALITY
1.5% 	 3 	 SHARING INFORMATION
1.0% 	 2 	 META DISCUSSION
1.0% 	 2 	 SHARING SECONDHAND EXPERIENCES
0.5% 	 1 	 SHARING NORMALITY


In [23]:
for _label, _texts in label_texts_dict.items():
    if _label == 'SHARING CAUSAL REASONING / HYPOTHESIZING':
        print('------------------------------------------')
        print(_label)
        print('------------------------------------------')
        print()
        for e in _texts:
            print(' '.join(e.split()))

------------------------------------------
SHARING CAUSAL REASONING / HYPOTHESIZING
------------------------------------------

It lasted for a few weeks but I figured it was because of the change .
But I know it will probably take up to 6 months for my body to re-adjust.
I thought I would be fine since I was on the depo so long beforehand, and didn't even realize this could possibly be a symptom of the implant.
This lasted for MONTHS, so I read somewhere online that vitamin e and zinc help with this, and it did stop the bleeding for a couple weeks, but I just started spotting again today.
I'm now doing a 5 month Accutane course as suggested by my dermatologist because I'm still breaking out consistently with large cysts.
I am taking the pill continuously to not have a period but this spotting is pretty much a period.
Paraguard prolonged period.
I'm thinking about counting it as a missed pill and taking another one just to cover all my bases.
I know this is normal if you’re off of the 

<br><br>

# Explore REDDIT comments

In [24]:
examples = db.get_dataset('discourse-reddit-comments')

print(len(examples))

200


In [25]:
label_count_dict = defaultdict(int)
method_label_count_dict = defaultdict(lambda: defaultdict(int))
label_texts_dict = defaultdict(list)
for e in examples:
    for _label in e['accept']:
        label_count_dict[_label] += 1
        method_label_count_dict[e['meta']['Method']][_label] += 1
        label_texts_dict[_label].append(e['text'])
    if len(e['accept']) < 1:
        label_count_dict['NONE'] += 1
        label_texts_dict['NONE'].append(e['text'])

print('------------------------------------------------------')
print('total number of posts labeled')
print('------------------------------------------------------')
print()
for _label, _count in sorted(label_count_dict.items(), key=lambda x: x[1], reverse=True):
    print(_count, '\t', _label)

------------------------------------------------------
total number of posts labeled
------------------------------------------------------

58 	 SHARING PERSONAL EXPERIENCES
47 	 SHARING INFORMATION
32 	 NONE
24 	 SHARING ADVICE
16 	 SHARING OPINIONS AND PREFERENCES
7 	 SHARING/DESCRIBING ADDITIONAL RESEARCH
7 	 SHARING EMOTIONAL SUPPORT
6 	 SHARING FUTURE PLANS
6 	 SHARING NORMALITY
6 	 SHARING SECONDHAND EXPERIENCES
5 	 SEEKING EXPERIENCES
5 	 SHARING CAUSAL REASONING / HYPOTHESIZING
4 	 META DISCUSSION
3 	 SHARING PERSONAL BACKGROUND
2 	 SEEKING EMOTIONAL SUPPORT
1 	 SEEKING INFORMATION


In [26]:
for _method, _label_count_dict in method_label_count_dict.items():
    print('--------------------------------')
    print(_method)
    print('--------------------------------')
    for _label, _count in sorted(_label_count_dict.items(), key=lambda x: x[1], reverse=True):
        print(_count, '\t', _label)
    print()

--------------------------------
iud
--------------------------------
18 	 SHARING PERSONAL EXPERIENCES
13 	 SHARING INFORMATION
9 	 SHARING OPINIONS AND PREFERENCES
6 	 SHARING ADVICE
3 	 SEEKING EXPERIENCES
3 	 SHARING FUTURE PLANS
3 	 SHARING NORMALITY
3 	 SHARING EMOTIONAL SUPPORT
2 	 SHARING/DESCRIBING ADDITIONAL RESEARCH
2 	 META DISCUSSION
2 	 SHARING PERSONAL BACKGROUND
1 	 SHARING SECONDHAND EXPERIENCES

--------------------------------
implant
--------------------------------
22 	 SHARING PERSONAL EXPERIENCES
16 	 SHARING INFORMATION
5 	 SHARING ADVICE
4 	 SHARING/DESCRIBING ADDITIONAL RESEARCH
4 	 SHARING SECONDHAND EXPERIENCES
3 	 SHARING OPINIONS AND PREFERENCES
2 	 META DISCUSSION
2 	 SHARING EMOTIONAL SUPPORT
2 	 SEEKING EMOTIONAL SUPPORT
2 	 SHARING CAUSAL REASONING / HYPOTHESIZING
1 	 SHARING FUTURE PLANS
1 	 SEEKING INFORMATION
1 	 SHARING NORMALITY
1 	 SEEKING EXPERIENCES
1 	 SHARING PERSONAL BACKGROUND

--------------------------------
pill
-------------------------

In [27]:
label_percent_dict = {_label: _count/float(len(examples)) for _label, _count in label_count_dict.items()}

print('------------------------------')
print('percent of posts with label')
print('------------------------------')
print()
for _label, _percent in sorted(label_percent_dict.items(), key=lambda x: x[1], reverse=True):
    print(str(round(_percent*100, 1)) + '%', '\t', label_count_dict[_label], '\t', _label)

------------------------------
percent of posts with label
------------------------------

29.0% 	 58 	 SHARING PERSONAL EXPERIENCES
23.5% 	 47 	 SHARING INFORMATION
16.0% 	 32 	 NONE
12.0% 	 24 	 SHARING ADVICE
8.0% 	 16 	 SHARING OPINIONS AND PREFERENCES
3.5% 	 7 	 SHARING/DESCRIBING ADDITIONAL RESEARCH
3.5% 	 7 	 SHARING EMOTIONAL SUPPORT
3.0% 	 6 	 SHARING FUTURE PLANS
3.0% 	 6 	 SHARING NORMALITY
3.0% 	 6 	 SHARING SECONDHAND EXPERIENCES
2.5% 	 5 	 SEEKING EXPERIENCES
2.5% 	 5 	 SHARING CAUSAL REASONING / HYPOTHESIZING
2.0% 	 4 	 META DISCUSSION
1.5% 	 3 	 SHARING PERSONAL BACKGROUND
1.0% 	 2 	 SEEKING EMOTIONAL SUPPORT
0.5% 	 1 	 SEEKING INFORMATION


In [28]:
for _label, _texts in label_texts_dict.items():
    if _label == 'SHARING/DESCRIBING ADDITIONAL RESEARCH':
        print('------------------------------------------')
        print(_label)
        print('------------------------------------------')
        print()
        for e in _texts:
            print(' '.join(e.split()))

------------------------------------------
SHARING/DESCRIBING ADDITIONAL RESEARCH
------------------------------------------

My doctor hasn't been concerned by it.
Reading/ watching experience stories has become my nightly routine lol
I'm definitely going to ask my doctor about the implant though!
I have had a blood clot already and was told not to personally.
My gyn knows about my history and didnt advise against it.
it even says on the nexplanon site that irregularbleeding is the most common side effect
Highly recommend the book Period Repair Manual for tips!!


<br><br>

# Explore TWITTER posts

In [29]:
examples = db.get_dataset('discourse-twitter-posts')

print(len(examples))

200


In [30]:
label_count_dict = defaultdict(int)
method_label_count_dict = defaultdict(lambda: defaultdict(int))
label_texts_dict = defaultdict(list)
for e in examples:
    for _label in e['accept']:
        label_count_dict[_label] += 1
        method_label_count_dict[e['meta']['Method']][_label] += 1
        label_texts_dict[_label].append(e['text'])
    if len(e['accept']) < 1:
        label_count_dict['NONE'] += 1
        label_texts_dict['NONE'].append(e['text'])

print('------------------------------------------------------')
print('total number of posts labeled')
print('------------------------------------------------------')
print()
for _label, _count in sorted(label_count_dict.items(), key=lambda x: x[1], reverse=True):
    print(_count, '\t', _label)

------------------------------------------------------
total number of posts labeled
------------------------------------------------------

56 	 SHARING/DESCRIBING ADDITIONAL RESEARCH
46 	 META DISCUSSION
34 	 SHARING INFORMATION
32 	 NONE
31 	 SHARING PERSONAL EXPERIENCES
14 	 SHARING OPINIONS AND PREFERENCES
13 	 SEEKING INFORMATION
12 	 SHARING SECONDHAND EXPERIENCES
8 	 SHARING FUTURE PLANS
8 	 SHARING CAUSAL REASONING / HYPOTHESIZING
6 	 SEEKING EMOTIONAL SUPPORT
4 	 SEEKING EXPERIENCES
3 	 SHARING ADVICE
2 	 SHARING PERSONAL BACKGROUND
1 	 SEEKING ADVICE


In [31]:
for _method, _label_count_dict in method_label_count_dict.items():
    print('--------------------------------')
    print(_method)
    print('--------------------------------')
    for _label, _count in sorted(_label_count_dict.items(), key=lambda x: x[1], reverse=True):
        print(_count, '\t', _label)
    print()

--------------------------------
iud
--------------------------------
18 	 META DISCUSSION
12 	 SHARING/DESCRIBING ADDITIONAL RESEARCH
8 	 SHARING PERSONAL EXPERIENCES
7 	 SHARING INFORMATION
6 	 SEEKING INFORMATION
5 	 SHARING OPINIONS AND PREFERENCES
5 	 SHARING SECONDHAND EXPERIENCES
4 	 SEEKING EMOTIONAL SUPPORT
3 	 SHARING FUTURE PLANS
2 	 SHARING PERSONAL BACKGROUND
2 	 SHARING CAUSAL REASONING / HYPOTHESIZING
2 	 SEEKING EXPERIENCES
1 	 SEEKING ADVICE

--------------------------------
pill
--------------------------------
25 	 SHARING/DESCRIBING ADDITIONAL RESEARCH
21 	 META DISCUSSION
17 	 SHARING INFORMATION
4 	 SHARING PERSONAL EXPERIENCES
3 	 SEEKING INFORMATION
1 	 SHARING SECONDHAND EXPERIENCES
1 	 SHARING CAUSAL REASONING / HYPOTHESIZING
1 	 SHARING ADVICE

--------------------------------
implant
--------------------------------
19 	 SHARING PERSONAL EXPERIENCES
19 	 SHARING/DESCRIBING ADDITIONAL RESEARCH
10 	 SHARING INFORMATION
9 	 SHARING OPINIONS AND PREFERENCES
7 	 

In [32]:
label_percent_dict = {_label: _count/float(len(examples)) for _label, _count in label_count_dict.items()}

print('------------------------------')
print('percent of posts with label')
print('------------------------------')
print()
for _label, _percent in sorted(label_percent_dict.items(), key=lambda x: x[1], reverse=True):
    print(str(round(_percent*100, 1)) + '%', '\t', label_count_dict[_label], '\t', _label)

------------------------------
percent of posts with label
------------------------------

28.0% 	 56 	 SHARING/DESCRIBING ADDITIONAL RESEARCH
23.0% 	 46 	 META DISCUSSION
17.0% 	 34 	 SHARING INFORMATION
16.0% 	 32 	 NONE
15.5% 	 31 	 SHARING PERSONAL EXPERIENCES
7.0% 	 14 	 SHARING OPINIONS AND PREFERENCES
6.5% 	 13 	 SEEKING INFORMATION
6.0% 	 12 	 SHARING SECONDHAND EXPERIENCES
4.0% 	 8 	 SHARING FUTURE PLANS
4.0% 	 8 	 SHARING CAUSAL REASONING / HYPOTHESIZING
3.0% 	 6 	 SEEKING EMOTIONAL SUPPORT
2.0% 	 4 	 SEEKING EXPERIENCES
1.5% 	 3 	 SHARING ADVICE
1.0% 	 2 	 SHARING PERSONAL BACKGROUND
0.5% 	 1 	 SEEKING ADVICE


In [33]:
for _label, _texts in label_texts_dict.items():
    if _label == 'SHARING INFORMATION':
        print('------------------------------------------')
        print(_label)
        print('------------------------------------------')
        print()
        for e in _texts:
            print(' '.join(e.split()))

------------------------------------------
SHARING INFORMATION
------------------------------------------

(1960) First contraceptive pill made available for women, who can now make their https://t.co/xdjj2owDmY https://t.co/LfSuOtTsmB
It cause a lot of hormonal imbalances.
http://t.co/fWeL2X2M IUD Beats Pill at Preventing Pregnancy - WebMD: http://t.co/vpcDmI3X IUD Beats… http://t.co/UvM4gSDM
How did she get pregnant &amp; she's on the IUD?
RT New Birth Control Pill Beyaz Includes Folic Acid, Columnist Writes http://bit.ly/bFCMNr #contraception #prochoice
A male version of the #IUD may finally be on the way!
Unlike some other methods, the contraceptive implant is not affected by common antibiotics, diarrhoea or vomiting.
Future reproductive lifespan may be lessened in oral contraceptive users: Lower measures of ovarian reserve: http://t.co/QD0Yj8msqB
Long-Term Data on Complications Adds to Criticism of Contraceptive Implant Thousands of women who claim they were … http://t.co/wKFFaIn6

<br><br>

# Explore Twitter REPLIES

In [34]:
examples = db.get_dataset('discourse-twitter-replies')

print(len(examples))

200


In [35]:
label_count_dict = defaultdict(int)
method_label_count_dict = defaultdict(lambda: defaultdict(int))
label_texts_dict = defaultdict(list)
for e in examples:
    for _label in e['accept']:
        label_count_dict[_label] += 1
        method_label_count_dict[e['meta']['Method']][_label] += 1
        label_texts_dict[_label].append(e['text'])
    if len(e['accept']) < 1:
        label_count_dict['NONE'] += 1
        label_texts_dict['NONE'].append(e['text'])

print('------------------------------------------------------')
print('total number of posts labeled')
print('------------------------------------------------------')
print()
for _label, _count in sorted(label_count_dict.items(), key=lambda x: x[1], reverse=True):
    print(_count, '\t', _label)

------------------------------------------------------
total number of posts labeled
------------------------------------------------------

53 	 SHARING PERSONAL EXPERIENCES
43 	 META DISCUSSION
38 	 NONE
33 	 SHARING INFORMATION
19 	 SHARING OPINIONS AND PREFERENCES
9 	 SHARING CAUSAL REASONING / HYPOTHESIZING
8 	 SHARING/DESCRIBING ADDITIONAL RESEARCH
6 	 SHARING PERSONAL BACKGROUND
6 	 SHARING SECONDHAND EXPERIENCES
5 	 SHARING FUTURE PLANS
4 	 SHARING ADVICE
3 	 SEEKING EXPERIENCES
2 	 SEEKING INFORMATION
1 	 SEEKING EMOTIONAL SUPPORT


In [36]:
for _method, _label_count_dict in method_label_count_dict.items():
    print('--------------------------------')
    print(_method)
    print('--------------------------------')
    for _label, _count in sorted(_label_count_dict.items(), key=lambda x: x[1], reverse=True):
        print(_count, '\t', _label)
    print()

--------------------------------
implant
--------------------------------
29 	 SHARING PERSONAL EXPERIENCES
11 	 SHARING INFORMATION
8 	 SHARING OPINIONS AND PREFERENCES
5 	 META DISCUSSION
4 	 SHARING ADVICE
3 	 SHARING PERSONAL BACKGROUND
3 	 SHARING CAUSAL REASONING / HYPOTHESIZING
3 	 SHARING FUTURE PLANS
2 	 SHARING/DESCRIBING ADDITIONAL RESEARCH
2 	 SHARING SECONDHAND EXPERIENCES
2 	 SEEKING INFORMATION

--------------------------------
pill
--------------------------------
28 	 META DISCUSSION
12 	 SHARING PERSONAL EXPERIENCES
9 	 SHARING INFORMATION
4 	 SHARING/DESCRIBING ADDITIONAL RESEARCH
2 	 SHARING OPINIONS AND PREFERENCES
2 	 SHARING CAUSAL REASONING / HYPOTHESIZING
1 	 SHARING PERSONAL BACKGROUND
1 	 SEEKING EXPERIENCES

--------------------------------
iud
--------------------------------
13 	 SHARING INFORMATION
12 	 SHARING PERSONAL EXPERIENCES
10 	 META DISCUSSION
9 	 SHARING OPINIONS AND PREFERENCES
4 	 SHARING CAUSAL REASONING / HYPOTHESIZING
4 	 SHARING SECONDHAND

In [37]:
label_percent_dict = {_label: _count/float(len(examples)) for _label, _count in label_count_dict.items()}

print('------------------------------')
print('percent of posts with label')
print('------------------------------')
print()
for _label, _percent in sorted(label_percent_dict.items(), key=lambda x: x[1], reverse=True):
    print(str(round(_percent*100, 1)) + '%', '\t', label_count_dict[_label], '\t', _label)

------------------------------
percent of posts with label
------------------------------

26.5% 	 53 	 SHARING PERSONAL EXPERIENCES
21.5% 	 43 	 META DISCUSSION
19.0% 	 38 	 NONE
16.5% 	 33 	 SHARING INFORMATION
9.5% 	 19 	 SHARING OPINIONS AND PREFERENCES
4.5% 	 9 	 SHARING CAUSAL REASONING / HYPOTHESIZING
4.0% 	 8 	 SHARING/DESCRIBING ADDITIONAL RESEARCH
3.0% 	 6 	 SHARING PERSONAL BACKGROUND
3.0% 	 6 	 SHARING SECONDHAND EXPERIENCES
2.5% 	 5 	 SHARING FUTURE PLANS
2.0% 	 4 	 SHARING ADVICE
1.5% 	 3 	 SEEKING EXPERIENCES
1.0% 	 2 	 SEEKING INFORMATION
0.5% 	 1 	 SEEKING EMOTIONAL SUPPORT


In [38]:
for _label, _texts in label_texts_dict.items():
    if _label == 'SHARING OPINIONS AND PREFERENCES':
        print('------------------------------------------')
        print(_label)
        print('------------------------------------------')
        print()
        for e in _texts:
            print(' '.join(e.split()))

------------------------------------------
SHARING OPINIONS AND PREFERENCES
------------------------------------------

oh helll nahhhhhh I had one with hormones and that shit was horrible.
im just so reluctant to start meds bc it took so many attempts to get to a contraceptive pill that didnt fuck
oh bitch fuck nexplanon.
Miruiana sounds too close to the IUD I have
on my third mirena iud and can't say enough good things about it although I know my experiences aren't universal, but I haven't had my period in over ten years and I'm so grateful
I have the IUD now I’m want something different
Now I’m glad I didn’t.
yea girl nexplanon for life
No babies=happy me
I hated the nexplanon!
tbh I got nexplanon when I was 26 so I could just avoid that shit altogether
i dont think im about that iud life man
Best thing i have ever done.
Birth control pill ftw
I have the paragard because it's hormonal free... girl these hormonal birth controls and I don't mix ...
I’m debating on what to do I want to

<br><br><br><br>

# Explore WebMD reviews

In [39]:
examples = db.get_dataset('discourse-webmd-reviews')

print(len(examples))

200


In [40]:
label_count_dict = defaultdict(int)
method_label_count_dict = defaultdict(lambda: defaultdict(int))
label_texts_dict = defaultdict(list)
for e in examples:
    for _label in e['accept']:
        label_count_dict[_label] += 1
        method_label_count_dict[e['meta']['Method']][_label] += 1
        label_texts_dict[_label].append(e['text'])
    if len(e['accept']) < 1:
        label_count_dict['NONE'] += 1
        label_texts_dict['NONE'].append(e['text'])

print('------------------------------------------------------')
print('total number of posts labeled')
print('------------------------------------------------------')
print()
for _label, _count in sorted(label_count_dict.items(), key=lambda x: x[1], reverse=True):
    print(_count, '\t', _label)

------------------------------------------------------
total number of posts labeled
------------------------------------------------------

143 	 SHARING PERSONAL EXPERIENCES
49 	 SHARING OPINIONS AND PREFERENCES
14 	 SHARING PERSONAL BACKGROUND
10 	 SHARING/DESCRIBING ADDITIONAL RESEARCH
10 	 SHARING FUTURE PLANS
10 	 SHARING CAUSAL REASONING / HYPOTHESIZING
9 	 NONE
6 	 SHARING INFORMATION
4 	 META DISCUSSION
3 	 SHARING ADVICE
2 	 SEEKING EXPERIENCES
1 	 SEEKING EMOTIONAL SUPPORT
1 	 SHARING SECONDHAND EXPERIENCES
1 	 SEEKING NORMALITY
1 	 SHARING NORMALITY


In [41]:
for _method, _label_count_dict in method_label_count_dict.items():
    print('--------------------------------')
    print(_method)
    print('--------------------------------')
    for _label, _count in sorted(_label_count_dict.items(), key=lambda x: x[1], reverse=True):
        print(_count, '\t', _label)
    print()

--------------------------------
pill
--------------------------------
40 	 SHARING PERSONAL EXPERIENCES
12 	 SHARING OPINIONS AND PREFERENCES
7 	 SHARING CAUSAL REASONING / HYPOTHESIZING
6 	 SHARING FUTURE PLANS
4 	 SHARING PERSONAL BACKGROUND
3 	 SHARING/DESCRIBING ADDITIONAL RESEARCH
2 	 SHARING INFORMATION
2 	 SEEKING EXPERIENCES
1 	 META DISCUSSION
1 	 SHARING ADVICE

--------------------------------
iud
--------------------------------
49 	 SHARING PERSONAL EXPERIENCES
13 	 SHARING OPINIONS AND PREFERENCES
7 	 SHARING/DESCRIBING ADDITIONAL RESEARCH
6 	 SHARING PERSONAL BACKGROUND
3 	 SHARING CAUSAL REASONING / HYPOTHESIZING
1 	 SHARING INFORMATION
1 	 SEEKING EMOTIONAL SUPPORT
1 	 SEEKING NORMALITY
1 	 META DISCUSSION

--------------------------------
implant
--------------------------------
54 	 SHARING PERSONAL EXPERIENCES
24 	 SHARING OPINIONS AND PREFERENCES
4 	 SHARING FUTURE PLANS
4 	 SHARING PERSONAL BACKGROUND
3 	 SHARING INFORMATION
2 	 META DISCUSSION
2 	 SHARING ADVICE

In [42]:
label_percent_dict = {_label: _count/float(len(examples)) for _label, _count in label_count_dict.items()}

print('------------------------------')
print('percent of posts with label')
print('------------------------------')
print()
for _label, _percent in sorted(label_percent_dict.items(), key=lambda x: x[1], reverse=True):
    print(str(round(_percent*100, 1)) + '%', '\t', label_count_dict[_label], '\t', _label)

------------------------------
percent of posts with label
------------------------------

71.5% 	 143 	 SHARING PERSONAL EXPERIENCES
24.5% 	 49 	 SHARING OPINIONS AND PREFERENCES
7.0% 	 14 	 SHARING PERSONAL BACKGROUND
5.0% 	 10 	 SHARING/DESCRIBING ADDITIONAL RESEARCH
5.0% 	 10 	 SHARING FUTURE PLANS
5.0% 	 10 	 SHARING CAUSAL REASONING / HYPOTHESIZING
4.5% 	 9 	 NONE
3.0% 	 6 	 SHARING INFORMATION
2.0% 	 4 	 META DISCUSSION
1.5% 	 3 	 SHARING ADVICE
1.0% 	 2 	 SEEKING EXPERIENCES
0.5% 	 1 	 SEEKING EMOTIONAL SUPPORT
0.5% 	 1 	 SHARING SECONDHAND EXPERIENCES
0.5% 	 1 	 SEEKING NORMALITY
0.5% 	 1 	 SHARING NORMALITY


In [43]:
for _label, _texts in label_texts_dict.items():
    if _label == 'SHARING/DESCRIBING ADDITIONAL RESEARCH':
        print('------------------------------------------')
        print(_label)
        print('------------------------------------------')
        print()
        for e in _texts:
            print(' '.join(e.split()))

------------------------------------------
SHARING/DESCRIBING ADDITIONAL RESEARCH
------------------------------------------

So if anyone is wondering, I have read all of these comments and they all apply.bloating, nausea, weight/appetite gain (it's been 12 days) extreme rage, breat tenderness and lumps.
So i went back again, gave me different meds, which i couldnt take upset stomach along with my regular symptoms then i kept having this pain, and hardness in my lower left abdomen, painful sex, doctor said my cervix and uterus were swollen, cramping, just horrible pain and very annoying.
I did not research until after placement...
While doing my research I read quite a few reviews about how much it hurt.
Before getting this form of birth control, I read tons of reviews.
But that was all explained to me before I chose to get it.
After a couple of excruciating hours trying to decide if it was skylas fault it hurt, I decided to go to the ER.
I started taking this pill as "suppressive" th

<br><br><br><br>

---

<br><br><br><br>

# Backup labeling into a CSV

In [44]:
reddit_post_examples = db.get_dataset('discourse-reddit-posts')
reddit_comment_examples = db.get_dataset('discourse-reddit-comments')
twitter_post_examples = db.get_dataset('discourse-twitter-posts')
twitter_replies_examples = db.get_dataset('discourse-twitter-replies')
webmd_reviews_examples = db.get_dataset('discourse-webmd-reviews')

In [45]:
len(reddit_post_examples), len(reddit_comment_examples), len(twitter_post_examples), len(twitter_replies_examples), len(webmd_reviews_examples)

(200, 200, 200, 200, 200)

In [46]:
label_dicts = []
for e in reddit_post_examples + reddit_comment_examples + twitter_post_examples + twitter_replies_examples + webmd_reviews_examples:
    for _label in e['accept']:
        label_dicts.append({'Source': e['meta']['Source'],
                            'ID': e['meta']['ID'],
                            'Label': _label,
                            'Text': e['text']})
    if len(e['accept']) == 0:
        label_dicts.append({'Source': e['meta']['Source'],
                            'ID': e['meta']['ID'],
                            'Label': 'NONE',
                            'Text': e['text']})
label_df = pd.DataFrame(label_dicts)

In [47]:
len(label_df)

1243

In [48]:
label_df['Label'].value_counts()

SHARING PERSONAL EXPERIENCES                365
NONE                                        138
SHARING INFORMATION                         123
SHARING OPINIONS AND PREFERENCES            117
META DISCUSSION                              99
SHARING/DESCRIBING ADDITIONAL RESEARCH       90
SHARING CAUSAL REASONING / HYPOTHESIZING     47
SEEKING INFORMATION                          47
SHARING FUTURE PLANS                         45
SHARING PERSONAL BACKGROUND                  35
SHARING ADVICE                               34
SHARING SECONDHAND EXPERIENCES               27
SEEKING EXPERIENCES                          27
SEEKING EMOTIONAL SUPPORT                    22
SHARING NORMALITY                             8
SEEKING ADVICE                                7
SHARING EMOTIONAL SUPPORT                     7
SEEKING NORMALITY                             5
Name: Label, dtype: int64

In [49]:
label_df['Source'].value_counts()

twitter-posts      270
webmd-reviews      264
reddit-posts       250
twitter-replies    230
reddit-comments    229
Name: Source, dtype: int64

In [50]:
label_df.sample(3)

Unnamed: 0,Source,ID,Label,Text
106,reddit-posts,f8a6jp,SHARING FUTURE PLANS,I am getting a copper IUD soon.
516,twitter-posts,230487448063447040,META DISCUSSION,Now I got a baby from a drunk dyslexic”
671,twitter-posts,556540323636056060,SHARING/DESCRIBING ADDITIONAL RESEARCH,6 Things Everywoman Should Know Before Using O...


In [51]:
for i, r in label_df[label_df['Label'] == 'NONE'].sample(10).iterrows():
    print(' '.join(r['Text'].split()))

I understand that
Bitch i got dick wanna fuck?
Vinney said a IUD is a wishbone 😭😭 and these the type niggas BPD hire 🥴
Oh you learn something new everyday on this app 😭😭😭😭
I guess I’ll have to bear my stupid periods.
Crying and hives are included.
so I know it’s not him).. and crying about it.
I’m so happy to my bih Mother Nature slide down on me this morning.
No I totally understand.
and she just blew me off im starting to think maybe its from the 5g emf exposure I have all the symptoms


In [52]:
label_df.to_csv('/Users/maria/Documents/data/birth-control/labeling/label-discourse/labeled_by_maria.all.csv')

<br><br><br><br>

---

<br><br><br><br>

# Try training a simple model

In [53]:
data_directory_path   = '/Users/maria/Documents/data/birth-control'
test_df = pd.read_csv(data_directory_path + '/labeling/label-discourse/sampled-sentences.test.csv')
len(test_df)

11993

In [54]:
test_df.sample(3)

Unnamed: 0.1,Unnamed: 0,text,meta
8747,8747,This is the best descriptor for a dear little ...,"{'ID': 1211812402435624962, 'Source': 'twitter..."
778,778,Bcp and nexplanon?,"{'ID': '3lzpfj', 'Source': 'reddit-posts', 'Me..."
7822,7822,"ahh, confused as to why a contraceptive implan...","{'ID': 269216691572051968, 'Source': 'twitter-..."


In [55]:
len(label_df.index)

1243

In [56]:
label_df.sample(3)

Unnamed: 0,Source,ID,Label,Text
126,reddit-posts,y9a6w,SHARING CAUSAL REASONING / HYPOTHESIZING,I know the hormones in Mirerna are locally del...
715,twitter-posts,1156128897458737200,SEEKING INFORMATION,do u ever lose the weight u gained cause of th...
317,reddit-comments,ch2ap8s,SHARING PERSONAL BACKGROUND,"I really, really want them to do that for me i..."


In [57]:
from sklearn.metrics import accuracy_score, precision_recall_fscore_support, classification_report
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

In [58]:
def binarize_label(label, target_label):
    if label == target_label:
        return 1
    return 0

In [59]:
for _target_label in label_df['Label'].unique():

    _binarized_df = label_df.copy()
    _binarized_df['Label'] = label_df['Label'].apply(lambda x: binarize_label(x, _target_label))
    _positive_ids = _binarized_df[_binarized_df['Label'] == 1]['ID'].tolist()
    _binarized_df = _binarized_df[~((_binarized_df['ID'].isin(_positive_ids)) & (_binarized_df['Label'] == 0))]

    _binarized_df = _binarized_df.groupby('Label').sample(n=len(_binarized_df[_binarized_df['Label'] == 1]), random_state=1)

    if len(_binarized_df.index) > 50:

        _train_df, _test_df = train_test_split(_binarized_df, test_size=0.33, random_state=42)

        _train_texts = _train_df['Text']
        _train_labels = _train_df['Label']
        _test_texts = _test_df['Text']
        _test_labels = _test_df['Label']

        _vectorizer = TfidfVectorizer()
        _X_train = _vectorizer.fit_transform(_train_texts)
        _X_test = _vectorizer.transform(_test_texts)

        _model = LogisticRegression(C=10).fit(_X_train, _train_labels)
        _predictions = _model.predict(_X_test)

        print(_target_label)
        print(classification_report(_test_labels, _predictions))

META DISCUSSION
              precision    recall  f1-score   support

           0       0.73      0.71      0.72        34
           1       0.70      0.72      0.71        32

    accuracy                           0.71        66
   macro avg       0.71      0.71      0.71        66
weighted avg       0.71      0.71      0.71        66

SHARING PERSONAL EXPERIENCES
              precision    recall  f1-score   support

           0       0.82      0.82      0.82       130
           1       0.79      0.79      0.79       111

    accuracy                           0.80       241
   macro avg       0.80      0.80      0.80       241
weighted avg       0.81      0.80      0.81       241

SEEKING EXPERIENCES
              precision    recall  f1-score   support

           0       0.86      0.55      0.67        11
           1       0.55      0.86      0.67         7

    accuracy                           0.67        18
   macro avg       0.70      0.70      0.67        18
weighted 

In [60]:
_binarized_df['Label'].value_counts()

0    7
1    7
Name: Label, dtype: int64

In [61]:
def process_string(text):
    text = text.lower()
    text = re.sub('[0-9]+', 'NUM', text)
    text = re.sub(r'[^\sA-Za-z0-9À-ÖØ-öø-ÿЀ-ӿ/]', ' \1 ', text)
    text = ' '.join(text.split())
    return text

In [62]:
t = process_string('Does this work? Hmmm,how about this???')

In [63]:
for _target_label in label_df['Label'].unique():

    _binarized_df = label_df.copy()
    _binarized_df['Label'] = label_df['Label'].apply(lambda x: binarize_label(x, _target_label))
    _positive_ids = _binarized_df[_binarized_df['Label'] == 1]['ID'].tolist()
    _binarized_df = _binarized_df[~((_binarized_df['ID'].isin(_positive_ids)) & (_binarized_df['Label'] == 0))]

    _binarized_df = _binarized_df.groupby('Label').sample(n=len(_binarized_df[_binarized_df['Label'] == 1]), random_state=1)

    if len(_binarized_df.index) > 50:

        _train_texts = _binarized_df['Text']
        _train_labels = _binarized_df['Label']

        _test_texts = test_df['text']

        _train_texts_processed = [process_string(t) for t in _train_texts]
        _test_texts_processed  = [process_string(t) for t in _test_texts]

        _vectorizer = TfidfVectorizer()
        _X_train = _vectorizer.fit_transform(_train_texts_processed)
        _X_test = _vectorizer.transform(_test_texts_processed)

        _model = LogisticRegression(C=10).fit(_X_train, _train_labels)
        _predictions = _model.predict(_X_test)

        print('---------------------------------')
        print(_target_label)
        print('---------------------------------')
        print()

        _positive_texts = [_text for _prediction, _text in zip(_predictions, _test_texts) if _prediction == 1]
        _negative_texts = [_text for _prediction, _text in zip(_predictions, _test_texts) if _prediction == 0]

        print('POSITIVE')
        for _text in random.sample(_positive_texts, 10):
            print(' '.join(_text.split()))
        
        print()

        print(classification_report(_test_labels, _predictions))


---------------------------------
META DISCUSSION
---------------------------------

POSITIVE
why are you tryna keep this baby at all?
I have an appointment to get an IUD early next week.
How the contraceptive pill changed Britain (BBC): Share With Friends: | | Health - Top Stories News, RSS Feeds... http://t.co/OgsJIjLB
Implanon all the way haha
Women reassured over safety of Essure birth control implant - http://t.co/xlDdvSGcmb
I started birth control pill yesterday hopefully that helps with migraines
There are other forms of birth control.
Constant mood swings, emotional states, crying at the tiniest things, and everything just seems to be too painful in my head.
No thanks, I have an IUD &amp; legal abortion.
Take one pill a day in order.

---------------------------------
SHARING PERSONAL EXPERIENCES
---------------------------------

POSITIVE
I cannot wait to have this gone.
they aren't heavier, the cramping isn't any different than without the IUD.
Now I'm bleeding off and on.
an