## More Broken Benchmarks

Let's have a look at the [Google Emotions Data](https://arxiv.org/pdf/2005.00547.pdf).

Let's have a look at this [other project](https://labelerrors.com).

Let's go! 

In [5]:
import pandas as pd 

from sklearn.pipeline import make_pipeline
from sklearn.linear_model import LogisticRegression
from sklearn.feature_extraction.text import CountVectorizer

df = pd.read_csv("data/goemotions_1.csv")

In [6]:
df.columns

Index(['text', 'id', 'author', 'subreddit', 'link_id', 'parent_id',
       'created_utc', 'rater_id', 'example_very_unclear', 'admiration',
       'amusement', 'anger', 'annoyance', 'approval', 'caring', 'confusion',
       'curiosity', 'desire', 'disappointment', 'disapproval', 'disgust',
       'embarrassment', 'excitement', 'fear', 'gratitude', 'grief', 'joy',
       'love', 'nervousness', 'optimism', 'pride', 'realization', 'relief',
       'remorse', 'sadness', 'surprise', 'neutral'],
      dtype='object')

In [35]:
pd.set_option('display.max_colwidth', None)
label_of_interest = 'excitement'

df[['text', label_of_interest]].loc[lambda d: d[label_of_interest] == 1].sample(2)

Unnamed: 0,text,excitement
26961,"I'm sure there are people who really enjoy being parents though. Those people just aren't us, and aren't most parents.",1
43032,"Yeah I know, it’s just wishful thinking but I would be ecstatic if it came out today",1


In [38]:
(df
 [['text', label_of_interest, 'rater_id', 'example_very_unclear']]
 .loc[lambda d: d[label_of_interest] == 0]
 .sample(2))

Unnamed: 0,text,excitement,rater_id,example_very_unclear
12421,Possibly. I just couldn’t go any further with it. Not after that.,0,30,False
16675,bruh literally same,0,2,False


In [9]:
X, y = df['text'], df[label_of_interest]

pipe = make_pipeline(
    CountVectorizer(), 
    LogisticRegression(class_weight='balanced', max_iter=1000)
)

In [10]:
%%time 

pipe.fit(X, y)

CPU times: user 21 s, sys: 4.39 s, total: 25.4 s
Wall time: 3.2 s


Pipeline(steps=[('countvectorizer', CountVectorizer()),
                ('logisticregression',
                 LogisticRegression(class_weight='balanced', max_iter=1000))])

## Trick 1: Model Uncertainty

In [11]:
pipe.predict_proba(X)

array([[0.81905448, 0.18094552],
       [0.87338383, 0.12661617],
       [0.9988756 , 0.0011244 ],
       ...,
       [0.95766529, 0.04233471],
       [0.89403903, 0.10596097],
       [0.97988889, 0.02011111]])

In [39]:
# make predictions 
probas = pipe.predict_proba(X)[:, 0] 

# use predictions in hindsight
(df
 .loc[(probas > 0.45) & (probas < 0.55)]
 [['text', label_of_interest, 'example_very_unclear', 'rater_id']]
 .head(7))

Unnamed: 0,text,excitement,example_very_unclear,rater_id
8,that's adorable asf,0,False,73
46,"If there’s a pattern, yes.",0,False,37
107,My fans on patreon will be rewarded soon,0,False,33
154,"Ones with close ties to SA, anyway. An escaped apostate won't exactly be itching to run home.",0,False,3
158,I really like this ring so I’m glad to hear that.,0,False,16
262,OMG THOSE TINY SHOES! *desire to boop snoot intensifies*,0,True,61
362,This. I relate to this. So much. Almost too much.,0,False,55


## Trick 2: Short on Confidence

Bonus: sort by shorting confidence.

In [14]:
df.loc[lambda d: d[label_of_interest] != pipe.predict(X)].shape

(5315, 37)

In [15]:
def correct_class_confidence(X, y, mod):
    """
    Gives the predicted confidence (or proba) associated
    with the correct label `y` from a given model.
    """
    probas = mod.predict_proba(X)
    values = []
    for i, proba in enumerate(probas):
        proba_dict = {mod.classes_[j]: v for j, v in enumerate(proba)}
        values.append(proba_dict[y[i]])
    return values

In [17]:
(df
 .assign(confidence=correct_class_confidence(X, y, pipe))
 .loc[lambda d: pipe.predict(d['text']) != d['excitement']]
 [['text', label_of_interest, 'confidence', 'example_very_unclear']]
 .sort_values("confidence")
 .loc[lambda d: d[label_of_interest] == 0]
 .head(20))

Unnamed: 0,text,excitement,confidence,example_very_unclear
5676,I am inexplicably excited by [NAME]. I get so excited by how he curls passes,0,0.000148,False
42757,Omg this is so amazing ! Keep up the awesome work and have a fantastic New Year !,0,0.000187,False
28707,Omg this is so amazing ! Keep up the awesome work and have a fantastic New Year !,0,0.000187,False
24756,Sounds like a fun game. Our home game around here is .05/.10. Its fun but not very exciting.,0,0.000262,False
44459,So no replays for arsenal penalty calls.. Cool cool cool cool cool cool cool cool,0,0.000595,False
69395,"Wow, your posting history is a real... interesting ride.",0,0.000719,False
20823,"Wow, your posting history is a real... interesting ride.",0,0.000719,False
2001,No different than people making a big deal about their team winning the super bowl. People find it interesting.,0,0.00074,False
30921,"Hey congrats!! That's amazing, you've done such amazing progress! Hope you have a great day :)",0,0.000813,False
39475,"I just read your list and now I can't wait, either!! Hurry up with the happy, relieved and peaceful onward and upward!! Congratulations😎",0,0.001129,False


This feels a bit awkward. Luckily, there are also folks who have labelled the first example correctly. But still ... 

In [40]:
df.loc[lambda d: d['id'] == 'eekgi19'][[label_of_interest, 'text', 'rater_id']]

Unnamed: 0,excitement,text,rater_id
5676,0,I am inexplicably excited by [NAME]. I get so excited by how he curls passes,37
52557,1,I am inexplicably excited by [NAME]. I get so excited by how he curls passes,41


## Trick 2.5: Effect of Model 

In [22]:
from sklearn.pipeline import make_union
from whatlies.language import BytePairLanguage

pipe_emb = make_pipeline(
    make_union(
        BytePairLanguage("en", vs=1_000), 
        BytePairLanguage("en", vs=100_000)
    ),
    LogisticRegression(class_weight='balanced', max_iter=1000)
)

In [23]:
%%time 
pipe_emb.fit(X.to_list(), y)

CPU times: user 49.3 s, sys: 2.05 s, total: 51.4 s
Wall time: 20.9 s


Pipeline(steps=[('featureunion',
                 FeatureUnion(transformer_list=[('bytepairlanguage-1',
                                                 BytePairLanguage(lang='en',
                                                                  vs=1000)),
                                                ('bytepairlanguage-2',
                                                 BytePairLanguage(lang='en',
                                                                  vs=100000))])),
                ('logisticregression',
                 LogisticRegression(class_weight='balanced', max_iter=1000))])

In [27]:
%%time 

(df
 .assign(confidence=correct_class_confidence(X.to_list(), y, pipe_emb))
 .loc[lambda d: pipe.predict(d['text'].to_list()) != d[label_of_interest]]
 [['text', label_of_interest, 'confidence']]
 .sort_values("confidence")
 .loc[lambda d: d[label_of_interest] == 0]
 .head(20))

CPU times: user 16.9 s, sys: 342 ms, total: 17.2 s
Wall time: 16.7 s


Unnamed: 0,text,excitement,confidence,id,rater_id
60381,WOW!!!,0,0.000207,eewtp9f,63
46854,Happy birthday!,0,0.000333,edtaxhn,15
66821,Happy birthday!,0,0.000333,edpzdjo,61
61637,Happy Birthday!,0,0.000333,eez8mcs,30
42584,Happy Birthday!,0,0.000333,edxnohm,79
35491,Happy one week anniversary,0,0.000458,efe7gef,60
44679,Happy Birthday!!!,0,0.000556,eed81k3,30
30302,Happy Birthday!!!,0,0.000556,eed81k3,73
52095,Enjoy the ride!,0,0.00097,eecwmbq,60
3545,Very interesting!!!,0,0.001007,edojuxv,66


From here, another interesting thing that you can do is you can pit models against eachother. When two models disagree there may also be an opporuntity for re-labelling.

## Trick 3: Cleanlab Noise Indices

In [25]:
from cleanlab.pruning import get_noise_indices

ordered_label_errors = get_noise_indices(
    s=y,
    psx=pipe.predict_proba(X),
    sorted_index_method='prob_given_label',
 )

In [26]:
df.iloc[ordered_label_errors][['text', label_of_interest]].head(20)

Unnamed: 0,text,excitement
5676,I am inexplicably excited by [NAME]. I get so excited by how he curls passes,0
28707,Omg this is so amazing ! Keep up the awesome work and have a fantastic New Year !,0
42757,Omg this is so amazing ! Keep up the awesome work and have a fantastic New Year !,0
24756,Sounds like a fun game. Our home game around here is .05/.10. Its fun but not very exciting.,0
44459,So no replays for arsenal penalty calls.. Cool cool cool cool cool cool cool cool,0
20823,"Wow, your posting history is a real... interesting ride.",0
69395,"Wow, your posting history is a real... interesting ride.",0
2001,No different than people making a big deal about their team winning the super bowl. People find it interesting.,0
30921,"Hey congrats!! That's amazing, you've done such amazing progress! Hope you have a great day :)",0
39475,"I just read your list and now I can't wait, either!! Hurry up with the happy, relieved and peaceful onward and upward!! Congratulations😎",0
