## The Top-K approach

In [1]:
import pandas as pd 
import numpy as np 
import matplotlib.pyplot as plt
from pathlib import Path
import joblib

import sys
sys.path.append("..")
import warnings
warnings.filterwarnings("ignore")

from ml_editor.data_processing import (
    format_raw_df, get_split_by_author,
    add_text_features_to_df,
    get_vectorized_series,
    get_feature_vector_and_label
)

from ml_editor.model_evaluation import get_top_k

data_path=Path("../data/writers.csv")
df=pd.read_csv(data_path)
df=format_raw_df(df.copy())

In [2]:
df=add_text_features_to_df(df.loc[df["is_question"]].copy())
train_df, test_df=get_split_by_author(df, test_size=0.2, random_state=42)

In [3]:
model_path=Path("../models/model_1.pkl")
clf=joblib.load(model_path)
vectorizer_path=Path("../models/vectorizer_1.pkl")
vectorizer=joblib.load(vectorizer_path)

In [4]:
train_df["vectors"]=get_vectorized_series(train_df["full_text"].copy(), vectorizer)
test_df["vectors"]=get_vectorized_series(test_df["full_text"].copy(), vectorizer)


features=[
    "action_verb_full",
    "question_mark_full",
    "text_len",
    "language_question"
]

X_train, y_train=get_feature_vector_and_label(train_df, features)
X_test, y_test=get_feature_vector_and_label(test_df, features)

In [5]:
test_analysis_df=test_df.copy()
y_predicted_proba=clf.predict_proba(X_test)
test_analysis_df["predicted_proba"]=y_predicted_proba[:, 1]
test_analysis_df["true_label"]=y_test

to_display=[
    "predicted_proba",
    "true_label",
    "Title",
    "body_text",
    "text_len",
    "action_verb_full",
    "question_mark_full",
    "language_question"
]

threshold=0.5

top_pos, top_neg, worst_pos, worst_neg, unsure = get_top_k(test_analysis_df, "predicted_proba", "true_label", k=2)
pd.options.display.max_colwidth = 500

In [6]:
top_pos[to_display]

Unnamed: 0_level_0,predicted_proba,true_label,Title,body_text,text_len,action_verb_full,question_mark_full,language_question
Id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
1614,0.74,True,How do evil protagonists win the reader over in dark fantasy stories?,"I thought about writing fantasy story from the perspective of the evil antagonist (think from Sauron's perspective). So my bad guy will be the protagonist and my good guys will be the antagonist. I want my main protagonist to be in general a bad guy. I don't like the idea of using the ""but he thinks he is a good guy"". I want my character to be petty, shallow, and selfish. \nI am interested in this in large part because it has not been done. \nSimilar to this would be a fallen angel story. Th...",797,True,True,False
37424,0.74,True,Is it important to describe every character of the storyline?,"Note: I am not sure of which site this question belongs to: Writing SE or Literature SE. If this question is unfit for this site, I will happily migrate it there.\nSo I am writing about the protagonist's school life adventures. Naturally there are lot of characters - classmates, school teachers, friends, etc. and by and large more than 50 characters has some or other appearance in the story. By appearance, I don't mean about their existence for one or two line. Each one has some reasonable a...",1586,True,True,False


In [7]:

top_neg[to_display]

Unnamed: 0_level_0,predicted_proba,true_label,Title,body_text,text_len,action_verb_full,question_mark_full,language_question
Id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
16095,0.09,False,Citation anniversary editions,"How do I do MLA citations for an anniversary edition, e.g. 30th anniversary Selfish Gene. Do I treat it as a normal edition or just ignore that it is a new edition?\n",195,False,True,False
35077,0.12,False,Do I violate copyright when I write about historical figures?,I want to write a book about historical figures. Do I violate copyright when I do so?\n,148,False,True,False


In [8]:
worst_pos[to_display]

Unnamed: 0_level_0,predicted_proba,true_label,Title,body_text,text_len,action_verb_full,question_mark_full,language_question
Id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
35367,0.07,True,How to relate stormy weather to sadness?,Is it possible that stormy weather can be related to sadness? The weather could be related to gloom?\n,142,True,True,False
17543,0.13,True,Using Pronoun 'It' repetitvely for emphasis?,"I'd like to know if using ""It"" repetitively (for emphasis) in this context is okay grammatically.\n\nTV has become the modern day baby sitter. It is raising our children. It is dictating the cultural narrative and shaping future society. It is raising the bored inattentive child. It is raising the consumer child. It is raising the aggressive child. It is raising the obese child. It is raising the misinformed and complacent child. It is raising the disenchanted child. And what’s more, it...",591,False,True,False


In [9]:
worst_neg[to_display]

Unnamed: 0_level_0,predicted_proba,true_label,Title,body_text,text_len,action_verb_full,question_mark_full,language_question
Id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
30420,0.75,False,Should I make my prologue chapter 1?,"My prologue is set 17 years before the main story arc. I am reflecting on the discussion here, which was asked by another SE contributor. I'm trying to decide what to do with my prologue. Building a website for my world with minor character sketches, short stories, mythologies, etc and additional supplemental is one possibility. It could go there. Or,\n\nI can delete it entirely, and put any necessary points into the rest of the book. \nI can leave it as the prologue, since that is my first ...",1658,True,True,False
30590,0.72,False,Usefulness of writing conferences and realistic expectations of obtaining an agent,"I completed my first fiction novel a short while ago (heist/romance). I've been trying to get a literary agent by viewing their websites and following their submission requirements. However, it hasn't been working. I was thinking of attending a few writing conferences where you can meet with literary agents/publishers and pitch your book, but they span a few days and are expensive, not to mention the cost of room and board. \nAre these conferences worth it in finding an agent/publisher? \nOr...",1135,True,True,False


In [10]:
unsure[to_display]

Unnamed: 0_level_0,predicted_proba,true_label,Title,body_text,text_len,action_verb_full,question_mark_full,language_question
Id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
6296,0.5,False,How do I start a business-presentation?,"How do you welcome a business audience to your presentation?\nI've thought of:\nHello, my name is [prename] [surname] and I am going to talk about [...]\n\nbut it sounds rather like a school presentation and not like a business presentation. Or isn't there a difference in the English language?\nDoes it make a difference if there are only men or if it is a mixed audience?\n(I don't have any experience with business presentations and English is a foreign language for me. If you have any more h...",736,False,True,False
20272,0.5,False,How do I structure an essay into a thesis statement and three points in three paragraphs? -this is not a school homework,"I'm writing an essay but I struggle with thesis statements and paragraphs. My teacher wants my thesis to be significant, specific, single, and supportable. I often know what I want to write, but struggle with writing about 3 points with 3 different paragraphs. I often see just one point broad point and try to make an essay based off of that. How do I fit one broad point into the structure of three specific points and an overall thesis?\nHere's an example of an essay in development that I am...",3407,True,True,False
