<a href="https://colab.research.google.com/github/sudhang/css-nlp/blob/master/Guessing_Game.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
# Make it pretty
from IPython.display import HTML, display

def set_css():
  display(HTML('''
  <style>
    pre {
        white-space: pre-wrap;
    }
  </style>
  '''))
get_ipython().events.register('pre_run_cell', set_css)

# PARAMS

In [2]:
GITREPOPATH = "https://raw.githubusercontent.com/sudhang/css-nlp/master"

# Imports

In [3]:
import pandas as pd
import nltk
from nltk.translate.bleu_score import sentence_bleu
from nltk.translate.bleu_score import SmoothingFunction
from nltk.translate.meteor_score import meteor_score
import ipywidgets as widgets
import random
from IPython.display import display
from IPython.display import display, clear_output
from nltk.tokenize import sent_tokenize
nltk.download('punkt')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


True

In [5]:
df_llama_nyt = pd.read_csv(f'{GITREPOPATH}/generated/llama2qlora_nyt.csv')
df_gptneo_nyt = pd.read_csv(f'{GITREPOPATH}/generated/gptneo_nyt.csv')
df_ngrams_nyt = pd.read_csv(f'{GITREPOPATH}/generated/ngram_nyt_6.csv')
df_falcon1_nyt = pd.read_csv(f'{GITREPOPATH}/generated/falconqlora_nyt_1.csv')
df_falcon2_nyt = pd.read_csv(f'{GITREPOPATH}/generated/falconqlora_nyt_2.csv')
df_falcon_nyt = pd.concat([df_falcon1_nyt, df_falcon2_nyt], axis=0, ignore_index=True)

## Combine Generated Data

In [8]:
# Add a 'Model' column to each DataFrame
df_ngrams_nyt['Model'] = 'n-grams'
df_llama_nyt['Model'] = 'LLaMa2'
df_gptneo_nyt['Model'] = 'GPT-NEO'
df_falcon_nyt['Model'] = "Falcon"

# ngrams has 20 articles.  we need only 10, because all the others do too
df_ngrams_nyt = df_ngrams_nyt.head(10)


# Concatenate the DataFrames
df_all = pd.concat([df_ngrams_nyt, df_llama_nyt, df_gptneo_nyt, df_falcon_nyt])

# Shuffle the DataFrame
df_all = df_all.sample(frac=1).reset_index(drop=True)

# Human Evaluation
To do a human evaluation, we have implemented a "Guessing Game": The evaluator is presented with both a real and a fake article.

In our models, we take the first two sentences from a real article, and then use it as seed texts (prompts) for the language model.  We present the real article and the article generated from this prompt side by side, and ask the user to choose which one is real and which is generated.  The user must make an active choice which is which, because the default value is "blank".

Another point to consider is that both the real and the generated articles are truncated to 25 sentences each.  This is because the generated articles are only made to be generated up to 51 or so sentences.  This would make "abrupt cutoffs" an easy way to determine that it is generated.  This, of course, is a limitation of our evaluation, as it does in fact hide a major failing of the generated articles!

Note that the evaluator is not given any feedback about whether his/her guess was correct or not.  This is done in order to prevent him/her from being influenced by their game playing.

In [9]:
from google.colab import files

# Initialize a counter for the current row
current_row = 0

evaluator_name = None

# Function to apply CSS
def set_css():
    display(HTML('''
    <style>
        pre {
            white-space: pre-wrap;
        }
    </style>
    '''))

# Function to display a row from the DataFrame
def guessing_game(df, row_num, num_sentences=25):
    global current_row
    global evaluator_name
    current_row = row_num

    # Clear previous output
    clear_output(wait=True)

    display(f"Turn {current_row} of {len(df)}")

    # Apply CSS for prettification
    set_css()

    # Create the evaluator_name widget
    if evaluator_name is None:
      evaluator_name = widgets.Text(value='', description='Evaluator Name:', placeholder='Enter your name')
      display(evaluator_name)
    else:
      display(evaluator_name.value)

    row = df.iloc[row_num]

    # Get the first num_sentences sentences of the original and generated articles
    original = ' '.join(nltk.sent_tokenize(row['Original Article'])[:num_sentences])
    generated = ' '.join(nltk.sent_tokenize(row['Generated Article'])[:num_sentences])

    # Randomly decide whether to show the original or generated article first
    if random.choice([True, False]):
        text1, text2 = original, generated
        correct_answer = 'Original, Generated'
    else:
        text1, text2 = generated, original
        correct_answer = 'Generated, Original'

    # Create and display the widgets
    print("Text 1:")
    print(text1)
    print("\nText 2:")
    print(text2)

    dropdown = widgets.Dropdown(options=['', 'Original, Generated', 'Generated, Original'], description='Order:')
    notes = widgets.Textarea(value='', description='Notes:', placeholder='Enter your notes')
    next_button = widgets.Button(description='Next Article Pair')

    display(dropdown)
    display(notes)
    display(next_button)

    def on_next_button_clicked(b):
      if dropdown.value == '':
          df.loc[current_row, f'EVALUATOR_{evaluator_name.value}_GUESS'] = 'SKIPPED'
      else:
          guess_correct = (dropdown.value == correct_answer)
          df.loc[current_row, f'EVALUATOR_{evaluator_name.value}_GUESS'] = guess_correct
      df.loc[current_row, f'EVALUATOR_{evaluator_name.value}_NOTES'] = notes.value
      if current_row < len(df) - 1:
          guessing_game(df, current_row + 1, num_sentences)
      else:
          print("You've reached the end of the DataFrame!")
          filename = f'EVALUATOR_{evaluator_name.value}.csv'
          df.to_csv(filename)
          files.download(filename)


    next_button.on_click(on_next_button_clicked)

# Call the function to display the first row
guessing_game(df_all, 0, num_sentences=10)

'Turn 0 of 40'

Text(value='', description='Evaluator Name:', placeholder='Enter your name')

Text 1:
Without a trace of bitterness, the billionaire Stephen A. Wynn once described his divorce from Elaine P. Wynn as the most expensive in American history. They evenly split his stake in the casino-resort company they co-founded and seemed to remain compatible leaders of some of the most profitable hotel-casinos in the world.. But Ms. Wynn’s relationship with Mr. Wynn was rocky for years, and he eventually left her for another woman — one who would soon be known as “Mrs. Steve.” She filed for divorce last year after months of allegations that she had been unfaithful to him by sleeping with other men, including two bodyguards hired by Donald J. Trump, the Republican presidential nominee. The couple married on Sept. 19, 2000, at St. John’s Roman Catholic Church in Las Vegas, where Mr. Wynn is buried. Their marriage has never been legally recognized, but it remains an enduring symbol of their union, which lasted more than seven decades. It also became a subject of intense speculation

Dropdown(description='Order:', options=('', 'Original, Generated', 'Generated, Original'), value='')

Textarea(value='', description='Notes:', placeholder='Enter your notes')

Button(description='Next Article Pair', style=ButtonStyle())