### This notebook chooses the model for summarizing the reviews of a product

In [278]:
# !pip install -U git+https://github.com/huggingface/transformers.git

In [29]:
from transformers import pipeline
import torch
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

#### Example reviews to test summarization performance

In [30]:
review1 = "My wife and I bought this keyboard for our new HTPC and were very excited about the size. The size and the feel of the keyboard are great. It feels very solid in the hands and works automatically upon connecting the USB dongle. After using it for a few moments, we soon realized the horrible mouse function of the keyboard. We have a 52 inch flatscreen tv which is about 10 feet away from our couch. We constantly had to recenter the mouse pointer because we would be pointing the device far off screen and the cursor would only move a few inches. We changed all the sensitivities going up and down to no avail. After using the mouse for a while, my hands would be bent at weird angles just to make it across the screen. Eventually the mouse began to move by itself to the left which made things even harder. It was a struggle to enjoy something that we hoped would be alot of fun. While the keyboard functions worked well, the horrible mouse function more than overshadows it."
review2 = "I ordered this as a keyboard/mouse for my HTPC. I liked the size of the product and the listed features. A gyroscopic mouse sounded intuitive and convenient. Unfortunately, it was not. The keyboard worked well eventually. I had to plug a regular keyboard in to get the PC to boot, but after the first boot, the full keyboard was never needed. Typing on the keyboard was easy enough--a good thumbboard with nice media keys for windows. The mouse function, on the other hand, was terrible. The gyroscope functioned poorly and wouldn't center the mouse pointer when the mouse was centered. The device was advertised as not needing any software calibration, but I'm betting the performance would have benefited from some sort of calibration. Without calibration, there was no way to recenter the pointer, so you had to twist your wrists awkwardly to try to get the mouse from one edge of the screen to the other. The device also suffered from poor RF performance, suggesting that it wasn't really RF at all, but rather IF connection. This means that I'd have to have line-of-sight to the receiver by sticking the receiver into a front usb port, rather than hiding the receiver in back. That's a poor option for an HTPC remote. For  89,𝑡ℎ𝑖𝑠𝑝𝑟𝑜𝑑𝑢𝑐𝑡𝑖𝑠𝑛𝑜𝑡𝑤𝑜𝑟𝑡ℎ𝑖𝑡.𝐼𝑚𝑎𝑦ℎ𝑎𝑣𝑒𝑏𝑒𝑒𝑛𝑎𝑏𝑙𝑒𝑡𝑜𝑎𝑐𝑐𝑒𝑝𝑡𝑡ℎ𝑒𝑏𝑒𝑙𝑜𝑤−𝑎𝑣𝑒𝑟𝑎𝑔𝑒𝑝𝑒𝑟𝑓𝑜𝑟𝑚𝑎𝑛𝑐𝑒𝑖𝑓𝑡ℎ𝑖𝑠𝑡ℎ𝑖𝑛𝑔𝑤𝑎𝑠 29 or less. Do yourself a favor and get something else. Something with a wireless connection, like a wireless keyboard, should offer better performance. I returned this thing and went with a $30 Lenovo multimedia remote, on sale."
review3 = """
Very nice little keyboard. For some reason I was expecting it to be a bit bigger, but not disappointed. It fits in the hands nicely and has a good feel. The virtual mouse movements are pretty cool as well. Not perfect, but not bad. There may be some tweaks I need to apply, but out-of-the-box settings for mouse movement I'd say are pretty usable. Keep in mind this is best suited for sitting back on the couch and \\"consuming content\\" by point-and-click and limited text. You could use it to respond to a short email, but don't expect to write a book with it. Pros: Good construction, nice feel in hands, good \\"click\\" to key touch, great mouse buttons. Cons: Keeping in context with how it should be used, none. I thought  80𝑤𝑎𝑠𝑎𝑙𝑖𝑡𝑡𝑙𝑒𝑠𝑡𝑒𝑒𝑝𝑤ℎ𝑒𝑛𝑐𝑜𝑚𝑝𝑎𝑟𝑒𝑑𝑡𝑜𝑜𝑡ℎ𝑒𝑟𝑐𝑜𝑚𝑝𝑜𝑛𝑒𝑛𝑡𝑠(𝐼𝑜𝑛𝑙𝑦𝑝𝑎𝑖𝑑 60 for an ASUS motherboard), but I guess that's about what you'd expect for other remotes, such as Logitech, etc.
"""
review4 = """i just got this in the mail today and had to write a review. I have my laptop connected to my tv like most people viewing this item. I first purchased the Pro mini, barely got 5ft and the touchpad is way to small. Naturally I was relunctant to buy this item and loose more money. Cideko is so simple I used it straight out the box without reading the instructions. Right now I am writting this review with the Cideko from my couch (about 12ft away). 
The best way I can describe the functionality is to think of a Wii remote, mixed with a PSP, with standard keyboard buttons. 
I am extremely happy with my purchase and highly reccommend!
"""

#### Five Candidate models

In [43]:
tokenizer_cnn = AutoTokenizer.from_pretrained('facebook/bart-large-cnn')
model_cnn = AutoModelForSeq2SeqLM.from_pretrained('facebook/bart-large-cnn')

tokenizer_cnn_12_6 = AutoTokenizer.from_pretrained("sshleifer/distilbart-cnn-12-6")
model_cnn_12_6 = AutoModelForSeq2SeqLM.from_pretrained("sshleifer/distilbart-cnn-12-6")

tokenizer_xsum = AutoTokenizer.from_pretrained('facebook/bart-large-xsum')
model_xsum = AutoModelForSeq2SeqLM.from_pretrained('facebook/bart-large-xsum')

tokenizer_xsum_12_6 = AutoTokenizer.from_pretrained("sshleifer/distilbart-xsum-12-6")
model_xsum_12_6 = AutoModelForSeq2SeqLM.from_pretrained("sshleifer/distilbart-xsum-12-6")

tokenizer_reddit = AutoTokenizer.from_pretrained("google/pegasus-reddit_tifu")
model_reddit = AutoModelForSeq2SeqLM.from_pretrained("google/pegasus-reddit_tifu")

In [44]:
model_dictionary = {'CNN - base': (tokenizer_cnn, model_cnn), 'CNN - distilbart_12_6': (tokenizer_cnn_12_6, model_cnn_12_6),
                   'BBC - base': (tokenizer_xsum, model_xsum), 'BBC - distilbart_12_6': (tokenizer_xsum_12_6, model_xsum_12_6),
                   'Reddit - Pegasus': (tokenizer_reddit, model_reddit)}

In [45]:
def print_results(text, max_length=200, num_beams = 4):
    print(f"Original Text:\n{text}\n")
    
    for model_name in model_dictionary:
        tokenizer, model = model_dictionary[model_name]
        tokenized_text = tokenizer.encode(text, return_tensors='pt')
        summary_ids = model.generate(tokenized_text,
                                     num_beams=num_beams,
                                     no_repeat_ngram_size=2,
                                     min_length=30,
                                     max_length=max_length,
                                     early_stopping=True)
        print(f"\n{model_name}:\n{tokenizer.decode(summary_ids[0], skip_special_tokens = True)}")
    print("===============================================================================================================================")

#### Print the summarizations of a few example reviews to compare the models
* CNN - BART models are more like extractive summarization, i.e. copying a few sentences, rather than abstractive summarization.
* BBC - BART models' summaries are mostly short and contains less important statements.
* Reddit - Pegasus model seems to summarize better since the tone of reddit posts and tone of Amazon review posts are quite similar, so it can generate words that are more relevant to the context.

In [46]:
for text in [review1, review2, review3, review4]:
    print_results(text)

Original Text:
My wife and I bought this keyboard for our new HTPC and were very excited about the size. The size and the feel of the keyboard are great. It feels very solid in the hands and works automatically upon connecting the USB dongle. After using it for a few moments, we soon realized the horrible mouse function of the keyboard. We have a 52 inch flatscreen tv which is about 10 feet away from our couch. We constantly had to recenter the mouse pointer because we would be pointing the device far off screen and the cursor would only move a few inches. We changed all the sensitivities going up and down to no avail. After using the mouse for a while, my hands would be bent at weird angles just to make it across the screen. Eventually the mouse began to move by itself to the left which made things even harder. It was a struggle to enjoy something that we hoped would be alot of fun. While the keyboard functions worked well, the horrible mouse function more than overshadows it.


CNN -

#### Trying different number of beams for the given text
The number of beams as 4 seems to be the best.

In [49]:
print(f"Original Text:\n{review1}\n")
for num_beams in [1,2,3,4]:
    tokenized_text = tokenizer_reddit.encode(review1, return_tensors="pt")
    summary_ids = model_reddit.generate(tokenized_text,
                                          num_beams=num_beams,
                                          no_repeat_ngram_size=2,
                                          min_length=30,
                                          max_length=200,
                                          early_stopping=True)
    print(f"\nReddit - Pegasus:\nNumber of beams = {num_beams}\n{tokenizer_reddit.decode(summary_ids[0], skip_special_tokens=True)}")

Original Text:
My wife and I bought this keyboard for our new HTPC and were very excited about the size. The size and the feel of the keyboard are great. It feels very solid in the hands and works automatically upon connecting the USB dongle. After using it for a few moments, we soon realized the horrible mouse function of the keyboard. We have a 52 inch flatscreen tv which is about 10 feet away from our couch. We constantly had to recenter the mouse pointer because we would be pointing the device far off screen and the cursor would only move a few inches. We changed all the sensitivities going up and down to no avail. After using the mouse for a while, my hands would be bent at weird angles just to make it across the screen. Eventually the mouse began to move by itself to the left which made things even harder. It was a struggle to enjoy something that we hoped would be alot of fun. While the keyboard functions worked well, the horrible mouse function more than overshadows it.


Reddi