In [None]:
# Copyright 2024 Reddit, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Use Case 4. Complex Post Summarization

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/reddit/kdd2024-tutorial-breaking-barriers/blob/master/Use_Case_4_Complex_Post_Summarization.ipynb)

## Overview

This notebook guides participants to combine some of the implementations in the other use cases to show how to use AI to summarize lengthy posts for users
with cognitive impairments. We will compare and contrast different
summarization techniques and discuss the ethical considerations
of using AI for summarization.

---

## Setting Up Google Colab
Google Colab provides a convenient platform to run Python code in the cloud, with access to powerful computing resources, including GPUs. Similarly, for this tutorial, it is recommended to enable GPU acceleration:

1.   Click on *Runtime* in the top menu.
2.   Select *Change runtime type*.
3.   In the dialog that appears, under *Hardware accelerator*, choose **T4 GPU** (or any other GPU that you may have access to) if it is not already enabled.
4.   Click *Save*.

---

## Requirements

In [None]:
!pip install -U transformers bitsandbytes accelerate flash_attn

---

## Settings

Run the following cells to make some convenient settings.

In [None]:
# Disable Transformer warnings
import logging
logging.basicConfig(level=logging.INFO)

import transformers
transformers.logging.set_verbosity_error()

import warnings
warnings.filterwarnings('ignore')

# Set GPU device
import torch
torch.set_default_device("cuda") # or "cpu" is GPU is not available

Run the following cell to get the run time on every cell execution:

In [None]:
!pip install ipython-autotime
%load_ext autotime

Run the following cell to enable wrap when printing long strings:

In [None]:
from IPython.display import HTML, display

def set_css():
  display(HTML("""
  <style>
    pre {
        white-space: pre-wrap;
    }
  </style>
  """))
get_ipython().events.register("pre_run_cell", set_css)

Some other useful imports:

In [None]:
from PIL import image
from tqdm import tqdm
from glob import glob

---

## Test Post

We will use the first post in the tutorial dataset.

In [None]:
post = {
    "title": "You get me fired, so you can’t work where I care about",
    "body": """I used to work at this factory. I was a housekeeper for about a year then I transferred to the "hoes" department. I made the hoses for the machines. Now there was the main girl that was a team lead and she was known for being a bad person. I wouldn’t have transferred but I was in an abusive relationship and was trying to save money to move. My biggest dislike of this lady was that she was VERY sexually driven. I am all for expressing yourself but she would flirt/hook up with all the guys on the line and she had a huge hang up on the big boss.

She HATED that big boss was "taking a liking" to me. By this I mean when he asked me to do something I did it. There was this big deal about how the hose department was behind (bc they couldn’t keep workers) and I was fast (even with my rib being out). She did this thing that took her an extra 10-15 mins per set which big boss told her to stop doing. She trained me to do it her way but I found it easier to do it big boss way and faster. Big boss encouraged me bc our production increased since I was there and I did what he said. (Plus I wasn’t flirting with the line 50% of the day) She instantly changed towards me. Very short and rude and pointing out every mistake.

This lady constantly told me that "her" department starts off at $17 and the job posting also said $17. So when I got $16 I was upset. I talked to boss about it and said $16 is what that department starts at and to be happy I’m making more than I was before. (Also talked to past hose girls and they said they got $16.50) After this I was pissy the rest of the day. Also my "move out" date was getting closer and that was also stressing me out. She was talking to me about how unfair it was that I wasn’t getting paid the right amount and I said something about how big boss was a piece of shit and shouldn’t even have his job (a lot more bad going on but to much to write). This angered her a lot.

I leave bc I have a doc appointment and boss called me while there and texted me to come straight to office when I got back. My bosses boss was there when I walked in. They accused me of PUTTING A KNIFE TO SOMEONES NECK AND THREATENING THEM. I was shocked. Obviously I denied it bc I DID NO SUCH THING. Then it was well you put a knife to someone’s neck, then you had a knife and pointed it at them, then you had a knife and were upset. I was like "who?? I was with lady all day. Did you ask her?" Then they said she was the one I "did it" to and there were "multiple witnesses". Long story short they fire me. (Later found out the one "witness" is this ladies close, self proclaimed "father figure")

That night my ex was extremely angry at me for losing my job then I get served DOMESTIC ABUSE papers by a cop regarding the work stuff. First off it says right on the paper this excludes coworkers. I go to court for it and she doesn’t even show up!!! I move away and I did contact a lawyer about everything but he said if there are "witnesses" even if they aren’t real that could end up having the whole thing turned against me.

Now for the revenge! On my last day of work at a nursing home I see lady. She is taking to the hiring manager. She leaves and I instantly go in and ask "was that lady’s name?" And get told yes. I explain how she got multiple people fired at her old job and LOVES to start and stir drama. Hiring manager says she will note that. LADY DIDNT GET THE JOB 😂
Footer""",
}

---

## Model 1: Phi-3-small-128k-instruct

The [Phi-3-small-128K-instruct](https://huggingface.co/microsoft/Phi-3-small-128k-instruct) is another lightweight, state-of-the-art open multimodal model in the Phi-3 model family. The model has underwent a post-training process that incorporates both supervised fine-tuning and direct preference optimization for the instruction following and safety measures.

### Getting Started

In [None]:
from transformers import AutoModelForCausalLM, AutoProcessor, pipeline

# Load model and tokenizer
model = AutoModelForCausalLM.from_pretrained(
    "microsoft/Phi-3-small-128k-instruct",
    device_map="cuda",
    trust_remote_code=True,
    torch_dtype="auto")
processor = AutoProcessor.from_pretrained(
    "microsoft/Phi-3-small-128k-instruct",
    trust_remote_code=True)

# Create pipeline
pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    device="cuda"
)

In [None]:
# Test prompt
prompt = "What about solving an 2x + 3 = 7 equation?"

# Inference
messages = [{"role": "user", "content": prompt}]
generation_args = {
    "max_new_tokens": 500,
    "return_full_text": False,
    "temperature": 0.0,
    "do_sample": False,
}
output = pipe(messages, **generation_args)
result = output[0]["generated_text"]

# Display results
print(result)

### Improved Implementation

This is a more convenient implementation to prompt the model.

In [None]:
from transformers import AutoModelForCausalLM, AutoProcessor, pipeline

class LLM:
  def __init__(self):
    # Load model and tokenizer
    model = AutoModelForCausalLM.from_pretrained(
        "microsoft/Phi-3-small-128k-instruct",
        device_map="cuda",
        trust_remote_code=True,
        torch_dtype="auto")
    processor = AutoProcessor.from_pretrained("microsoft/Phi-3-small-128k-instruct", trust_remote_code=True)

    # Create pipeline
    self.pipe = pipeline(
        "text-generation",
        model=model,
        tokenizer=tokenizer,
        device="cuda"
    )

  def __call__(self, prompt):
    messages = [{"role": "user", "content": prompt}]
    generation_args = {
        "max_new_tokens": 500,
        "return_full_text": False,
        "temperature": 0.0,
        "do_sample": False,
    }
    output = pipe(messages, **generation_args)
    result = output[0]["generated_text"]
    return result

In [None]:
# Load model
llm = LLM()

In [None]:
llm("What about solving an 2x + 3 = 7 equation?")

### Generate Post Summary

In [None]:
prompt = f"""Write a short summary of the following post with a language that can be clearly understood by non-expert people

## Post Title:
{post["title"]}

## Post Body:
{post["body"]}"""

print(llm(prompt))

---

## Model 2: imp-v1-3b

Now we are going to test the `MLLMv1` implementation (in Use Case 1. Image Short Captions) based on `imp-v1-3b` model.

In [None]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from PIL import Image

class MLLMv1:

  def __init__(self):
    torch.set_default_device("cuda")
    self.vision_model = AutoModelForCausalLM.from_pretrained(
      "MILVLG/imp-v1-3b",
      torch_dtype=torch.float16,
      device_map="auto",
      trust_remote_code=True)
    self.vision_tokenizer = AutoTokenizer.from_pretrained(
        "MILVLG/imp-v1-3b",
        trust_remote_code=True)

  def get_image_caption(self,
                        image: Image,
                        base_prompt="Write a very short caption for the image with less than 20 words") -> str:
    return self.prompt_llm(image, base_prompt)

  def get_image_description(self,
                            image: Image,
                            base_prompt="Write a short description for the image") -> str:
    return self.prompt_llm(image, base_prompt)

  def prompt_llm(self,
                 image: Image,
                 prompt: str,
                 max_new_tokens: int = 256,
                 temperature: float = 0.9,
                 top_k: int = 50,
                 top_p: float = 0.95) -> str:
    if image:
      text = self.vision_tokenizer.apply_chat_template(
          [{"role": "user", "content": f"<image>\n{prompt}"}],
          tokenize=False,
          add_generation_prompt=True
      )
      image_tensor = self.vision_model.image_preprocess(image)
    else:
      text = self.vision_tokenizer.apply_chat_template(
          [{"role": "user", "content": f"{prompt}"}],
          tokenize=False,
          add_generation_prompt=True
      )
      image_tensor = None
    input_ids = self.vision_tokenizer(text, return_tensors="pt").input_ids
    output_ids = self.vision_model.generate(
      input_ids,
      max_new_tokens=max_new_tokens,
      images=image_tensor,
      temperature=temperature,
      do_sample=True,
      top_k=top_k,
      top_p=top_p,
      use_cache=True)[0]
    response = self.vision_tokenizer.decode(output_ids[input_ids.shape[1]:], skip_special_tokens=True)
    response = response.replace("\n", " ").strip().replace("  ", " ")
    return response

In [None]:
# Load model
ic = MLLMv1()

In [None]:
prompt = f"""Write a short summary of the following post with a language that can be clearly understood by non-expert people

## Post Title:
{post["title"]}

## Post Body:
{post["body"]}"""

print(ic.prompt_llm(None, prompt))

Now test with several prompts to improve the results.

Finally, generate the captions for all the posts in the provided dataset.
- Do you find any issue?
- Which prompt seems to work best for all posts?

May you propose an approach to do multimodal post summarization, using both the post title and body and also the image or the video?

---

# Discussion: How Long Post Summarization with Multimodal LLMs can improve Accessibility in Social Media

- **Concise and Accessible Summaries**: LLMs can condense lengthy social media posts into clear, concise summaries, making information easier to digest for users with cognitive impairments.

- **Highlighting Key Information**: LLMs can identify and highlight crucial information, reducing cognitive overload and improving focus.

- **Visual & Audio Support**: Multimodal LLMs can incorporate images, videos, and audio to enhance understanding and engagement. This is particularly helpful for users with visual or auditory processing difficulties.

- **Integration with Assistive Technologies**: Seamlessly integrate LLMs with screen readers, text-to-speech software, and other assistive technologies.

- **Personalized Summaries**: LLMs can tailor summaries based on individual user preferences and cognitive abilities, ensuring optimal comprehension.

- **Transparency and Control**: Users should understand how the AI works and be able to control the summarization process. Allow users to customize the length, format, and level of detail in the summaries.

- **Accuracy and Bias**: Consider that LLMs can sometimes generate inaccurate or biased summaries, especially when dealing with sensitive or complex topics.
