

![JohnSnowLabs](https://nlp.johnsnowlabs.com/assets/images/logo.png)

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://githubtocolab.com/JohnSnowLabs/spark-nlp-workshop/blob/master/tutorials/streamlit_notebooks/T5TRANSFORMER.ipynb)




# **Text Summarization & Question Answering using google's T5 Transformer**

### Spark NLP documentation and instructions:
https://nlp.johnsnowlabs.com/docs/en/quickstart

### Spark NLP Google T5 Article 	
https://towardsdatascience.com/hands-on-googles-text-to-text-transfer-transformer-t5-with-spark-nlp-6f7db75cecff

### You can find details about Spark NLP annotators here:
https://nlp.johnsnowlabs.com/docs/en/annotators

### You can find details about Spark NLP models here:
https://nlp.johnsnowlabs.com/models


## 1. Colab Setup

In [None]:
!wget http://setup.johnsnowlabs.com/colab.sh -O - | bash
# !bash colab.sh
# -p is for pyspark
# -s is for spark-nlp
# !bash colab.sh -p 3.1.1 -s 3.0.1
# by default they are set to the latest

--2021-06-01 16:45:58--  http://setup.johnsnowlabs.com/colab.sh
Resolving setup.johnsnowlabs.com (setup.johnsnowlabs.com)... 51.158.130.125
Connecting to setup.johnsnowlabs.com (setup.johnsnowlabs.com)|51.158.130.125|:80... connected.
HTTP request sent, awaiting response... 302 Moved Temporarily
Location: https://raw.githubusercontent.com/JohnSnowLabs/spark-nlp/master/scripts/colab_setup.sh [following]
--2021-06-01 16:45:59--  https://raw.githubusercontent.com/JohnSnowLabs/spark-nlp/master/scripts/colab_setup.sh
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.109.133, 185.199.108.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.109.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1608 (1.6K) [text/plain]
Saving to: ‘STDOUT’

-                     0%[                    ]       0  --.-KB/s               setup Colab for PySpark 3.0.2 and Spark NLP 3.0.3

2021-06-01 16:45:59 (1.90 

## 2. Start the Spark session

Import dependencies and start Spark session.

In [None]:
import json
import pandas as pd
import numpy as np

from pyspark.ml import Pipeline
from pyspark.sql import SparkSession
import pyspark.sql.functions as F
from sparknlp.annotator import *
from sparknlp.base import *
import sparknlp
from sparknlp.pretrained import PretrainedPipeline

spark = sparknlp.start()

## 3. Select the DL model

For complete model list: 
https://nlp.johnsnowlabs.com/models

For `T5` models:
https://nlp.johnsnowlabs.com/models?tag=t5

##4. Text Summaization using T5 Transformer

 Define Spark NLP pipeline

In [None]:
document_assembler = DocumentAssembler()\
.setInputCol("text")\
.setOutputCol("documents")

t5 = T5Transformer() \
  .pretrained("t5_small", 'en') \
  .setTask("summarize:")\
  .setMaxOutputLength(200)\
  .setInputCols(["documents"]) \
  .setOutputCol("summaries")

summarizer_pp = Pipeline(stages=[
    document_assembler, t5
])

t5_small download started this may take some time.
Approximate size to download 139 MB
[OK!]


Run the pipeline

In [None]:
empty_df = spark.createDataFrame([['']]).toDF('text')
pipeline_model = summarizer_pp.fit(empty_df)
sum_lmodel = LightPipeline(pipeline_model)

In [None]:
example_txt = """

I'm interested in the laws around eliminating PMI on my mortgage. 
My original mortgage was for 140,000. My property appraised out at 169,000, but I paid 155,000 (I negotiated hard!). I currently owe 126,000. 
My automatic PMI removal date was set for Feb 2024 based on just paying the minimum payment -- however, I have been paying aggressively. 
From what I understand, the bank is obligated to stop charging for PMI once I reach 78% of my homes original appraised value. I reached 78% over a year ago. I spoke to someone at USbank and they said it doesn't work that way; that the original cost of the purchase is considered and not the appraised value, and that I have to wait until 2024, pay down more (?) or have another appraisal done. Honestly, I don't totally understand what they meant.
Does any know what the issue might be? Is it not as straightforward as the internet might have me believe? :)

"""

res = sum_lmodel.fullAnnotate(example_txt)[0]


print ('Summary:', res['summaries'][0].result)

Summary: a bank is obligated to stop charging for PMI once I reach 78% of my homes original appraised value . a bank says it doesn't work that way; the original cost of the purchase is considered and not the appraised value . a bank says it's not as straightforward as the internet might have me believe .


##5. Question Answering using T5 Transformer

 Define Spark NLP pipeline

In [None]:
document_assembler = DocumentAssembler() \
    .setInputCol("text") \
    .setOutputCol("documents")

sentence_detector = SentenceDetectorDLModel\
    .pretrained("sentence_detector_dl", "en")\
    .setInputCols(["documents"])\
    .setOutputCol("questions")

t5 = T5Transformer()\
    .pretrained("google_t5_small_ssm_nq", 'en')\
    .setInputCols(["questions"])\
    .setOutputCol("answers")\

qa_pp = Pipeline(stages=[
    document_assembler, sentence_detector, t5
])

sentence_detector_dl download started this may take some time.
Approximate size to download 354.6 KB
[OK!]
google_t5_small_ssm_nq download started this may take some time.
Approximate size to download 139 MB
[OK!]


Run the pipeline

In [None]:
empty_df = spark.createDataFrame([['']]).toDF('text')
pipeline_model = qa_pp.fit(empty_df)
qa_lmodel = LightPipeline(pipeline_model)

questions = ["Do student loans or credit card debt take precedence?",
             "How does Wageworks work?",
             "What to ask for when buying a used car?",
             "How fast does your credit score actually update?",
             "What happens with unused credit card accounts?"
]

res = qa_lmodel.fullAnnotate(questions)


for i, r in enumerate(res):
  print ("Question:", questions[i])
  for sent in r['answers']:
    print ('Answer:\t', sent.result)


Question: Do student loans or credit card debt take precedence?
Answer:	 Over one million students
Question: How does Wageworks work?
Answer:	 sales of ten million units
Question: What to ask for when buying a used car?
Answer:	 car gage
Question: How fast does your credit score actually update?
Answer:	 until you are interrogated
Question: What happens with unused credit card accounts?
Answer:	 bankruptcy
