# Text Summarization  using [HuggingFace Transformers](https://huggingface.co/models?pipeline_tag=translation&sort=trending)

- Let's install the following librarires
```
    !pip install transformers
```

In [None]:
# Code to ignore warnings
from transformers.utils import logging
logging.set_verbosity_error()

## Let's build the translation pipeline using **Transformers** Library

In [1]:
# Code 1 - Use a pipeline as a high-level helper
from transformers import pipeline

 The process:

1. Model Download: When you create a pipeline, the Transformers library automatically downloads the model weights and configuration files to your local machine. By default, these are stored in a cache directory (usually ~/.cache/huggingface/transformers/).
   
2. Local Execution: Once downloaded, the model runs entirely on your local machine. This means you don't need an internet connection after the initial download, and you're not making API calls to Hugging Face servers for inference.

3. First-time vs. Subsequent Use: The first time you use a specific model with a pipeline, it will download the necessary files. On subsequent runs, it will use the locally cached version unless you explicitly request a new download.

4. Resource Usage: Since the model runs locally, it uses your system's resources (CPU, GPU, RAM) for computations.

In [2]:
# Code 2 - create a summarizer object
summarizer = pipeline(task="summarization", model="facebook/bart-large-cnn")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


In [14]:
text = """BART is a transformer encoder-encoder (seq2seq) model with a bidirectional (BERT-like) encoder and an autoregressive (GPT-like) decoder.
BART is pre-trained by (1) corrupting text with an arbitrary noising function, and (2) learning a model to reconstruct the original text.
BART is particularly effective when fine-tuned for text generation (e.g. summarization, translation) but also works well for comprehension tasks (e.g. text classification, question answering).
This particular checkpoint has been fine-tuned on CNN Daily Mail, a large collection of text-summary pairs."""

In [20]:
# Summarize the text. the length here is in tokens
summary = summarizer(text, min_length=10, max_length=100)

In [14]:
# print the summary
summary

[{'summary_text': 'BART is a transformer encoder-encoder (seq2seq) model with a bidirectional (BERT-like) encoder. BART is pre-trained by corrupting text with an arbitrary noising function, and learning a model to reconstruct the original text.'}]

#### To improve the quality of your summary, try the following

In [21]:
# Code 3 - Summarize the text. the length here is in tokens
summary = summarizer(text,
    repetition_penalty=5.0,  # Increase this to discourage repetition
    length_penalty=0.3,      # Decrease this to generate longer summaries
    min_length=20, max_length=100)

In [22]:
summary

[{'summary_text': 'BART is pre-trained by corrupting text with an arbitrary noising function, and learning a model to reconstruct the original text. BART is particularly effective when fine-tuned for text generation but also works well for comprehension tasks.'}]

## Create an Inference point

Gradio is an open-source Python library that provides an easy way to create web-based user interfaces for machine learning models, data processing pipelines, or any Python function. Here are the key points about Gradio:

* It allows developers to quickly create interactive demos for their machine learning models or data science projects without needing expertise in web development.
* Gradio supports various input types (text, image, audio, video) and output types, making it suitable for a wide range of applications.
* It works well with popular machine learning frameworks like TensorFlow, PyTorch, and scikit-learn, as well as with Hugging Face models.
* You can run Gradio interfaces locally or easily deploy them to sharing services like Hugging Face Spaces.


In [None]:
!pip install gradio

In [5]:
# Code 4 - Let's create an UI using Gradio
import gradio as gr

In [3]:
# Code 5 - define a function to summarize text
def nlp(input_text):
    summary = summarizer(
        input_text,
        repetition_penalty=5.0,  # Increase this to discourage repetition
        length_penalty=0.3,      # Decrease this to generate longer summaries
        min_length=20, max_length=100
    )
    return summary[0]["summary_text"]

In [6]:
# Code 6 - UI object
ui = gr.Interface(nlp,
    inputs=gr.Textbox(label="Input Text"),
    outputs=gr.Textbox(label="Summary"),
    title="Text Summarizer",
    description="Summarize your text using the BART model.")

In [7]:
# Code 7 - launch UI
ui.launch(share=True)

Setting queue=True in a Colab notebook requires sharing enabled. Setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
Running on public URL: https://92207fcaf235bb4757.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)




# Model Deployement using HuggingFace

step 1: Visit [HugginFace.co ](https://https://huggingface.co/) \
step 2: Find the New button (left side of the page) and create a new space. \
step 3: space name = Bart-Text-Summarizer, license = Apache 2.0, space sdk = Gradio, Space hardware = Free type, and keep the public option selected.\
step 4: Once the space is created, click on files and create 2 files, requirement.txt (with all the required libraries) and app.py (application file)\
step 5: Inside your requirements.txt file, list - transformers and gradio. \
step 6: create another file app.py, and add the code from code 1, code 2, code 5, code 6, and code 7
