# Isi Penting Generator HuggingFace headline news style

Generate a long text with headline news style given isi penting (important facts).

<div class="alert alert-info">

This tutorial is available as an IPython notebook at [Malaya/example/isi-penting-generator-huggingface-headline-news-style](https://github.com/huseinzol05/Malaya/tree/master/example/isi-penting-generator-huggingface-headline-news-style).
    
</div>

<div class="alert alert-warning">

Results generated using stochastic methods.
    
</div>

In [1]:
%%time
import malaya
from pprint import pprint

CPU times: user 3.09 s, sys: 3.53 s, total: 6.62 s
Wall time: 2.29 s


### List available HuggingFace

In [2]:
malaya.generator.isi_penting.available_huggingface()

Unnamed: 0,Size (MB),ROUGE-1,ROUGE-2,ROUGE-L,Suggested length
mesolitica/finetune-isi-penting-generator-t5-small-standard-bahasa-cased,242.0,0.246203,0.058961,0.15159,1024.0
mesolitica/finetune-isi-penting-generator-t5-base-standard-bahasa-cased,892.0,0.246203,0.058961,0.15159,1024.0


### Load HuggingFace

Transformer Generator in Malaya is quite unique, most of the text generative model we found on the internet like GPT2 or Markov, simply just continue prefix input from user, but not for Transformer Generator. We want to generate an article or karangan like high school when the users give 'isi penting'.

```python
def huggingface(model: str = 'mesolitica/finetune-isi-penting-generator-t5-base-standard-bahasa-cased', **kwargs):
    """
    Load HuggingFace model to generate text based on isi penting.

    Parameters
    ----------
    model: str, optional (default='mesolitica/finetune-isi-penting-generator-t5-base-standard-bahasa-cased')
        Check available models at `malaya.generator.isi_penting.available_huggingface()`.

    Returns
    -------
    result: malaya.torch_model.huggingface.IsiPentingGenerator
    """
```

In [3]:
model = malaya.generator.isi_penting.huggingface()

#### generate

```python
def generate(
    self,
    strings: List[str],
    mode: str = 'surat-khabar',
    **kwargs,
):
    """
    generate a long text given a isi penting.

    Parameters
    ----------
    strings : List[str]
    mode: str, optional (default='surat-khabar')
        Mode supported. Allowed values:

        * ``'surat-khabar'`` - news style writing.
        * ``'tajuk-surat-khabar'`` - headline news style writing.
        * ``'artikel'`` - article style writing.
        * ``'penerangan-produk'`` - product description style writing.
        * ``'karangan'`` - karangan sekolah style writing.

    **kwargs: vector arguments pass to huggingface `generate` method.
        Read more at https://huggingface.co/docs/transformers/main_classes/text_generation

    Returns
    -------
    result: List[str]
    """
```

### Good thing about HuggingFace

In `generate` method, you can do greedy, beam, sampling, nucleus decoder and so much more, read it at https://huggingface.co/blog/how-to-generate

And recently, huggingface released https://huggingface.co/blog/introducing-csearch

In [4]:
isi_penting = ['Dr M perlu dikekalkan sebagai perdana menteri',
              'Muhyiddin perlulah menolong Dr M',
              'rakyat perlu menolong Muhyiddin']

In [5]:
pprint(model.generate(isi_penting, mode = 'tajuk-surat-khabar',
    do_sample=True, 
    max_length=256, 
    top_k=50, 
    top_p=0.95,))

You're using a T5TokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


['SANDAKAN - Tun Dr Mahathir Mohamad menegaskan Pakatan Harapan tidak akan '
 'menerima tawaran Dr Mahathir untuk bertanding pada pilihan raya kecil (PRK) '
 'Tanjung Piai di Rantau. Pengerusi PH yang juga Ahli Parlimen Tanjung Piai '
 'itu berkata, beliau tidak akan berkompromi dengan keputusan dibuat Dr '
 'Mahathir dalam PRK tersebut.']


In [6]:
isi_penting = ['Neelofa tetap dengan keputusan untuk berkahwin akhir tahun ini',
              'Long Tiger sanggup membantu Neelofa',
              'Tiba-tiba Long Tiger bergaduh dengan Husein']

We also can give any isi penting even does not make any sense.

In [7]:
pprint(model.generate(isi_penting, do_sample=True, mode = 'tajuk-surat-khabar',
    max_length=256,
    top_k=50, 
    top_p=0.95, ))

['KUALA LUMPUR: Walaupun dia kini sudah berkeras terhadap keinginan mahu '
 'berkahwin dan sudah berkahwin, dua anak, Neelofa dan Husein, tetap tetap '
 'dengan pendirian untuk tidak melayan permintaan pelawaan ini untuk bercerai. '
 'Menceritakan tentang reaksi Neelofa ketika ditanya sama ada akan berkahwin '
 'dalam masa terdekat, beliau percaya perkara itu "mengandungi tanggungjawab", '
 'tetapi masih ada lagi alasan agar perkara itu tidak dijadikan isu dan '
 'menjadi kecoh kepada sesetengah pihak.']


In [14]:
isi_penting = ['Anwar Ibrahim jadi perdana menteri', 'Muhyiddin cemburu jadi PM tepi',
              'PAS menggunakan isu sentimen kaum dan agama']

pprint(model.generate(isi_penting, do_sample=True, mode = 'tajuk-surat-khabar',
    max_length=256))

[': Pakatan Harapan akan terus mengotakan janjinya dengan memberikan sokongan '
 'kepada calon Pakatan Harapan, Datuk Seri Anwar Ibrahim, yang disifatkannya '
 '"tidak boleh diterima" oleh rakyat. Naib Pengerusi PKR tersebut mengingatkan '
 'pemimpin parti itu agar tidak menghentam parti dan pimpinan seperti '
 'Presiden-Presiden sesebuah parti berkenaan".']
