# MS to EN

<div class="alert alert-info">

This tutorial is available as an IPython notebook at [Malaya/example/ms-en-translation](https://github.com/huseinzol05/Malaya/tree/master/example/ms-en-translation).
    
</div>

<div class="alert alert-warning">

This module only trained on standard language structure, so it is not save to use it for local language structure.
    
</div>

In [2]:
%%time

import malaya

CPU times: user 6.3 s, sys: 1.47 s, total: 7.77 s
Wall time: 10.3 s


### Load dictionary

```python
def dictionary(**kwargs):
    """
    Load dictionary {MS: EN} .

    Returns
    -------
    result: Dict[str, str]
    """
```

In [2]:
dictionary = malaya.translation.ms_en.dictionary()

In [4]:
dictionary.get('ayam')

'chicken'

### List available Transformer models

In [3]:
malaya.translation.ms_en.available_transformer()

Unnamed: 0,Size (MB),Quantized Size (MB),BLEU,Suggested length
small,42.7,13.4,0.626,256.0
base,234.0,82.7,0.792,256.0
large,815.0,244.0,0.714,256.0
bigbird,246.0,63.7,0.678,1024.0
small-bigbird,50.4,13.1,0.586,1024.0
noisy-base,234.0,82.7,0.792,256.0


We tested on 100k MS-EN sentences.

### Load Transformer models

```python
def transformer(model: str = 'base', quantized: bool = False, **kwargs):
    """
    Load Transformer encoder-decoder model to translate MS-to-EN.

    Parameters
    ----------
    model : str, optional (default='base')
        Model architecture supported. Allowed values:

        * ``'small'`` - Transformer SMALL parameters.
        * ``'base'`` - Transformer BASE parameters.
        * ``'large'`` - Transformer LARGE parameters.
    
    quantized : bool, optional (default=False)
        if True, will load 8-bit quantized model. 
        Quantized model not necessary faster, totally depends on the machine.

    Returns
    -------
    result: malaya.model.tf.Translation class
    """
```

In [1]:
transformer = malaya.translation.ms_en.transformer()
transformer_small = malaya.translation.ms_en.transformer(model = 'small')
transformer_large = malaya.translation.ms_en.transformer(model = 'large')

### Load Quantized model

To load 8-bit quantized model, simply pass `quantized = True`, default is `False`.

We can expect slightly accuracy drop from quantized model, and not necessary faster than normal 32-bit float model, totally depends on machine.

In [4]:
quantized_transformer = malaya.translation.ms_en.transformer(quantized = True)



### Translate

#### Using greedy decoder

```python
def greedy_decoder(self, strings: List[str]):
    """
    translate list of strings.

    Parameters
    ----------
    strings : List[str]

    Returns
    -------
    result: List[str]
    """
```

#### Using beam decoder

```python
def beam_decoder(self, strings: List[str]):
    """
    translate list of strings using beam decoder, beam width size 3, alpha 0.5 .

    Parameters
    ----------
    strings : List[str]

    Returns
    -------
    result: List[str]
    """
```

**For better results, always split by end of sentences**.

In [4]:
from pprint import pprint

In [5]:
# https://www.sinarharian.com.my/article/89678/BERITA/Politik/Saya-tidak-mahu-sentuh-isu-politik-Muhyiddin

string_news1 = 'TANGKAK - Tan Sri Muhyiddin Yassin berkata, beliau tidak mahu menyentuh mengenai isu politik buat masa ini, sebaliknya mahu menumpukan kepada soal kebajikan rakyat serta usaha merancakkan semula ekonomi negara yang terjejas berikutan pandemik Covid-19. Perdana Menteri menjelaskan perkara itu ketika berucap pada Majlis Bertemu Pemimpin bersama pemimpin masyarakat Dewan Undangan Negeri (DUN) Gambir di Dewan Serbaguna Bukit Gambir hari ini.'
pprint(string_news1)

('TANGKAK - Tan Sri Muhyiddin Yassin berkata, beliau tidak mahu menyentuh '
 'mengenai isu politik buat masa ini, sebaliknya mahu menumpukan kepada soal '
 'kebajikan rakyat serta usaha merancakkan semula ekonomi negara yang terjejas '
 'berikutan pandemik Covid-19. Perdana Menteri menjelaskan perkara itu ketika '
 'berucap pada Majlis Bertemu Pemimpin bersama pemimpin masyarakat Dewan '
 'Undangan Negeri (DUN) Gambir di Dewan Serbaguna Bukit Gambir hari ini.')


In [6]:
# https://www.sinarharian.com.my/article/90021/BERITA/Politik/Tun-Mahathir-Anwar-disaran-bersara-untuk-selesai-kemelut-politik

string_news2 = 'ALOR SETAR - Kemelut politik Pakatan Harapan (PH) belum berkesudahan apabila masih gagal memuktamadkan calon Perdana Menteri yang dipersetujui bersama. Ahli Parlimen Sik, Ahmad Tarmizi Sulaiman berkata, sehubungan itu pihaknya mencadangkan mantan Pengerusi Parti Pribumi Bersatu Malaysia (Bersatu), Tun Dr Mahathir Mohamad dan Presiden Parti Keadilan Rakyat (PKR), Datuk Seri Anwar Ibrahim mengundurkan diri daripada politik sebagai jalan penyelesaian.'
pprint(string_news2)

('ALOR SETAR - Kemelut politik Pakatan Harapan (PH) belum berkesudahan apabila '
 'masih gagal memuktamadkan calon Perdana Menteri yang dipersetujui bersama. '
 'Ahli Parlimen Sik, Ahmad Tarmizi Sulaiman berkata, sehubungan itu pihaknya '
 'mencadangkan mantan Pengerusi Parti Pribumi Bersatu Malaysia (Bersatu), Tun '
 'Dr Mahathir Mohamad dan Presiden Parti Keadilan Rakyat (PKR), Datuk Seri '
 'Anwar Ibrahim mengundurkan diri daripada politik sebagai jalan penyelesaian.')


In [7]:
string_news3 = 'Menteri Kanan (Kluster Keselamatan) Datuk Seri Ismail Sabri Yaakob berkata, kelonggaran itu diberi berikutan kerajaan menyedari masalah yang dihadapi mereka untuk memperbaharui dokumen itu. Katanya, selain itu, bagi rakyat asing yang pas lawatan sosial tamat semasa Perintah Kawalan Pergerakan (PKP) pula boleh ke pejabat Jabatan Imigresen yang terdekat untuk mendapatkan lanjutan tempoh.'
pprint(string_news3)

('Menteri Kanan (Kluster Keselamatan) Datuk Seri Ismail Sabri Yaakob berkata, '
 'kelonggaran itu diberi berikutan kerajaan menyedari masalah yang dihadapi '
 'mereka untuk memperbaharui dokumen itu. Katanya, selain itu, bagi rakyat '
 'asing yang pas lawatan sosial tamat semasa Perintah Kawalan Pergerakan (PKP) '
 'pula boleh ke pejabat Jabatan Imigresen yang terdekat untuk mendapatkan '
 'lanjutan tempoh.')


In [8]:
# https://qcikgubm.blogspot.com/2018/02/contoh-soalan-dan-jawapan-karangan.html

string_karangan = 'Selain itu, pameran kerjaya membantu para pelajar menentukan kerjaya yang akan diceburi oleh mereka. Seperti yang kita ketahui, pasaran kerjaya di Malaysia sangat luas dan masih banyak sektor pekerjaan di negara ini yang masih kosong kerana sukar untuk mencari tenaga kerja yang benar-benar berkelayakan. Sebagai contohnya, sektor perubatan di Malaysia menghadapi masalah kekurangan tenaga kerja yang kritikal, khususnya tenaga pakar disebabkan peletakan jawatan oleh doktor dan pakar perubatan untuk memasuki sektor swasta serta berkembangnya perkhidmatan kesihatan dan perubatan. Setelah menyedari  hakikat ini, para pelajar akan lebih berminat untuk menceburi bidang perubatan kerana pameran kerjaya yang dilaksanakan amat membantu memberikan pengetahuan am tentang kerjaya ini'
pprint(string_karangan)

('Selain itu, pameran kerjaya membantu para pelajar menentukan kerjaya yang '
 'akan diceburi oleh mereka. Seperti yang kita ketahui, pasaran kerjaya di '
 'Malaysia sangat luas dan masih banyak sektor pekerjaan di negara ini yang '
 'masih kosong kerana sukar untuk mencari tenaga kerja yang benar-benar '
 'berkelayakan. Sebagai contohnya, sektor perubatan di Malaysia menghadapi '
 'masalah kekurangan tenaga kerja yang kritikal, khususnya tenaga pakar '
 'disebabkan peletakan jawatan oleh doktor dan pakar perubatan untuk memasuki '
 'sektor swasta serta berkembangnya perkhidmatan kesihatan dan perubatan. '
 'Setelah menyedari  hakikat ini, para pelajar akan lebih berminat untuk '
 'menceburi bidang perubatan kerana pameran kerjaya yang dilaksanakan amat '
 'membantu memberikan pengetahuan am tentang kerjaya ini')


In [9]:
# https://www.parlimen.gov.my/bills-dewan-rakyat.html?uweb=dr#, RUU Kumpulan Wang Simpanan Pekerja (Pindaan) 2019

string_parlimen = 'Subfasal 6(b) bertujuan untuk memasukkan subseksyen baharu 39(3) dan (4) ke dalam Akta 452. Subseksyen (3) yang dicadangkan bertujuan untuk menjadikan suatu kesalahan bagi mana-mana orang yang meninggalkan Malaysia tanpa membayar caruman yang tertunggak dan kena dibayar atau mengemukakan jaminan bagi pembayarannya. Subseksyen (4) yang dicadangkan memperuntukkan bahawa bagi maksud seksyen 39 Akta 452, “caruman” termasuklah apa-apa dividen atau caj lewat bayar yang kena dibayar ke atas mana-mana caruman.'
pprint(string_parlimen)

('Subfasal 6(b) bertujuan untuk memasukkan subseksyen baharu 39(3) dan (4) ke '
 'dalam Akta 452. Subseksyen (3) yang dicadangkan bertujuan untuk menjadikan '
 'suatu kesalahan bagi mana-mana orang yang meninggalkan Malaysia tanpa '
 'membayar caruman yang tertunggak dan kena dibayar atau mengemukakan jaminan '
 'bagi pembayarannya. Subseksyen (4) yang dicadangkan memperuntukkan bahawa '
 'bagi maksud seksyen 39 Akta 452, “caruman” termasuklah apa-apa dividen atau '
 'caj lewat bayar yang kena dibayar ke atas mana-mana caruman.')


In [10]:
string_random1 = 'saya menikmati filem mengenai makhluk asing yang menyerang bumi. <> Saya fikir fiksyen sains adalah genre yang luar biasa untuk apa sahaja. Sains masa depan, teknologi, perjalanan masa, perjalanan FTL, semuanya adalah konsep yang menarik. <> Saya sendiri peminat fiksyen sains!'
pprint(string_random1)

('saya menikmati filem mengenai makhluk asing yang menyerang bumi. <> Saya '
 'fikir fiksyen sains adalah genre yang luar biasa untuk apa sahaja. Sains '
 'masa depan, teknologi, perjalanan masa, perjalanan FTL, semuanya adalah '
 'konsep yang menarik. <> Saya sendiri peminat fiksyen sains!')


In [11]:
string_random2 = 'Fiksyen sains <> saya menikmati filem mengenai makhluk asing yang menyerang bumi. <> Fiksyen sains (sering dipendekkan menjadi SF atau sci-fi) adalah genre fiksyen spekulatif, biasanya berurusan dengan konsep khayalan seperti sains dan teknologi futuristik, perjalanan angkasa, perjalanan waktu, lebih cepat daripada perjalanan ringan, alam semesta selari, dan kehidupan di luar bumi .'
pprint(string_random2)

('Fiksyen sains <> saya menikmati filem mengenai makhluk asing yang menyerang '
 'bumi. <> Fiksyen sains (sering dipendekkan menjadi SF atau sci-fi) adalah '
 'genre fiksyen spekulatif, biasanya berurusan dengan konsep khayalan seperti '
 'sains dan teknologi futuristik, perjalanan angkasa, perjalanan waktu, lebih '
 'cepat daripada perjalanan ringan, alam semesta selari, dan kehidupan di luar '
 'bumi .')


#### Translate transformer base

In [13]:
%%time

pprint(transformer.greedy_decoder([string_news1, string_news2, string_news3]))

['TANGKAK - Tan Sri Muhyiddin Yassin said he did not want to touch on '
 'political issues at the moment, instead focusing on the welfare of the '
 "people and efforts to revitalize the affected country's economy following "
 'the Covid-19 pandemic. The prime minister explained the matter when speaking '
 'at a Leadership Meeting with Gambir State Assembly (DUN) leaders at the '
 'Bukit Gambir Multipurpose Hall today.',
 'ALOR SETAR - Pakatan Harapan (PH) political turmoil has not ended when it '
 "has failed to finalize the Prime Minister's candidate agreed upon. Sik MP "
 'Ahmad Tarmizi Sulaiman said he had suggested former United Nations (UN) '
 "Indigenous Party chairman Tun Dr Mahathir Mohamad and People's Justice Party "
 '(PKR) president Datuk Seri Anwar Ibrahim resign from politics as a solution.',
 'Senior Minister (Security Cluster) Datuk Seri Ismail Sabri Yaakob said the '
 'relaxation was given as the government was aware of the problems they had to '
 'renew the document. 

In [14]:
%%time

pprint(quantized_transformer.greedy_decoder([string_news1, string_news2, string_news3]))

['TANGKAK - Tan Sri Muhyiddin Yassin said he did not want to touch on '
 'political issues at the moment, instead focusing on the welfare of the '
 "people and efforts to revitalize the affected country's economy following "
 'the Covid-19 pandemic. The prime minister explained the matter when speaking '
 'at a Leadership Meeting with Gambir State Assembly (DUN) leaders at the '
 'Bukit Gambir Multipurpose Hall today.',
 'ALOR SETAR - Pakatan Harapan (PH) political turmoil has not ended when it '
 "has failed to finalize the Prime Minister's candidate agreed upon. Sik MP "
 'Ahmad Tarmizi Sulaiman said he had suggested former United Nations (UN) '
 "Indigenous Party chairman Tun Dr Mahathir Mohamad and People's Justice Party "
 '(PKR) president Datuk Seri Anwar Ibrahim resign from politics as a solution.',
 'Senior Minister (Security Cluster) Datuk Seri Ismail Sabri Yaakob said the '
 'relaxation was given as the government was aware of the problems they had to '
 'renew the document. 

In [15]:
%%time

pprint(transformer.greedy_decoder([string_karangan, string_parlimen]))

['In addition, career exhibitions help students determine their careers. As we '
 'know, the career market in Malaysia is very broad and there are still many '
 'job sectors in the country that are still vacant because it is difficult to '
 'find a truly qualified workforce. For example, the medical sector in '
 'Malaysia is facing a critical shortage of labor, especially specialists due '
 'to the resignation of doctors and physicians to enter the private sector and '
 'develop health and medical services. Upon realizing this fact, students will '
 'be more interested in medicine because the exhibition careers are very '
 'helpful in providing general knowledge of this career.',
 'Subclause 6 (b) seeks to introduce new subsections 39 (3) and (4) into Act '
 '452. Subsection (3) proposed to make an offense for any person leaving '
 'Malaysia without paying a deferred and payable contribution or filing a '
 'guarantee for payment. Subsection (4) proposed provides that for the purpose '


In [16]:
%%time

pprint(quantized_transformer.greedy_decoder([string_karangan, string_parlimen]))

['In addition, career exhibitions help students determine their careers. As we '
 'know, the career market in Malaysia is very broad and there are still many '
 'job sectors in the country that are still vacant because it is difficult to '
 'find a truly qualified workforce. For example, the medical sector in '
 'Malaysia is facing a critical shortage of labor, especially specialists due '
 'to the resignation of doctors and physicians to enter the private sector and '
 'develop health and medical services. Upon realizing this fact, students will '
 'be more interested in the medical field as the career exhibitions are very '
 'helpful to provide general knowledge of this career.',
 'Subclause 6 (b) seeks to introduce new subsections 39 (3) and (4) into Act '
 '452. Subsection (3) proposed to make an offense for any person leaving '
 'Malaysia without paying a deferred and payable contribution or to submit a '
 'guarantee for his payment. Subsection (4) proposed provides that for the '

In [17]:
%%time

result = transformer.greedy_decoder([string_random1, string_random2])
pprint(result)

['I enjoy movies about aliens attacking the earth. <> I think science fiction '
 'is an incredible genre for anything. Future science, technology, time '
 "travel, FTL travel, everything is an exciting concept. <> I'm a science "
 'fiction fan!',
 'Science fiction <> I enjoy movies about aliens invading the earth. <> '
 'Science fiction (often shortened to SF or sci-fi) is a genre of speculative '
 'fiction, usually dealing with imaginary concepts such as science and '
 'futuristic technology, space travel, time travel, faster than light travel, '
 'parallel universe, and life abroad.']
CPU times: user 19.1 s, sys: 10.3 s, total: 29.5 s
Wall time: 6.77 s


In [18]:
%%time

result = quantized_transformer.greedy_decoder([string_random1, string_random2])
pprint(result)

['I enjoy movies about aliens attacking the earth. <> I think science fiction '
 'is an incredible genre for anything. Future science, technology, time '
 "travel, FTL travel, everything is an exciting concept. <> I'm a science "
 'fiction fan!',
 'Science fiction <> I enjoy movies about aliens invading the earth. <> '
 'Science fiction (often shortened to SF or sci-fi) is a genre of speculative '
 'fiction, usually dealing with imaginary concepts such as science and '
 'futuristic technology, space travel, time travel, faster than light travel, '
 'parallel universe, and life abroad.']
CPU times: user 19.2 s, sys: 11.7 s, total: 30.9 s
Wall time: 6.66 s


#### Translate transformer small

In [19]:
%%time

pprint(transformer_small.greedy_decoder([string_news1, string_news2, string_news3]))

['TANGKAK - Tan Sri Muhyiddin Yassin said he did not want to touch on '
 'political issues at this time, instead focusing on the welfare of the people '
 "and efforts to revitalize the country's economy affected following the "
 'Covid-19 pandemic. The Prime Minister explained the matter when speaking at '
 'the Leaders Meeting with the leaders of the Gambir State Assembly (DUN) '
 'community at the Bukit Gambir Multipurpose Hall today.',
 'ALOR SETAR - Pakatan Harapan (PH) political turmoil has not been expected '
 "when it still fails to finalize the Prime Minister's candidate agreed "
 'together. Sik MP Ahmad Tarmizi Sulaiman said the party had suggested former '
 'United Nations Indigenous Party (UN) chairman Tun Dr Mahathir Mohamad and '
 "President of the People's Justice Party (PKR), Datuk Seri Anwar Ibrahim "
 'resigned from politics as a solution.',
 'Senior Minister (Security Cluster) Datuk Seri Ismail Sabri Yaakob said the '
 'relaxation was given as the government was aware

In [20]:
%%time

pprint(transformer_small.greedy_decoder([string_karangan, string_parlimen]))

['In addition, career exhibitions help students determine their careers. As we '
 'know, the career market in Malaysia is very broad and many employment '
 'sectors in the country are still vacant because it is difficult to find a '
 'truly qualified workforce. For example, the medical sector in Malaysia is '
 'facing critical labor shortages, especially specialists as specialists '
 'resign as well as medical professionals to enter the private sector and the '
 'development of health and medical services. After realizing this fact that '
 'students will be more interested in medicine as the exhibition of careers is '
 'helping to provide general knowledge of this career in this career as this '
 'career is especially in providing general knowledge of this career as this '
 'career as this career is difficult to gain knowledge of this career as it is '
 'difficult to gain knowledge of this career is difficult to find the career '
 'as it is difficult to find the career as it is difficu

In [21]:
%%time

result = transformer_small.greedy_decoder([string_random1, string_random2])
pprint(result)

['I enjoy movies about aliens attacking the earth. <> I think science fiction '
 'is a great genre for whatever future science, technology, travel, FTL '
 'travel, all of which is an interesting concept. <> I personally love science '
 'fiction!',
 'science fiction <> I enjoy movies about aliens who attack the earth. <> The '
 'science fiction (often shortened to SF or sci-fi) is a speculative fiction '
 'genre, usually dealing with the concept of imaginary science and futuristic '
 'technology, space travel, travel, faster than light travel, parallel '
 'universe, and outer life.']
CPU times: user 2.64 s, sys: 402 ms, total: 3.04 s
Wall time: 773 ms


### compare with Google translate using googletrans

Install it by,

```bash
pip3 install googletrans==4.0.0rc1
```

In [12]:
from googletrans import Translator

translator = Translator()

In [13]:
r = translator.translate(string_news1, src='ms', dest = 'en')
print(r.text)

TANGKAK - Tan Sri Muhyiddin Yassin said he did not want to touch on political issues at this time, instead of focusing on the welfare of the people and efforts to regenerate the country's economy following the Covid -19 pandemic.The prime minister explained the matter when speaking at a ceremony with a leader of the Gambir State Assembly (DUN) community leader at the Bukit Gambir Multipurpose Hall today.


In [14]:
r = translator.translate(string_news2, src='ms', dest = 'en')
print(r.text)

ALOR SETAR - The Pakatan Harapan (PH) political turmoil has not ended when it fails to finalize the agreed prime ministerial candidate.Sik Member of Parliament Ahmad Tarmizi Sulaiman said he had suggested former United Indigenous Party (UN) chairman Tun Dr Mahathir Mohamad and the People's Justice Party (PKR) president Datuk Seri Anwar Ibrahim resigned from politics as a solution.


In [15]:
r = translator.translate(string_news3, src='ms', dest = 'en')
print(r.text)

Senior Minister (Security Cluster) Datuk Seri Ismail Sabri Yaakob said the relaxation was given as the government was aware of the problem they had to renew the document.He said, for foreigners, the social visit ended during the Movement Control Order (CPP) could go to the nearest Immigration Department's office for extension.


In [16]:
r = translator.translate(string_karangan, src='ms', dest = 'en')
print(r.text)

In addition, career exhibitions help students determine the careers they will be involved in.As we know, the career market in Malaysia is very broad and there are still many employment sectors in the country that are still vacant because it is difficult to find a truly qualified workforce.For example, the medical sector in Malaysia is facing critical workforce problems, especially experts due to the resignation of doctors and physicians to enter the private sector as well as the growth of health and medical services.Upon realizing this fact, students will be more interested in getting into medicine because their career exhibitions are very helpful in providing general knowledge of this career


In [18]:
r = translator.translate(string_parlimen, src='ms', dest = 'en')
print(r.text)

Subfasal 6 (b) aims to include new subsections 39 (3) and (4) into Act 452. Subsection (3) proposed to make an offense for any person leaving Malaysia without paying an outstanding and payable contributionor submit a guarantee for his payment.Subsection (4) proposed providing that for the purposes of section 39 of Act 452, "Contribution" includes any dividends or late payments payable on any contribution.
