# Part-of-Speech Recognition

<div class="alert alert-info">

This tutorial is available as an IPython notebook at [Malaya/example/part-of-speech](https://github.com/huseinzol05/Malaya/tree/master/example/part-of-speech).
    
</div>

<div class="alert alert-warning">

This module only trained on standard language structure, so it is not save to use it for local language structure.
    
</div>

In [1]:
%%time
import malaya

CPU times: user 2.83 s, sys: 3.88 s, total: 6.71 s
Wall time: 1.95 s


  self.tok = re.compile(r'({})'.format('|'.join(pipeline)))
  self.tok = re.compile(r'({})'.format('|'.join(pipeline)))


### Describe supported POS

In [2]:
malaya.pos.describe

[{'Tag': 'ADJ', 'Description': 'Adjective, kata sifat'},
 {'Tag': 'ADP', 'Description': 'Adposition'},
 {'Tag': 'ADV', 'Description': 'Adverb, kata keterangan'},
 {'Tag': 'ADX', 'Description': 'Auxiliary verb, kata kerja tambahan'},
 {'Tag': 'CCONJ', 'Description': 'Coordinating conjuction, kata hubung'},
 {'Tag': 'DET', 'Description': 'Determiner, kata penentu'},
 {'Tag': 'NOUN', 'Description': ' Noun, kata nama'},
 {'Tag': 'NUM', 'Description': 'Number, nombor'},
 {'Tag': 'PART', 'Description': 'Particle'},
 {'Tag': 'PRON', 'Description': 'Pronoun, kata ganti'},
 {'Tag': 'PROPN', 'Description': 'Proper noun, kata ganti nama khas'},
 {'Tag': 'SCONJ', 'Description': 'Subordinating conjunction'},
 {'Tag': 'SYM', 'Description': 'Symbol'},
 {'Tag': 'VERB', 'Description': 'Verb, kata kerja'},
 {'Tag': 'X', 'Description': 'Other'}]

### List available HuggingFace POS models

In [3]:
malaya.pos.available_huggingface

{'mesolitica/pos-t5-tiny-standard-bahasa-cased': {'Size (MB)': 84.7,
  'PART': {'precision': 0.8938547486033519,
   'recall': 0.9411764705882353,
   'f1': 0.9169054441260744,
   'number': 170},
  'CCONJ': {'precision': 0.9713905522288756,
   'recall': 0.9785522788203753,
   'f1': 0.974958263772955,
   'number': 1492},
  'ADJ': {'precision': 0.9192897497982244,
   'recall': 0.88984375,
   'f1': 0.9043271139341008,
   'number': 1280},
  'ADP': {'precision': 0.9770908087220536,
   'recall': 0.9844271412680756,
   'f1': 0.9807452555755645,
   'number': 3596},
  'ADV': {'precision': 0.9478672985781991,
   'recall': 0.9523809523809523,
   'f1': 0.9501187648456056,
   'number': 1260},
  'VERB': {'precision': 0.9654357459379616,
   'recall': 0.9662921348314607,
   'f1': 0.9658637505541599,
   'number': 3382},
  'DET': {'precision': 0.9603854389721628,
   'recall': 0.9542553191489361,
   'f1': 0.9573105656350054,
   'number': 940},
  'NOUN': {'precision': 0.8789933694996986,
   'recall': 0.8976

In [5]:
string = 'KUALA LUMPUR: Sempena sambutan Aidilfitri minggu depan, Perdana Menteri Tun Dr Mahathir Mohamad dan Menteri Pengangkutan Anthony Loke Siew Fook menitipkan pesanan khas kepada orang ramai yang mahu pulang ke kampung halaman masing-masing. Dalam video pendek terbitan Jabatan Keselamatan Jalan Raya (JKJR) itu, Dr Mahathir menasihati mereka supaya berhenti berehat dan tidur sebentar  sekiranya mengantuk ketika memandu.'

### Load HuggingFace model

```python
def huggingface(
    model: str = 'mesolitica/pos-t5-small-standard-bahasa-cased',
    force_check: bool = True,
    **kwargs,
):
    """
    Load HuggingFace model to Part-of-Speech Recognition.

    Parameters
    ----------
    model: str, optional (default='mesolitica/pos-t5-small-standard-bahasa-cased')
        Check available models at `malaya.pos.available_huggingface`.
    force_check: bool, optional (default=True)
        Force check model one of malaya model.
        Set to False if you have your own huggingface model.

    Returns
    -------
    result: malaya.torch_model.huggingface.Tagging
    """
```

In [9]:
model = malaya.pos.huggingface()

#### Predict

```python
def predict(self, string: str):
    """
    Tag a string.

    Parameters
    ----------
    string : str

    Returns
    -------
    result: Tuple[str, str]
    """
```

In [7]:
model.predict(string)

Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.


[('KUALA', 'PROPN'),
 ('LUMPUR:', 'PROPN'),
 ('Sempena', 'PROPN'),
 ('sambutan', 'NOUN'),
 ('Aidilfitri', 'PROPN'),
 ('minggu', 'NOUN'),
 ('depan,', 'ADJ'),
 ('Perdana', 'PROPN'),
 ('Menteri', 'PROPN'),
 ('Tun', 'PROPN'),
 ('Dr', 'PROPN'),
 ('Mahathir', 'PROPN'),
 ('Mohamad', 'PROPN'),
 ('dan', 'CCONJ'),
 ('Menteri', 'PROPN'),
 ('Pengangkutan', 'PROPN'),
 ('Anthony', 'PROPN'),
 ('Loke', 'PROPN'),
 ('Siew', 'PROPN'),
 ('Fook', 'PROPN'),
 ('menitipkan', 'VERB'),
 ('pesanan', 'NOUN'),
 ('khas', 'ADJ'),
 ('kepada', 'ADP'),
 ('orang', 'NOUN'),
 ('ramai', 'NOUN'),
 ('yang', 'PRON'),
 ('mahu', 'ADV'),
 ('pulang', 'VERB'),
 ('ke', 'ADP'),
 ('kampung', 'NOUN'),
 ('halaman', 'NOUN'),
 ('masing-masing.', 'DET'),
 ('Dalam', 'ADP'),
 ('video', 'NOUN'),
 ('pendek', 'ADJ'),
 ('terbitan', 'NOUN'),
 ('Jabatan', 'PROPN'),
 ('Keselamatan', 'PROPN'),
 ('Jalan', 'PROPN'),
 ('Raya', 'PROPN'),
 ('(JKJR)', 'PUNCT'),
 ('itu,', 'DET'),
 ('Dr', 'PROPN'),
 ('Mahathir', 'PROPN'),
 ('menasihati', 'VERB'),
 ('mereka

#### Group similar tags

```python
def analyze(self, string: str):
        """
        Analyze a string.

        Parameters
        ----------
        string : str

        Returns
        -------
        result: {'words': List[str], 'tags': [{'text': 'text', 'type': 'location', 'score': 1.0, 'beginOffset': 0, 'endOffset': 1}]}
        """
```

In [8]:
model.analyze(string)

[{'text': ['KUALA', 'LUMPUR:', 'Sempena'],
  'type': 'PROPN',
  'score': 1.0,
  'beginOffset': 0,
  'endOffset': 3},
 {'text': ['sambutan'],
  'type': 'NOUN',
  'score': 1.0,
  'beginOffset': 3,
  'endOffset': 4},
 {'text': ['Aidilfitri'],
  'type': 'PROPN',
  'score': 1.0,
  'beginOffset': 4,
  'endOffset': 5},
 {'text': ['minggu'],
  'type': 'NOUN',
  'score': 1.0,
  'beginOffset': 5,
  'endOffset': 6},
 {'text': ['depan,'],
  'type': 'ADJ',
  'score': 1.0,
  'beginOffset': 6,
  'endOffset': 7},
 {'text': ['Perdana', 'Menteri', 'Tun', 'Dr', 'Mahathir', 'Mohamad'],
  'type': 'PROPN',
  'score': 1.0,
  'beginOffset': 7,
  'endOffset': 13},
 {'text': ['dan'],
  'type': 'CCONJ',
  'score': 1.0,
  'beginOffset': 13,
  'endOffset': 14},
 {'text': ['Menteri', 'Pengangkutan', 'Anthony', 'Loke', 'Siew', 'Fook'],
  'type': 'PROPN',
  'score': 1.0,
  'beginOffset': 14,
  'endOffset': 20},
 {'text': ['menitipkan'],
  'type': 'VERB',
  'score': 1.0,
  'beginOffset': 20,
  'endOffset': 21},
 {'tex