This project is an implementation of an English Part-of-Speech (POS) tagger using the Viterbi algorithm. The POS tagger is designed to label each word in a sentence with its corresponding part of speech (e.g., noun, verb, adjective).
The POS tagger uses the Viterbi algorithm, which is a dynamic programming algorithm used for finding the most likely sequence of hidden states (in this case, POS tags) that result in a sequence of observed events (words). The model is trained on a tagged corpus and then used to predict the POS tags for new sentences.
- Efficient POS tagging: Utilizes the Viterbi algorithm for efficient and accurate POS tagging.
- Customizable: Can be retrained with different corpora to improve accuracy for specific domains.
- Easy to use: Simple interface for tagging new sentences.
To install and run the POS tagger, follow these steps:
-
Clone the repository:
git clone https://github.com/yourusername/english-pos-tagger.git cd english-pos-tagger -
Install the required dependencies:
pip install -r requirements.txt
To use the POS tagger, you can check my app
| Metric | Precision | Recall | F1-Score | Support |
|---|---|---|---|---|
| Accuracy | 0.51 | 21961 | ||
| Macro Avg | 0.48 | 0.45 | 0.46 | 21961 |
| Weighted Avg | 0.54 | 0.51 | 0.52 | 21961 |
If you want to train the model with a different corpus, just pass the dataset into HMM model in the code.model
from code.model import HMM
import pickle
model = HMM(dataset)
with open('/kaggle/working/hmm_model.pkl','wb') as f:
pickle.dump(model,f)The dataset structure being:
data.json
[
{
"index"->int:{0,1,2...},
"sentence"->list[str]:{["word",...]}
"labels"->list[str]:{["VB","NN"]}
},
...
]
ENGLISH_POS_TAGGER/
├── code/
│ ├── __pycache__/
│ ├── __init__.py
│ ├── dataset.py
│ ├── model.py
│ └── viterbi_decoding.py
├── input/
│ ├── dev.json
│ └── train.json
├── app.py
├── english-pos-tagger-viterbi.ipynb
├── hmm_model.pkl
├── README.md
└── requirements.txt
- numpy
- pandas
To install the dependencies, run:
pip install -r requirements.txtThank you for using the English POS Tagger! If you have any questions or issues, please open an issue in the repository.