Skip to content

An POS tagger based on english language made using Hidden Markov Model and decoded using viterbi algorithm

Notifications You must be signed in to change notification settings

icode100/English_POS_tagger

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

English POS Tagger

This project is an implementation of an English Part-of-Speech (POS) tagger using the Viterbi algorithm. The POS tagger is designed to label each word in a sentence with its corresponding part of speech (e.g., noun, verb, adjective).

Table of Contents

Introduction

The POS tagger uses the Viterbi algorithm, which is a dynamic programming algorithm used for finding the most likely sequence of hidden states (in this case, POS tags) that result in a sequence of observed events (words). The model is trained on a tagged corpus and then used to predict the POS tags for new sentences.

Features

  • Efficient POS tagging: Utilizes the Viterbi algorithm for efficient and accurate POS tagging.
  • Customizable: Can be retrained with different corpora to improve accuracy for specific domains.
  • Easy to use: Simple interface for tagging new sentences.

Installation

To install and run the POS tagger, follow these steps:

  1. Clone the repository:

    git clone https://github.com/yourusername/english-pos-tagger.git
    cd english-pos-tagger
  2. Install the required dependencies:

    pip install -r requirements.txt

Usage

To use the POS tagger, you can check my app

Scores

Metric Precision Recall F1-Score Support
Accuracy 0.51 21961
Macro Avg 0.48 0.45 0.46 21961
Weighted Avg 0.54 0.51 0.52 21961

Training the Model

If you want to train the model with a different corpus, just pass the dataset into HMM model in the code.model

from code.model import HMM
import pickle
model = HMM(dataset)
with open('/kaggle/working/hmm_model.pkl','wb') as f:
    pickle.dump(model,f)

The dataset structure being:

data.json
[
    {
        "index"->int:{0,1,2...},
        "sentence"->list[str]:{["word",...]}
        "labels"->list[str]:{["VB","NN"]}
    },
    ...
]

File Structure

ENGLISH_POS_TAGGER/
├── code/
│   ├── __pycache__/
│   ├── __init__.py
│   ├── dataset.py
│   ├── model.py
│   └── viterbi_decoding.py
├── input/
│   ├── dev.json
│   └── train.json
├── app.py
├── english-pos-tagger-viterbi.ipynb
├── hmm_model.pkl
├── README.md
└── requirements.txt

Dependencies

  • numpy
  • pandas

To install the dependencies, run:

pip install -r requirements.txt

Thank you for using the English POS Tagger! If you have any questions or issues, please open an issue in the repository.

About

An POS tagger based on english language made using Hidden Markov Model and decoded using viterbi algorithm

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published