English POS Tagger

This project is an implementation of an English Part-of-Speech (POS) tagger using the Viterbi algorithm. The POS tagger is designed to label each word in a sentence with its corresponding part of speech (e.g., noun, verb, adjective).

Introduction

The POS tagger uses the Viterbi algorithm, which is a dynamic programming algorithm used for finding the most likely sequence of hidden states (in this case, POS tags) that result in a sequence of observed events (words). The model is trained on a tagged corpus and then used to predict the POS tags for new sentences.

Features

Efficient POS tagging: Utilizes the Viterbi algorithm for efficient and accurate POS tagging.
Customizable: Can be retrained with different corpora to improve accuracy for specific domains.
Easy to use: Simple interface for tagging new sentences.

Installation

To install and run the POS tagger, follow these steps:

Clone the repository:

git clone https://github.com/yourusername/english-pos-tagger.git
cd english-pos-tagger

Install the required dependencies:
```
pip install -r requirements.txt
```

Usage

To use the POS tagger, you can check my app

Scores

Metric	Precision	Recall	F1-Score	Support
Accuracy			0.51	21961
Macro Avg	0.48	0.45	0.46	21961
Weighted Avg	0.54	0.51	0.52	21961

Training the Model

If you want to train the model with a different corpus, just pass the dataset into HMM model in the code.model

from code.model import HMM
import pickle
model = HMM(dataset)
with open('/kaggle/working/hmm_model.pkl','wb') as f:
    pickle.dump(model,f)

The dataset structure being:

data.json
[
    {
        "index"->int:{0,1,2...},
        "sentence"->list[str]:{["word",...]}
        "labels"->list[str]:{["VB","NN"]}
    },
    ...
]

File Structure

ENGLISH_POS_TAGGER/
├── code/
│   ├── __pycache__/
│   ├── __init__.py
│   ├── dataset.py
│   ├── model.py
│   └── viterbi_decoding.py
├── input/
│   ├── dev.json
│   └── train.json
├── app.py
├── english-pos-tagger-viterbi.ipynb
├── hmm_model.pkl
├── README.md
└── requirements.txt

Dependencies

numpy
pandas

To install the dependencies, run:

pip install -r requirements.txt

Thank you for using the English POS Tagger! If you have any questions or issues, please open an issue in the repository.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

English POS Tagger

Table of Contents

Introduction

Features

Installation

Usage

Scores

Training the Model

File Structure

Dependencies

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
code		code
input		input
README.md		README.md
app.py		app.py
english-pos-tagger-viterbi.ipynb		english-pos-tagger-viterbi.ipynb
hmm_model.pkl		hmm_model.pkl
requirements.txt		requirements.txt

icode100/English_POS_tagger

Folders and files

Latest commit

History

Repository files navigation

English POS Tagger

Table of Contents

Introduction

Features

Installation

Usage

Scores

Training the Model

File Structure

Dependencies

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages