## Introduction 

In this notebook we will go through a CRF only named-entity recognition implementation based on finance corpus. The following would be the sequence of the notebook:
<br>
1. Loading the dataset into a dataframe
2. Data Preprocessing
3. Extract features from the sentences (Feature Engineering)
4. Training a Condtional Random Field model
5. Evaluating the trained CRF model
6. Optimising the hyperparameters 

## Import the required libraries

In [1]:
import pandas as pd
import numpy as np 

from sklearn.model_selection import RandomizedSearchCV
from sklearn.model_selection import learning_curve
from sklearn.model_selection import train_test_split

from sklearn_crfsuite import CRF
from sklearn.metrics import make_scorer
from sklearn_crfsuite import metrics
from sklearn.exceptions import UndefinedMetricWarning 

import warnings
import math
import sys

## Import the dataset into a dataframe

In [7]:
# Read the NER data using spaces as separators, keeping blank lines and adding columns
ner_data = pd.read_csv("../Data/tag1.csv", skip_blank_lines=False, encoding="utf-8", index_col=None)
ner_data.columns = ["Token", "NE"]

tag_distribution = ner_data.groupby("NE").size().reset_index(name='counts')
print(tag_distribution)

                     NE  counts
0        B-Counterparty       2
1  B-Direction of Trade       2
2     B-Expiration Date       2
3          B-Fixed Rate       2
4     B-Notional Amount       2
5    B-Reference Entity       2
6        I-Counterparty       2
7     I-Notional Amount       1
8    I-Reference Entity       5
9                     O      55


Now filtering the classes of Named Entity that we do not require in this analysis

In [None]:
classes = list(filter(lambda x: x not in ["O", np.nan], list(ner_data["ne"].unique())))
print(classes)