## Named Entity Recognition (NER) and Bidirectional LSTM

**Named Entity Recognition (NER)** is a natural language processing (NLP) task that involves identifying and classifying entities in text into predefined categories, such as:
- Person names (e.g., "John Doe")
- Organizations (e.g., "OpenAI")
- Locations (e.g., "New York")
- Dates and time (e.g., "January 1, 2023")

### NER with Bidirectional LSTM

A Bidirectional Long Short-Term Memory (BiLSTM) network is an effective architecture for NER because it considers the context of words in both forward and backward directions. This enables the model to capture dependencies from both preceding and succeeding words in the sequence, which is crucial for understanding context in text.

#### Key Components:
1. **Input Representation**: Words are represented as word embeddings, such as Word2Vec, GloVe, or contextual embeddings like BERT.
2. **Bidirectional LSTM**: Processes the sequence of embeddings in both forward and backward directions, producing context-aware representations for each word.
3. **Output Layer**: A dense layer followed by a softmax activation is used to classify each word into its respective entity class.

#### Mathematical Explanation

Let $ x_t $ be the embedding of the word at position $ t $ in a sequence of length $ T $.

1. **Forward LSTM**:
   The hidden state at time $ t $ is computed as:
   $$
   \overrightarrow{h_t} = \text{LSTM}(x_t, \overrightarrow{h_{t-1}})
  $$
2. **Backward LSTM**:
   Similarly, the hidden state for the backward LSTM is:
   $$
   \overleftarrow{h_t} = \text{LSTM}(x_t, \overleftarrow{h_{t+1}})
  $$
3. **Concatenation**:
   The final hidden state for each word combines both directions:
   $$
   h_t = \text{concat}(\overrightarrow{h_t}, \overleftarrow{h_t})
  $$

4. **Classification**:
   The output for each word is computed by applying a dense layer with softmax activation:
   $$
   y_t = \text{softmax}(W \cdot h_t + b)
  $$
   where $ W $ and $ b $ are trainable parameters, and $ y_t $ represents the probabilities for each entity class.

#### Workflow:
1. Preprocess the text and tokenize it.
2. Convert tokens into embeddings.
3. Pass embeddings through the BiLSTM.
4. Use the output states for each token to predict entity classes.

By training the BiLSTM model on annotated datasets (e.g., CoNLL-2003), the model learns to effectively recognize entities in text based on context.


## CoNLL-2003 
The **CoNLL-2003 dataset** is one of the most widely used datasets for Named Entity Recognition (NER). It was introduced as part of the Conference on Natural Language Learning (CoNLL) shared task in 2003. This dataset is annotated for four types of entities:

1. **PER**: Person names
2. **ORG**: Organizations
3. **LOC**: Locations
4. **MISC**: Miscellaneous entities that do not fit into the other categories

### Dataset Structure
The dataset consists of text data in multiple languages (e.g., English, German), but the English portion is most commonly used. It contains:
- **Tokens**: Words from sentences.
- **Part-of-Speech (POS) tags**: POS tags for each word.
- **Chunk tags**: Information about syntactic chunks.
- **NER labels**: Entity labels in the BIO format (Begin-Inside-Outside).

#### Example:
| Token   | POS   | Chunk   | NER   |
|---------|-------|---------|-------|
| EU      | NNP   | I-NP    | B-ORG |
| rejects | VBZ   | I-VP    | O     |
| German  | JJ    | I-NP    | B-MISC|
| call    | NN    | I-NP    | O     |
| to      | TO    | I-VP    | O     |
| boycott | VB    | I-VP    | O     |
| British | JJ    | I-NP    | B-MISC|
| lamb    | NN    | I-NP    | O     |
| .       | .     | O       | O     |

### BIO Format
The dataset uses the **BIO tagging scheme**:
- **B**: Beginning of an entity.
- **I**: Inside an entity.
- **O**: Outside of any entity.



### 1. Dataset and Dataloader Creation

### 2. Model Creation

We use a Bidirectional LSTM (BiLSTM) model with a CRF layer or a Transformer-based model like BERT for NER.

### 3. Training Loop

The training loop involves forward propagation, loss computation, and backpropagation. Use `CrossEntropyLoss` for classification tasks.

### 4. Evaluation Process

Evaluate the model on the test set using metrics like accuracy, precision, recall, and F1-score.
