# Neural Chatbots

- ðŸ“º **Video:** [https://youtu.be/Hc7P3QukmJk](https://youtu.be/Hc7P3QukmJk)

## Overview
- Use neural sequence models to generate conversational responses.
- Learn about encoder-decoder architectures and beam search decoding.

## Key ideas
- **Seq2seq training:** maximize likelihood of target response given context.
- **Teacher forcing:** stabilize training but can cause exposure bias.
- **Decoding:** sampling, beam search, or nucleus sampling control diversity.
- **Conditioning:** persona or style tokens steer outputs.

## Demo
Train a miniature seq2seq chatbot on tiny pairs to echo the lecture (https://youtu.be/kohvfY7l8Ww) with quick experimentation.

In [1]:
import torch
from torch import nn

pairs = [
    ('hi', 'hello there!'),
    ('how are you?', 'i am doing well!'),
    ('thanks', 'you are welcome!')
]

vocab = sorted(set(' '.join(u + ' ' + v for u, v in pairs))) + ['<pad>', '<s>', '</s>']
char_to_id = {c: i for i, c in enumerate(vocab)}

max_len = 20

def encode(text):
    ids = [char_to_id['<s>']] + [char_to_id[c] for c in text] + [char_to_id['</s>']]
    ids += [char_to_id['<pad>']] * (max_len - len(ids))
    return ids

src = torch.tensor([encode(u) for u, _ in pairs])
tgt = torch.tensor([encode(v) for _, v in pairs])

embed = nn.Embedding(len(vocab), 32)
encoder = nn.GRU(32, 32, batch_first=True)
decoder = nn.GRU(32, 32, batch_first=True)
out = nn.Linear(32, len(vocab))
criterion = nn.CrossEntropyLoss(ignore_index=char_to_id['<pad>'])
optimizer = torch.optim.Adam(list(embed.parameters()) + list(encoder.parameters()) + list(decoder.parameters()) + list(out.parameters()), lr=5e-3)

for epoch in range(1, 201):
    enc_in = embed(src)
    _, hidden = encoder(enc_in)
    dec_in = embed(tgt[:, :-1])
    outputs, _ = decoder(dec_in, hidden)
    logits = out(outputs)
    loss = criterion(logits.reshape(-1, len(vocab)), tgt[:, 1:].reshape(-1))
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    if epoch % 50 == 0:
        print(f"epoch {epoch:3d} | loss {loss.item():.4f}")

with torch.no_grad():
    prompt = torch.tensor([encode('hi')])
    enc_in = embed(prompt)
    _, hidden = encoder(enc_in)
    dec_input = torch.tensor([[char_to_id['<s>']]])
    response = []
    hidden_state = hidden
    for _ in range(max_len):
        emb_in = embed(dec_input)
        output, hidden_state = decoder(emb_in, hidden_state)
        logits = out(output.squeeze(1))
        next_id = logits.argmax(dim=-1)
        token = next_id.item()
        if token == char_to_id['</s>']:
            break
        if token != char_to_id['<pad>']:
            response.append(vocab[token])
        dec_input = next_id.unsqueeze(1)
    print('Generated response:', ''.join(response))


epoch  50 | loss 0.3282
epoch 100 | loss 0.0410


epoch 150 | loss 0.0167
epoch 200 | loss 0.0100
Generated response: hello there!


## Try it
- Modify the demo
- Add a tiny dataset or counter-example


## References
- [MCTest: A Challenge Dataset for the Open-Domain Machine Comprehension of Text](https://www.aclweb.org/anthology/D13-1020.pdf)
- [SQuAD: 100,000+ Questions for Machine Comprehension of Text](https://www.aclweb.org/anthology/D16-1264/)
- [Adversarial Examples for Evaluating Reading Comprehension Systems](https://www.aclweb.org/anthology/D17-1215/)
- [Reading Wikipedia to Answer Open-Domain Questions](https://arxiv.org/abs/1704.00051)
- [Latent Retrieval for Weakly Supervised Open Domain Question Answering](https://www.aclweb.org/anthology/P19-1612.pdf)
- [[Website] Natural Questions](https://ai.google.com/research/NaturalQuestions)
- [retrieval-augmented generation](https://arxiv.org/pdf/2005.11401.pdf)
- [WebGPT](https://arxiv.org/abs/2112.09332)
- [HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering](https://arxiv.org/abs/1809.09600)
- [Understanding Dataset Design Choices for Multi-hop Reasoning](https://www.aclweb.org/anthology/N19-1405/)
- [Learning to Retrieve Reasoning Paths over Wikipedia Graph for Question Answering](https://openreview.net/forum?id=SJgVHkrYDH)
- [QAMPARI](https://arxiv.org/abs/2205.12665)
- [Wizards of Wikipedia: Knowledge-Powered Conversational Agents](https://arxiv.org/pdf/1811.01241.pdf)
- [Task-Oriented Dialogue as Dataflow Synthesis](https://arxiv.org/abs/2009.11423)
- [A Neural Network Approach to Context-Sensitive Generation of Conversational Responses](https://arxiv.org/abs/1506.06714)
- [A Diversity-Promoting Objective Function for Neural Conversation Models](https://arxiv.org/abs/1510.03055)
- [Recipes for building an open-domain chatbot](https://arxiv.org/pdf/2004.13637.pdf)
- [Kurt Shuster et al.](https://arxiv.org/abs/2208.03188)
- [character.ai](https://character.ai)


*Links only; we do not redistribute slides or papers.*