# ‚ùì Question Answering Systems

**Author**: Data Science Master System  
**Difficulty**: ‚≠ê‚≠ê‚≠ê Advanced  
**Time**: 60 minutes  
**Prerequisites**: 18_nlp_text_generation

## Learning Objectives
- Extractive vs Generative QA
- Use BERT for extractive QA
- Build RAG pipelines
- Evaluate QA systems

In [None]:
import torch

## 1. Extractive QA with Pipeline

In [None]:
try:
    from transformers import pipeline
    
    qa = pipeline('question-answering')
    
    context = """Machine learning is a subset of artificial intelligence that enables 
    systems to learn from data. It was pioneered by researchers like Arthur Samuel 
    in the 1950s. Today, ML powers recommendation systems, self-driving cars, and 
    natural language processing."""
    
    questions = [
        "What is machine learning?",
        "Who pioneered machine learning?",
        "What does ML power today?"
    ]
    
    print("‚ùì Question Answering:")
    for q in questions:
        result = qa(question=q, context=context)
        print(f"\nQ: {q}")
        print(f"A: {result['answer']} (confidence: {result['score']:.2%})")
        
except ImportError:
    print("Install: pip install transformers")

## 2. QA Types

In [None]:
import pandas as pd

qa_types = pd.DataFrame({
    'Type': ['Extractive', 'Generative', 'Open-Domain', 'Multi-hop'],
    'Method': ['Span extraction', 'Generate answer', 'Retrieve + Answer', 'Multiple docs'],
    'Model': ['BERT', 'T5/GPT', 'RAG', 'HotpotQA'],
    'Use Case': ['Document QA', 'Chatbots', 'Search', 'Complex reasoning']
})

display(qa_types)

## 3. RAG (Retrieval-Augmented Generation)

In [None]:
rag_pipeline = '''
RAG Pipeline:

1. User Question
       ‚Üì
2. Embed Question (sentence-transformers)
       ‚Üì
3. Search Vector DB (FAISS, Pinecone, Chroma)
       ‚Üì
4. Retrieve Top-K Documents
       ‚Üì
5. Combine: Question + Context
       ‚Üì
6. Generate Answer (LLM)
       ‚Üì
7. Return Answer + Sources
'''
print(rag_pipeline)

## üéØ Key Takeaways
1. Extractive: Fast, precise, limited
2. Generative: Flexible, may hallucinate
3. RAG: Best of both worlds
4. Always cite sources in production

**Next**: 20_nlp_multilingual.ipynb