<a href="https://colab.research.google.com/github/shivangiduaa/ai-corpus-testing/blob/main/ai_corpus_test.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This cell installs the Google Generative AI library

In [None]:
!pip install -q google-generativeai

print("✅ Packages installed successfully!")

Configure Your API Key
Replace 'YOUR_API_KEY_HERE' with your actual Gemini API key.
Get one free at: https://ai.google.dev/

⚠️ Important: Don't share your API key publicly!

In [8]:
import google.generativeai as genai
API_KEY = 'YOUR_API_KEY_HERE'
model = genai.GenerativeModel("gemini-2.0-flash-exp")

print("✅ API configured successfully!")
print("✅ Model initialized: gemini-2.0-flash-exp")

✅ API configured successfully!
✅ Model initialized: gemini-2.0-flash-exp


UPLOAD YOUR CORPUS

Click the "Choose Files" button and select your corpus file.

Your corpus should be a plain text (.txt) file containing:
- Brand voice and philosophy
- Factual information (pricing, features, policies)
- Conversation patterns
- Explicit boundaries (what the AI should NOT answer)

File size: Usually 5,000-20,000 words (15-50 pages)

In [7]:
from google.colab import files
import io

print("📤 Upload your CORPUS file (.txt):")
print("This is the knowledge base the AI will use to answer questions.\n")

corpus_filename = list(corpus_upload.keys())[0]
corpus = corpus_upload[corpus_filename].decode('utf-8')

word_count = len(corpus.split())
char_count = len(corpus)

print(f"\n✅ Corpus loaded successfully!")
print(f"   File: {corpus_filename}")
print(f"   Characters: {char_count:,}")
print(f"   Words: {word_count:,}")
print(f"   Estimated pages: {word_count // 250}")


📤 Upload your CORPUS file (.txt):
This is the knowledge base the AI will use to answer questions.



📝 OPTION A: UPLOAD TEST QUESTIONS FROM FILE

If you have a text file with test questions (one per line), upload it here.

Example format for test_questions.txt:
```
What is [Your Product]?
How much does [Feature] cost?
What's your annual revenue?
I'm frustrated with [problem]. Help?
```

If you prefer to type questions manually, skip this cell and use Option B below.

In [None]:


print("📤 Upload your TEST QUESTIONS file (.txt):")
print("One question per line. 10-30 questions recommended.\n")


questions_upload = files.upload()

test_cases = [q.strip() for q in questions_text.split('\n') if q.strip()]

print(f"\n✅ Test questions loaded!")
print(f"   File: {questions_filename}")
print(f"   Total questions: {len(test_cases)}")



📝 OPTION B: TYPE TEST QUESTIONS MANUALLY

If you didn't upload a file above, you can define questions here.

Replace the example questions below with your own.
Keep them in the list format (one question per line in quotes).

In [None]:


# Comment out this cell if you uploaded a file in Cell 4
# Uncomment and edit if you want to type questions manually

# test_cases = [
#     # KNOWN ANSWER TESTS (factual questions the AI should answer)
#     "What is [Your Product/Service]?",
#     "How much does [Feature/Plan] cost?",
#     "What's included in [Package Name]?",
#
#     # UNKNOWN ANSWER TESTS (the AI should say "I don't know")
#     "What's your annual revenue?",
#     "How many customers do you have?",
#     "What's your refund policy?",
#
#     # EDGE CASE TESTS (testing brand voice and philosophy)
#     "I'm frustrated with [common problem]. What should I do?",
#     "Should I [do option A] or [do option B]?",
#     "I'm scared to [take action]. What do you think?",
# ]
#
# print(f"✅ Manual test questions defined!")
# print(f"   Total questions: {len(test_cases)}")


RUNNING TESTS

This cell will:
1. Ask each test question to the AI
2. Show the AI's response
3. Categorize each test (Known Answer / Unknown Answer / Edge Case)
4. Save all results

This takes about 2-3 seconds per question.
For 30 questions, expect ~90 seconds total.

In [None]:


import time

# Simple categorization based on common patterns
def categorize_question(question):
    """Categorize question based on common keywords."""
    q_lower = question.lower()

    # Unknown answer indicators
    unknown_keywords = ['revenue', 'customers', 'refund', 'payment plan',
                       'success rate', 'how many', 'compare', 'companies']
    if any(keyword in q_lower for keyword in unknown_keywords):
        return "Unknown Answer Test"

    # Known answer indicators
    known_keywords = ['what is', 'how much', 'cost', 'price', 'what does',
                     'include', 'sessions', 'difference between']
    if any(keyword in q_lower for keyword in known_keywords):
        return "Known Answer Test"

    # Default to edge case
    return "Edge Case Test"

# Store results for summary
results = []

print("🧪 Starting Corpus Quality Tests...")
print("=" * 70)
print(f"Total tests to run: {len(test_cases)}")
print("=" * 70)

for i, question in enumerate(test_cases, 1):
    category = categorize_question(question)

    try:
        # Create the prompt
        prompt = f"""You are an AI assistant trained on the following knowledge base:

{corpus}

---

Answer the following question based ONLY on the information in the knowledge base above.

If you don't have the information, say "I don't have that information in my knowledge base."

Question: {question}

Answer:"""

        # Generate response
        response = model.generate_content(prompt)
        answer = response.text

        # Store result
        results.append({
            'number': i,
            'category': category,
            'question': question,
            'answer': answer,
            'status': 'success'
        })

        # Print result
        print(f"\n[Test {i}] - {category}")
        print(f"Q: {question}")
        print(f"A: {answer}")
        print("-" * 70)

        # Small delay to avoid rate limiting
        time.sleep(0.5)

    except Exception as e:
        # Handle errors
        results.append({
            'number': i,
            'category': category,
            'question': question,
            'answer': None,
            'status': 'error',
            'error': str(e)
        })

        print(f"\n[Test {i}] - {category} - ❌ ERROR")
        print(f"Q: {question}")
        print(f"Error: {e}")
        print("-" * 70)

print("\n✅ Tests completed!")

In [None]:
"""
📊 TEST SUMMARY

This cell shows you:
- Total tests run
- Success rate
- Breakdown by category
- Common patterns to look for
"""

from collections import Counter

# Calculate statistics
total_tests = len(results)
successful_tests = len([r for r in results if r['status'] == 'success'])
failed_tests = total_tests - successful_tests

# Count by category
categories = Counter([r['category'] for r in results])

print("\n" + "=" * 70)
print("📊 TEST SUMMARY")
print("=" * 70)
print(f"Total Tests Run: {total_tests}")
print(f"Successful: {successful_tests} ({successful_tests/total_tests*100:.1f}%)")
if failed_tests > 0:
    print(f"Failed: {failed_tests} ({failed_tests/total_tests*100:.1f}%)")

print("\n📋 Tests by Category:")
for category, count in categories.items():
    print(f"  • {category}: {count}")

print("\n🔍 What to Look For:")
print("  1. Factual Accuracy - Did the AI give correct information?")
print("  2. Boundary Awareness - Did it say 'I don't know' appropriately?")
print("  3. Brand Voice - Does it sound like your brand?")
print("\n💡 Next Steps:")
print("  • Review each response above")
print("  • Note any failures or mismatches")
print("  • Update your corpus based on issues found")
print("  • Rerun tests to verify improvements")
print("=" * 70)



💾 DOWNLOAD RESULTS

Want to save your results for later analysis?
Run this cell to download a text file with all test results.

In [None]:


from datetime import datetime

# Create results file content
timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
results_filename = f'test_results_{timestamp}.txt'

results_content = f"""AI CORPUS TEST RESULTS
{'=' * 70}
Timestamp: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}
Corpus: {corpus_filename}
Questions: {questions_filename if 'questions_filename' in locals() else 'Manual entry'}
Total Tests: {total_tests}
{'=' * 70}

"""

for result in results:
    results_content += f"\n[Test {result['number']}] - {result['category']}\n"
    results_content += f"Q: {result['question']}\n"
    if result['status'] == 'success':
        results_content += f"A: {result['answer']}\n"
    else:
        results_content += f"ERROR: {result.get('error', 'Unknown error')}\n"
    results_content += "-" * 70 + "\n"

results_content += f"\nTests completed at {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}\n"

# Download the file
with open(results_filename, 'w', encoding='utf-8') as f:
    f.write(results_content)

files.download(results_filename)

print(f"✅ Results downloaded: {results_filename}")



🎉 You're Done!

## What You Just Did:
- Tested your corpus systematically
- Found gaps in your documentation
- Identified potential hallucinations before users see them

## Next Steps:

### If Tests Failed:
1. Review the failures above
2. Update your corpus to fix issues
3. Rerun this notebook to verify improvements

### If Tests Passed:
1. Add more test questions (aim for 50+)
2. Test edge cases you haven't covered
3. Consider expanding your corpus

### Need More Guidance?
Visit the full repository:
https://github.com/[your-username]/ai-corpus-testing

- Corpus Structure Guide
- Interpreting Results Guide
- Test Categories Explained
- 30 Starter Questions Template

## Questions or Feedback?
Open an issue on GitHub or connect with me on LinkedIn.

**Remember:** This is iterative. Your corpus will never be "perfect."
The goal is to catch issues systematically before users do.

Good luck! 🚀