# MariaDB Vector Magics - Professional Demo

This notebook demonstrates the professional capabilities of MariaDB Vector Magics for semantic search and RAG applications.

## Prerequisites
- MariaDB 11.7+ with Vector support running
- Python environment with mariadb-vector-magics installed
- Optional: Groq API key for RAG functionality

## 1. Setup and Connection

Load the extension and establish connection to MariaDB.

In [None]:
# Load the MariaDB Vector Magics extension
%load_ext mariadb_vector_magics

In [None]:
# Connect to MariaDB (replace with your credentials)
%connect_mariadb --password=your_password --database=vectordb

## 2. Load Embedding Model

Load a sentence transformer model for generating embeddings.

In [None]:
# Load embedding model (this may take a moment on first run)
%load_embedding_model all-MiniLM-L6-v2

## 3. Create Vector Store

Create an optimized vector table with HNSW indexing.

In [None]:
# Create vector store for documents
%create_vector_store --table tech_docs --dimensions 384 --distance cosine --drop_if_exists

## 4. Add Sample Documents

Insert sample technology documents with automatic embedding generation.

In [None]:
%%embed_table --table tech_docs --batch_size 16 --chunk_size 500
[
    "MariaDB is an open-source relational database management system that originated as a fork of MySQL. It provides enterprise-grade features including ACID compliance, advanced security, and high availability. The recent addition of Vector support enables semantic search capabilities for AI applications.",
    
    "Vector databases are specialized systems designed to store and query high-dimensional vectors efficiently. They use algorithms like HNSW (Hierarchical Navigable Small World) to enable fast approximate nearest neighbor search. This technology is essential for applications involving embeddings, similarity search, and recommendation systems.",
    
    "Retrieval Augmented Generation (RAG) is an AI architecture that combines information retrieval with large language models. The system first retrieves relevant documents from a knowledge base using semantic search, then uses those documents as context for generating accurate, grounded responses. This approach significantly reduces hallucinations in AI systems.",
    
    "Jupyter notebooks provide an interactive computing environment that combines code execution, rich text, and visualizations. They have become the standard tool for data science, machine learning experimentation, and educational content. The notebook format enables reproducible research and collaborative development.",
    
    "Python has emerged as the dominant programming language for artificial intelligence and machine learning. Its extensive ecosystem includes libraries like NumPy for numerical computing, pandas for data manipulation, scikit-learn for machine learning, and PyTorch/TensorFlow for deep learning. The language's simplicity and readability make it accessible to researchers and practitioners.",
    
    "Semantic search goes beyond keyword matching to understand the meaning and context of queries. It uses natural language processing and vector embeddings to find relevant content based on conceptual similarity rather than exact word matches. This technology powers modern search engines, recommendation systems, and question-answering applications.",
    
    "Large Language Models (LLMs) like GPT, Claude, and Llama have revolutionized natural language processing. These models are trained on vast amounts of text data and can perform tasks like text generation, summarization, translation, and question answering. However, they can suffer from hallucinations and lack access to current information, which RAG systems help address.",
    
    "Docker containerization has transformed software deployment by packaging applications with their dependencies into lightweight, portable containers. This technology ensures consistent behavior across different environments and simplifies scaling, orchestration, and continuous integration/continuous deployment (CI/CD) pipelines."
]

## 5. Explore the Data

Check what we've stored in our vector database.

In [None]:
# Show all tables in the database
%show_vector_tables

In [None]:
# Query the table to see sample data
%query_table tech_docs

## 6. Semantic Search Examples

Perform semantic searches to find relevant documents.

In [None]:
# Search for database-related content
%semantic_search "database systems and storage" --table tech_docs --top_k 3 --show_distance

In [None]:
# Search for AI and machine learning content
%semantic_search "artificial intelligence and machine learning" --table tech_docs --top_k 3 --show_distance

In [None]:
# Search for development tools
%semantic_search "software development and programming tools" --table tech_docs --top_k 2

In [None]:
# Search with distance threshold filtering
%semantic_search "natural language processing" --table tech_docs --top_k 5 --threshold 0.7 --show_distance

## 7. RAG (Retrieval Augmented Generation)

Demonstrate the complete RAG pipeline with LLM integration.

**Note:** You need to set your Groq API key as an environment variable or pass it as a parameter.

In [None]:
# Set your Groq API key (get free key at https://console.groq.com/)
import os
# os.environ['GROQ_API_KEY'] = 'your_groq_api_key_here'

In [None]:
# Ask questions about the stored documents
%rag_query "What is MariaDB and what makes it special?" --table tech_docs --top_k 3

In [None]:
# Ask about RAG systems
%rag_query "How does RAG work and why is it useful?" --table tech_docs --top_k 3 --temperature 0.1

In [None]:
# Ask about Python for AI
%rag_query "Why is Python popular for machine learning?" --table tech_docs --top_k 2 --model llama-3.1-70b-versatile

## 8. Advanced Use Cases

Demonstrate more advanced scenarios.

In [None]:
# Create a specialized vector store for code documentation
%create_vector_store --table code_docs --dimensions 384 --distance euclidean --m_value 32

In [None]:
%%embed_table --table code_docs --batch_size 8 --chunk_size 300
[
    "The connect_mariadb magic command establishes a connection to MariaDB server. Parameters include host, port, user, password, and database. It validates Vector support and displays connection status.",
    
    "The load_embedding_model command loads a sentence transformer model for generating embeddings. Default model is all-MiniLM-L6-v2 with 384 dimensions. Models are cached locally for faster subsequent loads.",
    
    "The create_vector_store command creates optimized tables for vector storage. It supports HNSW indexing with configurable M values, distance metrics (cosine/euclidean), and automatic schema generation.",
    
    "The embed_table cell magic processes documents in batches, generates embeddings, and inserts them into vector tables. It supports automatic chunking for large documents with configurable overlap.",
    
    "The semantic_search command performs similarity search using vector embeddings. It supports distance thresholds, result limiting, and displays similarity scores when requested."
]

In [None]:
# Search the code documentation
%semantic_search "how to connect to database" --table code_docs --top_k 2 --show_distance

## 9. Performance and Monitoring

Monitor the performance of your vector operations.

In [None]:
# Check both tables
%query_table tech_docs
%query_table code_docs

## 10. Cleanup (Optional)

Clean up the demo data if needed.

In [None]:
# Uncomment to drop the demo tables
# %create_vector_store --table tech_docs --dimensions 384 --drop_if_exists
# %create_vector_store --table code_docs --dimensions 384 --drop_if_exists

## Conclusion

This demo showcased the key features of MariaDB Vector Magics:

1. **Easy Setup**: Simple magic commands for connection and configuration
2. **Automatic Embeddings**: Seamless text-to-vector conversion
3. **Optimized Storage**: HNSW-indexed vector tables for fast search
4. **Semantic Search**: Natural language queries with similarity scoring
5. **RAG Integration**: Complete retrieval-augmented generation pipeline
6. **Professional Features**: Batch processing, chunking, and error handling

For production use, consider:
- Using environment variables for credentials
- Implementing proper error handling
- Monitoring performance and scaling
- Setting up automated backups
- Configuring appropriate security measures

Visit the [GitHub repository](https://github.com/jayant99acharya/mariadb) for more information and updates.