RAG (Retrieval-Augmented Generation) for course materials. Parses teaching materials (slides, scripts, syllabus), builds a DuckDB-backed ragnar store with embeddings, and provides Q&A via Ollama (local) or Claude (API).
Before installing or running teachrag, ensure you have:
- R packages:
dplyr(>= 1.2.0) and others (see DESCRIPTION). Runteachrag::check_dependencies()to verify, orteachrag::ensure_dependencies()to install missing packages. - Ollama with models
qwen2.5:3bandnomic-embed-text(for local Q&A and embeddings) - Claude API (optional): if you prefer using Claude via API, set
ANTHROPIC_API_KEYin your environment to use Claude instead of a local LLM. You will still need ollama andnomic-embed-textfor the embeddings.
# From source
devtools::install("fellennert/teachrag")The package ships with pre-built course data. You can run Q&A immediately without any setup:
library(teachrag)
# Single-turn Q&A (uses bundled data by default)
ask_rag("What is supervised machine learning?")
# Multi-turn chat
chat_state <- NULL
res1 <- ask_rag_chat(chat_state, "What is supervised machine learning?")
chat_state <- res1$chat_state
res2 <- ask_rag_chat(chat_state, "Can you give an example?")
# Shiny app (with progress bar)
run_app()
# CLI (prints status: Querying database → Producing answer → Fact-checking)
interactive_cli()All of these show progress: Querying database… → Producing initial answer… → Fact-checking answer…
To parse and index your own course materials:
library(teachrag)
run_setup_wizard()The wizard guides you through: choosing directories, parsing materials, building the store, testing a question, and launching the app.
If you have your own intermediate directory and store:
library(teachrag)
intermediate_dir <- "path/to/your/intermediate" # contains chunks.rds, syllabus.rds, store
store_path <- file.path(intermediate_dir, "teaching_db.ragnar.duckdb")
# Pass explicitly, or set options
options(teachrag.intermediate_dir = intermediate_dir, teachrag.store_path = store_path)
ask_rag("What is supervised machine learning?")
run_app()Note that right now there is no rigourous checking procedure to ensure that the chunks aren't too large for nomic-embed-text (i.e., length of chunks needs to be <8,192). You might want to manually preprocess your data or amend the preprocessing functions (R/parse.R) to prevent issues.
# 1. Parse and chunk materials
parse_materials(corpus_dir = "path/to/course_material", output_dir = "path/to/intermediate")
# 2. Build ragnar store (requires nomic-embed-text via Ollama)
build_store(output_dir = "path/to/intermediate")