Skip to content

iwillig/clojure-system-prompt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

Clojure System Prompt

This is a system prompt designed to work with the Clojure Programming language.

Why Custom System Prompts for Niche Languages Matter

When working with LLM-based coding assistants, languages like Clojure face inherent disadvantages due to training data imbalances. Custom system prompts can help bridge this gap and significantly improve code generation quality.

The Training Data Bias Problem

Research has documented significant programming language bias in LLMs:

  • Python dominance: Studies show LLMs use Python in 90-97% of benchmark tasks, even for language-agnostic problems. For high-performance tasks where Python is not optimal, it remains the dominant choice in 58% of cases.

  • Training data imbalance: The StarCoder dataset shows Python alone accounts for nearly 40% of its training corpus, while many other languages appear only marginally. Users in communities like StackOverflow concentrate on certain languages (Python, JavaScript), which degrades diversity when collecting training data.

  • The "Matthew Effect": Research suggests that AI programming assistance may systematically influence which languages, frameworks, and paradigms thrive or decline—mainstream ecosystems get reinforced while niche languages receive weaker support.

  • Functional language challenges: LLMs "frequently hallucinate functions that don't exist and have more trouble writing good Clojure code" according to analysis of LLM Clojure generation.

Why Custom Prompts Help

Custom system prompts (like CLAUDE.md files) compensate for training data gaps by:

  1. Providing domain-specific knowledge: Including idioms, conventions, and best practices the LLM may not have encountered frequently in training data.

  2. Preventing hallucinations: Explicitly documenting which libraries, functions, and patterns actually exist in your ecosystem.

  3. Enforcing paradigm consistency: Ensuring the LLM generates idiomatic functional code rather than defaulting to imperative patterns from more common languages.

  4. Context efficiency: Clojure's concise syntax means less context space is needed for code examples, allowing more room for guidance and conventions.

Best Practices for Language-Specific Prompts

Based on Anthropic's recommendations and community research:

  • Keep it concise: Research indicates frontier LLMs can follow ~150-200 instructions reliably. Since Claude Code's system prompt already contains ~50 instructions, your custom prompt should contain as few additional instructions as possible.

  • Use pointers, not copies: Don't include code snippets that will become outdated. Instead, reference file:line locations to point to authoritative context.

  • Avoid redundant style guidelines: Let linters and formatters handle code style. LLMs are slow and expensive compared to traditional tooling for these tasks.

  • Prioritize correctness over completeness: For each line, ask "Would removing this cause Claude to make mistakes?" If not, remove it.

  • Add emphasis for critical rules: Use "IMPORTANT" or "YOU MUST" for instructions that require strict adherence.

Clojure-Specific Advantages

Despite training data challenges, Clojure has characteristics that work well with LLM-assisted development:

  • Easier validation: Consistent syntax and functional code enable easier linting and testing. LLMs perform better in loops where generated code is validated and errors are fed back.

  • REPL-driven development: Current LLMs work well with the Clojure REPL, enabling interactive validation of generated code.

  • Data-oriented design: Immutable state and pure functions make LLM-generated agents testable, traceable, and straightforward to reason about.

  • Homoiconicity: The "data = code" feature of Lisp has potential for automatic program generation and manipulation.

Why Define Your Own System Prompt?

Research demonstrates that custom system prompts significantly improve LLM performance, and for niche languages, they outperform alternative approaches like Skills or AGENTS.md files.

Quantifiable Improvements from Custom Prompts

Studies show substantial accuracy gains from well-engineered prompts:

  • 57-67% accuracy improvements: Research on 26 prompting principles found that well-engineered prompts can increase accuracy by 57% on LLaMA models and 67% on GPT-4.

  • High sensitivity to instructions: LLM performance is highly sensitive to prompt choices—"reordering examples in a prompt produced accuracy shifts of more than 40 percent."

  • Domain-specific gains: Classification tasks showed "providing clear category definitions before examples improved accuracy by an average of 18% across all models."

Why Custom Instructions Beat Defaults

LLMs have known limitations that custom prompts address:

  • Verbosity by design: Models are trained to be helpful through comprehensive answers, but custom prompts can guide more concise, targeted responses.

  • Missing domain context: "LLMs lack intrinsic knowledge of research... this limitation emphasizes the importance of domain expertise in crafting prompts" according to prompt engineering research.

  • Coding-specific benefits: Addy Osmani notes that providing "in-line examples of the output format or approach you want" dramatically improves results—"LLMs are great at mimicry."

  • GitHub Copilot evidence: Developers report being "shocked... how few people use custom instructions, given how effective they are—he could guide the AI to output code matching his team's idioms."

Why CLAUDE.md Outperforms Skills and AGENTS.md

For language-specific conventions, always-loaded prompts have structural advantages over on-demand mechanisms:

The Skill Activation Reliability Problem

Skills rely on the LLM to decide when to invoke them—and this is unreliable:

Cognitive Science: The "Lost in the Middle" Effect

Research on LLM attention explains why always-loaded context works better:

  • U-shaped attention: Studies show "information at the beginning and end of a context window is more reliably processed than information in the middle."

  • Recency bias: "Transformers naturally weight recent tokens more heavily"—a 10,000-token prompt might effectively operate on just the last 2,000 tokens.

  • System prompt advantage: CLAUDE.md appears at the beginning of every conversation, benefiting from the primacy effect.

Context Length vs. Retrieval Research

Academic research comparing always-in-context vs. retrieved-on-demand:

Comparison Table

Aspect CLAUDE.md Skills AGENTS.md
Loading Always loaded On-demand, LLM decides Cross-tool standard
Reliability 100% (guaranteed) ~20-84% activation Varies by tool
Position Beginning (primacy effect) Mid-conversation Tool-dependent
Best for Language conventions Complex workflows Multi-tool compatibility

Practical Recommendations

For language-specific guidance like Clojure idioms:

  1. Put critical conventions in CLAUDE.md (always loaded, 100% reliability)
  2. Keep CLAUDE.md under ~500 lines to avoid attention dilution
  3. Use skills only for optional workflows you'll invoke explicitly with slash commands
  4. Don't rely on automatic skill activation for anything critical

Further Reading

Training Data Bias

Prompt Engineering Research

CLAUDE.md Best Practices

Skills and Context Engineering

Clojure-Specific

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published