Development Practices and Coding Standards

Development Practices & Standards

Tech Stack at a Glance

Area	Technology
Language	Kotlin (preferred over Java)
Framework	Spring Boot / Spring Framework
Build Tool	Apache Maven
Database	Neo4j (graph database)
LLM Provider	OpenAI (GPT-4o / GPT-4o-mini) — switchable via `ModelProvider`
Scripting	Python (client scripts, tooling)
Templating	Jinja2 (LLM prompt templates)
Container	Docker (Neo4j via Docker Compose)
API Docs	Swagger / OpenAPI
CI/CD	GitHub Actions + Dependabot
Quality	SonarQube / Jacoco
IDE	IntelliJ IDEA

General Development Practices

AI-Assisted Development

Use AI tools wherever possible: GitHub Copilot and Claude are the preferred tools.
Always closely review AI-generated code suggestions for correctness and IP concerns.

Technology Choices

Favor mainstream, well-supported technologies.
Do not introduce new technologies without clear justification.
Example: Neo4j is already used for the knowledge graph — no second database should be added without a strong reason.

Code Style

Kotlin Guidelines

Prefer Kotlin over Java for all new code.
Always use named parameters in Kotlin function calls for readability.
Use Kotlin's ? nullability instead of Java Optional.
Follow Spring naming conventions — consistency outweighs minimizing name length.
Names should be descriptive enough to make code self-explanatory.

Implementation class naming — think before reaching for Default, Simple, or Impl:

Prefix/Suffix	When to use
`SimpleX`	Deliberately basic/minimal implementation; more sophisticated variants may follow
`DefaultX`	Standard out-of-the-box implementation; alternatives exist or may exist
`XImpl`	Sole implementation, not expecting multiples; often private (e.g. behind a companion object or from deserialization)

Subclasses and implementations should contain the name of the supertype:

// Correct
class DefaultUserService : UserService

// Incorrect
class DefaultUsers : UserService

If none of Simple, Default, or Impl fit, prefer a descriptive name that reflects what the class actually does (e.g. CachingUserService).

Other conventions:

infoString is the preferred name for a method returning human-readable information about an object. Implement the HasInfoString interface.

General Code Quality

Emphasize readability and maintainability over cleverness.
Comment anything non-obvious. Use descriptive names to reduce the need for trivial comments.
If the obvious approach does not work, always comment why — this saves future developers time.
Enclose strings that may contain whitespace in log messages in single quotes:

log.info("Processing entity '${entity.name}'")

Use @Schema and related annotations on all types exposed via REST for accurate Swagger/OpenAPI documentation.
Event types passed over WebSockets must end with Event (required for TypeScript generator).

Neo4j / Cypher

Externalize all Cypher queries to src/main/resources/cypher — do not inline them in code.
Use Spring Data Neo4j 6 with care: it has no second-level cache and deletes/reinserts entire subgraphs on save.
Treat SDN like an ORM only with full awareness of its performance characteristics.

Testing

Test Types

Suffix	Description
`*IntegrationTest`	Spring integration test. Automated, runs under `mvn test`. Requires Docker (Neo4j).
`*IT`	Requires real infrastructure (e.g., a live LLM). Not automated — run manually for exploration.

Test Conventions

Use @Nested JUnit Jupiter tests to group related test cases within a class.
Write test method names in natural language describing the scenario:

fun `should return true when the user is an admin`()

Use mockk for mocking — it is the Kotlin-idiomatic mocking library.
Integration tests mock the layer immediately below them (e.g., web controllers mock graph building).
Avoid code duplication in tests where possible via fixtures and utility functions, but do not be overly strict about it.
Never make real LLM API calls in tests. All LLM interactions must be mocked. If you find yourself reaching for a real ChatModel in a test, stop — mock it with mockk instead. Reasons:
- Accessibility — this is an open source project. Community contributors should not need a paid API key just to run the test suite. Real LLM calls are a contribution barrier.
- Cost — real calls cost money. With many contributors and frequent CI runs, this adds up quickly.
- Non-determinism — LLM responses vary between calls, making assertions brittle and flaky tests hard to diagnose.
- Latency — real calls can take several seconds each, making the full suite painfully slow.
- Rate limits — heavy CI usage can hit API rate limits, causing random failures unrelated to code changes.
- Offline development — contributors should be able to work and run tests without an internet connection.

Dependencies

Favor mainstream choices for all libraries.
Prefer Spring or Spring-recommended libraries over third-party alternatives.
Use the latest GA version of all dependencies unless there is a specific reason not to.
Dependabot is enabled — keep an eye on automated dependency PRs.
During active development, Spring AI snapshots may be used; shift to GA as soon as available.

LLM Integration

ModelProvider Abstraction

Never use a Spring AI ChatModel or EmbeddingModel directly. Always go through the ModelProvider interface:

val model = modelProvider.getLlm("best")

LLMs are mapped to roles (e.g., best, cheapest) via application properties.
Role mapping is simpler and more predictable than resolving by quality or cost.
Model configuration lives in @Configuration classes under the config directory.

Prompts

All LLM prompts are Jinja2 templates under src/main/resources/prompts.
Always escape potentially problematic user input with the esc filter:

{{ text|esc }}

Standard template variables: text (input text), formatInstructions (from Spring AI StructuredOutputConverter).
Experiment with prompts in the OpenAI Playground before embedding them in code.
Polish existing prompts by copying them from logs/prompts.log into the Playground UI.

Logging

Do not put logging configuration in application.properties — use logback-spring.xml.
General output goes to console. Focused logs go to the logs/ directory.

File	Contents
`logs/cypher.log`	All Cypher queries executed
`logs/prompts.log`	Prompts sent to and responses from LLMs
`logs/security.log`	Security-related events

What to Log

Keep logs at a consistent level of detail for a given log level. Extra detail belongs in DEBUG.
Don't spam logs. Remove debug log messages unless they have ongoing value.

Where to Log It

Use well-known named loggers where appropriate:

PROMPT_LOGGER — for exchanges with LLMs
CYPHER_LOGGER — for Neo4j queries
Otherwise, use a logger appropriate for the class.

Obtain a logger using the logger() method unless efficiency is a concern (e.g. inside a nested loop). If you want to avoid the stack examination cost, declare logger as a field on the class.

Always get the logger by .java class reference to avoid issues with Spring CGLIB proxies:

// Correct
private val logger = LoggerFactory.getLogger(CypherRagQueryExecutor::class.java)

// Incorrect — may break under Spring proxying
private val logger = LoggerFactory.getLogger(javaClass)

Only use javaClass if you are certain Spring will not proxy the object and inheritance may be involved (e.g. a protected logger field).

How to Log It

Always use {} placeholders, never string interpolation. This is more efficient and enables lazy evaluation:

// Correct
logger.info("The value is {}", value)

// Incorrect
logger.info("The value is $value")

Making Logs Entertaining

Log messages should remain clear while making the world a more entertaining place — think funny airline safety videos.

Draw inspiration from: The Big Lebowski, Peep Show, Sherlock Holmes, Silicon Valley, The League of Gentlemen, and current affairs.

A few approved quotes ready for use:

"Yeah, well, you know, that's just like, uh, your opinion, man."
"What in god's holy name are you blathering about?"
"Sometimes you eat the bear, and sometimes, well, he eats you."
"That rug really tied the room together."
"This is a very complicated case, Maude. You know, a lotta ins, a lotta outs, a lotta what-have-yous."
"Is this your homework?"
"This is a local shop for local people."

Pull Request Conventions

PRs should do one thing. Keep scope focused.
Reference related issues from the issue tracker in the PR description.
All PRs must pass the GitHub Actions CI build (mvn test).
Review gen AI code suggestions for correctness and IP before merging.

API & Server Conventions

Endpoint prefix	Access
`api/v1/*`	Programmatic. Requires API key (`X-API-KEY` header).
`api/internal/*`	UI access. Secured via OAuth. Not for remote clients.
`/dev/*`	Dev profile only. No API key required. For diagnostics and client development.

Swagger / OpenAPI docs: http://localhost:8080/swagger-ui/index.html#/
WebSocket support uses the STOMP sub-protocol.
TypeScript interfaces are generated at target/typescript/embabel-rag.ts via mvn install.

Code Coverage & Quality

Code coverage is computed with Jacoco.
View local report at target/site/jacoco/index.html after running tests.
SonarQube reports are available on the project dashboard.
Quality gate must pass on SonarCloud before merging.

(c) Embabel Software Inc 2024-2025.

Uh oh!

Development Practices and Coding Standards

Development Practices & Standards

Tech Stack at a Glance

General Development Practices

AI-Assisted Development

Technology Choices

Code Style

Kotlin Guidelines

General Code Quality

Neo4j / Cypher

Testing

Test Types

Test Conventions

Dependencies

LLM Integration

ModelProvider Abstraction

Prompts

Logging

What to Log

Where to Log It

How to Log It

Making Logs Entertaining

Pull Request Conventions

API & Server Conventions

Code Coverage & Quality

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally