Conversation
186eff5 to
5a161d3
Compare
| // DagLoader provides read and traversal operations on the DAG. | ||
| // Get, GetByParent, and Ancestry come from this embedded interface. | ||
| merkle.DagLoader |
There was a problem hiding this comment.
This is a big change worth calling out: storage.Driver now just has the DagLoader interface methods directly: no more subverting one interface into another since everything should just be 1 driver now.
| ) | ||
| LANGUAGE SQL | ||
| AS $$ | ||
| WITH RECURSIVE walk AS ( |
There was a problem hiding this comment.
cc @yeazelm - this is one of the bigger bits to look at - this is the query you had written for CTEs walking the chains and is now re-written for RECURSIVE on duckdb's analytical engine:
https://duckdb.org/docs/current/sql/query_syntax/with#recursive-ctes
|
@greptile-ai please review 🙏🏼 |
|
| Filename | Overview |
|---|---|
| pkg/storage/postgres/postgres.go | Core Postgres storage driver using sqlc + go-migrate; contains P1 silent-zero risk in interfaceInt32/interfaceInt64 helpers used for ancestry token stats, and a P2 edge case in toMigrateDSN losing sslmode for key-value DSNs. |
| migrations/1776787153_ancestry_func.up.sql | New recursive CTE SQL function for ancestry traversal with a depth-bounded UNION ALL; logic is correct and properly depth-limited. |
| pkg/storage/postgres/gensqlc/ancestry_chains.sql.go | sqlc-generated code for ancestry query; deliberately uses interface{} typed fields to handle pg_duckdb mixed-type returns — the P1 type-assertion risk in the calling code stems from this design. |
| cmd/tapes/local/local.go | New tapes local command that bootstraps Postgres and Ollama via Docker; hardcoded UID 26 in prepareLocalPostgresDir couples it to the custom Postgres image when a different image is passed. |
| cmd/tapes/serve/serve.go | Refactored serve command now Postgres-only; unconditionally initialises pgvector (unlike tapes serve proxy which is conditional), requiring the vector extension even without embedding configuration. |
| pkg/vector/pgvector/pgvector.go | New pgvector driver replacing chroma/qdrant/sqlitevec; correctly uses pgx batch for upserts, HNSW cosine index, and parameterised ANY queries. |
| migrations/1776773689_baseline.up.sql | Baseline nodes table DDL with appropriate indexes; self-referencing FK parent_hash ON DELETE SET NULL preserves orphaned children correctly. |
| cli/sqlite-import/main.go | Standalone SQLite-to-Postgres import utility; well-structured with batching, dry-run mode, and proper ON CONFLICT DO NOTHING semantics. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[tapes serve / proxy / ingest] --> B[postgres.NewDriver]
B --> C[go-migrate: up/down SQL]
C --> D[(Postgres + pg_duckdb + pgvector)]
A --> E[pgvector.NewDriver]
E --> D
A --> F[embeddingutils.NewEmbedder]
F --> G[Ollama / embedding provider]
H[tapes local up] --> I[Docker: tapes-local-postgres]
H --> J[Docker: tapes-local-ollama]
I --> D
J --> G
K[tapes deck] --> L[HTTP API calls]
L --> A
M[cli/sqlite-import] -->|one-time migration| D
N[Proxy intercept] --> A
N --> O[Upstream LLM]
subgraph Ancestry Query
P[AncestryChains ctx] -->|BEGIN tx| Q[DuckdbForceExecution]
Q --> R[ancestry_chains_rows CTE]
R -->|COMMIT| S[Chain map by start_hash]
end
B --> P
Prompt To Fix All With AI
This is a comment left during a code review.
Path: pkg/storage/postgres/postgres.go
Line: 814-822
Comment:
**Silent zero-return on integer type mismatch in ancestry rows**
`interfaceInt32` and `interfaceInt64` silently return `0` when the type assertion fails. DuckDB internally uses 64-bit integers for all integer types, so when `pg_duckdb` executes the ancestry CTE and scans into `interface{}`, INTEGER columns may come back as `int64` instead of `int32`. Every call to `interfaceInt32` would then return `0`, causing all token-usage stats (prompt tokens, completion tokens, total tokens, cache tokens, durations) to be silently zeroed out for every node in an ancestry query result.
A defensive implementation should try both widths:
```go
func interfaceInt32(v any) int32 {
switch i := v.(type) {
case int32:
return i
case int64:
return int32(i) //nolint:gosec
}
return 0
}
func interfaceInt64(v any) int64 {
switch i := v.(type) {
case int64:
return i
case int32:
return int64(i)
}
return 0
}
```
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: cmd/tapes/serve/serve.go
Line: 183-191
Comment:
**`tapes serve` always initialises pgvector unconditionally**
`tapes serve` auto-sets `vectorStoreTarget` to `postgresDSN` in `PreRunE` and then unconditionally calls `pgvector.NewDriver`, which runs `CREATE EXTENSION IF NOT EXISTS vector` on every startup. In contrast, `tapes serve proxy` wraps the same initialization in `if c.vectorStoreTarget != ""`. A plain PostgreSQL instance without the `vector` extension would fail `tapes serve` immediately, even if vector search is not needed. Consider guarding this the same way as the proxy sub-command, or documenting that the custom Postgres image (with `vector` pre-installed) is required.
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: cmd/tapes/local/local.go
Line: 263-276
Comment:
**Hardcoded UID 26 couples `tapes local` to the custom Postgres image**
`chown -R 26:26` is the UID of the `postgres` user inside `public.ecr.aws/…/papercomputeco/postgres`. If a user passes a different `--postgres-image` (e.g. the official `postgres:16` image where the postgres user is UID 999), the `chown` will set wrong ownership and the container will fail to start. Consider deriving the UID from the target image or documenting that `--postgres-image` only supports the official tapes image.
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: pkg/storage/postgres/postgres.go
Line: 479-525
Comment:
**`toMigrateDSN` drops TLS settings for key-value style DSNs**
When the caller provides a libpq key-value DSN (e.g. `host=localhost dbname=tapes sslmode=disable`), the function falls through to `pgx.ParseConfig` and rebuilds the URL from `cfg.Host`, `cfg.User`, `cfg.Database`, and `cfg.RuntimeParams`. However, `sslmode=disable` is not stored in `RuntimeParams` — pgx converts it into a `TLSConfig` field. The rebuilt `pgx5://` URL therefore omits `sslmode`, causing `go-migrate` to attempt a TLS handshake against a server that has TLS disabled, resulting in a migration failure. The common `postgres://…?sslmode=disable` format is handled correctly by the prefix-replacement branch; only key-value DSNs are affected.
How can I resolve this? If you propose a fix, please make it concise.Reviews (1): Last reviewed commit: "✨ feat: Introduce duckdb, refactor away ..." | Re-trigger Greptile
| func interfaceInt32(v any) int32 { | ||
| i, _ := v.(int32) | ||
| return i | ||
| } | ||
|
|
||
| func interfaceInt64(v any) int64 { | ||
| i, _ := v.(int64) | ||
| return i | ||
| } |
There was a problem hiding this comment.
Silent zero-return on integer type mismatch in ancestry rows
interfaceInt32 and interfaceInt64 silently return 0 when the type assertion fails. DuckDB internally uses 64-bit integers for all integer types, so when pg_duckdb executes the ancestry CTE and scans into interface{}, INTEGER columns may come back as int64 instead of int32. Every call to interfaceInt32 would then return 0, causing all token-usage stats (prompt tokens, completion tokens, total tokens, cache tokens, durations) to be silently zeroed out for every node in an ancestry query result.
A defensive implementation should try both widths:
func interfaceInt32(v any) int32 {
switch i := v.(type) {
case int32:
return i
case int64:
return int32(i) //nolint:gosec
}
return 0
}
func interfaceInt64(v any) int64 {
switch i := v.(type) {
case int64:
return i
case int32:
return int64(i)
}
return 0
}Prompt To Fix With AI
This is a comment left during a code review.
Path: pkg/storage/postgres/postgres.go
Line: 814-822
Comment:
**Silent zero-return on integer type mismatch in ancestry rows**
`interfaceInt32` and `interfaceInt64` silently return `0` when the type assertion fails. DuckDB internally uses 64-bit integers for all integer types, so when `pg_duckdb` executes the ancestry CTE and scans into `interface{}`, INTEGER columns may come back as `int64` instead of `int32`. Every call to `interfaceInt32` would then return `0`, causing all token-usage stats (prompt tokens, completion tokens, total tokens, cache tokens, durations) to be silently zeroed out for every node in an ancestry query result.
A defensive implementation should try both widths:
```go
func interfaceInt32(v any) int32 {
switch i := v.(type) {
case int32:
return i
case int64:
return int32(i) //nolint:gosec
}
return 0
}
func interfaceInt64(v any) int64 {
switch i := v.(type) {
case int64:
return i
case int32:
return int64(i)
}
return 0
}
```
How can I resolve this? If you propose a fix, please make it concise.| proxyConfig.VectorDriver, err = pgvector.NewDriver(context.TODO(), &pgvector.Config{ | ||
| ConnString: c.vectorStoreTarget, | ||
| Dimensions: c.embeddingDimensions, | ||
| }, c.logger) | ||
| if err != nil { | ||
| return fmt.Errorf("creating vector driver: %w", err) | ||
| return fmt.Errorf("could not create new vector driver: %w", err) | ||
| } | ||
| defer proxyConfig.VectorDriver.Close() | ||
|
|
There was a problem hiding this comment.
tapes serve always initialises pgvector unconditionally
tapes serve auto-sets vectorStoreTarget to postgresDSN in PreRunE and then unconditionally calls pgvector.NewDriver, which runs CREATE EXTENSION IF NOT EXISTS vector on every startup. In contrast, tapes serve proxy wraps the same initialization in if c.vectorStoreTarget != "". A plain PostgreSQL instance without the vector extension would fail tapes serve immediately, even if vector search is not needed. Consider guarding this the same way as the proxy sub-command, or documenting that the custom Postgres image (with vector pre-installed) is required.
Prompt To Fix With AI
This is a comment left during a code review.
Path: cmd/tapes/serve/serve.go
Line: 183-191
Comment:
**`tapes serve` always initialises pgvector unconditionally**
`tapes serve` auto-sets `vectorStoreTarget` to `postgresDSN` in `PreRunE` and then unconditionally calls `pgvector.NewDriver`, which runs `CREATE EXTENSION IF NOT EXISTS vector` on every startup. In contrast, `tapes serve proxy` wraps the same initialization in `if c.vectorStoreTarget != ""`. A plain PostgreSQL instance without the `vector` extension would fail `tapes serve` immediately, even if vector search is not needed. Consider guarding this the same way as the proxy sub-command, or documenting that the custom Postgres image (with `vector` pre-installed) is required.
How can I resolve this? If you propose a fix, please make it concise.| cmd := fmt.Sprintf("mkdir -p %s && chown -R 26:26 %s && chmod 0700 %s", postgresDataPath, postgresDataPath, postgresDataPath) | ||
| if err := runDocker( | ||
| "run", "--rm", | ||
| "--user", "0", | ||
| "--entrypoint", "sh", | ||
| "-v", postgresDir+":"+path.Dir(postgresDataPath), | ||
| image, | ||
| "-lc", | ||
| cmd, | ||
| ); err != nil { | ||
| return fmt.Errorf("preparing postgres directory %q: %w", postgresDir, err) | ||
| } | ||
| return nil | ||
| } |
There was a problem hiding this comment.
Hardcoded UID 26 couples
tapes local to the custom Postgres image
chown -R 26:26 is the UID of the postgres user inside public.ecr.aws/…/papercomputeco/postgres. If a user passes a different --postgres-image (e.g. the official postgres:16 image where the postgres user is UID 999), the chown will set wrong ownership and the container will fail to start. Consider deriving the UID from the target image or documenting that --postgres-image only supports the official tapes image.
Prompt To Fix With AI
This is a comment left during a code review.
Path: cmd/tapes/local/local.go
Line: 263-276
Comment:
**Hardcoded UID 26 couples `tapes local` to the custom Postgres image**
`chown -R 26:26` is the UID of the `postgres` user inside `public.ecr.aws/…/papercomputeco/postgres`. If a user passes a different `--postgres-image` (e.g. the official `postgres:16` image where the postgres user is UID 999), the `chown` will set wrong ownership and the container will fail to start. Consider deriving the UID from the target image or documenting that `--postgres-image` only supports the official tapes image.
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
This is fine - the whole point is that the user would be using our image (same on the cloud platform).
| func toMigrateDSN(dsn string) string { | ||
| for _, p := range []string{"pgx5://", "postgres://", "postgresql://"} { | ||
| if strings.HasPrefix(dsn, p) { | ||
| if p == "pgx5://" { | ||
| return dsn | ||
| } | ||
| return "pgx5://" + strings.TrimPrefix(dsn, p) | ||
| } | ||
| } | ||
|
|
||
| cfg, err := pgx.ParseConfig(dsn) | ||
| if err != nil { | ||
| return dsn | ||
| } | ||
|
|
||
| port := cfg.Port | ||
| if port == 0 { | ||
| port = 5432 | ||
| } | ||
|
|
||
| v := url.URL{ | ||
| Scheme: "pgx5", | ||
| Host: fmt.Sprintf("%s:%d", cfg.Host, port), | ||
| Path: "/" + cfg.Database, | ||
| } | ||
| if cfg.User != "" { | ||
| if cfg.Password != "" { | ||
| v.User = url.UserPassword(cfg.User, cfg.Password) | ||
| } else { | ||
| v.User = url.User(cfg.User) | ||
| } | ||
| } | ||
| if len(cfg.RuntimeParams) > 0 { | ||
| q := url.Values{} | ||
| keys := make([]string, 0, len(cfg.RuntimeParams)) | ||
| for k := range cfg.RuntimeParams { | ||
| keys = append(keys, k) | ||
| } | ||
| sort.Strings(keys) | ||
| for _, k := range keys { | ||
| q.Set(k, cfg.RuntimeParams[k]) | ||
| } | ||
| v.RawQuery = q.Encode() | ||
| } | ||
|
|
||
| return v.String() | ||
| } |
There was a problem hiding this comment.
toMigrateDSN drops TLS settings for key-value style DSNs
When the caller provides a libpq key-value DSN (e.g. host=localhost dbname=tapes sslmode=disable), the function falls through to pgx.ParseConfig and rebuilds the URL from cfg.Host, cfg.User, cfg.Database, and cfg.RuntimeParams. However, sslmode=disable is not stored in RuntimeParams — pgx converts it into a TLSConfig field. The rebuilt pgx5:// URL therefore omits sslmode, causing go-migrate to attempt a TLS handshake against a server that has TLS disabled, resulting in a migration failure. The common postgres://…?sslmode=disable format is handled correctly by the prefix-replacement branch; only key-value DSNs are affected.
Prompt To Fix With AI
This is a comment left during a code review.
Path: pkg/storage/postgres/postgres.go
Line: 479-525
Comment:
**`toMigrateDSN` drops TLS settings for key-value style DSNs**
When the caller provides a libpq key-value DSN (e.g. `host=localhost dbname=tapes sslmode=disable`), the function falls through to `pgx.ParseConfig` and rebuilds the URL from `cfg.Host`, `cfg.User`, `cfg.Database`, and `cfg.RuntimeParams`. However, `sslmode=disable` is not stored in `RuntimeParams` — pgx converts it into a `TLSConfig` field. The rebuilt `pgx5://` URL therefore omits `sslmode`, causing `go-migrate` to attempt a TLS handshake against a server that has TLS disabled, resulting in a migration failure. The common `postgres://…?sslmode=disable` format is handled correctly by the prefix-replacement branch; only key-value DSNs are affected.
How can I resolve this? If you propose a fix, please make it concise.|
Force pushed to address Greptile's review - the migration DSN now gets all the key/values so migrations don't silently fail. And defensively try both uint32 and uint64 so we don't silently loose points in the pg type conversions. |
| if err := ensurePostgresContainer(c); err != nil { | ||
| return err | ||
| } | ||
| if err := ensureOllamaContainer(c); err != nil { |
There was a problem hiding this comment.
We probably also need to ensure the model that is needed for embeddings is pulled when doing this:
2026/04/24 14:49:44 WARN failed to generate embedding hash=97a3af412a4637cf070420435945ae1048bfb6066d42732274f12734e952fd61 error="embedding failed: ollama returned status 404: {\"error\":\"model \\"nomic-embed-text\\" not found, try pulling it first\"}"
There was a problem hiding this comment.
Those models are quite large: we mount the user's ~/.ollama/ directory so normal "ollama pull ..." works and persists those models and we can just use the ones already there.
There was a problem hiding this comment.
Yeah, I think this was the missing piece, the opinion on what embedding model to use shifted and I didn't have the local one there, so its more of a nice thing to pull the model. I think this was more of just a rough edge in the shift of preferred embedding model than this particular change.
Signed-off-by: John McBride <john@papercompute.com>
| func runDocker(args ...string) error { | ||
| cmd := exec.CommandContext(context.Background(), "docker", args...) | ||
| out, err := cmd.CombinedOutput() | ||
| if err != nil { | ||
| return fmt.Errorf("docker %s: %w\n%s", strings.Join(args, " "), err, strings.TrimSpace(string(out))) | ||
| } | ||
| return nil | ||
| } | ||
|
|
||
| func dockerOutput(args ...string) (string, error) { | ||
| cmd := exec.CommandContext(context.Background(), "docker", args...) | ||
| out, err := cmd.CombinedOutput() | ||
| if err != nil { | ||
| return "", fmt.Errorf("docker %s: %w\n%s", strings.Join(args, " "), err, strings.TrimSpace(string(out))) | ||
| } | ||
| return string(out), nil | ||
| } |
There was a problem hiding this comment.
These don't have cancellation, so a hung docker run will block tapes local up forever.
There was a problem hiding this comment.
We can smooth out the tapes local experience in the future: this PR's already ballooned quite abit. I'm not really worried about tapes local being the primary way people use or interface with tapes and honestly, tapes local will probably be mostly a development utility for the present.
|
|
||
| // Create shared driver (satisfies storage.Driver, which embeds merkle.DagLoader) | ||
| driver, err := c.newStorageDriver() | ||
| driver, err := postgres.NewDriver(context.TODO(), c.postgresDSN) |
There was a problem hiding this comment.
We could probably replace the TODO with a signal.NotifyContext context so that Ctrl-C can cleanly close the database things.
There was a problem hiding this comment.
I think that'd be a good future enhancement since I'd want to pipe a cancellation context down through the commander since we also have context that's on the telemetry client.
yeazelm
left a comment
There was a problem hiding this comment.
tapes deck might be in a bad spot with the refactor but the rest looks goood!
Signed-off-by: John McBride <john@papercompute.com>
|
Pushed |
|
Nice! I like the new pager! Feels like a reasonable trade off to let the TUI be responsive while letting the user get more data if they want. |
Documentation UpdatePR #189 includes significant user-facing changes that required documentation updates. A docs PR has been created: Docs PR: https://github.com/papercomputeco/tapes.dev/pull/103 Changes DocumentedBreaking Changes:
New Features:
Updated Pages:
|

This PR is a large refactor that focuses tapes's storage and vector model on Postgres with
pg_duckdbandpgvector. It also focuses direct database clients to exclusively use the API (i.e.tapes deck).Closes PCC-366
Closes PCC-367
Removes sqlite storage driver in favor of only Postgres
This removes sqlite in favor of centralizing on Postgres with the
pg_duckdbextension for large recursive columnar queries. This allows us to dramatically simply thestorage.Driverinterface implementations and removes the need for a separatemerkle.DagLoaderinterface: now, it's all just in thestorage.Driverimplementation for Postgres. We keep aninmemorystorage driver implementation since that's very useful for tests but remove the ability to configure it duringserve.This also refactors away all the vector implementations in favor of
pgvector.This unlocks tapes to be more scalable, optimized for huge scale data, and deployable for our future cloud services.
Introduces
tapes localcommand for local development and useThe new
tapes localallows for a user to run our postgres image and ollama on Docker - this is useful for super users and local development.tapes local upbrings up the containers in Docker which can then be targeted withtapes serve --postgres "postgres://tapes:tapes@localhost:5432/tapes?sslmode=disable"Removes ent in favor of sqlc compiled sql queries
Since ent is mostly focused on generating down to entity related queries, we miss out on the recurse nature of duckdb queries and needing to inject
SET LOCAL duckdb.force_execution = true;into the raw query context. So, instead, we use raw.sqlqueries with sqlc that compiles into thepkg/storage/postgres/gensqlc/package.Removes bespoke
pkg/storage/migrate/in favor ofup/downmigrations withgo-migrateWe had a hand rolled auto-migration package that could be brittle - this was removed in favor of using
go-migrateand up down migrations inmigrations/.... This also lets us get rid oftapes migratesince migrations are automatically managed by the tapes API when it connects to the Postgres db.Removes bespoke
tapesapiandtapesprox: centralizes on onetapesbinaryRemoved the
Dockerfiles/tapesapi.Dockerfileand seperate binary builds forcli/tapesapi/.... Everything goes through the monolithictapesbinary / build.tapes decknow uses the API without constant pollingThe deck tui was doing near constant polling of essentially a databases entire dataset which would take forever to load. Now, it is better about not grabbing all the data from the db at once.
Removes the tapes deck web view
This was some early experimentation on providing a web server locally for visualizing the tapes data. Since we're centralizing tapes on being more of a server technology ontop of the database, we're removing this in favor of future enhancements on a platform.
./cli/sqlite-importutility for importing data into PostgresAfter running
tapes local up, this utility can be run to migrate a sqlite db into the local postgres.