Skip to content

[RFC/Experimental] Server-side Search & DataTables Integration#39

Closed
mattiabonzi wants to merge 6 commits intolance-format:mainfrom
TuchSoft:feat/datatable-integration
Closed

[RFC/Experimental] Server-side Search & DataTables Integration#39
mattiabonzi wants to merge 6 commits intolance-format:mainfrom
TuchSoft:feat/datatable-integration

Conversation

@mattiabonzi
Copy link
Copy Markdown
Contributor

This PR is a significant architectural experiment that replaces the static HTML table with jQuery DataTables and implements server-side processing. Previously, the app had no filtering capabilities; this change introduces global search, column ordering, and complex querying.

Key Changes:

  • Server-Side Engine: Rewrote the backend to support DataTables pagination and search protocols.
  • DuckDB Integration: Added DuckDB to the backend to lazily scan Lance datasets, enabling SQL-powered filtering and sorting without loading entire tables into memory.
  • Advanced Querying: Integrated DataTables SearchBuilder, allowing for complex visual construction of AND/OR logic.
  • UI Features: Added a "Wrap Text" toggle to handle large cell content and updated the sidebar for better column management.
  • Other frontend enhancement

Breaking Changes:
The backend logic has been largely rewritten. Specifically, the standard /rows endpoint is superseded by a new /datatables POST endpoint to handle the structured query payloads from the frontend.

Current Limitations:
This is an experimental build and requires further stabilization:

  • Query Guards: There are currently no guards in the SearchBuilder for vector fields; attempting to apply standard text filters to a vector column will cause the query to fail.
  • Type Coverage: While basic types are handled, the serialization for more obscure Arrow types needs more robust testing.

Feedback & Testing Request:
I am looking for feedback on whether this DataTables-driven direction aligns with the project goals. If you feel this aligns with the project, it would be great if you could provide or point me to a diverse test dataset containing a wide variety of supported field types (e.g., nested structs, different vector dimensions, and timestamps) to ensure the DuckDB-to-Arrow serialization is seamless.

Please let me know if you would like me to proceed with these changes. If this is not interesting for the project, I will just delete the PR. I've made this cause i currently need a way to filter my own dataset while developing.

@mattiabonzi
Copy link
Copy Markdown
Contributor Author

@gordonmurray would love to hear your thoughts on this. As I mentioned, this is experimental and will need more work to be stable, but I believe it could leads to a better user experience overall.

@gordonmurray
Copy link
Copy Markdown
Collaborator

Hey @mattiabonzi, thanks for putting this together. The search and filtering need is real, and I can see the use case for this.

That said, I think this moves in a different direction from where the project is headed. The core goal is to stay a lightweight, zero-setup viewer: mount a folder, open a browser, browse your data.

  • Vanilla JS is a deliberate choice. The project avoids frameworks, bundlers, and external libraries by design. Adding jQuery, DataTables, and Select2 is a significant shift in that philosophy.
  • GET-only API. All existing endpoints are stateless GETs with no writes. The new POST /datatables endpoint changes that contract, and opening CORS to POST moves away from the read-only model.
  • DuckDB as a runtime dependency adds considerable weight to the container image and install footprint for what's meant to be a minimal tool.

I think filtering and search would be a genuinely useful addition, but ideally as something lighter: client-side sorting and filtering on the current page without new backend dependencies. Issue #30 covers column sorting as a starting point, and that could evolve from there.

If you're interested in contributing toward that lighter approach, I'd welcome it. And if you want to discuss a server-side search direction, opening an issue first to talk through the design would be a good next step.

Thanks again for the effort here, I hope this makes sense?

@mattiabonzi
Copy link
Copy Markdown
Contributor Author

@gordonmurray Totally makes sense. I figured I'd share it since I’d already written the code for my own project, but I completely understand the intention of the project. Thanks for the feedback, closing the PR now!

@mattiabonzi mattiabonzi closed this Apr 6, 2026
gordonmurray pushed a commit to gordonmurray/lance-data-viewer that referenced this pull request Apr 7, 2026
Names the load-bearing design constraints that shape the project so that
proposals touching them can be discussed against a written baseline rather
than reconstructed in each thread. Covers:

- Vanilla JS with no build step
- GET-only, stateless API
- No metadata database
- No in-app authentication
- Read-only access

Also adds a short "proposing changes" section pointing to prior design
discussions (lance-format#5, lance-format#29, lance-format#39) and a minimal development workflow snippet.

Fixes lance-format#42
gordonmurray added a commit that referenced this pull request Apr 7, 2026
…ts (#43)

Names the load-bearing design constraints that shape the project so that
proposals touching them can be discussed against a written baseline rather
than reconstructed in each thread. Covers:

- Vanilla JS with no build step
- GET-only, stateless API
- No metadata database
- No in-app authentication
- Read-only access

Also adds a short "proposing changes" section pointing to prior design
discussions (#5, #29, #39) and a minimal development workflow snippet.

Fixes #42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants