-
Notifications
You must be signed in to change notification settings - Fork 5
feat: add searches and aggregation to express backend #28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
dacharyc
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a couple of route issues we should fix to match the client, and a tsconfig Q related to a client file included here. Everything else is just a Q to consider.
| "resolveJsonModule": true, | ||
| "isolatedModules": true, | ||
| "jsx": "react-jsx", | ||
| "jsx": "preserve", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I run the client, it's automatically switching this value back to "react-jsx" with this message:
We detected TypeScript in your project and reconfigured your tsconfig.json file for you.
The following mandatory changes were made to your tsconfig.json:
- jsx was set to react-jsx (next.js uses the React automatic runtime)Curious why you're changing it here, and if we need it to be "preserve", what we might need to change in the client to force it to respect this config field?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's interesting, every time I run it it's been automatically changing it to preserve. I'm not sure what's causing the change but I'll look into it. For now I'll revert to react-jsx since that's what I originally had in there before it started changing it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interestingly it's almost the same message too:
We detected TypeScript in your project and reconfigured your tsconfig.json file for you.
The following mandatory changes were made to your tsconfig.json:
- jsx was set to preserve (next.js implements its own optimized jsx transform)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahhh I see, it's a versioning issue. From Claude:
The Problem
The issue occurs because Next.js automatically modifies your tsconfig.json file when it starts up. Different versions of Next.js have different preferences for the jsx setting:Your current setup: jsx: "preserve" (which Next.js 15+ prefers)
Others' setup: jsx: "react-jsx" (which older versions or different configurations prefer)
Why This Happens
Version Differences: Different Next.js versions have different default TSConfig preferences
Auto-Configuration: Next.js automatically "fixes" what it thinks are incorrect settings
Lock File Issues: Different package-lock.json or yarn.lock states can cause different dependency resolutions
Since this is using the latest version of Next, I think we should actually stick with preserve since later versions prefer it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That doesn't track with the usage, though. I ran npm install on the client in the last few days so it's pulling in the most recent version of Next.js. (Actually I checked and I've got 16.0.0 and it looks like Next released 16.0.1 recently.)
It may be a versioning difference issue, but that doesn't really hold up if we're both on the latest version of Next. So I think it's more likely it's some sort of configuration difference, or maybe a Node or other version issue, and we should probably try to get to the bottom of this as a fast follow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. Okay yeah I'll add it to the list of fast follows and try and get to the bottom of what's going on here
| } | ||
|
|
||
| // Validate and set limit (default: 10, min: 1, max: 50) | ||
| const limitNum = Math.min(Math.max(parseInt(limit as string) || 10, 1), 50); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious about this. When performing vector search, the client provides a default limit of 20 - should the backend default match the client default, or are we intentionally defaulting to a different value of 10 here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it should match, good catch!
| index: "vector_index", | ||
| path: "plot_embedding_voyage_3_large", | ||
| queryVector: queryVector, | ||
| numCandidates: limitNum * 15, // Search more candidates for better results |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there something we should say about why we're multiplying the limit by 15 here? 15 feels like an arbitrary/magic number and I wonder if there's some way we arrived at this number or some guidance we should provide about balancing performance/quality here when setting this multiple?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turns out the number is actually supposed to be 20! From the docs here: https://www.mongodb.com/docs/atlas/atlas-vector-search/vector-search-stage/#numcandidates-selection
I clarified the comment to explain better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome - thanks for the doc link! Good to know what the guidance is. I haven't checked the other two backends but we should probably check what the others are using as part of the cleanup pass tomorrow, and update it if we're using different values there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely, I'll check!
| * Find similar movies using vector search based on a movie ID. | ||
| * Demonstrates finding semantically similar movies using plot embeddings. | ||
| */ | ||
| export async function findSimilarMovies(req: Request, res: Response): Promise<void> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I missed it but I can't find anywhere in the frontend that we're using this API. Are we supporting future development here? I wonder if we can/should omit this from the v1 of the app if we haven't implemented the corresponding frontend feature yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yep, this was a separate, earlier implementation of vector search that wasn't working out. I definitely thought I got rid of it but I guess not! Removed now.
server/express/src/routes/movies.ts
Outdated
| * Aggregate movies with their most recent comments. | ||
| * Demonstrates MongoDB $lookup aggregation to join collections. | ||
| */ | ||
| router.get("/aggregations/comments", asyncHandler(movieController.getMoviesWithMostRecentComments)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, these aggregation routes don't match the frontend. We need some adjustments to match the client:
| router.get("/aggregations/comments", asyncHandler(movieController.getMoviesWithMostRecentComments)); | |
| router.get("/aggregations/reportingByComments", asyncHandler(movieController.getMoviesWithMostRecentComments)); |
server/express/src/routes/movies.ts
Outdated
| * Aggregate directors with the most movies. | ||
| * Demonstrates MongoDB $unwind and $group for array aggregation. | ||
| */ | ||
| router.get("/aggregations/directors", asyncHandler(movieController.getDirectorsWithMostMovies)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| router.get("/aggregations/directors", asyncHandler(movieController.getDirectorsWithMostMovies)); | |
| router.get("/aggregations/reportingByDirectors", asyncHandler(movieController.getDirectorsWithMostMovies)); |
server/express/src/routes/movies.ts
Outdated
| router.get("/aggregations/years", asyncHandler(movieController.getMoviesByYearWithStats)); | ||
|
|
||
| /** | ||
| * GET /api/movies/aggregations/directors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| * GET /api/movies/aggregations/directors | |
| * GET /api/movies/aggregations/reportingByDirectors |
| title: result.title, | ||
| year: result.year, | ||
| genres: result.genres, | ||
| imdb: result.imdbRating ? { rating: result.imdbRating } : undefined, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem to match what the client is expecting:
| imdb: result.imdbRating ? { rating: result.imdbRating } : undefined, | |
| imdbRating: result.imdbRating, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch
server/express/src/types/index.ts
Outdated
| imdb?: { | ||
| rating?: number; | ||
| votes?: number; | ||
| }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related to the mismatch with what the client is expecting:
| imdb?: { | |
| rating?: number; | |
| votes?: number; | |
| }; | |
| imdbRating?: number; |
server/express/src/routes/movies.ts
Outdated
| * Find similar movies using vector search based on a movie ID. | ||
| * Demonstrates finding semantically similar movies using plot embeddings. | ||
| */ | ||
| router.get("/find-similar-movies", asyncHandler(movieController.findSimilarMovies)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noted in the controller, but it looks like the frontend isn't using this route currently. Should we omit this until the frontend has implemented it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
dacharyc
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome - thanks for the fixups, everything seems to be working well! ✅
Adds MongoDB Search, Vector Search, and Aggregation endpoints to the express API. Also adds unit testing for these endpoints and cleans up the unit test files.
This PR also fixes up an issue with the vector search results pagination not working correctly