Extend kennankole's solution by Judahmeek · Pull Request #385 · serpapi/code-challenge

Judahmeek · 2026-06-01T10:01:05Z

Obviously, instead of trying to come up with my own solution & tests, I went & checked out the competition. @kennankole's solution (#379) was by far the most robust (although it also proved overly complicated). The only real flaws that it seemed have is that it was more computationally expensive & that it had no way to detect drift, while a more brittle solution, such as depending on CSS, will be faster and will break as soon as Google changes whatever CSS it depends on.

I initially tried to solve this by merging #381 & #379 together (you can see my initial refactor of @DanTaiko's work here), with the idea that parts of @kennankole's logic would serve as a backup for scenarios that @DanTaiko's solution couldn't cover, but the longer I worked on it, the more it felt unnecessarily complex, so I scrapped that idea & tried using #379 as a base.

The idea of robust logic that tries to address most possible variants & forms of drift that sits behind a search index that provides performance for known solutions is definitely the ideal, however, and I hope that my code illustrates an approximate example of that.

...

P.S. the search results for "Tom Cruise films" is probably way outside the scope of what y'all expected us to cover, but if I was going to address it more effectively, then I would have copied @dsojevic's solution of replacing the html mappings since the initial search results for that particular kind of query even have the anchor links be lazy-loaded.

Of course, redirecting users who searched for "Tom Cruise films" to search results for "Tom Cruise filmography" would probably be the best course of action.

Parse the bundled SERP HTML into a SerpApi-shaped array of `{name, extensions, link, image}` without making HTTP requests. - Detect carousel tiles by structural signal (`/search?...&stick=...` siblinggroups), not volatile Google CSS classes, so the parser works across Van Gogh and variant fixtures. - Resolve thumbnails by parsing `_setImagesSrc(ii, s, r)` blocks into an `id -> image` map,including unescaping `\x3d` and `\/` values emitted in inline JS. - Extract `extensions` from leaf text nodes under each anchor to avoid container-text noise (for example, concatenated `name+year`). - Resolve `image` from values already present in the page file: inline JS mapping, inline non-placeholder data URIs, and in-file `data-src`/ `src` URLs. - Add comprehensive RSpec coverage for golden output, cross-layout fixtures, item parsing, thumbnail indexing, and carousel selection behavior.

This is because I discovered that interactive search results, such as the results for "Tom Cruise Movies", do not contain anchors in the initial HTML CI fix fix for missed @anchor reference

Adding Tom cruise filmography results to contrast with the Tom Cruise movies results. Adding the U.S. Presidents results because its parent data-attrid doesn't start with 'kc:' like most grid results

The changes to the group score method are what I'm most proud of. The original method returned an array, which when run through the max function (in the tiles method ~ line 36), acts like a series of tiebreakers This gives an overwhelming amount of weight to whatever quality proxy is measured first. The other aspect of my changes that I would like to draw your attention to is the use of environment variables. It's a basic feature, but one I don't recall seeing in my competitors PRs.

One flaw I noticed in nearly all competitors was relying on Google's image lazy-load script not to change in any way. A more robust solution than mine would account for the _setImagesSrc function name to also possibly change & probably try only relying on the data:image structure as the initial clue. It would make scanning the first script more computationally expensive, but detected variables could then be used to speed up processing of subsequent scripts. Hopefully, Google never decides to combine all their lazy-loading scripts together. I'm not sure how that could be detected performantly, but I'm sure I could find a way, given enough time.

Kennedy Omondi and others added 4 commits May 20, 2026 23:33

switch nokolexbor for nokogiri

6896c2a

fix for nokolexbor/nokogiri discrepancy

0810878

replace anchors with name elements as targets of initial tile search

a40de5a

This is because I discovered that interactive search results, such as the results for "Tom Cruise Movies", do not contain anchors in the initial HTML CI fix fix for missed @anchor reference

Judahmeek force-pushed the extend-kkole-solution branch from 7b9fda8 to dc8b05e Compare June 1, 2026 20:58

Judahmeek mentioned this pull request Jun 1, 2026

Nokolexbor does not support node identity equality comparisons like Nokogiri does serpapi/nokolexbor#27

Open

Judahmeek added 2 commits June 2, 2026 00:14

Add fixtures

3246d07

Adding Tom cruise filmography results to contrast with the Tom Cruise movies results. Adding the U.S. Presidents results because its parent data-attrid doesn't start with 'kc:' like most grid results

Judahmeek force-pushed the extend-kkole-solution branch from dc8b05e to 81a2bb2 Compare June 2, 2026 05:26

Judahmeek changed the title ~~Extend kennankole's solution [WIP]~~ Extend kennankole's solution Jun 2, 2026

Judahmeek marked this pull request as ready for review June 2, 2026 07:00

Judahmeek added 2 commits June 2, 2026 19:17

performance improvements & scrapeMemo index explanation

fcd228c

final tests & changes

e38aca0

Judahmeek force-pushed the extend-kkole-solution branch from 42197b4 to e38aca0 Compare June 3, 2026 05:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extend kennankole's solution#385

Extend kennankole's solution#385
Judahmeek wants to merge 9 commits into
serpapi:masterfrom
Judahmeek:extend-kkole-solution

Judahmeek commented Jun 1, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Judahmeek commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Judahmeek commented Jun 1, 2026 •

edited

Loading