Skip to content
This repository has been archived by the owner on Mar 5, 2022. It is now read-only.

site search broken? #79

Closed
jarun opened this issue May 12, 2016 · 11 comments · Fixed by #80
Closed

site search broken? #79

jarun opened this issue May 12, 2016 · 11 comments · Fixed by #80

Comments

@jarun
Copy link
Owner

jarun commented May 12, 2016

$ googler -d -c in -n 4 -w imdb.com mr robot
[DEBUG] Version 2.4
[DEBUG] Base URL [/search?ie=UTF-8&oe=UTF-8&start=0&num=4&]
[DEBUG] Search URL [www.google.co.in : /search?ie=UTF-8&oe=UTF-8&start=0&num=4&q=mr+robot+site:imdb.com]
[DEBUG] Response body written to '/tmp/googler-response-b40uq9sd'.

 1 Mr. Robot (TV Series 2015– ) - IMDb
http://www.imdb.com/title/tt4158110/
... who suffers from social anxiety disorder and forms connections through hacking. He's recruited by a mysterious anarchist, who calls himself Mr. Robot.

 2 Elliot played by Rami Malek | Characters & Crew | Mr
http://www.usanetwork.com/mrrobot/cast/elliot

 3 USA Network's Mr. Robot is Filming Again in C
http://amusingthezillion.com/2015/04/19/usa-networks-mr-robot-is-filming-again-in-coney-island/

 4 Mr. Robot (TV Series 2015– ) - IMDb
http://m.imdb.com/title/tt4158110/

 5 Mr. Robot (TV series) - Wikipedia, the free encyclopedia
https://en.m.wikipedia.org/wiki/Mr._Robot_(TV_series)

 6 Mr. Robot (TV Series 2015– ) - Episodes - IMDb
http://www.imdb.com/title/tt4158110/episodes?season=2
GET DISCOVERED. Enhance your IMDb Page. Go to IMDbPro ». |; Help · Login · Register · Login · Mr. Robot (TV Series 2015– ) Poster · Mr. Robot (2015– ) ...

 7 Mr. Robot (TV Series 2015– ) - Episodes - IMDb
http://www.imdb.com/title/tt4158110/episodes
Elliot attempts to hack Vera out of jail in order to save someone he cares about; Tyrell's "game" gets crazy; and Angela digs deeper into her mother's death.

 8 Mr. Robot (TV Series 2015– ) - Full Cast & Crew - IMDb
http://www.imdb.com/title/tt4158110/fullcredits/
Mr. Robot (TV Series 2015– ) cast and crew credits, including actors, actresses, directors, writers and more.

Enter n, p, o, result index or new keywords (? for help) o
[DEBUG] Next URL [/search?ie=UTF-8&oe=UTF-8&start=0&num=4&q=mr+robot+site:imdb.com]

Enter n, p, o, result index or new keywords (? for help)

Looks like card results (People who ask) are also showing up.

screenshot

Are we showing these deliberately? I would expect the card results to be shown as indented or not to show at all. Even if we don't show in case of site-specific search, I guess it's OK.

@jarun
Copy link
Owner Author

jarun commented May 12, 2016

On second thoughts, I am almost convinced we shouldn't show these People who ask results from other sites in our results.

@zmwangx
Copy link
Collaborator

zmwangx commented May 12, 2016

This is not a regression. Here's what you get from 2.3:

> /usr/local/bin/googler -c in -n 4 -w imdb.com mr robot
 1  Mr. Robot (TV Series 2015– ) - IMDb
http://www.imdb.com/title/tt4158110/
... who suffers from social anxiety disorder and forms connections through hacking. He's recruited by a mysterious
anarchist, who calls himself Mr. Robot.

 2  Elliot played by Rami Malek | Characters & Crew | Mr
http://www.usanetwork.com/mrrobot/cast/elliot
GET DISCOVERED. Enhance your IMDb Page. Go to IMDbPro ». |; Help · Login · Register · Login · Mr. Robot (TV Series
2015– ) Poster · Mr. Robot (2015– ) ...

 3  Mr. Robot (TV Series 2015– ) - Full Cast & Crew - IMDb
http://www.imdb.com/title/tt4158110/fullcredits/
Mr. Robot (TV Series 2015– ) cast and crew credits, including actors, actresses, directors, writers and more.

 4  "Mr. Robot" Episode #2.1 (TV Episode 2016) - IMDb
http://www.imdb.com/title/tt4901088/
Directed by Sam Esmail. With Rami Malek, Christian Slater, Carly Chaikin, Grace Gummer.

Enter n, p, result number or new keywords (? for help)

Note that it is showing the "People also ask" results as 2, 3, 4, then randomly got cut off, seemingly due to half-brokenness of the legacy parser logic.

@zmwangx
Copy link
Collaborator

zmwangx commented May 12, 2016

In this case these results are harder to cut since they satisfy all the constraints we are currently imposing on valid results. What I'll do is to implement a generic ignore feature to blacklist certain tag/class combinations, e.g. div.related-question-pair in this case. Then it'll be easy to ignore other things we don't want to show on a case-by-case basis with a simple addition to a ignore list.

CSS selectors would be ideal for this purpose, but we don't have access to external libraries, so there's only so much I can do without bloat.

@jarun
Copy link
Owner Author

jarun commented May 12, 2016

implement a generic ignore feature to blacklist certain tag/class combinations

Works for me. No external libs please. The current imports take 0.16 seconds... approximately 16% of a resultset fetch time on my home network. Plus there are other regular complexities of additional dependencies.

@zmwangx
Copy link
Collaborator

zmwangx commented May 12, 2016

Yeah, no plan to use external libs.

The current imports take 0.16 seconds...

It takes 96ms here (averaged over 100 runs). Anyway, very typical for Python.

@jarun
Copy link
Owner Author

jarun commented May 12, 2016

very typical for Python

True. I've removed 2 imports as conditional. That's the best we can do for now.

@zmwangx
Copy link
Collaborator

zmwangx commented May 12, 2016

I've removed 2 imports as conditional.

It does help a tiny bit, but it's considered bad practice per PEP 8. I personally avoid that style as much as possible.

@jarun
Copy link
Owner Author

jarun commented May 12, 2016

it's considered bad practice per PEP 8

We'll live with it for debug and json, which don't come in our regular workflow.

@zmwangx
Copy link
Collaborator

zmwangx commented May 12, 2016

I'm okay with it. Just pointing out that I avoid it personally.

@jarun
Copy link
Owner Author

jarun commented May 12, 2016

okies. i'm gonna sleep now. have a great day! 👯

@zmwangx
Copy link
Collaborator

zmwangx commented May 12, 2016

Good night. You'll have a PR by the time you wake up.

zmwangx added a commit to zmwangx/googler that referenced this issue May 12, 2016
Fixes jarun#79.

Currently ignored: div.related-question-pair.
@jarun jarun closed this as completed in #80 May 13, 2016
@lock lock bot locked and limited conversation to collaborators Nov 15, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants