Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
[![Dependency Status](https://david-dm.org/fossasia/query-server.svg)](https://david-dm.org/ossasia/query-server)
[![Join the chat at https://gitter.im/fossasia/query-server](https://badges.gitter.im/fossasia/query-server.svg)](https://gitter.im/fossasia/query-server?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge)

> The query server can be used to search a keyword/phrase on a search engine (Google, Yahoo, Bing, Ask, DuckDuckGo, Yandex, Baidu and Exalead) and get the results as `json` or `xml`. The tool also stores the searched query string in a MongoDB database for analytical purposes. (The search engine scrapper is based on the scraper at [fossasia/searss](https://github.com/fossasia/searss).)
> The query server can be used to search a keyword/phrase on a search engine (Google, Yahoo, Bing, Ask, DuckDuckGo, Yandex, Baidu, Exalead, Quora and Youtube) and get the results as `json` or `xml`. The tool also stores the searched query string in a MongoDB database for analytical purposes. (The search engine scrapper is based on the scraper at [fossasia/searss](https://github.com/fossasia/searss).)

[![Deploy to Docker Cloud](https://files.cloud.docker.com/images/deploy-to-dockercloud.svg)](https://cloud.docker.com/stack/deploy/?repo=https://github.com/fossasia/query-server) [![Deploy](https://www.herokucdn.com/deploy/button.svg)](https://heroku.com/deploy?template=https://github.com/fossasia/query-server) [![Deploy on Scalingo](https://cdn.scalingo.com/deploy/button.svg)](https://my.scalingo.com/deploy?source=https://github.com/fossasia/query-server#master) [![Deploy to Bluemix](https://bluemix.net/deploy/button.png)](https://bluemix.net/deploy?repository=https://github.com/fossasia/query-server&branch=master)

Expand All @@ -23,7 +23,7 @@ The API(s) provided by query-server are as follows:

` GET /api/v1/search/<search-engine>?query=query&format=format `

> *search-engine* : [`google`, `ask`, `bing`, `duckduckgo`, `yahoo`, `yandex`, `baidu`, `exalead`]
> *search-engine* : [`google`, `ask`, `bing`, `duckduckgo`, `yahoo`, `yandex`, `baidu`, `exalead`, `quora`, `youtube`]

> *query* : query can be any string

Expand Down
6 changes: 4 additions & 2 deletions app/scrapers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
from baidu import Baidu
from exalead import Exalead
from quora import Quora
from youtube import Youtube

scrapers = {
'g': Google(),
Expand All @@ -20,7 +21,8 @@
'yd': Yandex(),
'u': Baidu(),
'e': Exalead(),
'q': Quora()
'q': Quora(),
't': Youtube()
}


Expand All @@ -34,7 +36,7 @@ def small_test():


def feedgen(query, engine, count=10):
if engine == 'q':
if engine in ['q', 't']:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Putting this logic in two different files can be the source of future bugs. Please encapsulate this logic in one file or the other but not in both.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cclauss Please review again

urls = scrapers[engine].search_without_count(query)
else:
urls = scrapers[engine].search(query, count)
Expand Down
28 changes: 28 additions & 0 deletions app/scrapers/youtube.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
from __future__ import print_function
from generalized import Scraper


class Youtube(Scraper):
"""Scraper class for Youtube"""

def __init__(self):
self.url = 'https://www.youtube.com/results'
self.queryKey = 'search_query'

def parseResponse(self, soup):
""" Parse the response and return list of urls
Returns: urls (list)
[[Tile1,url1], [Title2, url2],..]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The returned values are not urls. They are dicts.

I know these changes were not tested under Python 3.

"""
urls = []
for a in soup.findAll('a'):
if a.get('href').startswith('/watch?'):
link = 'https://www.youtube.com' + str(a.get('href'))
if not a.getText().startswith('\n\n'):
urls.append({'title': a.getText(), 'link': link})
else:
continue

print('Youtube parsed: ' + str(urls))

return urls
7 changes: 2 additions & 5 deletions app/server.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ def search(search_engine):

engine = search_engine
if engine not in ('google', 'bing', 'duckduckgo', 'yahoo', 'ask',
'yandex', 'ubaidu', 'exalead', 'quora'):
'yandex', 'ubaidu', 'exalead', 'quora', 'tyoutube'):
err = [404, 'Incorrect search engine', qformat]
return bad_request(err)

Expand All @@ -49,10 +49,7 @@ def search(search_engine):
err = [400, 'Not Found - missing query', qformat]
return bad_request(err)

if engine[0] == 'q':
result = feedgen(query, engine[0])
else:
result = feedgen(query, engine[0], count)
result = feedgen(query, engine[0], count)
if not result:
err = [404, 'No response', qformat]
return bad_request(err)
Expand Down
Binary file added app/static/images/youtube_icon.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions app/templates/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,7 @@ <h1><code>query-server</code></h1>
<button type="submit" value="ubaidu" class="btn btn-lg search btn-outline"><img src="{{ url_for('static', filename='images/baidu_icon.ico') }}" width="30px" alt="Baidu Icon"> Baidu</button>
<button type="submit" value="exalead" class="btn btn-lg search btn-outline"><img src="{{ url_for('static', filename='images/exalead_icon.png') }}" width="30px" alt="Exalead Icon"> Exalead</button>
<button type="submit" value="quora" class="btn btn-lg search btn-outline"><img src="{{ url_for('static', filename='images/quora_icon.png') }}" width="30px" alt="Quora Icon"> Quora</button>
<button type="submit" value="tyoutube" class="btn btn-lg search btn-outline"><img src="{{ url_for('static', filename='images/youtube_icon.png') }}" width="30px" alt="YouTube Icon"> YouTube</button>
</div>
</div>
<div class="col-sm-2">
Expand Down