Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use Py-StackExchange rather than requests and bs4 #13

Merged
merged 2 commits into from
Feb 13, 2015

Conversation

WnP
Copy link
Contributor

@WnP WnP commented Feb 10, 2015

answers are now 140 character long in listing and follow by ... if they are more long

let me know if you think it's a good idea or not

@lukasschwab
Copy link
Owner

This was a big thing for us––hopefully will improve speed as well.

I'll take a look at this later; there was some issue communication on the Py-StackExchange repo about getting just answer bodies using the API. There's a good chance you've implemented that, but it sounds like a great way to eliminate unnecessary data transfer (the rest of the HTML in the site, as with requests/bs4).

Just dropping this here for my own reference when reviewing––cheers!

@WnP
Copy link
Contributor Author

WnP commented Feb 11, 2015

I hadn't seen that issue before, but I've read the StackExchange API and Py-StackExchange's source code before implementing this feature, so yes it's implemented indeed ;-)

and yes I think it a more efficient method to deal only with the json API rather than full html requests

@lukasschwab
Copy link
Owner

@WnP This looks suuuuper clean. Starting testing, hopefully will merge by EOD.

@lukasschwab
Copy link
Owner

@WnP I dig it, merging.

I will make some small modifications to the way the output is printed myself, just because I think it is easier to implement those changes than communicate them. Very minor, just adding some newlines here and there. Will do new release with those changes.

Also, I notice an anecdotal speed difference... Do you?

Thanks!

lukasschwab added a commit that referenced this pull request Feb 13, 2015
use Py-StackExchange rather than requests and bs4
@lukasschwab lukasschwab merged commit 8097c61 into lukasschwab:master Feb 13, 2015
lukasschwab added a commit that referenced this pull request Feb 13, 2015
@WnP WnP deleted the query_refactoring branch February 14, 2015 10:04
@WnP
Copy link
Contributor Author

WnP commented Feb 14, 2015

@lukasschwab yes the speed difference is anecdotal from client side, let's compare them with this simple script:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
from timeit import timeit
import stackexchange
from stackexchange import Sort
import bs4
import requests
import html2text

h = html2text.HTML2Text()
term = 'python flask'
API_KEY = "3GBT2vbKxgh*ati7EBzxGA(("
so = stackexchange.Site(stackexchange.StackOverflow, app_key=API_KEY, impose_throttling=True)
questions = so.search_advanced(
    q=term,
    sort=Sort.Votes)
question = None

for q in questions:
    if 'accepted_answer_id' in q.json:
        question = q
        break
else:
    raise Exception('No question found')


def old_way_query(question):
    questionurl = question.json['link']
    answerid = question.json['accepted_answer_id']
    response = requests.get(questionurl)
    soup = bs4.BeautifulSoup(response.text)
    # Focuses on the single div with the matching answerid--necessary b/c bs4 is quirky
    for answerdiv in soup.find_all('div', attrs={'id': 'answer-' + str(answerid)}):
        answertext = h.handle(answerdiv.find('div', attrs={'class': 'post-text'}).prettify())


def new_way_query(question):
    answerid = question.json['accepted_answer_id']
    questiontext = h.handle(so.question(question.id, body=True).body)
    answer = h.handle(so.answer(answerid, body=True).body)

print('old way: %s' % timeit("old_way_query(question)", "from __main__ import question, old_way_query", number=20))
print('new way: %s' % timeit("new_way_query(question)", "from __main__ import question, new_way_query", number=20))

on my laptop using Python 2.7.9 it outputs:

old way: 12.9633069038
new way: 0.572069883347

so in this case (20 executions) it's 22 times faster, the more executions you have the more faster it is

for one execution the difference is really anecdotal

old way: 0.849025964737
new way: 0.543494939804

1.11 times faster ^^

however, these tests are highly dependent on the network connection

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants