use Py-StackExchange rather than requests and bs4 #13

WnP · 2015-02-10T22:38:03Z

answers are now 140 character long in listing and follow by ... if they are more long

let me know if you think it's a good idea or not

lukasschwab · 2015-02-11T22:28:41Z

This was a big thing for us––hopefully will improve speed as well.

I'll take a look at this later; there was some issue communication on the Py-StackExchange repo about getting just answer bodies using the API. There's a good chance you've implemented that, but it sounds like a great way to eliminate unnecessary data transfer (the rest of the HTML in the site, as with requests/bs4).

Just dropping this here for my own reference when reviewing––cheers!

WnP · 2015-02-11T22:39:39Z

I hadn't seen that issue before, but I've read the StackExchange API and Py-StackExchange's source code before implementing this feature, so yes it's implemented indeed ;-)

and yes I think it a more efficient method to deal only with the json API rather than full html requests

lukasschwab · 2015-02-13T21:26:32Z

@WnP This looks suuuuper clean. Starting testing, hopefully will merge by EOD.

lukasschwab · 2015-02-13T21:31:31Z

@WnP I dig it, merging.

I will make some small modifications to the way the output is printed myself, just because I think it is easier to implement those changes than communicate them. Very minor, just adding some newlines here and there. Will do new release with those changes.

Also, I notice an anecdotal speed difference... Do you?

Thanks!

use Py-StackExchange rather than requests and bs4

WnP · 2015-02-14T10:57:45Z

@lukasschwab yes the speed difference is anecdotal from client side, let's compare them with this simple script:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
from timeit import timeit
import stackexchange
from stackexchange import Sort
import bs4
import requests
import html2text

h = html2text.HTML2Text()
term = 'python flask'
API_KEY = "3GBT2vbKxgh*ati7EBzxGA(("
so = stackexchange.Site(stackexchange.StackOverflow, app_key=API_KEY, impose_throttling=True)
questions = so.search_advanced(
    q=term,
    sort=Sort.Votes)
question = None

for q in questions:
    if 'accepted_answer_id' in q.json:
        question = q
        break
else:
    raise Exception('No question found')


def old_way_query(question):
    questionurl = question.json['link']
    answerid = question.json['accepted_answer_id']
    response = requests.get(questionurl)
    soup = bs4.BeautifulSoup(response.text)
    # Focuses on the single div with the matching answerid--necessary b/c bs4 is quirky
    for answerdiv in soup.find_all('div', attrs={'id': 'answer-' + str(answerid)}):
        answertext = h.handle(answerdiv.find('div', attrs={'class': 'post-text'}).prettify())


def new_way_query(question):
    answerid = question.json['accepted_answer_id']
    questiontext = h.handle(so.question(question.id, body=True).body)
    answer = h.handle(so.answer(answerid, body=True).body)

print('old way: %s' % timeit("old_way_query(question)", "from __main__ import question, old_way_query", number=20))
print('new way: %s' % timeit("new_way_query(question)", "from __main__ import question, new_way_query", number=20))

on my laptop using Python 2.7.9 it outputs:

old way: 12.9633069038
new way: 0.572069883347

so in this case (20 executions) it's 22 times faster, the more executions you have the more faster it is

for one execution the difference is really anecdotal

old way: 0.849025964737
new way: 0.543494939804

1.11 times faster ^^

however, these tests are highly dependent on the network connection

WnP added 2 commits February 10, 2015 23:34

use Py-StackExchange rather than requests and bs4

9201637

remove useless dependecies in setup.py

d4adc08

lukasschwab added a commit that referenced this pull request Feb 13, 2015

Merge pull request #13 from WnP/query_refactoring

8097c61

use Py-StackExchange rather than requests and bs4

lukasschwab merged commit 8097c61 into lukasschwab:master Feb 13, 2015

lukasschwab added a commit that referenced this pull request Feb 13, 2015

Adds credits to WnP for PR #13

fd59b21

WnP deleted the query_refactoring branch February 14, 2015 10:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use Py-StackExchange rather than requests and bs4 #13

use Py-StackExchange rather than requests and bs4 #13

WnP commented Feb 10, 2015

lukasschwab commented Feb 11, 2015

WnP commented Feb 11, 2015

lukasschwab commented Feb 13, 2015

lukasschwab commented Feb 13, 2015

WnP commented Feb 14, 2015

use Py-StackExchange rather than requests and bs4 #13

use Py-StackExchange rather than requests and bs4 #13

Conversation

WnP commented Feb 10, 2015

lukasschwab commented Feb 11, 2015

WnP commented Feb 11, 2015

lukasschwab commented Feb 13, 2015

lukasschwab commented Feb 13, 2015

WnP commented Feb 14, 2015