Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Any way to send HTTP POST requests? #106

Closed
scottwoodall opened this issue Jan 10, 2018 · 7 comments
Closed

Any way to send HTTP POST requests? #106

scottwoodall opened this issue Jan 10, 2018 · 7 comments

Comments

@scottwoodall
Copy link

In working with toapi I came across a scenario where the web page had an HTML table that was paginated.

Clicking on "next page" would issue an ajax post request to fetch the next set of records in the data set.

Is there anyway to accomplish this with toapi?

@alvinwoon
Copy link

I am also looking for the ability to follow links or parse the next page. Sometimes the first url is not what you are looking for (for example if you want to parse the first result of a search page and not the search page itself)

@elliotgao2
Copy link
Owner

@scottwoodall

There is a solution here:

api = Api(url)
app = api.server.app

@app.route('/post_page/')
def post_method():
    res = requests.post(url, data) # You need to analysis the ajax post request of source site.
    return  item.parse(res.text)

@elliotgao2
Copy link
Owner

@alvinwoon
Copy link

@gaojiuli Thanks!

@scottwoodall
Copy link
Author

@gaojiuli Where does data, and item come from? I tried:

@app.route('/posts')
def post_method(*args, **kwargs):
    print(args)
    print(kwargs)

but they are empty.

@Ehco1996
Copy link

Ehco1996 commented Jan 21, 2018

由于toapi内置的fetch_page_source()方法 没有针对post请求的情况

我们需要自行添加flask路由来实现功能

这里给出一个比较详细的例子

假设我需要通过post方法来得到这个 url 的数据,并且通过toapi的方式来解析的

  • items的编写
from toapi import Item, XPath


class Search(Item):
    '''
    从搜索的界面解析出
    书名 id 链接 简介
    '''
    title = XPath('//h3/a/text()')
    book_id = XPath('//h3/a/@href')
    url = XPath('//h3/a/@href')
    content = XPath('//p[2]/text()')

    def clean_title(self, title):
        return ''.join(title)

    def clean_book_id(self, book_id):
        return book_id.split('-')[1]

    def clean_url(self, url):
        return url[:url.find('?')]

    class Meta:
        source = XPath('//li[@class="pbw"]')
        # 这里的route留空,防止重复注册路由
        route = {}
  • 路由的注册
from toapi import Api
from items.search import Search
from settings import MySettings
import json
import requests


api = Api('',settings=MySettings)
api.register(Search)

@api.server.app.route('/search/<keyword>')
def search_page(keyword):
    '''
    91bay新书论坛
    搜索功能
    '''
    data = {
        'searchsel': 'forum',
        'mod': 'forum',
        'srchtype': 'title',
        'srchtxt': keyword,
    }
    r = requests.post(
        'http://91baby.mama.cn/search.php?searchsubmit=yes', data)
    r.encoding = 'utf8'
    html = r.text
    results = {}
    items = [Search]
    # 通过toapi的方法对网页进行解析
    for item in items:
        parsed_item = api.parse_item(html, item)
        results[item.__name__] = parsed_item
    # 返回json
    return api.server.app.response_class(
        response=json.dumps(results, ensure_ascii=False),
        status=200,
        mimetype='application/json'
    )

if __name__ == '__main__':
    api.serve()

这样我们就可以通过访问http://127.0.0.1:5000/search/keyword 来解析post数据

这个方法由于没有得到toapi的支持
所以缓存功能是不可以使用的

@howie6879
Copy link
Contributor

Hi @Ehco1996
You can also use the cache by yourself
There is a document here:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants