Any way to send HTTP POST requests? #106

scottwoodall · 2018-01-10T18:26:38Z

In working with toapi I came across a scenario where the web page had an HTML table that was paginated.

Clicking on "next page" would issue an ajax post request to fetch the next set of records in the data set.

Is there anyway to accomplish this with toapi?

alvinwoon · 2018-01-11T00:49:29Z

I am also looking for the ability to follow links or parse the next page. Sometimes the first url is not what you are looking for (for example if you want to parse the first result of a search page and not the search page itself)

elliotgao2 · 2018-01-11T01:09:21Z

@scottwoodall

There is a solution here:

api = Api(url)
app = api.server.app

@app.route('/post_page/')
def post_method():
    res = requests.post(url, data) # You need to analysis the ajax post request of source site.
    return  item.parse(res.text)

elliotgao2 · 2018-01-11T01:10:49Z

@alvinwoon

This example could help you.

https://github.com/gaojiuli/toapi/blob/master/examples/hackernews_page.py

alvinwoon · 2018-01-12T09:32:19Z

@gaojiuli Thanks!

scottwoodall · 2018-01-12T16:20:59Z

@gaojiuli Where does data, and item come from? I tried:

@app.route('/posts')
def post_method(*args, **kwargs):
    print(args)
    print(kwargs)

but they are empty.

Ehco1996 · 2018-01-21T12:33:32Z

由于toapi内置的fetch_page_source()方法没有针对post请求的情况

我们需要自行添加flask路由来实现功能

这里给出一个比较详细的例子

假设我需要通过post方法来得到这个 url 的数据，并且通过toapi的方式来解析的

items的编写

from toapi import Item, XPath


class Search(Item):
    '''
    从搜索的界面解析出
    书名 id 链接 简介
    '''
    title = XPath('//h3/a/text()')
    book_id = XPath('//h3/a/@href')
    url = XPath('//h3/a/@href')
    content = XPath('//p[2]/text()')

    def clean_title(self, title):
        return ''.join(title)

    def clean_book_id(self, book_id):
        return book_id.split('-')[1]

    def clean_url(self, url):
        return url[:url.find('?')]

    class Meta:
        source = XPath('//li[@class="pbw"]')
        # 这里的route留空，防止重复注册路由
        route = {}

路由的注册

from toapi import Api
from items.search import Search
from settings import MySettings
import json
import requests


api = Api('',settings=MySettings)
api.register(Search)

@api.server.app.route('/search/<keyword>')
def search_page(keyword):
    '''
    91bay新书论坛
    搜索功能
    '''
    data = {
        'searchsel': 'forum',
        'mod': 'forum',
        'srchtype': 'title',
        'srchtxt': keyword,
    }
    r = requests.post(
        'http://91baby.mama.cn/search.php?searchsubmit=yes', data)
    r.encoding = 'utf8'
    html = r.text
    results = {}
    items = [Search]
    # 通过toapi的方法对网页进行解析
    for item in items:
        parsed_item = api.parse_item(html, item)
        results[item.__name__] = parsed_item
    # 返回json
    return api.server.app.response_class(
        response=json.dumps(results, ensure_ascii=False),
        status=200,
        mimetype='application/json'
    )

if __name__ == '__main__':
    api.serve()

这样我们就可以通过访问http://127.0.0.1:5000/search/keyword 来解析post数据

这个方法由于没有得到toapi的支持
所以缓存功能是不可以使用的

howie6879 · 2018-01-21T13:44:43Z

Hi @Ehco1996
You can also use the cache by yourself
There is a document here:

cached： demo
api_cached

howie6879 mentioned this issue Jan 17, 2018

关于post数据的获取和item是编写 #108

Closed

elliotgao2 closed this as completed Jan 25, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Any way to send HTTP POST requests? #106

Any way to send HTTP POST requests? #106

scottwoodall commented Jan 10, 2018

alvinwoon commented Jan 11, 2018

elliotgao2 commented Jan 11, 2018

elliotgao2 commented Jan 11, 2018

alvinwoon commented Jan 12, 2018

scottwoodall commented Jan 12, 2018

Ehco1996 commented Jan 21, 2018 •

edited

howie6879 commented Jan 21, 2018

Any way to send HTTP POST requests? #106

Any way to send HTTP POST requests? #106

Comments

scottwoodall commented Jan 10, 2018

alvinwoon commented Jan 11, 2018

elliotgao2 commented Jan 11, 2018

elliotgao2 commented Jan 11, 2018

alvinwoon commented Jan 12, 2018

scottwoodall commented Jan 12, 2018

Ehco1996 commented Jan 21, 2018 • edited

howie6879 commented Jan 21, 2018

Ehco1996 commented Jan 21, 2018 •

edited