Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add encoding #36

Closed
allphfa opened this issue Dec 19, 2017 · 1 comment
Closed

add encoding #36

allphfa opened this issue Dec 19, 2017 · 1 comment

Comments

@allphfa
Copy link

allphfa commented Dec 19, 2017

request.py

import asyncio

from .log import logger

try:
    import uvloop

    asyncio.set_event_loop_policy(uvloop.EventLoopPolicy())
except ImportError:
    pass


async def fetch(url, spider, session, semaphore):
    with (await semaphore):
        try:
            if callable(spider.headers):
                headers = spider.headers()
            else:
                headers = spider.headers
            # hare   hare   hare
            if hasattr(spider,'encoding'):
                codec = spider.encoding
            else:
                codec = 'utf-8'
            # hare   hare   hare

            
            async with session.get(url, headers=headers) as response:
                if response.status in [200, 201]:
                    data = await response.text(encoding=codec)   # hare   hare   hare
                    return data
                logger.error('Error: {} {}'.format(url, response.status))
                return None
        except:
            return None

test.py

class MySpider(Spider):
    concurrency = 5
    encoding = 'gbk'
    start_url = r'http://blog.sciencenet.cn/home.php?mod=space&uid=40109&do=blog&view=me&from=space&page=1'
    parsers = [Parser('http://blog.sciencenet.cn/home.php.*?page=\d+',Post)]
@allphfa
Copy link
Author

allphfa commented Dec 20, 2017

I forgot there was an important parameter in front of you

      data = await response.text(encoding=codec,errors='ignore')

@allphfa allphfa closed this as completed Apr 25, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants