Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

转换编码格式报错 #5

Closed
storyflow opened this issue Sep 6, 2017 · 4 comments
Closed

转换编码格式报错 #5

storyflow opened this issue Sep 6, 2017 · 4 comments

Comments

@storyflow
Copy link
Contributor

storyflow commented Sep 6, 2017

版本:3.1.2
php版本:5.3.27
处理:GB2312 转 UTF-8

报错截图:
qq 20170906123551

不影响正常执行

相关代码:

private function _arrayConvertEncoding($arr, $toEncoding, $fromEncoding)
{
    eval('$arr = '.iconv($fromEncoding, $toEncoding.'//IGNORE', var_export($arr,TRUE)).';');
    return $arr;
}
@storyflow
Copy link
Contributor Author

后续:应该编码就是UTF-8,导致报错。
但是仍然没解决乱码问题。

@jae-jae
Copy link
Owner

jae-jae commented Sep 6, 2017

QueryList内置的转码比较简单,并不能适用于复杂情况,请手动转码HTML然后再把HTML传给Query方法

@storyflow
Copy link
Contributor Author

@jae-jae 和内置的转码方法是一样的,使用的是iconv。
我解析的是taobao的dom,先转码,再把HTML传给Query方法,会解析不了。
我自己再尝试下吧。谢谢

@storyflow
Copy link
Contributor Author

@jae-jae 问题已经解决了,可以关闭了。

问题:抓取页面乱码问题
源网站编码:GBK 想要的编码:UTF-8

方案一:内置的转码方法
GBK => UTF-8 无效,仍然会出现乱码的情况

方案二:手动转码HTML,然后再把HTML传给Query方法。
解决方法:需要移除头部。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants