Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

为什么两个案例的采集结果被存放的形式有所区别?结果的形式有什么规则吗? #28

Closed
jayroe opened this issue Feb 10, 2018 · 2 comments

Comments

@jayroe
Copy link

jayroe commented Feb 10, 2018

案例一:

html:

<div id="one">
    <div class="two">
        <a href="http://querylist.cc">QueryList官网</a>
        <img src="http://querylist.com/1.jpg" alt="这是图片">
        <img src="http://querylist.com/2.jpg" alt="这是图片2">
    </div>
    <span>其它的<b>一些</b>文本</span>
</div>

rules:

$rules = array(
    //采集id为one这个元素里面的纯文本内容
    'text' => array('#one','text'),
    //采集class为two下面的超链接的链接
    'link' => array('.two>a','href'),
    //采集class为two下面的第二张图片的链接
    'img' => array('.two>img:eq(1)','src'),
    //采集span标签中的HTML内容
    'other' => array('span','html')
);

采集结果:全部被放在了 Array[0]中

Array
(
    [0] => Array
        (
            [text] => 
        QueryList官网
    其它的一些文本
            [link] => http://querylist.cc
            [img] => http://querylist.com/2.jpg
            [other] => 其它的<b>一些</b>文本
        )
)

案例二:

html:

<div class="xx">
        <img data-src="/path/to/1.jpg" alt="">
    </div>
    <div class="xx">
        <img data-src="/path/to/2.jpg" alt="">
    </div>
    <div class="xx">
        <img data-src="/path/to/3.jpg" alt="">
    </div>

rules:

array(
        'image' => array('.xx>img','data-src')
    )

采集结果:分别被放在了Array[0],Array[1],Array[2]中?

Array
(
   [0] => Array
       (
           [image] => /path/to/1.jpg
       )
   [1] => Array
       (
           [image] => /path/to/2.jpg
       )
   [2] => Array
       (
           [image] => /path/to/3.jpg
       )
)
@jae-jae
Copy link
Owner

jae-jae commented Feb 12, 2018

案例一 中符合采集规则的数据只有一条,所以采集结果只有一条,案例二符合采集规则的数据有多条,所以采集结果有多条;案例一中Array[0]中虽然有多条数据,但他们是一个整体,合起来是一条数据。

@jayroe
Copy link
Author

jayroe commented Feb 12, 2018

@jae-jae 很高兴能收到你的回信,感谢你对我提问的回答。只是读完回答后,我还是有点不明白。在案例一中,我写了三条采集规则,分别为'text','link'和'img',并且这三条采集规则都能采集到对应数据,为何你的回答中说,符合采集规则的数据只有一条?以下我将附上采集规则,期待你的下一次回复。

$rules = array(
    //采集id为one这个元素里面的纯文本内容
    'text' => array('#one','text'),
    //采集class为two下面的超链接的链接
    'link' => array('.two>a','href'),
    //采集class为two下面的第二张图片的链接
    'img' => array('.two>img:eq(1)','src'),
    //采集span标签中的HTML内容
    'other' => array('span','html')
);

@jae-jae jae-jae closed this as completed Sep 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants