I can not get blank text in td tag. #62

pc10201 · 2016-10-19T06:40:37Z

# coding=utf-8

from parsel import Selector

html = u'''
                        <table class="table table-bordered table-hover table-condensed">
                            <thead>
                            <tr>
                                <th>#</th>
                                <th>code</th>
                                <th>vendor</th>
                                <th>name</th>
                                <th>num</th>
                            </tr>
                            </thead>
                            <tbody>
                                <tr>
                                    <th scope="row">1750</th>
                                    <td><a href="/exam/000-643">000-643</a></td>
                                    <td>IBM</td>
                                    <td></td>
                                    <td>45</td>
                                </tr>
                                </tbody>
'''

sel = Selector(text=html)
print sel.xpath('//tbody/tr//td/text()').extract()
print sel.xpath('//tbody/tr//td//text()').extract()

output


[u'000-643', u'IBM', u'45']```

The text was updated successfully, but these errors were encountered:

redapple · 2016-10-19T09:20:03Z

XPath's data model does not consider "blank text" as text nodes:

A text node always has at least one character of data.

So the output of parsel seems correct to me. (And I get the same results with http://codebeautify.org/Xpath-Tester for example.)

What you could do is to loop on <td> elements and apply text() on them:

>>> sel.xpath('//tbody/tr//td/text()').extract()
[u'IBM', u'45']
>>> for td in sel.xpath('//tbody/tr//td'):
...     print(td.xpath('text()').extract())
... 
[]
[u'IBM']
[]
[u'45']
>>>

wsgggws · 2018-12-19T02:11:44Z

Python3.6.7
Scrapy 1.5.1

[info.xpath('text()').extract()
      for info in response.xpath('//td').extract()]

Error:
for info in response.xpath('//td').extract()]))\nAttributeError: 'str' object has no attribute 'xpath'\n"

Gallaecio · 2018-12-19T08:24:47Z

Python3.6.7
Scrapy 1.5.1
[info.xpath('text()').extract()
      for info in response.xpath('//td').extract()]
Error:
for info in response.xpath('//td').extract()]))\nAttributeError: 'str' object has no attribute 'xpath'\n"

Please, instead of hijacking an unrelated Scrapy issue, ask your question in StackOverflow. It is an easy question, I bet you will get a prompt answer there.

ilyazub · 2022-02-16T21:04:36Z

@pc10201 s.xpath("//tbody/tr//td").xpath("normalize-space()").getall() returns None for blank text nodes. text() ignores blank text nodes as expected.

>>> s.xpath("//tbody/tr//td").xpath("normalize-space()").getall()
['000-643', 'IBM', '', '45']

Full code

from parsel import Selector

html = """
<table class="table table-bordered table-hover table-condensed">
  <thead>
    <tr>
        <th>#</th>
        <th>code</th>
        <th>vendor</th>
        <th>name</th>
        <th>num</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th scope="row">1750</th>
      <td><a href="/exam/000-643">000-643</a></td>
      <td>IBM</td>
      <td></td>
      <td>45</td>
    </tr>
  </tbody>
</table>
"""

s = Selector(text=html)

with_text = s.xpath("//tbody/tr//td//text()").getall()
with_normalize_space = s.xpath("//tbody/tr//td").xpath("normalize-space()").getall()

print(with_text, with_normalize_space)

Output

['000-643', 'IBM', '45'] ['000-643', 'IBM', '', '45']

I'm commenting on this old issue because I've faced it today.

pc10201 changed the title ~~I can get blank text in td tag.~~ I can not get blank text in td tag. Oct 19, 2016

redapple closed this as completed Oct 19, 2016

redapple added the invalid label Oct 19, 2016

barrio mentioned this issue Apr 30, 2024

Parsel import causes crash #294

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

I can not get blank text in td tag. #62

I can not get blank text in td tag. #62

pc10201 commented Oct 19, 2016

redapple commented Oct 19, 2016

wsgggws commented Dec 19, 2018

Gallaecio commented Dec 19, 2018

ilyazub commented Feb 16, 2022 •

edited

Loading

I can not get blank text in td tag. #62

I can not get blank text in td tag. #62

Comments

pc10201 commented Oct 19, 2016

redapple commented Oct 19, 2016

wsgggws commented Dec 19, 2018

Gallaecio commented Dec 19, 2018

ilyazub commented Feb 16, 2022 • edited Loading

ilyazub commented Feb 16, 2022 •

edited

Loading