Skip to content
This repository has been archived by the owner on Jul 13, 2021. It is now read-only.

Select /html/body/ #38

Open
lbell opened this issue Feb 24, 2014 · 5 comments
Open

Select /html/body/ #38

lbell opened this issue Feb 24, 2014 · 5 comments

Comments

@lbell
Copy link

lbell commented Feb 24, 2014

Any way to select an entire body of a page? I'm working on one that has no div or classes or much of anything except text wrapped in a body tag.

@ravenx99
Copy link

I'm in the same boat... I want to pull a comic image from a page that has no classes. This doesn't seem to work (Firebug says it's the xpath to the image tag).

"xpath": "html/body/table/tbody/tr[2]/td/table/tbody/tr/td[2]/table/tbody/tr/td/table/tbody/tr[2]/td/img"

@m42e
Copy link

m42e commented Jul 25, 2014

Can you provide a sample url?

@troydunham
Copy link

I'm having a similar issue with a page without usable DIV classes. I have found a unique locator but can't seem to get it to pull body text.

Here is an example page: http://paddocktalk.com/news/html/story-259326.html
Here is the unique string: "IMG SRC=http://paddocktalk.com/news/html/images/smilies/icon_smile.gif"
I've tried all variations of XPATH that I can think of. My other pages with div classes are working perfectly.

@m42e
Copy link

m42e commented Aug 11, 2014

It seems to me, that it is not a proper xml format, maybe some tags are missing or there is no encoding specified so some characters can not be read successfully. This will lead to errors and the xpath selection is not performed. You can try my version and use the split method. https://github.com/m42e/ttrss_plugin-af_feedmod

@m42e
Copy link

m42e commented Aug 11, 2014

Ok, i digged in deeper.

@troydunham try: "xpath" : "td[@width='85%' and @valign='top' and @bgcolor='#FFFFFF']" and you may be near the treasure.....

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants