Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting "floating" text outside of html elements #27

Closed
ainsleyc opened this issue Apr 25, 2014 · 2 comments
Closed

Getting "floating" text outside of html elements #27

ainsleyc opened this issue Apr 25, 2014 · 2 comments

Comments

@ainsleyc
Copy link

First off, awesome project!

Looking at this sample page, there is a block of data as shown below:
http://tlahuac.wired.com.mx/687770/grupo-escape.html

With the current UI, I haven't found a good way to extract the multiple pieces of data in front of the elements.

The best I have come up with is to select the "p" element and apply a regex on the annotation, but that will only allow you to retrieve one value (such as the street or the telephone number)

This pattern of putting "floating" text outside of an html element seems pretty common, is there a good way of extracting them?

<p>
  <span>Nombre de empresa:</span> Grupo Escape
  <br/><br/>
  <span>Tel:</span> 5860 1232 1233, 5845 6457 6457
  <br/><br/>
  <span><input class="DefBtn" type="submit" value="Contáctenos" onclick="location.href='/contact.php?cid=687770';"/></span>
  <br/><br/>
  <span>Street:</span> Eje 10 mz-32 Lote 3
  <br/><br/>
  <span>Colonia:</span> colonia Santa Catarina
  <br/><br/>
  <span>Código postal:</span> 13100
  <br/><br/>
  <span>Cuidad:</span> Tlahuac, Distrito Federal
  <br/><br/>            <span>Web:</span> <a href="http://www.grupoescape.com.mx">www.grupoescape.com.mx</a>
  <br/><br/>            </p>    <h2>Mapa</h2>
<p>
@ainsleyc ainsleyc changed the title Getting Getting "floating" html outside of html elements Apr 25, 2014
@ainsleyc ainsleyc changed the title Getting "floating" html outside of html elements Getting "floating" text outside of html elements Apr 25, 2014
@duendex
Copy link
Contributor

duendex commented Apr 25, 2014

Have you tried selecting just the piece of text you want to extract (like
in a text editor)?
I think it will do what you need.

On Fri, Apr 25, 2014 at 4:17 PM, Ainsley Chong notifications@github.comwrote:

First off, awesome project!

Looking at this sample page, there is a block of data as shown below:
http://tlahuac.wired.com.mx/687770/grupo-escape.html

With the current UI, I haven't found a good way to extract the multiple
pieces of data in front of the elements.

The best I have come up with is to select the

element and apply a regex on the annotation, but that will only allow you
to retrieve one value (such as the street or the telephone number)

This pattern of putting "floating" text outside of an html element seems
pretty common, is there a good way of extracting them?

Nombre de empresa: Grupo Escape

Tel: 5860 1232 1233, 5845 6457 6457



Street: Eje 10 mz-32 Lote 3

Colonia: colonia Santa Catarina

Código postal: 13100

Cuidad: Tlahuac, Distrito Federal

Web: www.grupoescape.com.mx

Mapa


Reply to this email directly or view it on GitHubhttps://github.com//issues/27
.

@ainsleyc
Copy link
Author

Oh nice, I didn't know you could do that.

Thanks for the response!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants