Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update tutorials so Floki does not parse documents multiple times #42

Closed
oltarasenko opened this issue Dec 29, 2019 · 2 comments
Closed
Labels
good first issue Good for newcomers

Comments

@oltarasenko
Copy link
Collaborator

Based on:
https://elixirforum.com/t/web-scraping-tools/4823/31

Stop parsing each page four times.

When you run response.body |> Floki.find(...), you’re really running the equivalent of response.body |> Floki.parse() |> Floki.find(...) which means your four Floki.finds are parsing the whole document four times.

Instead, try parsed_body = Floki.parse(response.body) then parsed_body |> Floki.find(...).```
@oltarasenko oltarasenko added the good first issue Good for newcomers label Dec 29, 2019
@philss
Copy link

philss commented Jan 5, 2020

For that I recommend that we start to use the newer functions Floki.parse_document/1 or Floki.parse_fragment/1.

They are functions that can return errors in case of something goes wrong with the parsing. Also, Floki.parse/1 is deprecated now.

TLDR:

with {:ok, parsed_body} <- Floki.parse_document(response.body) do
  # Floki.find(parsed_body, "a.foo") or something similar
end

@oltarasenko
Copy link
Collaborator Author

I think this issue is done now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

2 participants