Update tutorials so Floki does not parse documents multiple times #42

oltarasenko · 2019-12-29T20:45:34Z

Based on:
https://elixirforum.com/t/web-scraping-tools/4823/31

Stop parsing each page four times.

When you run response.body |> Floki.find(...), you’re really running the equivalent of response.body |> Floki.parse() |> Floki.find(...) which means your four Floki.finds are parsing the whole document four times.

Instead, try parsed_body = Floki.parse(response.body) then parsed_body |> Floki.find(...).```

The text was updated successfully, but these errors were encountered:

philss · 2020-01-05T18:48:13Z

For that I recommend that we start to use the newer functions Floki.parse_document/1 or Floki.parse_fragment/1.

They are functions that can return errors in case of something goes wrong with the parsing. Also, Floki.parse/1 is deprecated now.

TLDR:

with {:ok, parsed_body} <- Floki.parse_document(response.body) do
  # Floki.find(parsed_body, "a.foo") or something similar
end

oltarasenko · 2020-04-14T13:25:50Z

I think this issue is done now

oltarasenko added the good first issue Good for newcomers label Dec 29, 2019

oltarasenko closed this as completed Apr 14, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update tutorials so Floki does not parse documents multiple times #42

Update tutorials so Floki does not parse documents multiple times #42

oltarasenko commented Dec 29, 2019

philss commented Jan 5, 2020

oltarasenko commented Apr 14, 2020

Update tutorials so Floki does not parse documents multiple times #42

Update tutorials so Floki does not parse documents multiple times #42

Comments

oltarasenko commented Dec 29, 2019

philss commented Jan 5, 2020

oltarasenko commented Apr 14, 2020