New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestions for Learning to Crawl #31

Closed
mischov opened this Issue Oct 11, 2017 · 0 comments

Comments

Projects
None yet
2 participants
@mischov

mischov commented Oct 11, 2017

I will make code suggestions to the appropriate repo- these are just suggestions and fixes that might be made to the blog post itself.

1) Add a section about dependencies

You show a bit of the process for getting started with the project when you show mix new hello_crawler but then you ignore how dependencies get added. A small section showing the update to mix.exs and discussing the purpose of each dependency would both set the context nicely and help keep new Elixirists who are trying to follow along from wondering why they can't use HTTPoison or Floki.

2) Fix the with code example

with {:ok, %{body: body}} <- HTTPoison.get(url, [], [follow_redirect: true]),
     tags                 <- Floki.find(body, "a"),
     hrefs                <- Floki.attribute(tags, "href") do
  [url | body
         |> Floki.find("a")
         |> Floki.attribute("href")]
else
  _ -> [url]
end

The code in the above snippet duplicates the Floki work- a better example would be

with {:ok, %{body: body}} <- HTTPoison.get(url, [], [follow_redirect: true]),
     tags                 <- Floki.find(body, "a"),
     hrefs                <- Floki.attribute(tags, "href") do
  [url | hrefs]
else
  _ -> [url]
end

Also, and I'll address this point further in the code suggestions, just because HTTPoison.get returns {:ok, ...} doesn't mean that you fetched the page you wanted... it can return {:ok, %{status_code: 404, ...}} that will still be a failure.

@pcorey pcorey closed this Oct 12, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment