Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Protocol error #238

Closed
assertnotnull opened this issue Feb 24, 2023 · 7 comments
Closed

Protocol error #238

assertnotnull opened this issue Feb 24, 2023 · 7 comments

Comments

@assertnotnull
Copy link

I have followed the docs and wrote a simple spider but running it gives me a protocol error.
Elixir 1.14
Erlang 24
Crawly 0.14

defmodule BasicSpider do
  use Crawly.Spider

  @impl Crawly.Spider
  def base_url do
    "https://www.metal-archives.com"
  end

  @impl Crawly.Spider
  def init() do
    [
      start_urls: [
        "https://www.metal-archives.com/bands/Judas_Priest/97"
      ]
    ]
  end

  @impl Crawly.Spider
  def parse_item(response) do
    {:ok, document} = Floki.parse_document(response.body)
    IO.inspect(document)

    items =
      document
      |> Floki.find("#band_content")
      |> Enum.map(fn x ->
        %{
          name: Floki.find(x, ".band_name") |> Floki.text()
        }
      end)

    IO.inspect(items)

    %Crawly.ParsedItem{items: items, requests: []}
  end
end

Error:

** (Protocol.UndefinedError) protocol String.Chars not implemented for %Crawly.Request{url: "https://www.metal-archives.com/bands/Judas_Priest/97", headers: [{"User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36"}], prev_response: nil, options: [], middlewares: [{Crawly.Middlewares.UserAgent, [user_agents: ["Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36"]]}, {Crawly.Pipelines.WriteToFile, [folder: "./tmp", extension: "jl"]}], retries: 0} of type Crawly.Request (a struct). This protocol is implemented for the following type(s): Atom, BitString, Date, DateTime, Decimal, Float, Floki.Selector, Floki.Selector.AttributeSelector, Floki.Selector.Combinator, Floki.Selector.Functional, Floki.Selector.PseudoClass, Hex.Solver.Assignment, Hex.Solver.Constraints.Empty, Hex.Solver.Constraints.Range, Hex.Solver.Constraints.Union, Hex.Solver.Incompatibility, Hex.Solver.PackageRange, Hex.Solver.Term, Integer, List, NaiveDateTime, Phoenix.LiveComponent.CID, Postgrex.Copy, Postgrex.Query, Time, URI, Version, Version.Requirement
@oltarasenko
Copy link
Collaborator

Hey @assertnotnull the code above seem to work fine for me:

iex(2)>
10:45:07.962 [warning] Description: 'Server authenticity is not verified since certificate path validation is not enabled'
     Reason: 'The option {verify, verify_peer} and one of the options \'cacertfile\' or \'cacerts\' are required to enable this.'

[
  %{
    name: "Judas Priest",
    url: "https://www.metal-archives.com/bands/Judas_Priest/97"
  }
]

10:45:09.800 [debug] Stored item: %{name: "Judas Priest", url: "https://www.metal-archives.com/bands/Judas_Priest/97"}

Could you provide a bit more info about the case? To me it looks like you have a problem with one of inspects in the code above.

1 similar comment
@oltarasenko
Copy link
Collaborator

Hey @assertnotnull the code above seem to work fine for me:

iex(2)>
10:45:07.962 [warning] Description: 'Server authenticity is not verified since certificate path validation is not enabled'
     Reason: 'The option {verify, verify_peer} and one of the options \'cacertfile\' or \'cacerts\' are required to enable this.'

[
  %{
    name: "Judas Priest",
    url: "https://www.metal-archives.com/bands/Judas_Priest/97"
  }
]

10:45:09.800 [debug] Stored item: %{name: "Judas Priest", url: "https://www.metal-archives.com/bands/Judas_Priest/97"}

Could you provide a bit more info about the case? To me it looks like you have a problem with one of inspects in the code above.

@zongwu233
Copy link

I have the same problem.

Elixir 1.14.3 (compiled with Erlang/OTP 25)
Crawly 0.14

@zongwu233
Copy link

I guess I found the reason. I checked the code of version 0.14 that the local project depends on, and
It is strange that there is no below implementation code in the Request :

defimpl String.Chars, for: Crawly.Request do
  def to_string(s) do
    inspect(s)
  end
end

But the master branch have it.
Is there a problem with the release of 0.14 code?

@zongwu233
Copy link

Yeah, I directly dependet on the master branch in mix.exs, and the error disappears.

@oltarasenko
Copy link
Collaborator

Strange. @zongwu233, as I see that code was added 6 months ago. In any case, I am preparing the 0.15.0, so hopefully it will disappear soon.

1 similar comment
@oltarasenko
Copy link
Collaborator

Strange. @zongwu233, as I see that code was added 6 months ago. In any case, I am preparing the 0.15.0, so hopefully it will disappear soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants