Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to crawl https site #41

Closed
ArsenShnurkov opened this issue Nov 15, 2014 · 2 comments
Closed

Unable to crawl https site #41

ArsenShnurkov opened this issue Nov 15, 2014 · 2 comments

Comments

@ArsenShnurkov
Copy link

I am trying to run Abot.Demo on
https://focus.kontur.ru

I added site certificate with
yes | certmgr -ssl -v https://focus.kontur.ru

Abot.Demo program gives me "Max. redirections exceeded." exception.
and the following line in log:
[2014-11-15 08:24:41,678] [1] [INFO ] - Page crawl complete, Status:[302] Url:[https://focus.kontur.ru/] Parent:[https://focus.kontur.ru/] - [AbotLogger]

I use mono 3.10.1 on linux

What is the problem, and how to overcome it?

@sjdirect
Copy link
Owner

The siteis replyingwith an http 302 redirect on every request. I would bet
that site requires a cookie to be sent with each request. If the cookie is
not sent it likely redirects over and over until it reaches the max number
of redirects which is 7 by default.

  1. Extend the Abot.Core.PageRequester class and override the MakeRequest()
    method

  2. Copy the overridden code as the starting point for your new impl of
    MakeRequest()

  3. After you receive the first response from that site, you need to save
    the cookie that you get back and resend it with all future requests in the
    overridden MakeRequest method.

Good luck
Steven

I am trying to run Abot.Demo on
https://focus.kontur.ru

I added site certificate with
yes | certmgr -ssl -v https://focus.kontur.ru

Abot.Demo program gives me "Max. redirections exceeded." exception.
and the following line in log:
[2014-11-15 08:24:41,678] [1] [INFO ] - Page crawl complete, Status:[302]
Url:[https://focus.kontur.ru/] Parent:[https://focus.kontur.ru/] -
[AbotLogger]

What is the problem, and how to overcome it?


Reply to this email directly or view it on GitHub
#41.

@ArsenShnurkov
Copy link
Author

Yes, works this way.

See https://github.com/sjdirect/abot/pull/42/files

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants