Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cookies from the Cookie request header are not processed #1992

Open
exotfboy opened this issue May 16, 2016 · 5 comments · Fixed by #2400 · May be fixed by #4812
Open

Cookies from the Cookie request header are not processed #1992

exotfboy opened this issue May 16, 2016 · 5 comments · Fixed by #2400 · May be fixed by #4812

Comments

@exotfboy
Copy link

exotfboy commented May 16, 2016

I am new in scrapy, and I meet some problems which I can not get answer from google, so I post it here:

1 Cookie not work even set in DEFAULT_REQUEST_HEADERS:

DEFAULT_REQUEST_HEADERS = {
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
    'accept-encoding': 'gzip, deflate, sdch',
    'cache-control': 'no-cache',
    'cookie': 'xx=yy',
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.94 Safari/537.36'
}
class MySpider(scrapy.Spider):
    def make_requests_from_url(self, url):
        return scrapy.http.Request(url, headers=DEFAULT_REQUEST_HEADERS)

I know the make_requests_from_url will only called once for the start_urls, and in my opinion, the first request will send the cookie I set in the DEFAULT_REQUEST_HEADERS, however it does not.

2 Share settings between spiders.

I have multiple spiders in the project which share most of the settings like RandomAgentMiddleware RandomProxyMiddleware UserAgent DEFAULT_REQUEST_HEADERS and etc, however they are configured inside the settings.py for each spider.

Is it possible to share these settings?


The
COOKIES_ENABLED is set to true.

@BruceDone
Copy link

BruceDone commented May 16, 2016

how about the COOKIES_ENABLED from the setting.py ? did you set it to False ?

@kmike
Copy link
Member

kmike commented May 16, 2016

So the 1st issue is that CookiesMiddleware sets Cookie header even if cookiejar is empty, or, broadly speaking, that it discards Cookie header set on a request instead of adding to it. This happens here. I think this is a valid concern. A pull request to fix that is welcome.

Sorry, I don't get the second issue. All settings defined in settings.py are shared between spiders, you can't configure per-spider settings in settings.py file. What do you mean?

@elacuesta
Copy link
Member

elacuesta commented Nov 17, 2016

A question about priorities: when creating a Request, if a name is specified both directly as part of headers['Cookie'] and as a value in the cookies argument, which one should be used? I feel tempted to keep the one in cookies, but that's just my opinion.

@kmike
Copy link
Member

kmike commented Nov 21, 2016

@elacuesta yeah, I agree that using value set in cookies argument makes more sense in this case.

@Gallaecio
Copy link
Member

Gallaecio commented Oct 8, 2020

Reopening as per #4823

@Gallaecio Gallaecio reopened this Oct 8, 2020
@Gallaecio Gallaecio linked a pull request Oct 8, 2020 that will close this issue
@elacuesta elacuesta changed the title DEFAULT_REQUEST_HEADERS not work as expected Cookies from the Cookie request header are not processed Oct 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants