-
Notifications
You must be signed in to change notification settings - Fork 10.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve cookie handling #5463
Open
farsene
wants to merge
22
commits into
scrapy:master
Choose a base branch
from
OrestisKan:improve-cookie-handling
base: master
Could not load branches
Branch not found: {{ refName }}
Could not load tags
Nothing to show
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Improve cookie handling #5463
Changes from all commits
Commits
Show all changes
22 commits
Select commit
Hold shift + click to select a range
b104c98
Create AccessCookiesMiddleware that passes the cookie jar to the spid…
OrestisKan 87deaac
Remove singular get cookie method
OrestisKan 0f22239
Add persistence of cookies when spider is closed
farsene 417cfe3
Add tests for storage
farsene 606cd3f
Merge pull request #3 from OrestisKan/cookies_storage
OrestisKan 2e12447
updating branch
764cc71
fixed an issue w regards to the return of get cookies at spider, seei…
farsene ab063fd
temp
b60cb1e
to cookies
farsene 6429e8c
changed get_cookies from from_dict to iter
9067f2f
deleted extra methods
16e4c86
change innit method, testing for accesscookiemiddleware working for a…
334a315
final cookie test working
3ee3327
remove prints
1f5db64
Revert "to cookies"
farsene e2f90e8
Revert changes to conftest
farsene 85b3ea7
Docs spiders.rst
TBG1998 83d07cd
Docs downloader-middleware.rst update
TBG1998 5b3fa25
Add documentation storage.rst
TBG1998 2c9c58a
update Index.rst to include Storage
TBG1998 4608083
Fix storage.rst
TBG1998 3f279e8
Fix 2 storage.rst
TBG1998 File filter
Filter by extension
Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,49 @@ | ||
.. _topics-storage: | ||
|
||
======= | ||
Storage | ||
======= | ||
|
||
The storage functionality can be used to store information locally or globally based on the events of opening and closing a spider. The original purpose of having the storage functionality is to be able to handle the cookie storage across spiders, however this can be extended for other purposes. | ||
|
||
.. _topics-base-storage: | ||
|
||
BaseStorage | ||
=========== | ||
|
||
BaseStorage is the interface of the storage class that defines how an implemented storage should behave. The main methods are the following: | ||
|
||
.. method:: open_spider(spdr) | ||
|
||
This method is called upon the event of a spider being opened. | ||
|
||
:param spider: the spider that is being opened | ||
:type spider: :class:`~scrapy.Spider` object | ||
|
||
.. method:: close_spider(spdr) | ||
|
||
This method is called upon the event of a spider being closed. | ||
|
||
:param spider: the spider that is being closed | ||
:type spider: :class:`~scrapy.Spider` object | ||
|
||
.. _topics-in-memory-storage: | ||
|
||
InMemoryStorage | ||
=============== | ||
|
||
The InMemoryStorage is designed to allow the storage of cookies on a local file. If the COOKIES_PERSISTENCE constant is set to true in the settings of the project, the cookies are saved to a file and loaded from it on demand. | ||
|
||
.. method:: open_spider(spider) | ||
|
||
This method is called upon the event of a spider being opened. When the spider is opened, the cookies are loaded from the file, if they were saved there by a spider from a previous crawling session. | ||
|
||
:param spider: the spider that is being opened | ||
:type spider: :class:`~scrapy.Spider` object | ||
|
||
.. method:: close_spider(spider) | ||
|
||
This method is called upon the event of a spider being closed. When the spider is closed, the cookies are saved to the file in order to allow another spider to reuse those existing cookies at a later point in time. | ||
|
||
:param spider: the spider that is being closed | ||
:type spider: :class:`~scrapy.Spider` object |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
from collections.abc import MutableMapping | ||
|
||
from scrapy.spiders import Spider | ||
|
||
|
||
class BaseStorage(MutableMapping): | ||
name = None | ||
|
||
def __init__(self, settings): | ||
self.settings = settings | ||
|
||
@classmethod | ||
def from_middleware(cls, middleware): | ||
obj = cls(middleware.settings) | ||
return obj | ||
|
||
def open_spider(self, spider: Spider): | ||
pass | ||
|
||
def close_spider(self, spider: Spider): | ||
pass | ||
|
||
def __delitem__(self, v): | ||
pass | ||
|
||
def __getitem__(self, k): | ||
pass | ||
|
||
def __iter__(self): | ||
pass | ||
|
||
def __len__(self): | ||
pass | ||
|
||
def __setitem__(self, k, v): | ||
pass |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
import io | ||
import logging | ||
import os | ||
import pickle | ||
from collections import UserDict | ||
from typing import Dict | ||
|
||
from scrapy.http.cookies import CookieJar | ||
from scrapy.spiders import Spider | ||
from scrapy.storage import BaseStorage | ||
from scrapy.utils.project import data_path | ||
|
||
logger = logging.getLogger(__name__) | ||
|
||
|
||
class InMemoryStorage(UserDict, BaseStorage): | ||
def __init__(self, settings): | ||
super(InMemoryStorage, self).__init__() | ||
self.settings = settings | ||
self.cookies_dir = data_path(settings["COOKIES_PERSISTENCE_DIR"]) | ||
|
||
def open_spider(self, spider): | ||
if not self.settings["COOKIES_PERSISTENCE"]: | ||
return | ||
if not os.path.exists(self.cookies_dir): | ||
return | ||
with io.open(self.cookies_dir, "br") as f: | ||
self.data: Dict = pickle.load(f) | ||
|
||
def close_spider(self, spider): | ||
if self.settings["COOKIES_PERSISTENCE"]: | ||
with io.open(self.cookies_dir, "bw+") as f: | ||
pickle.dump(self.data, f) | ||
|
||
def __missing__(self, key) -> CookieJar: | ||
self.data.update({key: CookieJar()}) | ||
return self.data[key] |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a big API change, which should probably be discussed in #1878 before working on an implementation.