Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+1] Make data_path work when outside project (used by HttpCacheMiddleware and Deltafetch plugin) #1581

Merged
merged 4 commits into from Sep 30, 2016
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
14 changes: 11 additions & 3 deletions scrapy/utils/project.py
Expand Up @@ -12,6 +12,7 @@
ENVVAR = 'SCRAPY_SETTINGS_MODULE'
DATADIR_CFG_SECTION = 'datadir'


def inside_project():
scrapy_module = os.environ.get('SCRAPY_SETTINGS_MODULE')
if scrapy_module is not None:
Expand All @@ -23,6 +24,7 @@ def inside_project():
return True
return bool(closest_scrapy_cfg())


def project_data_dir(project='default'):
"""Return the current project data dir, creating it if it doesn't exist"""
if not inside_project():
Expand All @@ -39,16 +41,22 @@ def project_data_dir(project='default'):
os.makedirs(d)
return d


def data_path(path, createdir=False):
"""If path is relative, return the given path inside the project data dir,
otherwise return the path unmodified
"""
Return the given path joined with the .scrapy data directory.
If given an absolute path, return it unmodified.
"""
if not isabs(path):
path = join(project_data_dir(), path)
if inside_project():
path = join(project_data_dir(), path)
else:
path = join('.scrapy', path)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so, before it raised NotConfigured and now it creates a directory in CWD/.scrapy/<path>.

And the goal is to allow httpcache and deltafetch (and similars) to run even outside a scrapy project.

It is a bit backwards incompatible, scripts running in readonly filesystem now will fail because directory can't be created.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, it's indeed a bit backwards incompatible, there is a PR to mention it in the release notes: #2298

About the case of scripts running in readonly filesystem, is that a huge concern?
It looks minor to me, because for the breaking change to happen the script would need to be 1) running scrapy outside a project and 2) using the settings to enabling http cache or some other plugin that would try to use the data dir (in which case, the expected behavior would still be for the settings to be used and any existing data previously stored in .scrapy to be used).

So, still a bug fix, don't you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point. Looks good then.

if createdir and not exists(path):
os.makedirs(path)
return path


def get_project_settings():
if ENVVAR not in os.environ:
project = os.environ.get('SCRAPY_PROJECT', 'default')
Expand Down
34 changes: 34 additions & 0 deletions tests/test_utils_project.py
@@ -0,0 +1,34 @@
import unittest
import os
import tempfile
import shutil
import contextlib
from scrapy.utils.project import data_path


@contextlib.contextmanager
def inside_a_project():
prev_dir = os.getcwd()
project_dir = tempfile.mkdtemp()

try:
os.chdir(project_dir)
with open('scrapy.cfg', 'w') as f:
# create an empty scrapy.cfg
f.close()

yield project_dir
finally:
os.chdir(prev_dir)
shutil.rmtree(project_dir)


class ProjectUtilsTest(unittest.TestCase):
def test_data_path_outside_project(self):
self.assertEquals('.scrapy/somepath', data_path('somepath'))

def test_data_path_inside_project(self):
with inside_a_project() as proj_path:
expected = os.path.join(proj_path, '.scrapy', 'somepath')
self.assertEquals(expected, data_path('somepath'))
self.assertEquals('/absolute/path', data_path('/absolute/path'))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is the test for abs path inside project only?

If it makes a difference then it should be tested out and inside.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, lemme add it to the test outside project as well!