Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setup default settings when we run a spider #55

Closed
botzill opened this issue Jun 20, 2019 · 15 comments
Closed

Setup default settings when we run a spider #55

botzill opened this issue Jun 20, 2019 · 15 comments
Assignees
Labels
question Further information is requested suggestion Suggestion for a better experience
Milestone

Comments

@botzill
Copy link

botzill commented Jun 20, 2019

I was wondering how easy would be to configure default settings to show when we run a spider?

Thx.

@botzill botzill added the bug Something isn't working label Jun 20, 2019
@botzill
Copy link
Author

botzill commented Jun 20, 2019

Yes, for sure not a bug :), just asking;.

@my8100
Copy link
Owner

my8100 commented Jun 20, 2019

Do you mean customizing the default value of the textarea below?

image

@my8100 my8100 added question Further information is requested and removed bug Something isn't working labels Jun 20, 2019
@botzill
Copy link
Author

botzill commented Jun 20, 2019

I mean

Screen Shot 2019-06-20 at 17 14 12

Here, set there some custom one which we can define in settings.py, like make it configurable which one to show here. It's harder to setup in that textarea but here would be nice. Plus, some dropdown as well to be possible.

Thx.

@my8100
Copy link
Owner

my8100 commented Jun 20, 2019

For now, you can manually modify the code below like this.
I would consider making it configurable in the configuration file.
Thanks for your suggestion.

 self.kwargs.setdefault('USER_AGENT', 'Chrome')  # Chrome|iPhone|iPad|Android 
 self.kwargs.setdefault('ROBOTSTXT_OBEY', 'False') 
 self.kwargs.setdefault('COOKIES_ENABLED', 'False') 
 self.kwargs.setdefault('CONCURRENT_REQUESTS', '16') 
 self.kwargs.setdefault('DOWNLOAD_DELAY', '0') 
 _additional = "-d setting=CLOSESPIDER_TIMEOUT=60\r\n-d setting=CLOSESPIDER_PAGECOUNT=10\r\n-d arg1=val1" 
 self.kwargs.setdefault('additional', _additional) 

self.kwargs.setdefault('USER_AGENT', '') # Chrome|iPhone|iPad|Android
self.kwargs.setdefault('ROBOTSTXT_OBEY', '')
self.kwargs.setdefault('COOKIES_ENABLED', '')
self.kwargs.setdefault('CONCURRENT_REQUESTS', '')
self.kwargs.setdefault('DOWNLOAD_DELAY', '')
_additional = "-d setting=CLOSESPIDER_TIMEOUT=60\r\n-d setting=CLOSESPIDER_PAGECOUNT=10\r\n-d arg1=val1"
self.kwargs.setdefault('additional', _additional)

@my8100 my8100 added the suggestion Suggestion for a better experience label Jun 20, 2019
@botzill
Copy link
Author

botzill commented Jun 20, 2019

I see, thx @my8100! Would be really useful to have this configurable.

@my8100 my8100 closed this as completed in 30b39b7 Jun 22, 2019
@my8100
Copy link
Owner

my8100 commented Jun 22, 2019

@botzill

  1. pip install -U git+https://github.com/my8100/scrapydweb.git
  2. Update the existing config file with the options below:
    ############################## Run Spider #####################################
    # The default is False, set it to True to automatically
    # expand the 'settings & arguments' section in the Run Spider page.
    SCHEDULE_EXPAND_SETTINGS_ARGUMENTS = False
    # The default is 'Mozilla/5.0', set it a non-empty string to customize the default value of `custom`
    # in the drop-down list of `USER_AGENT`.
    SCHEDULE_CUSTOM_USER_AGENT = 'Mozilla/5.0'
    # The default is None, set it to any value of ['custom', 'Chrome', 'iPhone', 'iPad', 'Android']
    # to customize the default value of `USER_AGENT`.
    SCHEDULE_USER_AGENT = None
    # The default is None, set it to True or False to customize the default value of `ROBOTSTXT_OBEY`.
    SCHEDULE_ROBOTSTXT_OBEY = None
    # The default is None, set it to True or False to customize the default value of `COOKIES_ENABLED`.
    SCHEDULE_COOKIES_ENABLED = None
    # The default is None, set it to a non-negative integer to customize the default value of `CONCURRENT_REQUESTS`.
    SCHEDULE_CONCURRENT_REQUESTS = None
    # The default is None, set it to a non-negative number to customize the default value of `DOWNLOAD_DELAY`.
    SCHEDULE_DOWNLOAD_DELAY = None
    # The default is "-d setting=CLOSESPIDER_TIMEOUT=60\r\n-d setting=CLOSESPIDER_PAGECOUNT=10\r\n-d arg1=val1",
    # set it to '' or any non-empty string to customize the default value of `additional`.
    # Use '\r\n' as the line separator.
    SCHEDULE_ADDITIONAL = "-d setting=CLOSESPIDER_TIMEOUT=60\r\n-d setting=CLOSESPIDER_PAGECOUNT=10\r\n-d arg1=val1"

@botzill
Copy link
Author

botzill commented Jun 22, 2019

Thx @my8100, very useful!

@botzill
Copy link
Author

botzill commented Jun 22, 2019

So, if I want to define a custom one can I do SCHEDULE_MY_CUSTOM_ONE? Or add in SCHEDULE_ADDITIONAL ?

@my8100
Copy link
Owner

my8100 commented Jun 22, 2019

Customize the SCHEDULE_ADDITIONAL option.

@botzill
Copy link
Author

botzill commented Jun 22, 2019

OK, thx.

@botzill
Copy link
Author

botzill commented Jun 23, 2019

Do you think is possible to add smth like SCHEDULE_MY_CUSTOM_ONE but the same we have SCHEDULE_CONCURRENT_REQUESTS and other?

@my8100
Copy link
Owner

my8100 commented Jun 23, 2019

Which setting options do you use frequently?

@botzill
Copy link
Author

botzill commented Jun 23, 2019 via email

@my8100
Copy link
Owner

my8100 commented Jun 23, 2019

I think the SCHEDULE_ADDITIONAL option is good enough for most cases.

@botzill
Copy link
Author

botzill commented Jun 23, 2019 via email

@my8100 my8100 added this to the 1.3.0 milestone Aug 4, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested suggestion Suggestion for a better experience
Projects
None yet
Development

No branches or pull requests

2 participants