Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add instructions about how to define output file when running scrapy … #3882

Merged
merged 3 commits into from Jul 18, 2019
Merged

add instructions about how to define output file when running scrapy … #3882

merged 3 commits into from Jul 18, 2019

Conversation

@lena-kuhn
Copy link
Contributor

@lena-kuhn lena-kuhn commented Jul 16, 2019

…from script instead of cmd

The following stackoverflow questions show, that it is unclear how to define the feed format and name within a script. Equivalent to "-o" flag when starting crawler process within cmd.
https://stackoverflow.com/questions/43468275/running-scrapy-from-a-script-with-file-output/43468852
https://stackoverflow.com/questions/23574636/scrapy-from-script-output-in-json

@@ -35,12 +35,17 @@ Here's an example showing how to run a single spider with it.
...

process = CrawlerProcess({
'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)'
'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)',
Copy link
Member

@Gallaecio Gallaecio Jul 16, 2019

What about removing USER_AGENT as part of these changes? I believe the point of USER_AGENT was make it obvious that the first parameter of CrawlerProcess is a dictionary of settings, which the new settings already accomplish.

Copy link
Contributor Author

@lena-kuhn lena-kuhn Jul 17, 2019

maybe we can add CrawlerProcess(settings={}) so that it's totally clear.

docs/topics/practices.rst Outdated Show resolved Hide resolved
Copy link
Contributor Author

@lena-kuhn lena-kuhn left a comment

I totally agree with you, telling that you can define any setting within a dictionary as a parameter of CrawlerProcess should be enough. There shouldn't be an extra emphasis on any specific setting. Shall I make a new pull request or can you mark it as "needs work"?

@@ -35,12 +35,17 @@ Here's an example showing how to run a single spider with it.
...

process = CrawlerProcess({
'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)'
'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)',
Copy link
Contributor Author

@lena-kuhn lena-kuhn Jul 17, 2019

maybe we can add CrawlerProcess(settings={}) so that it's totally clear.

@Gallaecio
Copy link
Member

@Gallaecio Gallaecio commented Jul 17, 2019

Shall I make a new pull request or can you mark it as "needs work"?

Please use the same pull request.

You can edit the title and prefix it with [WIP] to indicate that is currently work in progress, so that people don’t bother reviewing it until you remote the prefix from the title again.

@lena-kuhn lena-kuhn changed the title add instructions about how to define output file when running scrapy … [WIP] add instructions about how to define output file when running scrapy … Jul 17, 2019
@codecov
Copy link

@codecov codecov bot commented Jul 17, 2019

Codecov Report

Merging #3882 into master will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master    #3882   +/-   ##
=======================================
  Coverage   85.56%   85.56%           
=======================================
  Files         164      164           
  Lines        9551     9551           
  Branches     1431     1431           
=======================================
  Hits         8172     8172           
  Misses       1132     1132           
  Partials      247      247

@codecov
Copy link

@codecov codecov bot commented Jul 17, 2019

Codecov Report

Merging #3882 into master will increase coverage by <.01%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #3882      +/-   ##
==========================================
+ Coverage   85.56%   85.56%   +<.01%     
==========================================
  Files         164      164              
  Lines        9551     9565      +14     
  Branches     1431     1435       +4     
==========================================
+ Hits         8172     8184      +12     
- Misses       1132     1133       +1     
- Partials      247      248       +1
Impacted Files Coverage Δ
scrapy/utils/datatypes.py 60.2% <0%> (-0.01%) ⬇️
scrapy/crawler.py 89.94% <0%> (ø) ⬆️
scrapy/loader/processors.py 100% <0%> (ø) ⬆️
scrapy/_monkeypatches.py 57.14% <0%> (ø) ⬆️
scrapy/utils/conf.py 89.39% <0%> (ø) ⬆️
scrapy/item.py 98.55% <0%> (+0.06%) ⬆️
scrapy/settings/__init__.py 93.1% <0%> (+0.12%) ⬆️

'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)'
process = CrawlerProcess(settings={
'FEED_FORMAT':'json',
'FEED_URI':'items.json'
Copy link
Member

@kmike kmike Jul 17, 2019

a nitpick: could you please add whitespace after :, as per pep8?

Copy link
Contributor Author

@lena-kuhn lena-kuhn Jul 17, 2019

sure

@lena-kuhn lena-kuhn changed the title [WIP] add instructions about how to define output file when running scrapy … add instructions about how to define output file when running scrapy … Jul 17, 2019
@kmike
Copy link
Member

@kmike kmike commented Jul 18, 2019

Thanks @MagdalenaDeschner, and thanks @Gallaecio for a review!

@kmike kmike merged commit 44eb21a into scrapy:master Jul 18, 2019
3 checks passed
@kmike kmike added this to the v1.7 milestone Jul 18, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants