Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

settings per deploy target #388

Closed
wants to merge 4 commits into from
Closed

settings per deploy target #388

wants to merge 4 commits into from

Conversation

plainas
Copy link

@plainas plainas commented Sep 16, 2013

This adds the possibility to define settings per target on the scrapy.cfg.

If you run several scrapyD instances, you can define different settings module for each of them in your scrapy project. This is useful test things in a development/testing environment before sending them to a production environment.

My scrapy.cfg looks like this:

[deploy:live]
url = http://live.mysite.com:6800/
project = myproj
settings = myproj.settings_live

[deploy:tests]
url = http://tests.mysite.com:6800/
project = myproj
settings = myproj.settings_tests

Cherry pick the change set, only changes in deploy.py are relevant

EDIT: Do note that the deploy command was moved to scrapyd. I found out about this after I created this pull request.

if not os.path.exists('setup.py'):

if os.path.exists('setup.py'):
open('setup.py', 'w').close()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will clear the existing setup.py file. I think it is wrong to clear it, because it is possible that setup.py is not auto-generated.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To generate a different setup.py with different configs is the whole purpose of this pull request. The setup.py IS generated by the deploy command anyway, based on an hardcoded template.

However I notice now that the deploy command has been moved to scrapyd. Any chance this would be integrated in scrapyd?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, setup.py is generated, but it is not overwritten if already present, because user can (and in many cases must) customize it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not aware of what most of the people customize on their setup.py. What do people use it for in this case?

Couldn't they just edit the template in that case? (Honest question)
It is just editing a single python script anyway.

Also related, this pull request adds the possibility of adding a distinct settings module per deploy target, I would guess that would cover many steup.py tuning many people do, but again, I am not sure about what people use it for.

Either way, any suggestion on how this could be added in a more smooth way? I.e. without this impact?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

setup.py is not tied to scrapy/scrapyd; it is just a standard mean of creating setuptools distributions / eggs. Users could customize it to include data files or change the metadata.

Please correct me if I'm wrong, but the template is hardcoded in scrapy/scrapyd code. It is not a good idea to change the installed Scrapy version for multiple reasons: template can be shared with other scrapy projects, scrapy updates will overwrite the changes, these changes won't be in VCS, etc.

Multiple deploy targets feature looks reasonable, but I haven't dug in to check how it iteracts with other scrapy/scrapyd parts - sorry, I don't have suggestions now.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is hardcoded into the source yes. AFAIK the reason why it was put there it was because it provides an easy mean of packing so the project can be deployed. I don't think the need to build eggs arises from anything else. Wouldn't it be for the deploy command, would the template be there at all?
I have yet to see the first scrapy project distributed as package or egg. The documentation only mentions packaging as a necessary step to deployment.

And it also says this:

The simplest way to deploy your project is by using the deploy command, which automates the process of building the egg uploading it using the Scrapyd HTTP JSON API.

It is not a good idea to change the installed Scrapy version for multiple reasons: template can be shared with other scrapy projects, scrapy updates will overwrite the changes, these changes won't be in VCS, etc.

Ok, here I can tell your for a fact that you got confused. The template is not changed as you can see in the pull request. What is changed is the settings definitions passed to it. There is no possible way project could impact another project regardless this changes are used or not.
Also, the setup.py we are talking about is is the one of the project, not scrapy's setup.py. Which I think nobody really cares about or ever used manually. Specially if it is under VCS this won't be a problem.

But we are discussing over a deprecated implementation, ideally this should be integrated with new scrapyd-deploy. This problem is possibly non-existent (I haven't checked).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some use cases for custom setup.py:

  • Add data files to the project. For example, this could be some supporting data files that spiders require, or pickled scikit-learn models for data extraction or classification, or even read-only sqlite database. They go under "package_data" setup.py key.
  • Custom scripts that should be available on scrapyd server; they go under "scripts" setup key.
  • Excluding some files from distribution; they can be excluded e.g. using 'exclude' argument to find_packages.

This is not an evidence, but all scrapy projects I touched last month had setup.py customized for one or more reasons from the above, and there could be reasons for using custom setup.py other than those 3. I see that this feature (custom setup.py files) is not widely used everywhere, but at least in some environments developers rely on it, so we can't just remove it without providing a (better) alternative.

Re template changes: you're right that templates are not changed in this PR, but I was answering your question "Couldn't they just edit the template in that case?" - templates are in scrapy so changing them requires changing installed scrapy.

@nramirezuy
Copy link
Contributor

The thing here is why you need different settings in the same Scrapy project?

@plainas
Copy link
Author

plainas commented Sep 16, 2013

nramirezuy, it is clearly stated in the pull request:

This is useful test things in a development/testing environment before sending them to a production environment.

The whole point with deploy is precisely to send the project to a target environment. If you can't cherry pick the settings, then the whole concept of scrapyd is not very useful, you could just use scrapy's crawl command.
Scrapy's documentation clearly mentions a trick to use different settings when testing locally and when deploying to scrapyd, but that is very limiting, you have no access to all scrapyd offers. You are also limited to two different settings, what if you need to test your scraping code against many different settings?

I understand that most of the people creates ad hoc scrapers to crawl a handful of domains, they have no need for such feature. I crawl 15.000+ sites continusouly. I need to really put the pedal to the metal, being able to deploy to multiple environments with fine grained control of settings is a crucial need for me. Could be a nice feature for many people I would guess.

@pablohoffman
Copy link
Member

A couple of comments:

  1. I like the proposed functionality change, it's something that we discussed internally some time ago and I'm happy to see it proposed by someone from the community.
  2. "scrapy deploy" is indeed deprecated and the changes should be done on scrapyd-deploy (please submit a pull request on scrapyd project). Also, scrapyd-deploy still lacks this feature.
  3. @kmike is right in that the scrapy project setup.py must not be overwritten. Even though packaging here is a mechanism for deployment, project must be able to customize how packaging is done (by writing their own setup.py) for example when you need to deploy resources/static files.

My best suggestion here to implement this functionality is to modify the generated egg changing the scrapy settings entry_point to the desired value. The egg is a zip file and the entry point is in the file EGG-INFO/entry_points.txt, which has something like this:

[scrapy]
settings = testspiders.settings

So we need to open the egg archive, modify that file and save it back. This can all be done with the Python zipfile module, I'd expect. I know, it's ugly but so are eggs, and python packaging in general (according to common consensus).

@plainas
Copy link
Author

plainas commented Sep 17, 2013

Maybe suffix the filename with the environment label. So we would have setup.py, setup_live.py, setup_tests.py. Should work.

@pablohoffman
Copy link
Member

@plainas your suggestion sounds good but it doesn't support projects that have their own setup.py. How about my suggestion of modifying EGG-INFO/entry_points.txt post egg generation?

@plainas
Copy link
Author

plainas commented Sep 18, 2013

Sounds a little bit tricky and potentially error prone if I should be honest.
If the egg surgery goes wrong, then we risk deploying the wrong settings in the target. This is probably me being over cautious.
On the other hand, if the existent [modified] setup.py is to be honoured in all depoys (regardless the target) your suggestion might be the only possible one.

The strategy used in this diff works for me and it is what I am using. I like it because it is simple (notice the tiny amount of code that I needed). I maintain my own forks of scrapy and scrapyd which I unfortunately cannot release due to company's privacy policies. I was hopping I could push at least some of the things I've been working on but as I notice now, my branch and scrapyd's master have diverged considerably.

@nramirezuy
Copy link
Contributor

@plainas what about something like the approach I used here #293 ?

@Digenis
Copy link
Member

Digenis commented Jan 5, 2017

@plainas, this part of scrapy is moved to scrapyd-client
If you are still interested, you can open a new PR there.

I think we can close this.

@redapple
Copy link
Contributor

redapple commented Jan 5, 2017

Right @Digenis , this is pretty old. And belongs to scrapyd-client indeed.

@redapple redapple closed this Jan 5, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants