-
Notifications
You must be signed in to change notification settings - Fork 162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
allow users to pass spider arguments via url #29
Comments
having similar problems |
I managed to achive this in a very easy way. scrapyrt allows to configure the
Then, just configure scrapyrt Could this or something similar be added to the core class? I can do a PR. |
@gdelfresno : hey bro, i'm trying custom crawlresource but my project not change :(. You can show me expamle project using custom CrawlResource? Edit: Yeah, i do pass arguments to the spider by your guide. but i must change lib source in /usr/bin/local :(. I'm trying add to CrawlerResource in spider.py and settings.py but not working |
@pawelmhm : I have problem the same your problem at 1 year ago. After 1 year, you are have solution for problem? I'm a newbie Python and Scrapy, do you can help me? Sr my english bad :( |
hey @dotungvp1994 yeah we'll prioritize this and add it to next release, but cant give you exact ETA yet. We'll need some time to implement it for sure, not sure how much time |
News about that? We are having the same problems. We will try with the @gdelfresno solution may be constructing a docker modified with that solution :) |
@pianista215: tip for problems. You can pass arguments to meta data request and get it in response. |
Edited: Sorry @dotungvp1994 , I'm trying to pass it to the API: curl -XPOST -d '{ "spider_name":"XXX", "start_requests":true, "request":{ "meta": {"lookup_until_date": "23-09-2017" } } }' "http://localhost:9081/crawl.json" >> response But unfortunately, is not on my response.request.meta dictionary in the parse method Am I missing something???? |
Hi @dotungvp1994 , If you are already interested is really easy with @gdelfresno trick. You first modify the file previously commented or if you want, you can use the docker I've already made with the changes: pianista215/scrapyrt:0.10-parameter-patched Now, you have to modify your spider, to get the parameters from kwargs in init :
Now you have your lookup_until_date populated if you invoke to Scrapyrt doing: |
@pianista215 : Yeah, i know i will to try @gdelfresno trick and working but i thought pass arguments to meta data is the same. |
@pawelmhm Implemented here gdelfresno@ee3be05 Do you want me to open a pull request? |
sure open PR let's discuss this @gdelfresno |
I have tested the pull request and it is working well, I think should be merged. |
any news? |
In my case, I just used the "meta" field as suggested earlier in this thread |
working well for me, |
@janceChun @shadiakiki1986 @dotungvp1994 I couldnt access to meta params from the spider. How did you do it exactly? I want to use that meta params to use it in the spider request form. Thanks! |
@shadiakiki1986 Thanks. I finally patched the file with @gdelfresno code, my problem was I didnt know where was resources.py In my case was stored in : :/usr/local/lib/python3.5/dist-packages/scrapyrt/resources.py After that I could access to the variable with self.param |
any update on merging #72 ? |
@TapanInexture this is going to be part of next release. Probably will be released this month. |
There are two complications here. One is that arguments can override spider methods, and someone could crash your spider by passing bad argument. See this Scrapy issue scrapy/scrapy#1633, for example passing argument "start_requests" will break spider. So we should validate arguments. Other thing is that it seems better to isolate spider_arguments and make them JSON. For example yo could pass: http://localhost/crawl.json?url=http://aa.com&spider_arguments=%7B%22zipcode%22%3A%20%2214001%22%7D where spider arguments is: %7B%22zipcode%22%3A%20%2214001%22%7D which is urlencoded {"zipcode": "14001"}. This way you will be able to pass any object as argument, it could be dictionary, list etc. It will be more flexible and it wont collide with same name api parameters and request arguments. E.g. someone could pass dont_filter spider argument but it could collide with dont_filter Request argument, and it would cause trouble. I'm implementing this on branch, will create PR soon. |
Hi, once you pass in these arguments, how do you retrieve them in your script? Sorry, I'm attempting this first time here. |
They are available as spider attributes @doverradio I'll add this to documentation, e.g. you pass argument zipcode should be available in spider as self.zipcode, e.g. spider code def parse_xxx(self, response):
print('zipcode is ' + self.zipcode) crawl args need to be passed as json |
closed via #120 |
When running Scrapy from command line you can do:
but this is NOT possible with ScrapyRT now. You cannot pass arguments for spiders, you can only pass arguments for request. Adding support for "command_line" arguments is not difficult to implement and seems important IMO.
You could simply pass
EDIT:
clarify we're talking about passing arguments to API via url
The text was updated successfully, but these errors were encountered: