Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unable to deploy with scrapyd-deploy #128

Closed
MihaiCraciun opened this issue Dec 16, 2014 · 40 comments
Closed

unable to deploy with scrapyd-deploy #128

MihaiCraciun opened this issue Dec 16, 2014 · 40 comments

Comments

@MihaiCraciun
Copy link

Hello,

Could you please help me figure out what I'm doing wrong ? Here are the steps:
i followed the portia install manual - all ok
i created a new project, entered an url, tagged an item - all ok
clicked "continue browsing", browsed through site, items were being extracted as expected - all ok

Next i wanted to deploy my spider:
1st try : i tried to run, as the docs specified, scrapyd-deploy your_scrapyd_target -p project_name - got error - scrapyd wasn't installed
fix: pip install scrapyd
2nd try : i launched scrapyd server (also missing from the docs), accessed http://localhost:6800/ -all ok
after a brief reading of scrapyd docs i found out i had to edit the file scrapy.cfg from my project : slyd/data/projects/new_project/scrapy.cfg
added the following :
[deploy:local]
url = http://localhost:6800/

went back to the console, checked all is ok :
$:> scrapyd-deploy -l
local http://localhost:6800/

$:> scrapyd-deploy -L local
default

seemed ok so i gave it another try :
$scrapyd-deploy local -p default
Packing version 1418722113
Deploying to project "default" in http://localhost:6800/addversion.json
Server response (200):
{"status": "error", "message": "IOError: [Errno 21] Is a directory: '/Users/Mihai/Work/www/4ideas/MarketWatcher/portia_tryout/portia/slyd/data/projects/new_project'"}

What am I missing ?

@ruairif
Copy link
Contributor

ruairif commented Dec 16, 2014

I'm not sure exactly. I was able to recreate your issue but once I had resolved it I couldn't recreate it again.
Try reinstalling scrapyd and try to deploy it again and see if that works.

@MihaiCraciun
Copy link
Author

Could you give me a step by step on how I would go about doing that ? I have next to zero experience with python/scrapy and did everything by following tutorials.
I shut down everything : killed twistd process used for running portia, killed scrapyd process.
I did a pip uninstall scrapyd and pip install scrapyd
I ran scrapyd and it started the server ok.
I ran scrapyd-deploy -L local and it returned empty
I ran curl http://localhost:6800/listprojects.json and it returned {"status": "ok", "projects": []}
I tried to run twistd -n slyd and got :
Another twistd server is running, PID 1896

This could either be a previously started instance of your application or a different application entirely.
To start a new one, either run it in some other directory, or use the --pidfile and --logfile parameters to avoid clashes.

@ruairif
Copy link
Contributor

ruairif commented Dec 16, 2014

You probably didn't kill slyd correctly.
Did you try deploying again?

@MihaiCraciun
Copy link
Author

i couldn't deploy because i didn't have the required argument "project name" scrapyd-deploy your_scrapyd_target -p project_name
All i had was the target argument local

@ruairif
Copy link
Contributor

ruairif commented Dec 16, 2014

Is it all sorted now then?

@MihaiCraciun
Copy link
Author

no. I didn't understand what i needed to do and was waiting for some guidance. What should i do / check ?

@ruairif
Copy link
Contributor

ruairif commented Dec 16, 2014

Change directory to the project
Run scrapyd &
Run scrapyd-deploy local -p default

Then it should be available on scrapyd (scrapyd-deploy -L local)

@MihaiCraciun
Copy link
Author

Still not working...
scrapyd-deploy local -p default
Packing version 1418730681
Deploying to project "default" in http://localhost:6800/addversion.json
Server response (200):
{"status": "error", "message": "IOError: [Errno 21] Is a directory: '/Users/Mike/www/portia/slyd/data/projects/new_project'"}
Just so we're clear, i have to run scrapyd without the & sign, correct?

@tpeng
Copy link
Contributor

tpeng commented Dec 16, 2014

@MihaiCraciun scrapyd-deploy local -p default seems wrong to me, the argument after -p should be project name, and in your caseit should be new_project.

also could you please paste your slyd/data/projects/new_project/scrapy.cfg? it can give more information

@MihaiCraciun
Copy link
Author

cat scrapy.cfg

# Automatically created by: slyd

[settings]
default = slybot.settings

[deploy:local]
url = http://localhost:6800/

@MihaiCraciun
Copy link
Author

(venv)192-168-0-197:new_project Mihai$ scrapyd-deploy local -p new_project
Packing version 1418735228
Deploying to project "new_project" in http://localhost:6800/addversion.json
Server response (200):
{"status": "error", "message": "IOError: [Errno 2] No such file or directory: 'slybot-project/project.json'"}

@tpeng
Copy link
Contributor

tpeng commented Dec 16, 2014

that doesn't looks right. could you also paste the slyd/data/projects/new_project/setup.py?

@MihaiCraciun
Copy link
Author

The original file looked like this:

# Automatically created by: slyd

from setuptools import setup, find_packages

setup(
    name         = 'new_project',
    version      = '1.0',
    packages     = find_packages(),
    package_data = {
        'spiders': ['*.json']
    },
    data_files = [('', ['project.json', 'items.json', 'extractors.json'])],
    entry_points = {'scrapy': ['settings = spiders.settings']},
    zip_safe = True
)

But :
Searching on stackoverflow I noticed other people had issues with deployment and one fix mentioned removing build, eggs, project.egg-info and also setup.py. and doing another scrapyd-deploy target -p prject
after doing so, my setup.py looked like so :

# Automatically created by: scrapyd-deploy

from setuptools import setup, find_packages

setup(
    name         = 'project',
    version      = '1.0',
    packages     = find_packages(),
    entry_points = {'scrapy': ['settings = slybot.settings']},
)

Again, i deleted build, eggs and so on, and placed back the original setup.py (firs in this comment). Did a build and got :
scrapyd-deploy local -p new_project
Packing version 1418735717
Deploying to project "new_project" in http://localhost:6800/addversion.json
Server response (200):
{"status": "error", "message": "IOError: [Errno 21] Is a directory: '/Users/Mihai/Work/www/4ideas/MarketWatcher/portia_tryout/portia/slyd/data/projects/new_project'"}

@MihaiCraciun
Copy link
Author

i can access http://localhost:6800/ just fine, all the commands i'm running are inside my virtual environment, in the project folder (new_project).. :(

@tpeng
Copy link
Contributor

tpeng commented Dec 16, 2014

did you accidentally remove the slyd/data/projects/new_project/spider/settings.py? could you paste that too?

@almeidaf
Copy link
Contributor

i had the same issue and reported on #100 i never got it to work but thought it was me that was doing something wrong since i never used scrapy/scrapyd before

@MihaiCraciun
Copy link
Author

here it is (new_project/spiders/settings.py)

# Automatically created by: slyd
import os

SPIDER_MANAGER_CLASS = 'slybot.spidermanager.ZipfileSlybotSpiderManager'
EXTENSIONS = {'slybot.closespider.SlybotCloseSpider': 1}
ITEM_PIPELINES = ['slybot.dupefilter.DupeFilterPipeline']
SPIDER_MIDDLEWARES = {'slybot.spiderlets.SpiderletsMiddleware': 999} # as close as possible to spider output
SLYDUPEFILTER_ENABLED = True

PROJECT_ZIPFILE = os.path.abspath(os.path.join(os.path.dirname(__file__), '..'))

try:
    from local_slybot_settings import *
except ImportError:
    pass

@MihaiCraciun
Copy link
Author

I can remove the whole project and do it again .. just one question : can i have both portia and scrapyd running at the same time ? I tried to startup portia in another terminal window while having scrapyd running and got an error that there is another twistd application running.

@tpeng
Copy link
Contributor

tpeng commented Dec 16, 2014

@MihaiCraciun yes, you can

@ruairif
Copy link
Contributor

ruairif commented Dec 16, 2014

@MihaiCraciun Before you delete the whole project would you mind just deleting the
new_project.egg-info and build folders and see if you can deploy it then.

@MihaiCraciun
Copy link
Author

I tried that earlier and it failed. I'm just now rebuilding the project and giving it another go.

@MihaiCraciun
Copy link
Author

good : extraction is working i portia. Now.. can anybody tell me step by step what i should do .. so that i don't mess this up again ?
I have both portia and scrapyd up and running ( i don't know how i did it now, it just worked, fingers crossed it won't happen again)

@tpeng
Copy link
Contributor

tpeng commented Dec 16, 2014

you can find the steps on https://github.com/scrapinghub/portia/blob/master/README.md#deploying-a-project, if you find something unclear/obscure i'm glad to update it.

@almeidaf
Copy link
Contributor

im gonna re-install scrapyd and see if i can get it to work this time

@MihaiCraciun
Copy link
Author

That's what got me in this mess the first time :))
Correct me if I'm wrong (and if not, please update the readme.md):

To deploy on localhost:

  1. cd in your project folder and run scrapyd to get the scrapyd server running. If you don't have scrapyd installed, follow these instructions : http://scrapyd.readthedocs.org/en/latest/install.html
  2. edit scrapy.cfg and add your target like so :
    [deploy:local-target-name-here]
    url = http://localhost:6800/
    Question : should i also specify project here ?
    a link with more details would also be nice for this step
  3. in your project folder run: scrapyd-deploy -L local-target-name-here
    the output is your spider-name
  4. use it in the following command : scrapyd-deploy local-target-name-here -p spider-name

And then schedule your spider with: curl http://localhost:6800/schedule.json -d project=your_project_name -d spider=your_spider_name
You can get the project name by running what command is it? i didn't find anything

@MihaiCraciun
Copy link
Author

IT'S FREAKIN' IMPOSSIBLE !!!!
sorry for venting..
so:

  1. killed portia and scrapyd
  2. deleted every project i had in slyd/data/projects
  3. started portia, defined a spider
  4. edited scrapy.cfg and added local target with url=http://localhost:6800/
  5. cd'd into new_project folder
  6. ran scrapyd. Startup worked ok
  7. ran scrapyd-deploy local -p new_project
    got error
    scrapyd-deploy local -p new_project
    Packing version 1418743344
    Deploying to project "new_project" in http://localhost:6800/addversion.json
    Server response (200):
    {"status": "error", "message": "IOError: [Errno 21] Is a directory: '/Users/Mihai/Work/www/4ideas/MarketWatcher/portia_tryout/portia/slyd/data/projects/new_project'"}

Please ... HELP ! I'm out of ideas...

@ruairif
Copy link
Contributor

ruairif commented Dec 16, 2014

Did you try it through curl
curl http://localhost:6800/addversion.json -F project=new_project -F egg=<egg_file>

If that doesn't work would you mind uploading your zip file so I can see it there's anything weird about it.

@MihaiCraciun
Copy link
Author

http://we.tl/Qolkk9JhYT

@ruairif
Copy link
Contributor

ruairif commented Dec 16, 2014

There's nothing strange about the project.
Can you try replacing your _upload_egg function in your scrapyd-deploy script with:

def _upload_egg(target, eggpath, project, version):
    print('Reading egg from: %s' % eggpath)
    with open(eggpath, 'rb') as f:
         eggdata = f.read()
    print('Finished reading egg: %s' % eggdata[:100])
    data = {
        'project': project,
        'version': version,
        'egg': ('project.egg', eggdata),
    }
    body, boundary = encode_multipart(data)
    url = _url(target, 'addversion.json')
    headers = {
        'Content-Type': 'multipart/form-data; boundary=%s' % boundary,
        'Content-Length': str(len(body)),
    }
    req = urllib2.Request(url, body, headers)
    _add_auth_header(req, target)
    _log('Deploying to project "%s" in %s' % (project, url))
    return _http_post(req)

Then could you post your output from the deploy here

@MihaiCraciun
Copy link
Author

Traceback (most recent call last):
File "/Users/Mihai/Work/www/4ideas/MarketWatcher/portia_tryout/venv/bin/scrapyd-deploy", line 291, in
main()
File "/Users/Mihai/Work/www/4ideas/MarketWatcher/portia_tryout/venv/bin/scrapyd-deploy", line 96, in main
if not _upload_egg(target, egg, project, version):
File "/Users/Mihai/Work/www/4ideas/MarketWatcher/portia_tryout/venv/bin/scrapyd-deploy", line 194, in _upload_egg
print('Reading egg from: ' % eggpath)
TypeError: not all arguments converted during string formatting

@ruairif
Copy link
Contributor

ruairif commented Dec 16, 2014

I forgot a %s in the script. I've updated it above. would you mind trying again

@MihaiCraciun
Copy link
Author

here it is :

Reading egg from: /var/folders/1j/7w389xgj51l1hfw_4l9xwyz80000gn/T/scrapydeploy-M1ya7T/new_project-1.0-py2.7.egg
Finished reading egg: P/??EC???extractors.json??Pp??Eu[*m?
items.json͎1
?0
  ???
Deploying to project "new_project" in http://localhost:6800/addversion.json
Server response (200):
{"status": "error", "message": "IOError: [Errno 21] Is a directory: '/Users/Mihai/Work/www/4ideas/MarketWatcher/portia_tryout/portia/slyd/data/projects/new_project'"}

@MihaiCraciun
Copy link
Author

I'm working in a virtual environment.. i don't know why it's trying to go in system var (or at least that's what i understand)

@ruairif
Copy link
Contributor

ruairif commented Dec 16, 2014

I've no idea why it's doing that either, I'm not really familiar with how scrapyd works.
Try removing the folder that it's trying to upload to and see if that helps.
rm /Users/Mihai/Work/www/4ideas/MarketWatcher/portia_tryout/portia/slyd/data/projects/new_project

@MihaiCraciun
Copy link
Author

that's my project folder from which i'm supposed to run scrapyd-deploy

@ruairif
Copy link
Contributor

ruairif commented Dec 16, 2014

Looks like you are running them from the same folder then.
Restart scrapyd from some other folder and then try it.

@MihaiCraciun
Copy link
Author

HOLY C%@p it worked !!! What now ? :)

@ruairif
Copy link
Contributor

ruairif commented Dec 16, 2014

I'll have to add it to the docs as a gotcha.

@MihaiCraciun
Copy link
Author

What should I read next on actually using my spider ? I want to run my spider and save the data scraped to a database

@ruairif
Copy link
Contributor

ruairif commented Dec 16, 2014

Schedule your spider through the API and you can monitor it through the web interface

@ruairif ruairif closed this as completed Dec 17, 2014
donsunsoft added a commit to donsunsoft/portia that referenced this issue Jan 3, 2015
scrapinghub#128 Update docs to warn users not to run scrapyd in their project direc...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants