unable to deploy with scrapyd-deploy #128

MihaiCraciun · 2014-12-16T10:08:37Z

Hello,

Could you please help me figure out what I'm doing wrong ? Here are the steps:
i followed the portia install manual - all ok
i created a new project, entered an url, tagged an item - all ok
clicked "continue browsing", browsed through site, items were being extracted as expected - all ok

Next i wanted to deploy my spider:
1st try : i tried to run, as the docs specified, scrapyd-deploy your_scrapyd_target -p project_name - got error - scrapyd wasn't installed
fix: pip install scrapyd
2nd try : i launched scrapyd server (also missing from the docs), accessed http://localhost:6800/ -all ok
after a brief reading of scrapyd docs i found out i had to edit the file scrapy.cfg from my project : slyd/data/projects/new_project/scrapy.cfg
added the following :
[deploy:local]
url = http://localhost:6800/

went back to the console, checked all is ok :
$:> scrapyd-deploy -l
local http://localhost:6800/

$:> scrapyd-deploy -L local
default

seemed ok so i gave it another try :
$scrapyd-deploy local -p default
Packing version 1418722113
Deploying to project "default" in http://localhost:6800/addversion.json
Server response (200):
{"status": "error", "message": "IOError: [Errno 21] Is a directory: '/Users/Mihai/Work/www/4ideas/MarketWatcher/portia_tryout/portia/slyd/data/projects/new_project'"}

What am I missing ?

ruairif · 2014-12-16T11:02:13Z

I'm not sure exactly. I was able to recreate your issue but once I had resolved it I couldn't recreate it again.
Try reinstalling scrapyd and try to deploy it again and see if that works.

MihaiCraciun · 2014-12-16T11:28:40Z

Could you give me a step by step on how I would go about doing that ? I have next to zero experience with python/scrapy and did everything by following tutorials.
I shut down everything : killed twistd process used for running portia, killed scrapyd process.
I did a pip uninstall scrapyd and pip install scrapyd
I ran scrapyd and it started the server ok.
I ran scrapyd-deploy -L local and it returned empty
I ran curl http://localhost:6800/listprojects.json and it returned {"status": "ok", "projects": []}
I tried to run twistd -n slyd and got :
Another twistd server is running, PID 1896

This could either be a previously started instance of your application or a different application entirely.
To start a new one, either run it in some other directory, or use the --pidfile and --logfile parameters to avoid clashes.

ruairif · 2014-12-16T11:35:02Z

You probably didn't kill slyd correctly.
Did you try deploying again?

MihaiCraciun · 2014-12-16T11:39:41Z

i couldn't deploy because i didn't have the required argument "project name" scrapyd-deploy your_scrapyd_target -p project_name
All i had was the target argument local

ruairif · 2014-12-16T11:41:34Z

Is it all sorted now then?

MihaiCraciun · 2014-12-16T11:42:52Z

no. I didn't understand what i needed to do and was waiting for some guidance. What should i do / check ?

ruairif · 2014-12-16T11:47:42Z

Change directory to the project
Run scrapyd &
Run scrapyd-deploy local -p default

Then it should be available on scrapyd (scrapyd-deploy -L local)

MihaiCraciun · 2014-12-16T11:53:41Z

Still not working...
scrapyd-deploy local -p default
Packing version 1418730681
Deploying to project "default" in http://localhost:6800/addversion.json
Server response (200):
{"status": "error", "message": "IOError: [Errno 21] Is a directory: '/Users/Mike/www/portia/slyd/data/projects/new_project'"}
Just so we're clear, i have to run scrapyd without the & sign, correct?

tpeng · 2014-12-16T13:04:32Z

@MihaiCraciun scrapyd-deploy local -p default seems wrong to me, the argument after -p should be project name, and in your caseit should be new_project.

also could you please paste your slyd/data/projects/new_project/scrapy.cfg? it can give more information

MihaiCraciun · 2014-12-16T13:05:43Z

cat scrapy.cfg

# Automatically created by: slyd

[settings]
default = slybot.settings

[deploy:local]
url = http://localhost:6800/

MihaiCraciun · 2014-12-16T13:08:05Z

(venv)192-168-0-197:new_project Mihai$ scrapyd-deploy local -p new_project
Packing version 1418735228
Deploying to project "new_project" in http://localhost:6800/addversion.json
Server response (200):
{"status": "error", "message": "IOError: [Errno 2] No such file or directory: 'slybot-project/project.json'"}

tpeng · 2014-12-16T13:13:33Z

that doesn't looks right. could you also paste the slyd/data/projects/new_project/setup.py?

MihaiCraciun · 2014-12-16T13:22:26Z

The original file looked like this:

# Automatically created by: slyd

from setuptools import setup, find_packages

setup(
    name         = 'new_project',
    version      = '1.0',
    packages     = find_packages(),
    package_data = {
        'spiders': ['*.json']
    },
    data_files = [('', ['project.json', 'items.json', 'extractors.json'])],
    entry_points = {'scrapy': ['settings = spiders.settings']},
    zip_safe = True
)

But :
Searching on stackoverflow I noticed other people had issues with deployment and one fix mentioned removing build, eggs, project.egg-info and also setup.py. and doing another scrapyd-deploy target -p prject
after doing so, my setup.py looked like so :

# Automatically created by: scrapyd-deploy

from setuptools import setup, find_packages

setup(
    name         = 'project',
    version      = '1.0',
    packages     = find_packages(),
    entry_points = {'scrapy': ['settings = slybot.settings']},
)

Again, i deleted build, eggs and so on, and placed back the original setup.py (firs in this comment). Did a build and got :
scrapyd-deploy local -p new_project
Packing version 1418735717
Deploying to project "new_project" in http://localhost:6800/addversion.json
Server response (200):
{"status": "error", "message": "IOError: [Errno 21] Is a directory: '/Users/Mihai/Work/www/4ideas/MarketWatcher/portia_tryout/portia/slyd/data/projects/new_project'"}

MihaiCraciun · 2014-12-16T13:24:36Z

i can access http://localhost:6800/ just fine, all the commands i'm running are inside my virtual environment, in the project folder (new_project).. :(

tpeng · 2014-12-16T13:28:02Z

did you accidentally remove the slyd/data/projects/new_project/spider/settings.py? could you paste that too?

almeidaf · 2014-12-16T13:28:42Z

i had the same issue and reported on #100 i never got it to work but thought it was me that was doing something wrong since i never used scrapy/scrapyd before

MihaiCraciun · 2014-12-16T13:31:19Z

here it is (new_project/spiders/settings.py)

# Automatically created by: slyd
import os

SPIDER_MANAGER_CLASS = 'slybot.spidermanager.ZipfileSlybotSpiderManager'
EXTENSIONS = {'slybot.closespider.SlybotCloseSpider': 1}
ITEM_PIPELINES = ['slybot.dupefilter.DupeFilterPipeline']
SPIDER_MIDDLEWARES = {'slybot.spiderlets.SpiderletsMiddleware': 999} # as close as possible to spider output
SLYDUPEFILTER_ENABLED = True

PROJECT_ZIPFILE = os.path.abspath(os.path.join(os.path.dirname(__file__), '..'))

try:
    from local_slybot_settings import *
except ImportError:
    pass

MihaiCraciun · 2014-12-16T13:37:10Z

I can remove the whole project and do it again .. just one question : can i have both portia and scrapyd running at the same time ? I tried to startup portia in another terminal window while having scrapyd running and got an error that there is another twistd application running.

tpeng · 2014-12-16T13:37:57Z

@MihaiCraciun yes, you can

ruairif · 2014-12-16T13:40:30Z

@MihaiCraciun Before you delete the whole project would you mind just deleting the
new_project.egg-info and build folders and see if you can deploy it then.

MihaiCraciun · 2014-12-16T13:45:06Z

I tried that earlier and it failed. I'm just now rebuilding the project and giving it another go.

MihaiCraciun · 2014-12-16T13:47:25Z

good : extraction is working i portia. Now.. can anybody tell me step by step what i should do .. so that i don't mess this up again ?
I have both portia and scrapyd up and running ( i don't know how i did it now, it just worked, fingers crossed it won't happen again)

tpeng · 2014-12-16T13:55:27Z

you can find the steps on https://github.com/scrapinghub/portia/blob/master/README.md#deploying-a-project, if you find something unclear/obscure i'm glad to update it.

almeidaf · 2014-12-16T13:58:08Z

im gonna re-install scrapyd and see if i can get it to work this time

MihaiCraciun · 2014-12-16T14:15:07Z

That's what got me in this mess the first time :))
Correct me if I'm wrong (and if not, please update the readme.md):

To deploy on localhost:

cd in your project folder and run scrapyd to get the scrapyd server running. If you don't have scrapyd installed, follow these instructions : http://scrapyd.readthedocs.org/en/latest/install.html
edit scrapy.cfg and add your target like so :
[deploy:local-target-name-here]
url = http://localhost:6800/
Question : should i also specify project here ?
a link with more details would also be nice for this step
in your project folder run: scrapyd-deploy -L local-target-name-here
the output is your spider-name
use it in the following command : scrapyd-deploy local-target-name-here -p spider-name

And then schedule your spider with: curl http://localhost:6800/schedule.json -d project=your_project_name -d spider=your_spider_name
You can get the project name by running what command is it? i didn't find anything

MihaiCraciun · 2014-12-16T15:27:31Z

IT'S FREAKIN' IMPOSSIBLE !!!!
sorry for venting..
so:

killed portia and scrapyd
deleted every project i had in slyd/data/projects
started portia, defined a spider
edited scrapy.cfg and added local target with url=http://localhost:6800/
cd'd into new_project folder
ran scrapyd. Startup worked ok
ran scrapyd-deploy local -p new_project
got error
scrapyd-deploy local -p new_project
Packing version 1418743344
Deploying to project "new_project" in http://localhost:6800/addversion.json
Server response (200):
{"status": "error", "message": "IOError: [Errno 21] Is a directory: '/Users/Mihai/Work/www/4ideas/MarketWatcher/portia_tryout/portia/slyd/data/projects/new_project'"}

Please ... HELP ! I'm out of ideas...

ruairif · 2014-12-16T15:35:19Z

Did you try it through curl
curl http://localhost:6800/addversion.json -F project=new_project -F egg=<egg_file>

If that doesn't work would you mind uploading your zip file so I can see it there's anything weird about it.

MihaiCraciun · 2014-12-16T15:44:21Z

http://we.tl/Qolkk9JhYT

ruairif · 2014-12-16T16:16:20Z

There's nothing strange about the project.
Can you try replacing your _upload_egg function in your scrapyd-deploy script with:

def _upload_egg(target, eggpath, project, version):
    print('Reading egg from: %s' % eggpath)
    with open(eggpath, 'rb') as f:
         eggdata = f.read()
    print('Finished reading egg: %s' % eggdata[:100])
    data = {
        'project': project,
        'version': version,
        'egg': ('project.egg', eggdata),
    }
    body, boundary = encode_multipart(data)
    url = _url(target, 'addversion.json')
    headers = {
        'Content-Type': 'multipart/form-data; boundary=%s' % boundary,
        'Content-Length': str(len(body)),
    }
    req = urllib2.Request(url, body, headers)
    _add_auth_header(req, target)
    _log('Deploying to project "%s" in %s' % (project, url))
    return _http_post(req)

Then could you post your output from the deploy here

MihaiCraciun · 2014-12-16T16:20:06Z

Traceback (most recent call last):
File "/Users/Mihai/Work/www/4ideas/MarketWatcher/portia_tryout/venv/bin/scrapyd-deploy", line 291, in
main()
File "/Users/Mihai/Work/www/4ideas/MarketWatcher/portia_tryout/venv/bin/scrapyd-deploy", line 96, in main
if not _upload_egg(target, egg, project, version):
File "/Users/Mihai/Work/www/4ideas/MarketWatcher/portia_tryout/venv/bin/scrapyd-deploy", line 194, in _upload_egg
print('Reading egg from: ' % eggpath)
TypeError: not all arguments converted during string formatting

ruairif · 2014-12-16T16:22:11Z

I forgot a %s in the script. I've updated it above. would you mind trying again

MihaiCraciun · 2014-12-16T16:25:12Z

here it is :

Reading egg from: /var/folders/1j/7w389xgj51l1hfw_4l9xwyz80000gn/T/scrapydeploy-M1ya7T/new_project-1.0-py2.7.egg
Finished reading egg: P/??EC???extractors.json??Pp??Eu[*m?
items.json͎1
?0
  ???
Deploying to project "new_project" in http://localhost:6800/addversion.json
Server response (200):
{"status": "error", "message": "IOError: [Errno 21] Is a directory: '/Users/Mihai/Work/www/4ideas/MarketWatcher/portia_tryout/portia/slyd/data/projects/new_project'"}

MihaiCraciun · 2014-12-16T16:26:24Z

I'm working in a virtual environment.. i don't know why it's trying to go in system var (or at least that's what i understand)

ruairif · 2014-12-16T16:31:41Z

I've no idea why it's doing that either, I'm not really familiar with how scrapyd works.
Try removing the folder that it's trying to upload to and see if that helps.
rm /Users/Mihai/Work/www/4ideas/MarketWatcher/portia_tryout/portia/slyd/data/projects/new_project

MihaiCraciun · 2014-12-16T16:33:12Z

that's my project folder from which i'm supposed to run scrapyd-deploy

ruairif · 2014-12-16T16:36:45Z

Looks like you are running them from the same folder then.
Restart scrapyd from some other folder and then try it.

MihaiCraciun · 2014-12-16T16:40:07Z

HOLY C%@p it worked !!! What now ? :)

ruairif · 2014-12-16T16:41:17Z

I'll have to add it to the docs as a gotcha.

MihaiCraciun · 2014-12-16T16:48:55Z

What should I read next on actually using my spider ? I want to run my spider and save the data scraped to a database

…rectory

ruairif · 2014-12-16T17:03:17Z

Schedule your spider through the API and you can monitor it through the web interface

scrapinghub#128 Update docs to warn users not to run scrapyd in their project direc...

ruairif added a commit that referenced this issue Dec 16, 2014

#128 Update docs to warn users not to run scrapyd in their project di…

29d9e70

…rectory

ruairif closed this as completed Dec 17, 2014

donsunsoft added a commit to donsunsoft/portia that referenced this issue Jan 3, 2015

Merge pull request #1 from scrapinghub/master

c664b30

scrapinghub#128 Update docs to warn users not to run scrapyd in their project direc...

unable to deploy with scrapyd-deploy #128

unable to deploy with scrapyd-deploy #128

Comments

MihaiCraciun commented Dec 16, 2014

ruairif commented Dec 16, 2014

MihaiCraciun commented Dec 16, 2014

ruairif commented Dec 16, 2014

MihaiCraciun commented Dec 16, 2014

ruairif commented Dec 16, 2014

MihaiCraciun commented Dec 16, 2014

ruairif commented Dec 16, 2014

MihaiCraciun commented Dec 16, 2014

tpeng commented Dec 16, 2014

MihaiCraciun commented Dec 16, 2014

MihaiCraciun commented Dec 16, 2014

tpeng commented Dec 16, 2014

MihaiCraciun commented Dec 16, 2014

MihaiCraciun commented Dec 16, 2014

tpeng commented Dec 16, 2014

almeidaf commented Dec 16, 2014

MihaiCraciun commented Dec 16, 2014

MihaiCraciun commented Dec 16, 2014

tpeng commented Dec 16, 2014

ruairif commented Dec 16, 2014

MihaiCraciun commented Dec 16, 2014

MihaiCraciun commented Dec 16, 2014

tpeng commented Dec 16, 2014

almeidaf commented Dec 16, 2014

MihaiCraciun commented Dec 16, 2014

MihaiCraciun commented Dec 16, 2014

ruairif commented Dec 16, 2014

MihaiCraciun commented Dec 16, 2014

ruairif commented Dec 16, 2014

MihaiCraciun commented Dec 16, 2014

ruairif commented Dec 16, 2014

MihaiCraciun commented Dec 16, 2014

MihaiCraciun commented Dec 16, 2014

ruairif commented Dec 16, 2014

MihaiCraciun commented Dec 16, 2014

ruairif commented Dec 16, 2014

MihaiCraciun commented Dec 16, 2014

ruairif commented Dec 16, 2014

MihaiCraciun commented Dec 16, 2014

ruairif commented Dec 16, 2014