Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrating Scurl into Scrapy #3332

Open
wants to merge 86 commits into
base: master
Choose a base branch
from
Open

Integrating Scurl into Scrapy #3332

wants to merge 86 commits into from

Conversation

malloxpb
Copy link
Member

Hey @lopuhin , I will continue the PR #3319 in here. I screwed it up somehow and I could not get the diff updated...

@lopuhin
Copy link
Member

lopuhin commented Jul 25, 2018

I just added scurl on py27 test. I don't know if this is the best way to make it tested on both py27 and 36 but please let me know if this can be improved

@nctl144 in general I think what you did is correct, but the build https://travis-ci.org/scrapy/scrapy/jobs/407810222 has a lot of failures, maybe this is due to this small requirements issue that I left a comment on: #3332 (comment)

@malloxpb
Copy link
Member Author

malloxpb commented Aug 4, 2018

Hey @lopuhin , the build is green now haha. This PR is also ready for review!

@malloxpb malloxpb changed the title [WIP] Integrating Scurl into Scrapy Integrating Scurl into Scrapy Aug 6, 2018
@lopuhin
Copy link
Member

lopuhin commented Aug 6, 2018

Hey @nctl144 looks good! Before the merge, this might need tweaking the warning wording, change scurl URL to git+git://github.com/scrapy/scurl.git (btw, is Cython needed there?) once all PRs are merged. I wonder if we also need to add scurl to the installation docs?

@nctl144 could you please also provide benchmarking results on python 2 and python 3, to provide motivation why scurl is useful?

@malloxpb
Copy link
Member Author

malloxpb commented Aug 6, 2018

(btw, is Cython needed there?)

I suppose it is not since I tried installing the library without having Cython installed. I will remove it!

this might need tweaking the warning wording

hmm, do you have anything in mind for this? I am not sure how I should change that haha

could you please also provide benchmarking results on python 2 and python 3, to provide motivation why scurl is useful?

Yeah I will work on that now :)

@lopuhin
Copy link
Member

lopuhin commented Aug 7, 2018

this might need tweaking the warning wording

hmm, do you have anything in mind for this? I am not sure how I should change that haha

I think it would be better to remove the warning - sorry, it was my idea initially but after talking to @kmike we decided that at the moment it would be better to go without it.

@malloxpb
Copy link
Member Author

malloxpb commented Aug 7, 2018

haha sure I will just remove it for now @lopuhin !

@malloxpb
Copy link
Member Author

malloxpb commented Aug 8, 2018

I just added Scurl to the installation doc in this PR @lopuhin . I will include the profiling result in the comment section soon! But please let me know if there's anything else that I should work on 😄

@malloxpb
Copy link
Member Author

malloxpb commented Aug 8, 2018

here is some speed check result (I don't have the profile to share yet since vmprof is not cooperating...

using scrapy-bench bookworm on py3

With Scurl 72.70933595483582 items/second
Without Scurl 62.18140819263788 items/second

Using scrapy bench on py3:

With Scurl 2880 pages/min
Without Scurl 2400 pages/min

More information can be found in the GSoC final report, which can be found here. Lemme know what you think @lopuhin !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants