Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+1] add lru_cache to css2xpath #109

Merged
merged 8 commits into from May 31, 2018

Conversation

Projects
None yet
4 participants
@nctl144
Copy link
Member

commented Feb 23, 2018

@lopuhin , I am continuing the performance discuss from this topic. I just tried added lru_cache to parse library func css to xpath and I am planning on testing it and give the data result.
I am just wondering if it is correct to test the performance by creating a new virtualenv and test it with the benchmarker available in scrapy-bench, and how can I test it with the new version of the parsel that I created (I tried python setup.py develop in parsel after adding new code but it did not work)? Please let me know!

nctl144 added some commits Feb 23, 2018

@lopuhin

This comment has been minimized.

Copy link
Member

commented Feb 25, 2018

@nctl144 yes, having two virtualenvs is the easiest solution IMO. So the first is the base one, and the second is the same, except that you install local parsel with python setup.py develop or pip install -e ., while being in the parsel directory.

I tried python setup.py develop in parsel after adding new code but it did not work?

Did you get an error, or old parsel was imported? Maybe you can try to first uninstall parsel, check that it fails to import, then install a local one? And double-check that you are in a correct virtualenv.

@lopuhin

This comment has been minimized.

Copy link
Member

commented Feb 25, 2018

@nctl144 I assume the PR is not completely ready yes, I hope you don't mind me changing the title to reflect that.

@lopuhin lopuhin changed the title add lru_cache to css2xpath [WIP] add lru_cache to css2xpath Feb 25, 2018

@nctl144

This comment has been minimized.

Copy link
Member Author

commented Feb 25, 2018

Thank you @lopuhin for adding the tag. I will add more result to this topic when it's ready!

@codecov

This comment has been minimized.

Copy link

commented Feb 25, 2018

Codecov Report

Merging #109 into master will decrease coverage by 0.38%.
The diff coverage is 90.9%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #109      +/-   ##
==========================================
- Coverage     100%   99.61%   -0.39%     
==========================================
  Files           5        5              
  Lines         252      260       +8     
  Branches       46       47       +1     
==========================================
+ Hits          252      259       +7     
- Misses          0        1       +1
Impacted Files Coverage Δ
parsel/csstranslator.py 98.57% <90.9%> (-1.43%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f6103c8...0bedb8a. Read the comment docs.

@nctl144

This comment has been minimized.

Copy link
Member Author

commented Feb 26, 2018

Hey @lopuhin , I have added the lru_cache decorator to cache the function css_to_xpath. The result is here (I ran it with scrapy-bench cssbench):
python3:
With cache option:

The results of the benchmark are (all speeds in items/sec) : 


Test = 'css Benchmark' Iterations = '50'


Mean : 198.84502027847827 Median : 200.51676914384882 Std Dev : 6.7986926982947775

Without cache option:

The results of the benchmark are (all speeds in items/sec) : 


Test = 'css Benchmark' Iterations = '50'


Mean : 169.82949117379943 Median : 170.35505304421105 Std Dev : 4.068657666687966

python2:
With cache option:

The results of the benchmark are (all speeds in items/sec) : 


Test = 'css Benchmark' Iterations = '50'


Mean : 197.24539931 Median : 198.732379741 Std Dev : 4.99212917841

Without cache option:

The results of the benchmark are (all speeds in items/sec) : 


Test = 'css Benchmark' Iterations = '50'


Mean : 168.903527219 Median : 169.90905336 Std Dev : 3.98855536418

My code is not that clean, but please let me know if there is anything that I can improve here!

@@ -1,8 +1,15 @@
import sys

if sys.version_info[0] < 3:

This comment has been minimized.

Copy link
@lopuhin

lopuhin Feb 26, 2018

Member

I think it would be better to use six.PY2 in this case.

setup.py Outdated
'cssselect>=0.9'
]

if sys.version_info[0:2] < (3, 0):

This comment has been minimized.

Copy link
@lopuhin

lopuhin Feb 26, 2018

Member

I think that this will not do what we want in all cases, because parsel uses universal wheel (https://github.com/scrapy/parsel/blob/master/setup.cfg).

First, how to check (it's important to do it for this PR anyway):

  • build a source and a universal wheel, the command is python setup.py sdist bdist_wheel (this will generate two files)
  • make fresh virtual environments for python 2 and python 3 with recent pip, and try installing both files (also in separate virtualenvs): this should result in functools32 being installed only for python2

I didn't check, but I think that for a wheel, it will results in functools32 not installed in both cases (if you build the wheel under python 3), or installed in both cases, if you build a wheel under python 2, which is done for pypi releases:

condition: $TOXENV == py27

The reason is that setup.py code is not executed for a wheel, but this is the most common (and fastest) way of installation.

In order to fix this, a conditional dependency must be added in a different way, like it's added for pypy in scrapy (https://github.com/scrapy/scrapy/blob/master/setup.py) - but in this case, we want to condition on python version, I hope there are such examples.

If anything is not clear, don't hesitate to ask - and also please check that there is indeed a problem before fixing.

This comment has been minimized.

Copy link
@nctl144

nctl144 Feb 26, 2018

Author Member

Hey @lopuhin , I checked and indeed there were some errors:
Wheel created with Python2:

  • Python 2, install with whl: no problem with the installation, functools32 was installed
  • python 3, install with whl: It was trying to install functools32 and failed to install the package.

Wheel created with python3:

  • python3, install with whl: no problem with the installation
  • python2, install with whl: functools32 was not installed
pass

@lru_cache(maxsize=256)
def css_to_xpath(self, css, prefix='descendant-or-self::'):

This comment has been minimized.

Copy link
@lopuhin

lopuhin Feb 26, 2018

Member

I think it would be nice to add caching without repeating code - is it possible to call parent method, or are there any differences?

This comment has been minimized.

Copy link
@nctl144

nctl144 Feb 26, 2018

Author Member

You're right @lopuhin , I just changed the code so it calls the parent method instead. Please let me know if that can resolve this problem!

@lopuhin
Copy link
Member

left a comment

Thanks @nctl144 ! I think the general approach is in line with what was discussed in #98, left some comments on the implementation.

Also someone from core parsel developers will need to have another look at this after me.

@nctl144

This comment has been minimized.

Copy link
Member Author

commented Feb 26, 2018

Hey @lopuhin , I have fixed the requirements in setup.py based on your suggestion, and I also tested it on python2 and python3 environments, including packaging and installing it. Thank you so much for your help and please let me know if there is any other thing that I can further improve! :)

@lopuhin
Copy link
Member

left a comment

Thanks @nctl144 , looks great! I also verified that both a wheel and a source package are handled correctly under python 2 and python 3.

Now let's wait for someone else to check it too.

@lopuhin lopuhin changed the title [WIP] add lru_cache to css2xpath [MRG+1] add lru_cache to css2xpath Feb 27, 2018

@cathalgarvey

This comment has been minimized.

Copy link

commented Feb 27, 2018

Looks concise and well-written, to me. :)

@kmike

This comment has been minimized.

Copy link
Member

commented May 31, 2018

Thanks @nctl144 for the fix and @lopuhin and @cathalgarvey for the review!

@kmike kmike merged commit 5323b83 into scrapy:master May 31, 2018

1 of 3 checks passed

codecov/patch 90.9% of diff hit (target 100%)
Details
codecov/project 99.61% (-0.39%) compared to f6103c8
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@nctl144 nctl144 deleted the nctl144:lru_cache branch May 31, 2018

@kmike kmike added this to the v1.5 milestone Jun 22, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.