-
Notifications
You must be signed in to change notification settings - Fork 10.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: return unique_list only when link_extractor.unique is True #5458
fix: return unique_list only when link_extractor.unique is True #5458
Conversation
Ok so this won't get fixed? |
Any update on this? |
Added to my review queue, but it may take a while 🙇 |
Just a ping :D |
I am OK with merging as is, provided we update the docstring of |
I updated the docstring and fixed the html used for the tests to have tests that also cover the new case (unique = false, and the output contains duplicates). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Until your tests changes I did not realize the issue only affected links with matching URL and text, but the original issue indeed describes it like that.
Thanks!
Co-authored-by: Adrián Chaves <adrian@chaves.io>
Codecov Report
@@ Coverage Diff @@
## master #5458 +/- ##
==========================================
- Coverage 88.87% 88.86% -0.02%
==========================================
Files 162 162
Lines 10980 10982 +2
Branches 1796 1797 +1
==========================================
Hits 9759 9759
- Misses 940 941 +1
- Partials 281 282 +1
|
In the
lxmlhtml.py
script, the functionextract_links(self, response)
always returnsunique_list(all_links)
, even when the attributeunique
of thelink_extractor
is equal toFalse
.The behavior is coherent with the
unique
attribute with this little change in the return line.Fixes #3798, closes #3799, closes #4695