Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🔗 Follow URLs #90

Merged
merged 13 commits into from
Mar 21, 2022
Merged

🔗 Follow URLs #90

merged 13 commits into from
Mar 21, 2022

Conversation

roniemartinez
Copy link
Owner

@roniemartinez roniemartinez commented Mar 17, 2022

Resolves #62 (easier alternative, no code change for decorated functions)

TODO

  • Extract URLs
  • Fix relative paths
  • [ ] Prevent following external URLs
  • Prevent duplicates Prevent requesting current URL
  • Specify allowed domains (For now, allowed domains are extracted from URL input.)

NOTES

This is just a simple Spider/Crawler for now. Will update in the future.

@roniemartinez roniemartinez added enhancement New feature or request WIP Work-in-progress labels Mar 17, 2022
@roniemartinez roniemartinez self-assigned this Mar 17, 2022
@codecov-commenter
Copy link

codecov-commenter commented Mar 17, 2022

Codecov Report

Merging #90 (d07a92b) into master (d067127) will decrease coverage by 2.51%.
The diff coverage is 86.23%.

@@            Coverage Diff             @@
##           master      #90      +/-   ##
==========================================
- Coverage   99.24%   96.72%   -2.52%     
==========================================
  Files          13       13              
  Lines         929     1070     +141     
==========================================
+ Hits          922     1035     +113     
- Misses          7       35      +28     
Impacted Files Coverage Δ
dude/__init__.py 100.00% <ø> (ø)
dude/scraper.py 100.00% <ø> (ø)
dude/playwright_scraper.py 82.85% <46.15%> (-12.95%) ⬇️
dude/base.py 97.75% <86.20%> (-2.25%) ⬇️
dude/optional/parsel_scraper.py 98.21% <93.75%> (-1.79%) ⬇️
dude/optional/lxml_scraper.py 98.24% <93.93%> (-1.76%) ⬇️
dude/optional/beautifulsoup_scraper.py 99.09% <96.66%> (-0.91%) ⬇️
dude/optional/pyppeteer_scraper.py 98.19% <100.00%> (+0.28%) ⬆️
dude/optional/selenium_scraper.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d067127...d07a92b. Read the comment docs.

@roniemartinez roniemartinez marked this pull request as ready for review March 19, 2022 16:58
@roniemartinez roniemartinez mentioned this pull request Mar 21, 2022
@roniemartinez roniemartinez removed the WIP Work-in-progress label Mar 21, 2022
@roniemartinez roniemartinez merged commit 43b153d into master Mar 21, 2022
@roniemartinez roniemartinez deleted the follow-links branch March 21, 2022 21:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Spider
2 participants