-
Notifications
You must be signed in to change notification settings - Fork 6
Implement Reddit scraper #159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements Reddit scraping capabilities using the Apify API, adding a new Reddit job type to the worker's capabilities.
Key changes:
- Added Reddit scraper job type with support for URL scraping, post search, user search, and community search
- Refactored Apify client interface to enable reuse across different scrapers
- Updated type definitions and capabilities to support Reddit functionality
Reviewed Changes
Copilot reviewed 21 out of 23 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| pkg/client/apify_client.go | Added Apify interface and refactored types to use uint instead of int for pagination |
| internal/jobserver/worker.go | Added error logging improvements for job execution |
| internal/jobserver/jobserver_test.go | Added WorkerID field to test job structures |
| internal/jobserver/jobserver.go | Registered new Reddit scraper job type |
| internal/jobs/webscraper.go | Changed log levels from Info to Debug/Warn for reduced verbosity |
| internal/jobs/twitterapify/client.go | Updated to use new Apify interface and uint types |
| internal/jobs/twitter.go | Cleaned up unused function and updated import statements |
| internal/jobs/redditapify/client_test.go | Comprehensive tests for Reddit Apify client functionality |
| internal/jobs/redditapify/client.go | New Reddit Apify client implementation |
| internal/jobs/reddit_test.go | Unit tests for Reddit scraper job execution |
| internal/jobs/reddit.go | New Reddit scraper implementation |
| internal/capabilities/detector.go | Added Reddit capabilities detection when Apify key is available |
| internal/api/routes.go | Improved error message clarity |
| go.mod | Added temporary replace directive for tee-types dependency |
| api/types/reddit/reddit_test.go | Tests for Reddit response type marshalling/unmarshalling |
| api/types/reddit/reddit_suite_test.go | Test suite setup for Reddit types |
| api/types/reddit/reddit.go | Reddit data type definitions and JSON marshalling logic |
| api/types/job.go | Added Reddit configuration support and Job string method |
| api/types/encrypted.go | Enhanced error messages with proper error wrapping |
| README.md | Added documentation for Reddit job types and capabilities |
| Makefile | Added test target for Reddit functionality |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
rapidfix
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes looking good!
|
Depends on gopher-lab/tee-types#15 |
What
Add jobs for scraping Reddit
Testing
Tested with
curlon the devbox, using an Apify API key