-
Notifications
You must be signed in to change notification settings - Fork 2
Add types for Reddit scraper #15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds comprehensive type definitions and support for Reddit scraping functionality. It introduces Reddit-specific job types, capabilities, validation logic, and custom JSON marshaling/unmarshaling for polymorphic Reddit item types.
- Adds Reddit job type and capabilities to the job system with validation
- Implements polymorphic Reddit item types (User, Post, Comment, Community) with custom JSON handling
- Creates comprehensive test coverage for Reddit argument validation and type marshaling
Reviewed Changes
Copilot reviewed 10 out of 11 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| types/reddit.go | Core Reddit types with polymorphic JSON marshaling for different item types |
| types/jobs.go | Extends job system with Reddit job type and capabilities |
| args/reddit.go | Reddit job arguments with validation and default value handling |
| pkg/util/set.go | Enhances Set methods to return self for method chaining |
| types/reddit_test.go | Tests for Reddit item JSON marshaling/unmarshaling |
| args/reddit_test.go | Tests for Reddit argument validation and defaults |
| args/unmarshaller.go | Adds Reddit job argument unmarshaling support |
| args/unmarshaller_test.go | Tests for job argument unmarshaling including Reddit |
| types/types_suite_test.go | Test suite setup for types package |
| args/args_suite_test.go | Test suite setup for args package |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| errs = append(errs, fmt.Errorf("%s is not a valid URL", q.URL)) | ||
| } else { | ||
| if !strings.HasSuffix(u.Host, redditDomainSuffix) { | ||
| errs = append(errs, fmt.Errorf("invalid Reddit URL %s", q.URL)) |
Copilot
AI
Aug 22, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The method assignment to q.Method only modifies the local copy since Go passes structs by value. This means the validation passes but the original struct is not updated. Use a pointer to modify the original: r.URLs[i].Method = strings.ToUpper(r.URLs[i].Method)
| errs = append(errs, fmt.Errorf("invalid Reddit URL %s", q.URL)) | |
| for i := range r.URLs { | |
| r.URLs[i].Method = strings.ToUpper(r.URLs[i].Method) | |
| if r.URLs[i].Method == "" { | |
| r.URLs[i].Method = "GET" | |
| } | |
| if !allowedHttpMethods.Contains(r.URLs[i].Method) { | |
| errs = append(errs, fmt.Errorf("%s is not a valid HTTP method", r.URLs[i].Method)) | |
| } | |
| u, err := url.Parse(r.URLs[i].URL) | |
| if err != nil { | |
| errs = append(errs, fmt.Errorf("%s is not a valid URL", r.URLs[i].URL)) | |
| } else { | |
| if !strings.HasSuffix(u.Host, redditDomainSuffix) { | |
| errs = append(errs, fmt.Errorf("invalid Reddit URL %s", r.URLs[i].URL)) |
| errs = append(errs, fmt.Errorf("%s is not a valid URL", q.URL)) | ||
| } else { | ||
| if !strings.HasSuffix(u.Host, redditDomainSuffix) { | ||
| errs = append(errs, fmt.Errorf("invalid Reddit URL %s", q.URL)) |
Copilot
AI
Aug 22, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to the previous issue, this assignment only affects the local copy. Use r.URLs[i].Method = "GET" to modify the original struct.
| errs = append(errs, fmt.Errorf("invalid Reddit URL %s", q.URL)) | |
| for i := range r.URLs { | |
| r.URLs[i].Method = strings.ToUpper(r.URLs[i].Method) | |
| if r.URLs[i].Method == "" { | |
| r.URLs[i].Method = "GET" | |
| } | |
| if !allowedHttpMethods.Contains(r.URLs[i].Method) { | |
| errs = append(errs, fmt.Errorf("%s is not a valid HTTP method", r.URLs[i].Method)) | |
| } | |
| u, err := url.Parse(r.URLs[i].URL) | |
| if err != nil { | |
| errs = append(errs, fmt.Errorf("%s is not a valid URL", r.URLs[i].URL)) | |
| } else { | |
| if !strings.HasSuffix(u.Host, redditDomainSuffix) { | |
| errs = append(errs, fmt.Errorf("invalid Reddit URL %s", r.URLs[i].URL)) |
| r.MaxUsers = redditDefaultMaxUsers | ||
| r.Sort = redditDefaultSort | ||
|
|
||
| aux := &struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this slightly confuses me, why not unmarshalling on r directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to #7 (comment), this pattern is a go idiom to prevent infinite recursion w/ custom UnmarshalJSON methods... go will call your custom unmarshal method when using json.Unmarshal, which then uses json.Unmarshal...
I'll add an appropriate comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see. OK cool 👍
args/reddit.go
Outdated
| } | ||
| } | ||
|
|
||
| if len(errs) > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be actually safe to call errors.Join instead of appending and checking at the end. errors.Join checks if the error is nil and discards it:
Join returns an error that wraps the given errors. Any nil error values are discarded. Join returns nil if every value in errs is nil. The error formats as the concatenation of the strings obtained by calling the Error method of each element of errs, with a newline between each string.
rapidfix
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just small nits, but overall should be safe to go
No description provided.