Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: deduplicate bucket names when ingesting from file #207

Merged
merged 4 commits into from
Sep 19, 2023

Conversation

lavafroth
Copy link
Contributor

Fixes #156

Changes

  • Adds a string hashset (empty struct map) to keep track of bucket names that are already seen
  • Continues the file scanning loop if a bucket name is seen
  • Sends the bucket to the channel and adds it to the set of seen bucket names otherwise

Copy link
Owner

@sa7mon sa7mon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work! Can you add a unit test for this method to verify duplicate buckets don't get scanned? Finding a way to do it without adding test files would be ideal, but if the only clean way is to add a test_buckets.txt file to the module then so be it.

@lavafroth
Copy link
Contributor Author

I'll create a helper function that does this using an io.Reader so that we can test that with a strings.NewReader for a test string.

Copy link
Owner

@sa7mon sa7mon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last thing: README.md should be updated as well. Under the -bucket-file section after the example file code block add:

Bucket names listed multiple times will only be scanned once.

@sa7mon
Copy link
Owner

sa7mon commented Sep 19, 2023

Thanks for the contribution!

@sa7mon sa7mon merged commit 611a452 into sa7mon:main Sep 19, 2023
7 of 8 checks passed
@lavafroth lavafroth deleted the unique-bucket-names branch September 20, 2023 02:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

De-dupe bucket list when ingesting bucket file
2 participants