Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CHANGE] Investigate ways of optimizing the NSRL database download #46

Closed
ross-spencer opened this issue Mar 31, 2024 · 2 comments · Fixed by #52
Closed

[CHANGE] Investigate ways of optimizing the NSRL database download #46

ross-spencer opened this issue Mar 31, 2024 · 2 comments · Fixed by #52
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@ross-spencer
Copy link

ross-spencer commented Mar 31, 2024

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

The NSRL database is currently 4gb+ and takes over an hour to download. I have investigated using a waitgroup to download chunks in multiple goroutines here: ross-spencer#1 <-- this takes between 15 and 30 minutes off the download. That being said it can still take up to an hour to get this file.

Describe the solution you'd like
A clear and concise description of what you want to happen.

One option is to consider merging ross-spencer#1.

Another is to consider compressing the bolt db as it compresses fairly efficiently:

4.0G	nsrl.db
1.3G	nsrl.tar.xz

This would also reduce download times, but would require a decompression function client side. It might still be complimented by other download options such as https://github.com/melbahja/got.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Also considered was enabling gzip compression in nginx for application/octet-stream but this would prevent chunking (i believe). It may simply be quicker to host a compressed file and allow it to be extracted by the app.

@ross-spencer ross-spencer added the enhancement New feature or request label Mar 31, 2024
@steffenfritz steffenfritz added this to the v1.0.0 milestone Mar 31, 2024
@steffenfritz
Copy link
Owner

Thanks for the pull request!

Thinking about your proposed ways to optimize the download I like the compressed file more as it reduces the size on the wire.

  1. The compression takes time only when creating a new version from RDS. So no impact for users.
  2. The decompression takes time after the download with ftrove once but it was faster more than 4 times than the download.

So, if I don't miss something gzip would be suitable:

1,5G  1 Apr 15:22 nsrl.db.gz
4,0G  1 Apr 15:22 nsrl.db

@ross-spencer
Copy link
Author

That seems like a pretty good compression rate!

@steffenfritz steffenfritz linked a pull request Apr 10, 2024 that will close this issue
steffenfritz added a commit that referenced this issue Apr 10, 2024
…f-optimizing-the-nsrl-database-download

Closing #46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants