-
Notifications
You must be signed in to change notification settings - Fork 1.9k
How to make a database of a very large codebase very quickly? #8342
Description
Description of the issue
I have close 10K small projects. All these projects have individual CSS/HTML/JS files. Often many times I want to scan the entire catalog for some query. The catalog also keeps changing with new addition/modification to packages
My requirements are to scan the entire catalog on demand in the most efficient manner possible.
I have explored 2 approaches
-
Upload all packages as individual databases to LGTM and scan from there. The challenge here is that it is very slow to scan entire catalog. I have 9 query workers and it takes 7 hours to scan.
-
Create a big LGTM database and scan over this only on demand. The challenge here is that combined all these packages are around 27GB. How do I create a LGTM database in the minimum time possible?
I am using a Standard_F32s_v2 VM on azure and it is still taking lot of time (at least 5 hours so far) to run. This is a compute optimized VM.
What should be my VM configuration ? Is more power better? Should I go for a memory optimized or compute optimized ?
Are there any more command parameters than I can specify to speed this up. I currently use threads=0 while creating a database though I don't know its significance.
Any other inputs would be highly appreciated.