How to make a database of a very large codebase very quickly?

**Description of the issue**
I have close 10K small projects. All these projects have individual CSS/HTML/JS files. Often many times I want to scan the entire catalog for some query. The catalog also keeps changing with new addition/modification to packages

My requirements are to scan the entire catalog on demand in the most efficient manner possible.

I have explored 2 approaches

1. Upload all packages as individual databases to LGTM and scan from there. The challenge here is that it is very slow to scan entire catalog. I have 9 query workers and it takes 7 hours to scan.

2. Create a big LGTM database and scan over this only on demand. The challenge here is that combined all these packages are around 27GB. How do I create a LGTM database in the minimum time possible?

I am using a Standard_F32s_v2 VM on azure and it is still taking lot of time (at least 5 hours so far) to run. This is a compute optimized VM. 
What should be my VM configuration ? Is more power better? Should I go for a memory optimized or compute optimized ?
Are there any more command parameters than I can specify to speed this up. I currently use `threads=0` while creating a database though I don't know its significance.


Any other inputs would be highly appreciated.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to make a database of a very large codebase very quickly? #8342

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

How to make a database of a very large codebase very quickly? #8342

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions