Skip to content

How to make a database of a very large codebase very quickly? #8342

@Shivam60

Description

@Shivam60

Description of the issue
I have close 10K small projects. All these projects have individual CSS/HTML/JS files. Often many times I want to scan the entire catalog for some query. The catalog also keeps changing with new addition/modification to packages

My requirements are to scan the entire catalog on demand in the most efficient manner possible.

I have explored 2 approaches

  1. Upload all packages as individual databases to LGTM and scan from there. The challenge here is that it is very slow to scan entire catalog. I have 9 query workers and it takes 7 hours to scan.

  2. Create a big LGTM database and scan over this only on demand. The challenge here is that combined all these packages are around 27GB. How do I create a LGTM database in the minimum time possible?

I am using a Standard_F32s_v2 VM on azure and it is still taking lot of time (at least 5 hours so far) to run. This is a compute optimized VM.
What should be my VM configuration ? Is more power better? Should I go for a memory optimized or compute optimized ?
Are there any more command parameters than I can specify to speed this up. I currently use threads=0 while creating a database though I don't know its significance.

Any other inputs would be highly appreciated.

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions