Skip to content

Query with REGEX expressions is consuming too much memory #775

@ajnavarro

Description

@ajnavarro

The following query:

SELECT f.repository_id, f.blob_hash, f.commit_hash, f.file_path
      FROM (
          SELECT blob_hash, repository_id
          FROM blobs
          WHERE NOT IS_BINARY(blob_content) AND (
              blob_content REGEXP '(?i)facebook.*[\'\\"][0-9a-f]{32}[\'\\"]'
              OR blob_content REGEXP '(?i)twitter.*[\'\\"][0-9a-zA-Z]{35,44}[\'\\"]'
              OR blob_content REGEXP '(?i)github.*[\'\\"][0-9a-zA-Z]{35,40}[\'\\"]'
              OR blob_content REGEXP 'AKIA[0-9A-Z]{16}'
              OR blob_content REGEXP '(?i)heroku.*[0-9A-F]{8}-[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{12}'
              OR blob_content REGEXP '.*-----BEGIN ((RSA|DSA|OPENSSH|SSH2|EC) )?PRIVATE KEY-----.*'
          )
      ) h
      INNER JOIN commit_files f
      ON h.blob_hash = f.blob_hash
          AND h.repository_id = f.repository_id
          AND f.file_path NOT REGEXP '^vendor.*'

Is consuming 9 times more memory than previous v0.19.0 version

We should have a look into that.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingperformancePerformance improvements

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions