Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gitlint slowdown inside docker #129

Closed
melg8 opened this issue Apr 19, 2020 · 2 comments
Closed

gitlint slowdown inside docker #129

melg8 opened this issue Apr 19, 2020 · 2 comments
Labels
bug User-facing bugs docs Documentation related items

Comments

@melg8
Copy link

melg8 commented Apr 19, 2020

Problem

When using gitlint from docker container (either jorisroovers/gitlint or custom) it's speed drops in orders of magnitude. I've tried to lint my own repo in ci, using docker container, and encountered slowdown for 30 second, than i've tested same command using my host OS - and saw results of linting right away.

Expected Behavior

Fast linting of repository git history tree from docker container.

Actual Behavior

Super slow linting in docker container.

Steps to Reproduce the Problem

Test running natively in host system

  1. install python,pip,git
  2. pip install gitlint
  3. clone as example your own repo
        git clone https://github.com/jorisroovers/gitlint.git
    
  4. cd gitlint
  5. Run from host:
        time gitlint -s --commits HEAD
    
    Results:
    real 0m4,951s
    user 0m3,709s
    sys 0m2,624s

Test running in docker container

  1. install git, docker
  2. docker pull jorisroovers/gitlint:latest
  3. Run from host:
        time docker run -v $(pwd):/repo jorisroovers/gitlint:latest -s --commits HEAD
    
    Results:
    real 6m31,495s
    user 0m0,056s
    sys 0m0,026s

Specifications

  • Host system: xubuntu 18.04
  • docker -v: Docker version 19.03.8, build afacb8b7f0
  • uname -a: Linux user-computer 4.15.0-96-generic Windows support (experimental) #97-Ubuntu SMP Wed Apr 1 03:25:46 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
--debug flag output from host run: Click me
 gitlint -s --debug --commits HEADDEBUG: gitlint.cli To report issues, please visit https://github.com/jorisroovers/gitlint/issues
DEBUG: gitlint.cli Platform: Linux-4.15.0-96-generic-x86_64-with-Ubuntu-18.04-bionic
DEBUG: gitlint.cli Python version: 3.6.9 (default, Nov  7 2019, 10:44:02) 
[GCC 8.3.0]
DEBUG: gitlint.cli Git version: git version 2.26.1
DEBUG: gitlint.cli Gitlint version: 0.13.1
DEBUG: gitlint.cli GITLINT_USE_SH_LIB: [NOT SET]
DEBUG: gitlint.cli Configuration
config-path: None
[GENERAL]
extra-path: None
contrib: []
ignore: 
ignore-merge-commits: True
ignore-fixup-commits: True
ignore-squash-commits: True
ignore-revert-commits: True
ignore-stdin: False
staged: False
verbosity: 0
debug: True
target: /tmp/gitlint
[RULES]
  I1: ignore-by-title
     ignore=all
     regex=None
  I2: ignore-by-body
     ignore=all
     regex=None
  T1: title-max-length
     line-length=72
  T2: title-trailing-whitespace
  T6: title-leading-whitespace
  T3: title-trailing-punctuation
  T4: title-hard-tab
  T5: title-must-not-contain-word
     words=WIP
  T7: title-match-regex
     regex=.*
  B1: body-max-line-length
     line-length=80
  B5: body-min-length
     min-length=20
  B6: body-is-missing
     ignore-merge-commits=True
  B2: body-trailing-whitespace
  B3: body-hard-tab
  B4: body-first-line-empty
  B7: body-changed-file-mention
     files=
  M1: author-valid-email
     regex=[^@ ]+@[^@ ]+\.[^@ ]+

DEBUG: gitlint.cli No --msg-filename flag, no or empty data passed to stdin. Using the local repo.
DEBUG: gitlint.cli Linting 289 commit(s)
DEBUG: gitlint.lint Linting commit c77d4a1009a8e9b567134b295720f92173911b33
DEBUG: gitlint.lint Commit Object
--- Commit Message ----
Version bump to 0.14.0dev

Bumped version to 0.14.0dev.

--- Meta info ---------
Author: Joris Roovers <joris.roovers@gmail.com>
Date:   2020-02-26 13:59:12 +0100
is-merge-commit:  False
is-fixup-commit:  False
is-squash-commit: False
is-revert-commit: False
Branches: ['master']
Changed Files: ['gitlint/__init__.py']
-----------------------
DEBUG: gitlint.lint Linting commit d3dba4dd1a139d49ae17babbb0b87887d4a7dc61
DEBUG: gitlint.lint Commit Object
--- Commit Message ----
0.13.1 release

- Patch to enable --staged flag for pre-commit.
- Minor doc updates #109

Full Release details in CHANGELOG.md.

--- Meta info ---------
Author: Joris Roovers <joris.roovers@gmail.com>
Date:   2020-02-26 13:39:51 +0100
is-merge-commit:  False
is-fixup-commit:  False
is-squash-commit: False
is-revert-commit: False
Branches: ['master']
Changed Files: ['gitlint/__init__.py']
-----------------------
DEBUG: gitlint.lint Linting commit fadf697f09b4f69703b1fa6ecbfc9d8f6d991b08
DEBUG: gitlint.lint Commit Object
--- Commit Message ----
./run_tests.sh tweaks

Most importantly, a fix to properly deactivate virtualenv.

--- Meta info ---------
Author: Joris Roovers <joris.roovers@gmail.com>
Date:   2020-02-26 12:31:53 +0100
is-merge-commit:  False
is-fixup-commit:  False
is-squash-commit: False
is-revert-commit: False
Branches: ['master']
Changed Files: ['CHANGELOG.md', 'gitlint/__init__.py', 'run_tests.sh']
-----------------------
DEBUG: gitlint.lint Linting commit 4acc477a19b772e8298e5dac72903e684ab215ad
DEBUG: gitlint.lint Commit Object
--- Commit Message ----
Docs: pre-commit invocation details

Minor doc edit outlining how to invoke gitlint with additional arguments
through gitlint.

This fixes #109

--- Meta info ---------
Author: Joris Roovers <joris.roovers@gmail.com>
Date:   2020-02-26 10:01:38 +0100
is-merge-commit:  False
is-fixup-commit:  False
is-squash-commit: False
is-revert-commit: False
Branches: ['master']
Changed Files: ['docs/index.md']
-----------------------
DEBUG: gitlint.lint Linting commit de355864f4aa4b7973058c2203a3cee10fa8ffc8
DEBUG: gitlint.lint Commit Object
--- Commit Message ----
Use tools\windows\run_tests.bat in Github Actions

Previously we directly invoked pytest, using run_tests.bat is cleaner.

--- Meta info ---------
Author: Joris Roovers <joris.roovers@gmail.com>
Date:   2020-02-25 13:28:58 +0100
is-merge-commit:  False
is-fixup-commit:  False
is-squash-commit: False
is-revert-commit: False
Branches: ['master']
Changed Files: ['.github/workflows/checks.yml']
-----------------------
DEBUG: gitlint.lint Linting commit 2471e2f7d2fdd6476b763a46940e3e116a16f7c9
--debug flag output inside docker: Click me
docker run -v $(pwd):/repo jorisroovers/gitlint:latest -s --debug --commits HEAD
DEBUG: gitlint.cli To report issues, please visit https://github.com/jorisroovers/gitlint/issues
DEBUG: gitlint.cli Platform: Linux-4.15.0-96-generic-x86_64-with
DEBUG: gitlint.cli Python version: 3.8.1 (default, Jan 18 2020, 02:42:17) 
[GCC 9.2.0]
DEBUG: gitlint.cli Git version: git version 2.24.1
DEBUG: gitlint.cli Gitlint version: 0.13.1
DEBUG: gitlint.cli GITLINT_USE_SH_LIB: [NOT SET]
DEBUG: gitlint.cli Configuration
config-path: None
[GENERAL]
extra-path: None
contrib: []
ignore: 
ignore-merge-commits: True
ignore-fixup-commits: True
ignore-squash-commits: True
ignore-revert-commits: True
ignore-stdin: False
staged: False
verbosity: 0
debug: True
target: /repo
[RULES]
  I1: ignore-by-title
     ignore=all
     regex=None
  I2: ignore-by-body
     ignore=all
     regex=None
  T1: title-max-length
     line-length=72
  T2: title-trailing-whitespace
  T6: title-leading-whitespace
  T3: title-trailing-punctuation
  T4: title-hard-tab
  T5: title-must-not-contain-word
     words=WIP
  T7: title-match-regex
     regex=.*
  B1: body-max-line-length
     line-length=80
  B5: body-min-length
     min-length=20
  B6: body-is-missing
     ignore-merge-commits=True
  B2: body-trailing-whitespace
  B3: body-hard-tab
  B4: body-first-line-empty
  B7: body-changed-file-mention
     files=
  M1: author-valid-email
     regex=[^@ ]+@[^@ ]+\.[^@ ]+

DEBUG: gitlint.cli No --msg-filename flag, no or empty data passed to stdin. Using the local repo.
DEBUG: gitlint.cli Linting 289 commit(s)
DEBUG: gitlint.lint Linting commit c77d4a1009a8e9b567134b295720f92173911b33
DEBUG: gitlint.lint Commit Object
--- Commit Message ----
Version bump to 0.14.0dev

Bumped version to 0.14.0dev.

--- Meta info ---------
Author: Joris Roovers <joris.roovers@gmail.com>
Date:   2020-02-26 13:59:12 +0100
is-merge-commit:  False
is-fixup-commit:  False
is-squash-commit: False
is-revert-commit: False
Branches: ['master']
Changed Files: ['gitlint/__init__.py']
-----------------------
DEBUG: gitlint.lint Linting commit d3dba4dd1a139d49ae17babbb0b87887d4a7dc61
DEBUG: gitlint.lint Commit Object
--- Commit Message ----
0.13.1 release

- Patch to enable --staged flag for pre-commit.
- Minor doc updates #109

Full Release details in CHANGELOG.md.

--- Meta info ---------
Author: Joris Roovers <joris.roovers@gmail.com>
Date:   2020-02-26 13:39:51 +0100
is-merge-commit:  False
is-fixup-commit:  False
is-squash-commit: False
is-revert-commit: False
Branches: ['master']
Changed Files: ['gitlint/__init__.py']
-----------------------
DEBUG: gitlint.lint Linting commit fadf697f09b4f69703b1fa6ecbfc9d8f6d991b08
DEBUG: gitlint.lint Commit Object
--- Commit Message ----
./run_tests.sh tweaks

Most importantly, a fix to properly deactivate virtualenv.

--- Meta info ---------
Author: Joris Roovers <joris.roovers@gmail.com>
Date:   2020-02-26 12:31:53 +0100
is-merge-commit:  False
is-fixup-commit:  False
is-squash-commit: False
is-revert-commit: False
Branches: ['master']
Changed Files: ['CHANGELOG.md', 'gitlint/__init__.py', 'run_tests.sh']
-----------------------
DEBUG: gitlint.lint Linting commit 4acc477a19b772e8298e5dac72903e684ab215ad
DEBUG: gitlint.lint Commit Object
--- Commit Message ----
Docs: pre-commit invocation details

Minor doc edit outlining how to invoke gitlint with additional arguments
through gitlint.

This fixes #109

--- Meta info ---------
Author: Joris Roovers <joris.roovers@gmail.com>
Date:   2020-02-26 10:01:38 +0100
is-merge-commit:  False
is-fixup-commit:  False
is-squash-commit: False
is-revert-commit: False
Branches: ['master']
Changed Files: ['docs/index.md']
-----------------------
DEBUG: gitlint.lint Linting commit de355864f4aa4b7973058c2203a3cee10fa8ffc8
DEBUG: gitlint.lint Commit Object
--- Commit Message ----
Use tools\windows\run_tests.bat in Github Actions

Previously we directly invoked pytest, using run_tests.bat is cleaner.

--- Meta info ---------
Author: Joris Roovers <joris.roovers@gmail.com>
Date:   2020-02-25 13:28:58 +0100
is-merge-commit:  False
is-fixup-commit:  False
is-squash-commit: False
is-revert-commit: False
Branches: ['master']
Changed Files: ['.github/workflows/checks.yml']
-----------------------
DEBUG: gitlint.lint Linting commit 2471e2f7d2fdd6476b763a46940e3e116a16f7c9
^C
Aborted!

Hints

I'm not a python programmer, so this maybe misleading but, what i've tested so far:
Problem exists even if:

  • rebuild my own docker image
  • rebuild docker with ubuntu base
  • rebuild docker with python of different version 2.7, 3.5, 3.8 (base image python:2.7-alpine for example)
  • rebuild docker with different git version
  • install gitlint from repo using setup.py
  • i've tried to profile gitlint using cProfile as such:
 python -m cProfile ./gitlint/cli.py -s --commits HEAD

and found that big difference is between this calls:
native 4 seconds :
10207/7710 0.008 0.000 4.859 0.001 cache.py:7(_try_cache)
vs in docker 394 seconds:
10207/7710 0.008 0.000 394.278 0.051 cache.py:7(_try_cache)

maybe this can give you a hint.

Ps

  • If you can suggest some hotfix/workarounds of this problem, without drop of docker or full git history check - it will be much appreciated.
  • If you need more detailed info or some tests, i'll try to give it.
  • Anyhow, thanks for your work! gitlint is a great tool which helps to keep git history nice and clean.
@melg8
Copy link
Author

melg8 commented Apr 19, 2020

Update

Story:

I've searched cProfile outputs some more, and found that cal of _git is more likely to be source of a trouble.

870    0.022    0.000  394.980    0.454 git.py:28(_git)

Than i found that _git uses sh call and this call is repeated multiple times.
I've searched in google something like: "docker python sh git slow"
And found this great issue
So problem is in basically in "spawning PTY processes is many times slower on Docker".

Workaround

I've tried workaround from this post and it's seems to work for me!

time docker run --ulimit nofile=1024 -v $(pwd):/repo jorisroovers/gitlint:latest -s --commits HEAD

Result:
real 0m9,539s
user 0m0,038s
sys 0m0,018s

But i think it still should be further investigated for regression of tool behavior.
Maybe for now it will be nice to add at least some text comments in DockerFile - for new users, so they could avoid this problem.

@jorisroovers
Copy link
Owner

Hi!

Thanks for the elaborate bug report!

I was able to reproduce this

# Native MacOS, MBP 2019 (2.6 GHz 6-Core Intel Core i7), gitlint 0.13.1
$ time gitlint -s --commits HEAD
gitlint -s --commits HEAD  5.66s user 7.40s system 101% cpu 12.835 total

# Docker on MacOS, same system
# Note that docker on Mac will always be slower since it runs in a VM
$ time docker run -v $(pwd):/repo jorisroovers/gitlint:latest -s --commits HEAD
docker run -v $(pwd):/repo jorisroovers/gitlint:latest -s --commits HEAD  0.05s user 0.03s system 0% cpu 8:33.18 total

$ time docker run --ulimit nofile=1024 -v $(pwd):/repo jorisroovers/gitlint:latest -s --commits HEAD
docker run --ulimit nofile=1024 -v $(pwd):/repo jorisroovers/gitlint:latest -  0.03s user 0.07s system 0% cpu 1:17.45 total

So ~8.5 min vs ~1.5min using your suggested workaround. I repeated this a few times and got similar results each time.

I'll make sure to update the docs and Dockerfile the next time I work on gitlint and before closing this out :-)

@jorisroovers jorisroovers added bug User-facing bugs docs Documentation related items labels May 8, 2020
jorisroovers added a commit that referenced this issue Oct 24, 2020
- IMPORTANT: Gitlint 0.14.x will be the last gitlint release to support Python
  2.7 and Python 3.5, as both are EOL which makes it difficult to keep
  supporting them.
- Python 3.9 support
- New Rule: title-min-length enforces a minimum length on titles
  (default: 5 chars) (#138)
- New Rule: body-match-regex allows users to enforce that the commit-msg body
  matches a given regex (#130)
- New Rule: ignore-body-lines allows users to ignore parts of a commit by
  matching a regex against the lines in a commit message body (#126)
- Named Rules allow users to have multiple instances of the same rule active at
  the same time. This is useful when you want to enforce the same rule multiple
  times but with different options (#113, #66)
- User-defined Configuration Rules allow users to dynamically change gitlint's
  configuration and/or the commit before any other rules are applied.
- The commit-msg hook has been re-written in Python (it contained a lot of
  Bash before), fixing a number of platform specific issues. Existing users
  will need to reinstall their hooks
  (gitlint uninstall-hook; gitlint install-hook) to make use of this.
- Most general options can now be set through environment variables (e.g. set
  the general.ignore option via GITLINT_IGNORE=T1,T2). The list of available
  environment variables can be found in the configuration documentation.
- Users can now use self.log.debug("my message") for debugging purposes in
  their user-defined rules. Debug messages will show up when running
  gitlint --debug.
- Breaking: User-defined rule id's can no longer start with 'I', as those are
  reserved for built-in gitlint ignore rules.
- New RegexOption rule option type for use in user-defined rules. By using the
  RegexOption, regular expressions are pre-validated at gitlint startup and
  compiled only once which is much more efficient when linting multiple commits.
- Bugfixes:
 - Improved UTF-8 fallback on Windows (ongoing - #96)
 - Windows users can now use the 'edit' function of the commit-msg hook (#94)
 - Doc update: Users should use --ulimit nofile=1024 when invoking gitlint
   using Docker (#129)
 - The commit-msg hook was broken in Ubuntu's gitlint package due to a
   python/python3 mismatch (#127)
 - Better error message when no git username is set (#149)
 - Options can now actually be set to None (from code) to make them optional.
 - Ignore rules no longer have "None" as default regex, but an empty regex -
   effectively disabling them by default (as intended).
- Contrib Rules:
 - Added 'ci' and 'build' to conventional commit types (#135)
- Under-the-hood: minor performance improvements (removed some unnecessary
  regex matching), test improvements, improved debug logging, CI runs on pull
  requests, PR request template.

Full Release details in CHANGELOG.md.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug User-facing bugs docs Documentation related items
Projects
None yet
Development

No branches or pull requests

2 participants