Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce size of large commit messages #21

Open
rtyley opened this issue Jul 15, 2013 · 3 comments
Open

Reduce size of large commit messages #21

rtyley opened this issue Jul 15, 2013 · 3 comments

Comments

@rtyley
Copy link
Owner

rtyley commented Jul 15, 2013

@chenri mentions in #20 that his repo contains a very large commit message:

due to someone repeating a short line a huge number of times (editing error?) so that a single msg line was 6Mb long. We never detected this until we tried bfg last week.

It would be nice if The BFG had the ability to somehow reduce the size of large commit messages - but how exactly would it do it? Some options:

  • Simply truncate the entire message after the first X KB?
  • Allow user to run a --replace-message-text option, similar to --replace-text?? This would work for a repeated value on a single line, so long as the line wasn't too long, but how would it work on a commit message with 1 million lines, each distinct?
@chenri
Copy link

chenri commented Jul 15, 2013

If the option accepts perl-compatible regex, then this is possible and simple:

--replace-message-text 'qr{^.{1000000,1000000000}$}s'

the above will match msg lines that are > 1Mb and < 1Gb in size.
Note that we found another instance where a single short msg line has been
repeated millions of times.

Thanks

Richard

On Mon, Jul 15, 2013 at 12:29:33AM -0700, Roberto Tyley wrote:

@chenri mentions in #20 that his repo contains a very large commit message:

due to someone repeating a short line a huge number of times (editing
error?) so that a single msg line was 6Mb long. We never detected this
until we tried bfg last week.

It would be nice if The BFG had the ability to somehow reduce the size of large
commit messages - but how exactly would it do it? Some options:

• Simply truncate the entire message after the first X KB?
• Allow user to run a --replace-message-text option, similar to
--replace-text?? This would work for a repeated value on a single line, so
long as the line wasn't too long, but how would it work on a commit message
with 1 million lines, each distinct?


Reply to this email directly or view it on GitHub.*

@rtyley
Copy link
Owner Author

rtyley commented Jul 16, 2013

The BFG uses regex expressions provided by the Java SDK, which are quite extensive:

http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html

I'll hopefully get some time in the next 24 hours to look at this.

@chenri
Copy link

chenri commented Jul 26, 2013

Hi Roberto,

Did you get a chance to work on this? It would be really

nice if bfg-repo-cleaner can handle huge msg size.

Thanks

Richard

On Tue, Jul 16, 2013 at 02:09:46AM -0700, Roberto Tyley wrote:

The BFG uses regex expressions provided by the Java SDK, which are quite
extensive:

http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html

I'll hopefully get some time in the next 24 hours to look at this.


Reply to this email directly or view it on GitHub.*

rtyley added a commit that referenced this issue Sep 2, 2013
...need to add test against tag messages
rtyley added a commit that referenced this issue Sep 2, 2013
The most unusual case to be handled is when... the only thing that needs
to be cleaned in a repo is the annotated tag message text itself.
rtyley added a commit that referenced this issue Sep 15, 2013
...need to add test against tag messages
rtyley added a commit that referenced this issue Sep 15, 2013
The most unusual case to be handled is when... the only thing that needs
to be cleaned in a repo is the annotated tag message text itself.
rtyley added a commit that referenced this issue Sep 15, 2013
The most unusual case to be handled is when... the only thing that needs
to be cleaned in a repo is the annotated tag message text itself.

Fixing -rt switch to -rmt
rtyley added a commit that referenced this issue Nov 18, 2013
Still some scruff in here...

Chunky refactor unifying the cleaning of commit/tag message text.

The most unusual case to be handled is when the /only/ thing that needs
to be cleaned in a repo is annotated tag message text.
rtyley added a commit that referenced this issue Nov 18, 2013
Still some scruff in here...

Chunky refactor unifying the cleaning of commit/tag message text.

The most unusual case to be handled is when the /only/ thing that needs
to be cleaned in a repo is annotated tag message text.
rtyley added a commit that referenced this issue Nov 22, 2013
Still some scruff in here...

Chunky refactor unifying the cleaning of commit/tag message text.

The most unusual case to be handled is when the /only/ thing that needs
to be cleaned in a repo is annotated tag message text.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants