Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hotspot analysis doesn't work after merging two Git repositories. #5

Closed
mhenrichsen opened this issue Jul 17, 2017 · 10 comments
Closed

Comments

@mhenrichsen
Copy link

Hi @smontanari,

I've tried this method to merge two repositories without messing with the history.

It seems to work fine, but when I run the hotspot analysis, there is no hotspots - all dots have the same color.

Here is a picture.

Any ideas?

@smontanari
Copy link
Owner

smontanari commented Jul 21, 2017

The symptom seems to indicate that your analysis results do not contain any complexity and/or lines of code metrics (depending on what diagram you're trying to visualise). It could be just a matter of configuring code-forensics the right way. If your code is in an open repo and you let me know which one I can give it a quick try and let you know

@dlehammer
Copy link

dlehammer commented Oct 13, 2017

The symptom described by @mhenrichsen seems to match my experience with git subtree.
I've also attempted to combine multiple git repositories into a single repository with each repository represented by a sub-directory, in-order to perform code-forensic on the full code-base.
I've created an example repo here.
The repo was created as follows (using Git v2.14.2):

$ mkdir code-forensics_git_subtree_issue-5 && cd code-forensics_git_subtree_issue-5

code-forensics_git_subtree_issue-5$ git init

code-forensics_git_subtree_issue-5$ echo "Workaround for fatal: /usr/lib/git-core/git-subtree cannot be used without a working tree." > dummy.txt

code-forensics_git_subtree_issue-5$ git add .

code-forensics_git_subtree_issue-5$ git commit --message="Workaround for fatal: /usr/lib/git-core/git-subtree cannot be used without a working tree."

code-forensics_git_subtree_issue-5$ git subtree add --prefix=grails-cache https://github.com/grails-plugins/grails-cache.git master

code-forensics_git_subtree_issue-5$ git subtree add --prefix=grails-quartz https://github.com/grails-plugins/grails-quartz.git master

Executing git log results in a promising result, my 3 commits + the full history
code-forensics_git_subtree_issue-5_git_log

Unfortunately $ gulp hotspot-analysis --dateFrom=2007-05-07 doesn't provide the expected result.
code-forensics_git_subtree_issue-5_hotspot_analysis

Ie. it seems the number of commits are equal across the repo, how can that be... A visual representation gives a clue.
code-forensics_git_subtree_issue-5_visual_git_log

Hmm, further probing provide the following clue for a file with 100+ commits in the original repository.
code-forensics_git_subtree_issue-5_visual_git_log_for_file

I've tried to research the underlying issue, ie. retrieving the full history via git log, this is the most informative discussion I've found.

Conclusion; this seems to be a git issue.

@dlehammer
Copy link

dlehammer commented Oct 13, 2017

Hotspot analysis works as expected for a single git-repository, in this example grails-quartz.
screenshot from 2017-10-13 15-42-37

@dlehammer
Copy link

dlehammer commented Oct 17, 2017

I couldn't let this issue go, as additional digging revealed there's several approaches for merging git repositories into a single git repository in separate sub-directories without loosing file history. And as described below I'm suspecting there's an issue here and perhaps it's possible to tease out a fix

I've tried several approaches, they all seem to produce full history for git log but the result is seemingly incompatible with code-forensics, for example git-merge-repos, git-stitch-repo, How do you merge two Git repositories? etc..

Common for the above approaches is that code-forensics doesn't produce "Revision churn level" above 1 and the log output changes when run on merged repositories, I've uploaded an git-merge-repos example here.

When executing code-forensics on a single git repository as described in above comment.
The following bolded line is present in the log.

...
Starting 'vcs-log-dump'...
Fetching git log from 2015-01-01 to 2017-10-17
...

When executing code-forensics on a merged git repository, example, the bolded line above is omitted!
And instead the following bolded line is present in the log.

...
Starting 'hotspot-analysis'...
Can't determine weight of collection. Assigning a value of 0 to every item.
...

Regardless, the full file history seems to be present as far as git is concerned

screenshot from 2017-10-17 15-58-37

Utilizing git log on the same example as above, provides the full file history as expected.

screenshot from 2017-10-17 15-59-44

I've tried digging around in the source for code-forensics, and found the following in git_adapter.js and I expected removing/changing the flag would solve this issue.

gitlog_analysis: ['log', '--all', '--numstat', '--date=short', '--no-renames', ...

Unfortunately my limited experience with node.js has blocked attempts to determine why this line is only executed for single git repository.

@dlehammer
Copy link

dlehammer commented Oct 17, 2017

In-order to support reproducability, the git-merge-repos example was created using the following steps:

  1. clone repositories
~/tmp$ git clone --mirror https://github.com/grails-plugins/grails-cache.git

~/tmp$ git clone --mirror https://github.com/grails-plugins/grails-quartz.git
  1. move content into sub-directory
~/tmp/grails-cache.git$ git filter-branch --index-filter \
  'tab=$(printf "\t") && git ls-files -s --error-unmatch . >/dev/null 2>&1; [ $? != 0 ] || (git ls-files -s | sed "s~$tab\"*~&grails-cache/~" | GIT_INDEX_FILE=$GIT_INDEX_FILE.new git update-index --index-info && mv "$GIT_INDEX_FILE.new" "$GIT_INDEX_FILE")' \
  --tag-name-filter cat \
  -- --all

~/tmp/grails-quartz.git$ git filter-branch --index-filter \
  'tab=$(printf "\t") && git ls-files -s --error-unmatch . >/dev/null 2>&1; [ $? != 0 ] || (git ls-files -s | sed "s~$tab\"*~&grails-quartz/~" | GIT_INDEX_FILE=$GIT_INDEX_FILE.new git update-index --index-info && mv "$GIT_INDEX_FILE.new" "$GIT_INDEX_FILE")' \
  --tag-name-filter cat \
  -- --all
  1. merge repositories
~/tmp$ git clone https://github.com/robinst/git-merge-repos.git

~/tmp/git-merge-repos$ ./run.sh /home/dleh/tmp/grails-cache.git:. /home/dleh/tmp/grails-quartz.git:.
Started merging 2 repositories into one, output directory: /home/dleh/tmp/git-merge-repos/merged-repo
...
Done, took 4437 ms
Merged repository: /home/dleh/tmp/git-merge-repos/merged-repo
dleh@nine-deh:~/tmp/git-merge-repos$

The resulting merged repository can be found in ~/tmp/git-merge-repos/merged-repo, see also example.

@smontanari
Copy link
Owner

smontanari commented Oct 23, 2017

thank you @dlehammer for the extensive report on your attempts to dig into this.

code-forensics needs to produce a git log in a particular format in order to be parseable by code-maat. the code in the module git_adapter.js simply wraps git commands and streams the output back to the program so it can be parsed accordingly.
In particular, the repo log/history information necessary for many of the analyses is retrieved through a git log command with the parameters you identified already, i.e.:

git log --all --numstat --date=short --no-renames --pretty=format='--%h--%ad--%an'

Have you tried to manually run this command in your merged git repo? Is the output of this command different from when it's executed on a normal repository?

@dlehammer
Copy link

dlehammer commented Jan 4, 2018

git log --all --numstat --date=short --no-renames --pretty=format='--%h--%ad--%an'

Have you tried to manually run this command in your merged git repo? Is the output of this command different from when it's executed on a normal repository?

As far as I can tell, the output format is identical between repositories and both produce output.
git log ... diff

My main suspect is still that the command isn't executed because some error blocks the flow at an earlier stage, as described in above comment.

When executing code-forensics on a single git repository as described in above comment.
The following bolded line is present in the log.

...
Starting 'vcs-log-dump'...
Fetching git log from 2015-01-01 to 2017-10-17
...

When executing code-forensics on a merged git repository, example, the bolded line above is omitted!
And instead the following bolded line is present in the log.

...
Starting 'hotspot-analysis'...
Can't determine weight of collection. Assigning a value of 0 to every item.
...

@smontanari
Copy link
Owner

smontanari commented Jan 5, 2018

@dlehammer I cloned your example repo and I had no problem running the hotspot-analysis
ha-screenshot

This is an extract of the output

[15:02:52] Using gulpfile ~/temp/merged-git_code_forensics/gulpfile.js
[15:02:52] Starting 'sloc-report'...
...
[15:02:52] Finished 'sloc-report' after 172 ms
[15:02:52] Starting 'code-stats-reports'...
[15:02:52] Finished 'code-stats-reports' after 58 μs
[15:02:52] Created: vcslog_normalised_2015-01-01_2017-10-17.log
[15:02:52] Finished 'vcs-log-dump' after 113 ms
[15:02:52] Starting 'revisions-report'...
[15:02:56] Finished 'revisions-report' after 3.84 s
[15:02:56] Starting 'hotspot-analysis'...
[15:02:56] Generating report file 2015-01-01_2017-10-17_revisions-hotspot-data.json
[15:02:56] Open the following link to see the results:
[15:02:56] http://localhost:3000/index.html?reportId=7b228da598b9213fc610a513f7be9f3e2da49fbe
[15:02:56] Finished 'hotspot-analysis' after 23 ms

I also checked the vcs log file produced and they look ok. Maybe you're missing something in your setup?
I suggest you try to execute the analysis with the COMMAND_DEBUG=1 env variable and maybe the more verbose output would show more information

@dlehammer
Copy link

Well, this is a bit unsettling, but I'm able to run the analysis successfully for the example repo - just like you now.

I suspect something's changed in my environment since last time the symptom manifested itself, but I haven't taken any active steps in this regard myself - hence I'm unable to tell what's affected the outcome.

This is my current setup, as best as I can gather:

  • Ubuntu 16.04.3 LTS (64-bit)
  • git v2.15.1
  • nodejs v4.2.6
  • npm v3.5.2
  • Oracle JDK v1.8.0_141-b15
  • code-forensics v0.14.0

Thank you for your patience 👍

@smontanari
Copy link
Owner

No worries, I'll close the issue for now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants