Skip to content
Browse files

Fixes some issues parsing gitweb, should make it work on 1.7.9

Eventually someone should rewrite this to parse the atom or rss feed
on gitweb, but in the meantime this should work.
  • Loading branch information...
1 parent 78850bd commit d953a6088bd67c42171e5fdbea56bdc485d1c480 @jportoles jportoles committed Mar 29, 2012
Showing with 4 additions and 5 deletions.
  1. +4 −5 SourceGitweb/SourceGitweb.php
View
9 SourceGitweb/SourceGitweb.php
@@ -272,8 +272,7 @@ private function commit_changeset( $p_repo, $p_input, $p_branch='' ) {
if ( !SourceChangeset::exists( $p_repo->id, $t_commit['revision'] ) ) {
# Parse for commit data
- preg_match( '#<tr><td>author</td><td>(?:<a[^>]*>)?([^<>]*)(?:</a>)? *(?:<a[^>]*>)?<([^<>]*)>(?:</a>)?</td>(?:<[^<>]*>\s*)*?</tr>'.
- '<tr><td></td><td> \w*, (\d* \w* \d* \d*:\d*:\d*)#', $t_gitweb_data, $t_matches );
+ preg_match( '#<tr><td>author</td><td>(?:<a[^>]*>)?([^<>]*)(?:</a>)? *(?:<a[^>]*>)?<([^<>]*)>(?:</a>)?</td>(?:<[^<>]*>\s*)*?</tr>\n<tr><td></td><td><span class="datetime">\w*, (\d* \w* \d* \d*:\d*:\d*)#', $t_gitweb_data, $t_matches );
$t_commit['author'] = $t_matches[1];
$t_commit['author_email'] = $t_matches[2];
$t_commit['date'] = date( 'Y-m-d H:i:s', strtotime( $t_matches[3] ) );
@@ -290,7 +289,7 @@ private function commit_changeset( $p_repo, $p_input, $p_branch='' ) {
}
}
- preg_match( '#<div class="page_body">(.*)</div>#', $t_gitweb_data, $t_matches );
+ preg_match( '#<div class="page_body">\n(.*)\n</div>#', $t_gitweb_data, $t_matches );
$t_commit['message'] = trim( str_replace( '<br/>', PHP_EOL, $t_matches[1] ) );
# Strip ref links and signoff spans from commit message
@@ -302,8 +301,8 @@ private function commit_changeset( $p_repo, $p_input, $p_branch='' ) {
# Parse for changed file data
$t_commit['files'] = array();
- preg_match_all( '#<tr class="(?:light|dark)"><td><a class="list" href="[^"]*;h=(\w+);[^"]*">([^<>]+)</a></td>'.
- '<td>(?:<span class="file_status (\w+)">[^<>]*</span>)?</td>#',
+ preg_match_all( '#<tr class="(?:light|dark)">\n<td><a class="list" href="[^"]*;h=(\w+);[^"]*">([^<>]+)</a></td>'.
+ '\n<td>(?:<span class="file_status (\w+)">[^<>]*</span>)?</td>#',
$t_gitweb_files, $t_matches, PREG_SET_ORDER );
foreach( $t_matches as $t_file_matches ) {

7 comments on commit d953a60

@ashumkin

Excuse me, did you test it?
Last two changes (that add "\n" to regexp) break parsing of commit message and changed files list (tested on gitweb 1.7.7 and 1.7.11).
Take a look at line 241 https://github.com/mantisbt-plugins/source-integration/blob/d953a6088bd67c42171e5fdbea56bdc485d1c480/SourceGitweb/SourceGitweb.php#L241 - it removes any newline from sourcecode

@jportoles

I'm using it actively with gitweb 1.7.9 and it works like a charm. The problem with this parser is that it's impossible to make it work with every version of Git; if you fix it for 1.7.7 it will break with 1.7.9 and so on. But if you have a patch for a more recent version go ahead and make a pull request.

@ashumkin

The main line here is 241 - it removes all EOLs from page content. How they can affect regexp then?

@jportoles

Ah, I see what you mean. Perhaps I didn't notice/test that part then, would be happy to see a fix if you can write it.

@jportoles

Most likely it was working on windows (which is the environment I tested it on) because PHP_EOL did not remove individual line breaks without carriage, thus some line breaks were left. The best fix is probably to remove the line jumps from the regexp, and use both \n and \r\n instead of PHP_EOL with str_replace in the line 241 so that it works on both windows and linux.

@ashumkin

Oh, do you use gitweb on Windows? It's clear then to me. I use gitweb on Linux.

@jportoles

It will break on windows unless that str_replace is changed though, but otherwise yes, I see your point.

Please sign in to comment.
Something went wrong with that request. Please try again.