Skip to content

Commit

Permalink
Refactored fix.
Browse files Browse the repository at this point in the history
The original fix in the previous commit didn't work for articles
that contained a lot of tags. The fix was modified to go through
ALL the tags BEFORE the article body. Only after finding the start
of the article do we start assembling the text for the snippet.
  • Loading branch information
rickumali committed Jan 12, 2013
1 parent 0fa9e1d commit fe4e8f9
Showing 1 changed file with 18 additions and 12 deletions.
30 changes: 18 additions & 12 deletions process_tech_feed.pl
Original file line number Diff line number Diff line change
Expand Up @@ -161,32 +161,38 @@ ()

my $tag;
my $text;
my $found_body = 0;

# The description element contains lots of HTML (specifically,
# lots of Drupal <divs>). This code reads in the first few HTML tags,
# storing their text into the $text variable.
for (my $i = 0; $i < 20; $i++) {
# We first have to find the article text, skipping past
# all the tags until we find the DIV with the 'field-name-body'
# in the class attribute.
my $found_body = 0;
while (!$found_body) {
$tag = $text_stream->get_tag();
if ($opt_debug) {
print "TAG FOUND: $tag->[3]\n";
}
# This could examines each tag, and checks whether the
# This code examines each tag, and checks whether the
# Drupal DIV class 'field-name-body' is present. As soon
# as we find the first one, THEN we start collecting the
# text. We do this because until we see the 'field-name-body'
# most of the text are just the tags, and we don't want
# to display that text.
# as we find the first one, THEN we can start collecting the
# text.
if (!$found_body) {
if ($tag->[3] =~ /.* field-name-body .*/) {
$found_body = 1;
print "FIELD-BODY FOUND\n" if $opt_debug;
}
} else {
$text .= $text_stream->get_phrase();
}
}

# Now that we've found the actual article text (in body)
# let's just read the first few tags.
for (my $i = 0; $i < 20; $i++) {
$tag = $text_stream->get_tag();
if ($opt_debug) {
print "TAG FOUND: $tag->[3]\n";
}
$text .= $text_stream->get_phrase();
}

if (length($text) < $min_chars) {
return($text);
} else {
Expand Down

0 comments on commit fe4e8f9

Please sign in to comment.