Skip to content

Commit

Permalink
Item12322: Item5492: Item5955: essential feature add, to support the …
Browse files Browse the repository at this point in the history
…filtering of certain attributes that otherwise block TML conversion of tags. For example, at the moment if a P tag has a name attribute, that attribute will block conversion of the P back to a TML para. This is a major problem when importing HTML from external source, so I have fixed it. There is no functional impact on the WysiwygEdit, unless the user chooses to enable the WYSIWYGPLUGIN_IGNOREATTRS preference (which is null by default). Added some chunks of doc while I was in there.
  • Loading branch information
Comment committed Jan 20, 2015
1 parent c634ad1 commit 91888e7
Show file tree
Hide file tree
Showing 6 changed files with 272 additions and 94 deletions.
2 changes: 1 addition & 1 deletion WysiwygPlugin/data/System/WysiwygPlugin.txt
Expand Up @@ -40,7 +40,7 @@ There is also the advantage that the translator can be used to *import* HTML fro

Both translators can be used directly from Perl scripts, for example to build your own stand-alone translators.

A stand-alone convertor script for HTML to TML is included in the installation. It can be found in =tools/html2tml.pl=.
A stand-alone convertor script for HTML to TML is included in the installation. It can be found in =tools/html2tml.pl=. Run it with a =--help= parameter to find out how to use it.

---+++ Integrating a HTML Editor
The plugin can be used to integrate an HTML editor in a number of different ways.
Expand Down
16 changes: 14 additions & 2 deletions WysiwygPlugin/data/System/WysiwygPluginSettings.txt
Expand Up @@ -28,9 +28,10 @@ The default setting for this preference is defined within the plugin.
It corresponds to =div, span=.

This feature may be disabled by setting the preference to a single comma.
Thi does _not_ guarantee that HTML markup will be removed; the conversion
This does _not_ guarantee that HTML markup will be removed; the conversion
of HTML tags to TML markup remains subject to the other controls provided
by the !WysiwygPlugin, including the =WYSIWYGPLUGIN_STICKYBITS= preference,
by the !WysiwygPlugin, including the =WYSIWYGPLUGIN_STICKYBITS= and
=WYSIWYGPLUGIN_IGNOREATTRS= preferences,
=<sticky>= blocks, =<literal>= blocks and the rules applied
to tables and lists.

Expand Down Expand Up @@ -125,4 +126,15 @@ in the plugin:

If you edit using the plain-text editor, you can use the <sticky>..</sticky> tags to delimit HTML (or TML) that you do *not* want to be WYSIWYG edited.

---++++ WYSIWYGPLUGIN_IGNOREATTRS - Ignore tag attributes when deciding whether to keep a tag or not when converting HTML to TML. This is most useful when you
have specific styling that you want to make sure you strip off.

This preference takes the same format as =WYSIWYGPLUGIN_STICKYBITS=. It
specifies tags and their attributes that are to be ignored when making the
decision whether to keep the tag or not. For example, a =<font face="Open Sans">= tag will normally be maintained in the TML. However setting
=WYSIWYGPLUGIN_IGNOREATTRS= to =font=face= will result in it being removed.

By default =WYSIWYGPLUGIN_IGNOREATTRS= is empty. =WYSIWYGPLUGIN_STICKYBITS=
takes precedence over this setting.

%STOPINCLUDE%
95 changes: 95 additions & 0 deletions WysiwygPlugin/lib/Foswiki/Plugins/WysiwygPlugin/HTML2TML.pm
Expand Up @@ -99,6 +99,9 @@ IMPORTANT: $html is a perl internal string, *NOT* octets
=cut

my @protectedByAttr;
my @ignoreAttr;

sub convert {
my ( $this, $text, $options ) = @_;

Expand All @@ -108,6 +111,64 @@ sub convert {
$opts = WC::VERY_CLEAN
if ( $options->{very_clean} );

# See the WysiwygPluginSettings for information on stickybits
my $pref = $options->{stickybits};
$pref = <<'DEFAULT' unless defined $pref;
(?!img).*=id,lang,title,dir,on.*;
a=accesskey,coords,shape,target;
bdo=dir;
br=clear;
col=char,charoff,span,valign,width;
colgroup=align,char,charoff,span,valign,width;
dir=compact;
div=align,style;
dl=compact;
font=size,face;
h\d=align;
hr=align,noshade,size,width;
legend=accesskey,align;
li=value;
ol=compact,start,type;
p=align;
param=name,type,value,valuetype;
pre=width;
q=cite;
table=align,bgcolor,.*?background-color:.*,frame,rules,summary,width;
tbody=align,char,charoff,valign;
td=abbr,align,axis,bgcolor,.*?background-color:.*,.*?border-color:.*,char,charoff,headers,height,nowrap,rowspan,scope,valign,width;
tfoot=align,char,charoff,valign;
th=abbr,align,axis,bgcolor,.*?background-color:.*,char,charoff,height,nowrap,rowspan,scope,valign,width,headers;
thead=align,char,charoff,valign;
tr=bgcolor,.*?background-color:.*,char,charoff,valign;
ul=compact,type;
DEFAULT

foreach my $def ( split( /;\s*/s, $pref ) ) {
my ( $re, $ats ) = split( /\s*=\s*/s, $def, 2 );
push(
@protectedByAttr,
{
tag => qr/$re/i,
attrs => join( '|', split( /\s*,\s*/, $ats ) )
}
);
}

$pref = $options->{ignoreattrs};

if ( defined $pref ) {
foreach my $def ( split( /;\s*/s, $pref ) ) {
my ( $re, $ats ) = split( /\s*=\s*/s, $def, 2 );
push(
@ignoreAttr,
{
tag => qr/$re/i,
attrs => join( '|', split( /\s*,\s*/, $ats ) )
}
);
}
}

#print STDERR "input [". WC::encode_specials($text). "]\n\n";

# Convert (safe) named entities back to the
Expand Down Expand Up @@ -217,6 +278,40 @@ sub _closeTag {
}
}

# Determine if sticky attributes prevent a tag being converted to
# TML when this attribute is present.

sub protectedByAttr {
my ( $tag, $attr ) = @_;

foreach my $row (@protectedByAttr) {
if ( $tag =~ /^$row->{tag}$/i ) {

if ( $attr =~ /^($row->{attrs})$/i ) {
return 1;
}
}
}
return 0;
}

# Determine if an attribute is to be ignored when deciding whether
# to keep a tag as HTML or not.

sub ignoreAttr {
my ( $tag, $attr ) = @_;

foreach my $row (@ignoreAttr) {
if ( $tag =~ /^$row->{tag}$/i ) {

if ( $attr =~ /^($row->{attrs})$/i ) {
return 1;
}
}
}
return 0;
}

sub _text {
my ( $this, $text ) = @_;
my $l = new Foswiki::Plugins::WysiwygPlugin::HTML2TML::Leaf($text);
Expand Down
108 changes: 84 additions & 24 deletions WysiwygPlugin/lib/Foswiki/Plugins/WysiwygPlugin/HTML2TML/Node.pm
Expand Up @@ -5,6 +5,21 @@
# act to express format requirements - for example, the need to have a
# newline before some text, or the need for a space. Whitespace is then
# collapsed down to the minimum that satisfies the format requirements.
#
# 10,000 foot overview:
# _handleTAG functions are called on the Node object, passing
# in an options bitmask and receiving back a bitmask of flags
# and some TML text. The expansion is recursive, so the
# TML returned is the expansion of the entire DOM tree
# under the node. If the TML test is undef, that is taken as a
# signal that the node cannot be converted to TML, in which
# case _defaultTag is used to expand it as HTML. _defaultTag
# is itself recursive, so sub-nodes may well be expanded as
# TML. The options flags, and the flags returned from the
# _handle function, are used to steer the expansion. As well
# as the flags, there are special characters dropped into the
# TML, for example for non-breaking space, or space that can
# be collapsed etc.

# VERY IMPORTANT: ALL STRINGS STORED IN NODES ARE UNICODE
# (perl character strings)
Expand All @@ -29,11 +44,12 @@ use warnings;

use Foswiki::Func; # needed for regular expressions
use Assert;
use HTML::Entities ();

use vars qw( $reww );

use Foswiki::Plugins::WysiwygPlugin::HTML2TML ();
use Foswiki::Plugins::WysiwygPlugin::HTML2TML::WC ();
use HTML::Entities ();

our $reww;

my %jqueryChiliClass = map { $_ => 1 }
qw( cplusplus csharp css bash delphi html java js
Expand Down Expand Up @@ -461,9 +477,10 @@ sub _collapse {
}
}

# Pressing return in a "foswikiDeleteMe" paragraph will cause the paragraph
# to be split into a 2nd paragraph with the same class. We only want to clean
# the first one in the blockquote, and preserve the rest without the class.
# Pressing return in a "foswikiDeleteMe" paragraph will cause
# the paragraph to be split into a 2nd paragraph with the same
# class. We only want to clean the first one in the blockquote,
# and preserve the rest without the class.
if ( $node->{tag} eq 'p'
&& $node->hasClass('foswikiDeleteMe')
&& $node->{parent}
Expand Down Expand Up @@ -727,11 +744,10 @@ sub _defaultTag {
sub _isProtectedByAttrs {
my $this = shift;

require Foswiki::Plugins::WysiwygPlugin::Handlers;
foreach my $attr ( keys %{ $this->{attrs} } ) {
next unless length( $this->{attrs}->{$attr} ); # ignore nulls
return $attr
if Foswiki::Plugins::WysiwygPlugin::Handlers::protectedByAttr(
if Foswiki::Plugins::WysiwygPlugin::HTML2TML::protectedByAttr(
$this->{tag}, $attr );
}
return 0;
Expand Down Expand Up @@ -943,7 +959,7 @@ sub _isConvertableTable {
if (
defined $this->{attrs}->{style}
&& length $this->{attrs}->{style}
&& Foswiki::Plugins::WysiwygPlugin::Handlers::protectedByAttr(
&& Foswiki::Plugins::WysiwygPlugin::HTML2TML::protectedByAttr(
'style', $this->{attrs}
)
);
Expand Down Expand Up @@ -1060,7 +1076,7 @@ sub _isConvertableTableRow {

if (
$key eq 'style'
&& Foswiki::Plugins::WysiwygPlugin::Handlers::protectedByAttr(
&& Foswiki::Plugins::WysiwygPlugin::HTML2TML::protectedByAttr(
$kid->{tag}, $atts{$key}
)
);
Expand Down Expand Up @@ -1669,9 +1685,22 @@ sub _handleFONT {
}
}

# Either the colour can't be mapped, or we can't do the conversion
# without loss of information
return ( 0, undef );
# Check if any of the attributes can be ignored
foreach my $a ( keys %atts ) {
delete $atts{$a}
if Foswiki::Plugins::WysiwygPlugin::HTML2TML::ignoreAttr(
$this->{tag}, $a );
}

if ( scalar( keys(%atts) ) ) {

# Either the colour can't be mapped, or we can't do the conversion
# without loss of attribute information
return ( 0, undef );
}

# We can ignore this
return $this->_flatten($options);
}

# FORM
Expand Down Expand Up @@ -1926,13 +1955,22 @@ sub _handleSPAN {
# delete $atts{style} if defined $atts{style};
# }

# ignore the span tag if there are no other attrs
if ( scalar( keys %atts ) == 0 ) {
return $this->_flatten($options);
# Check if any of the attributes can be ignored
foreach my $a ( keys %atts ) {
delete $atts{$a}
if Foswiki::Plugins::WysiwygPlugin::HTML2TML::ignoreAttr(
$this->{tag}, $a );
}

# otherwise use the default generator.
return ( 0, undef );
if ( scalar( keys(%atts) ) ) {

# Either the colour can't be mapped, or we can't do the conversion
# without loss of attribute information
return ( 0, undef );
}

# We can ignore this
return $this->_flatten($options);
}

# STRIKE
Expand All @@ -1958,20 +1996,40 @@ sub _handleTABLE {
# print STDERR "Found TABLE Attr $key = $atts{$key} \n";
# }

# Preserve HTML if non-default options are requested for padding, spacing, border.
# Preserve HTML if non-default options are requested for
# padding, spacing, border.
return ( 0, undef )
if ( defined $atts{cellpadding} && $atts{cellpadding} ne '0' );
if (
defined $atts{cellpadding}
&& $atts{cellpadding} ne '0'
&& !Foswiki::Plugins::WysiwygPlugin::HTML2TML::ignoreAttr(
$this->{tag}, 'cellpadding'
)
);
return ( 0, undef )
if ( defined $atts{cellspacing} && $atts{cellspacing} ne '1' );
return ( 0, undef ) if ( defined $atts{border} && $atts{border} ne '1' );
if (
defined $atts{cellspacing}
&& $atts{cellspacing} ne '1'
&& !Foswiki::Plugins::WysiwygPlugin::HTML2TML::ignoreAttr(
$this->{tag}, 'cellspacing'
)
);
return ( 0, undef )
if (
defined $atts{border}
&& $atts{border} ne '1'
&& !Foswiki::Plugins::WysiwygPlugin::HTML2TML::ignoreAttr(
$this->{tag}, 'border'
)
);

#use Data::Dumper;
#print STDERR Data::Dumper::Dumper( \%atts);

return 0
if (
defined $atts{style}
&& Foswiki::Plugins::WysiwygPlugin::Handlers::protectedByAttr(
&& Foswiki::Plugins::WysiwygPlugin::HTML2TML::protectedByAttr(
'table', $atts{style}
)
);
Expand Down Expand Up @@ -2011,7 +2069,9 @@ sub _handleVAR { return _flatten(@_); }
__END__
Foswiki - The Free and Open Source Wiki, http://foswiki.org/
Copyright (C) 2008-2010 Foswiki Contributors. Foswiki Contributors
Author: Crawford Currie http://c-dot.co.uk
Copyright (C) 2008-2015 Foswiki Contributors. Foswiki Contributors
are listed in the AUTHORS file in the root of this distribution.
NOTE: Please extend that file, not this notice.
Expand Down

0 comments on commit 91888e7

Please sign in to comment.