Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode control characters in inputs may break generation of documents #192

Open
tagliala opened this issue Feb 1, 2024 · 2 comments
Open

Comments

@tagliala
Copy link
Contributor

tagliala commented Feb 1, 2024

For some reason (maybe copy & paste), we ended up with a text field containing \u0002 char (Start of Text), and this broke document generation with Sablon

Minimum reproducible test case

# frozen_string_literal: true

require 'bundler/inline'

begin
  gemfile(true) do
    source 'https://rubygems.org'

    gem 'sablon'
  end

  require 'sablon'
rescue Gem::LoadError => e
  puts "\nMissing Dependency:\n#{e.backtrace.first} #{e.message}"
rescue LoadError => e
  puts "\nError:\n#{e.backtrace.first} #{e.message}"
  exit 1
end

template = Sablon.template(File.expand_path('./cv_template.docx'))

context = {
  title: "\u0002",
  skills: [],
  education: [],
  career: [],
  referees: []
}

template.render_to_file File.expand_path('./output.docx'), context

cv template is the same file as: https://github.com/senny/sablon/blob/master/test/fixtures/cv_template.docx

This is a problem with unicode chars from 1 (0001) to 31 (001F) except, without surprises, for:

  • horizontal tab 0009
  • line feed 000a
  • carriage return 000d

Expected output

The template without content

image

Actual output

An empty file

image

Workarounds

I've tried Sablon.content(:string, "\u0002") without success.

Sanitizing the input will work, but there is always the chance to forget a field, so I was wondering if this can be fixed in Sablon itself

@tagliala
Copy link
Contributor Author

At the moment we are using the following monkey patch:

# frozen_string_literal: true

# TODO: Remove when senny/sablon#192 is fixed
Sablon::Statement.send(:remove_const, :Insertion)

module Sablon
  module Statement
    class Insertion < Struct.new(:expr, :field)
      RESERVED_CHARS = ((1..31).map { |c| c.chr(Encoding::UTF_8) } - ["\t", "\n", "\r"]).join.freeze
      REPLACING_CHAR = ''

      def evaluate(env)
        if (content = expr.evaluate(env.context))
          sanitized_content = sanitize(content)
          field.replace(Sablon::Content.wrap(sanitized_content), env)
        else
          field.remove
        end
      end

      def sanitize(content)
        return content unless content.respond_to?(:tr)

        content.tr(RESERVED_CHARS, REPLACING_CHAR)
      end
    end
  end
end

If there is a better place please advise

@senny
Copy link
Owner

senny commented Jun 26, 2024

That's unfortunate and annoying to debug for sure. Assuming all characters that we strip do indeed cause the generated document to be broken, I see no drawbacks in using some defensive sanitization directly in Sablon.

Conceptually, It be better if we can sanitize in Sablon::Content than the insertion. That would also allow some more flexible behavior if someone wanted a different kind of sanitization for Strings for example. It might be a bit more involving to add it to Content though as we might need to sanitize in multiple places.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants