Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sanitizer rules per renderer #16110

Merged
merged 5 commits into from
Jun 23, 2021
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
4 changes: 4 additions & 0 deletions docs/content/doc/advanced/config-cheat-sheet.en-us.md
Original file line number Diff line number Diff line change
Expand Up @@ -904,13 +904,17 @@ Gitea supports customizing the sanitization policy for rendered HTML. The exampl
ELEMENT = span
ALLOW_ATTR = class
REGEXP = ^\s*((math(\s+|$)|inline(\s+|$)|display(\s+|$)))+
ALLOW_DATA_URI_IMAGES = true
```

- `ELEMENT`: The element this policy applies to. Must be non-empty.
- `ALLOW_ATTR`: The attribute this policy allows. Must be non-empty.
- `REGEXP`: A regex to match the contents of the attribute against. Must be present but may be empty for unconditional whitelisting of this attribute.
- `ALLOW_DATA_URI_IMAGES`: **false** Allow data uri images (`<img src="data:image/png;base64,..."/>`).

Multiple sanitisation rules can be defined by adding unique subsections, e.g. `[markup.sanitizer.TeX-2]`.
To apply a sanitisation rules only for a specify external renderer they must use the renderer name, e.g. `[markup.sanitizer.asciidoc.rule-1]`.
If the rule is defined above the renderer ini section or the name does not match a renderer it is applied to every renderer.

## Time (`time`)

Expand Down
41 changes: 38 additions & 3 deletions docs/content/doc/advanced/external-renderers.en-us.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,8 +64,8 @@ IS_INPUT_FILE = false
[markup.jupyter]
ENABLED = true
FILE_EXTENSIONS = .ipynb
RENDER_COMMAND = "jupyter nbconvert --stdout --to html --template basic "
IS_INPUT_FILE = true
RENDER_COMMAND = "jupyter nbconvert --stdin --stdout --to html --template basic"
IS_INPUT_FILE = false

[markup.restructuredtext]
ENABLED = true
Expand All @@ -90,15 +90,50 @@ FILE_EXTENSIONS = .md,.markdown
RENDER_COMMAND = pandoc -f markdown -t html --katex
```

You must define `ELEMENT`, `ALLOW_ATTR`, and `REGEXP` in each section.
You must define `ELEMENT` and `ALLOW_ATTR` in each section.

To define multiple entries, add a unique alphanumeric suffix (e.g., `[markup.sanitizer.1]` and `[markup.sanitizer.something]`).

To apply a sanitisation rules only for a specify external renderer they must use the renderer name, e.g. `[markup.sanitizer.asciidoc.rule-1]`, `[markup.sanitizer.<renderer>.rule-1]`.

**Note**: If the rule is defined above the renderer ini section or the name does not match a renderer it is applied to every renderer.

Once your configuration changes have been made, restart Gitea to have changes take effect.

**Note**: Prior to Gitea 1.12 there was a single `markup.sanitiser` section with keys that were redefined for multiple rules, however,
there were significant problems with this method of configuration necessitating configuration through multiple sections.

### Example: Office DOCX

Display Office DOCX files with [`pandoc`](https://pandoc.org/):
```ini
[markup.docx]
ENABLED = true
FILE_EXTENSIONS = .docx
RENDER_COMMAND = "pandoc --from docx --to html --self-contained --template /path/to/basic.html"

[markup.sanitizer.docx.img]
ALLOW_DATA_URI_IMAGES = true
```

The template file has the following content:
```
$body$
```

### Example: Jupyter Notebook

Display Jupyter Notebook files with [`nbconvert`](https://github.com/jupyter/nbconvert):
```ini
[markup.jupyter]
ENABLED = true
FILE_EXTENSIONS = .ipynb
RENDER_COMMAND = "jupyter-nbconvert --stdin --stdout --to html --template basic"

[markup.sanitizer.jupyter.img]
ALLOW_DATA_URI_IMAGES = true
```

## Customizing CSS
The external renderer is specified in the .ini in the format `[markup.XXXXX]` and the HTML supplied by your external renderer will be wrapped in a `<div>` with classes `markup` and `XXXXX`. The `markup` class provides out of the box styling (as does `markdown` if `XXXXX` is `markdown`). Otherwise you can use these classes to specifically target the contents of your rendered HTML.

Expand Down
10 changes: 10 additions & 0 deletions modules/markup/csv/csv.go
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ import (
"html"
"io"
"io/ioutil"
"regexp"
"strconv"

"code.gitea.io/gitea/modules/csv"
Expand Down Expand Up @@ -38,6 +39,15 @@ func (Renderer) Extensions() []string {
return []string{".csv", ".tsv"}
}

// SanitizerRules implements markup.Renderer
func (Renderer) SanitizerRules() []setting.MarkupSanitizerRule {
return []setting.MarkupSanitizerRule{
{Element: "table", AllowAttr: "class", Regexp: regexp.MustCompile(`data-table`)},
{Element: "th", AllowAttr: "class", Regexp: regexp.MustCompile(`line-num`)},
{Element: "td", AllowAttr: "class", Regexp: regexp.MustCompile(`line-num`)},
}
}

func writeField(w io.Writer, element, class, field string) error {
if _, err := io.WriteString(w, "<"); err != nil {
return err
Expand Down
7 changes: 6 additions & 1 deletion modules/markup/external/external.go
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ func RegisterRenderers() {

// Renderer implements markup.Renderer for external tools
type Renderer struct {
setting.MarkupRenderer
*setting.MarkupRenderer
}

// Name returns the external tool name
Expand All @@ -48,6 +48,11 @@ func (p *Renderer) Extensions() []string {
return p.FileExtensions
}

// SanitizerRules implements markup.Renderer
func (p *Renderer) SanitizerRules() []setting.MarkupSanitizerRule {
return p.MarkupSanitizerRules
}

func envMark(envName string) string {
if runtime.GOOS == "windows" {
return "%" + envName + "%"
Expand Down
4 changes: 2 additions & 2 deletions modules/markup/html_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -112,7 +112,7 @@ func TestRender_links(t *testing.T) {

defaultCustom := setting.Markdown.CustomURLSchemes
setting.Markdown.CustomURLSchemes = []string{"ftp", "magnet"}
ReplaceSanitizer()
InitializeSanitizer()
CustomLinkURLSchemes(setting.Markdown.CustomURLSchemes)

test(
Expand Down Expand Up @@ -192,7 +192,7 @@ func TestRender_links(t *testing.T) {

// Restore previous settings
setting.Markdown.CustomURLSchemes = defaultCustom
ReplaceSanitizer()
InitializeSanitizer()
CustomLinkURLSchemes(setting.Markdown.CustomURLSchemes)
}

Expand Down
9 changes: 7 additions & 2 deletions modules/markup/markdown/markdown.go
Original file line number Diff line number Diff line change
Expand Up @@ -199,7 +199,7 @@ func actualRender(ctx *markup.RenderContext, input io.Reader, output io.Writer)
}
_ = lw.Close()
}()
buf := markup.SanitizeReader(rd)
buf := markup.SanitizeReader(rd, "")
_, err := io.Copy(output, buf)
return err
}
Expand All @@ -215,7 +215,7 @@ func render(ctx *markup.RenderContext, input io.Reader, output io.Writer) error
if log.IsDebug() {
log.Debug("Panic in markdown: %v\n%s", err, string(log.Stack(2)))
}
ret := markup.SanitizeReader(input)
ret := markup.SanitizeReader(input, "")
_, err = io.Copy(output, ret)
if err != nil {
log.Error("SanitizeReader failed: %v", err)
Expand Down Expand Up @@ -249,6 +249,11 @@ func (Renderer) Extensions() []string {
return setting.Markdown.FileExtensions
}

// SanitizerRules implements markup.Renderer
func (Renderer) SanitizerRules() []setting.MarkupSanitizerRule {
return []setting.MarkupSanitizerRule{}
}

// Render implements markup.Renderer
func (Renderer) Render(ctx *markup.RenderContext, input io.Reader, output io.Writer) error {
return render(ctx, input, output)
Expand Down
6 changes: 6 additions & 0 deletions modules/markup/orgmode/orgmode.go
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ import (
"strings"

"code.gitea.io/gitea/modules/markup"
"code.gitea.io/gitea/modules/setting"
"code.gitea.io/gitea/modules/util"

"github.com/niklasfasching/go-org/org"
Expand All @@ -38,6 +39,11 @@ func (Renderer) Extensions() []string {
return []string{".org"}
}

// SanitizerRules implements markup.Renderer
func (Renderer) SanitizerRules() []setting.MarkupSanitizerRule {
return []setting.MarkupSanitizerRule{}
}

// Render renders orgmode rawbytes to HTML
func Render(ctx *markup.RenderContext, input io.Reader, output io.Writer) error {
htmlWriter := org.NewHTMLWriter()
Expand Down
56 changes: 26 additions & 30 deletions modules/markup/renderer.go
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,7 @@ type Renderer interface {
Name() string // markup format name
Extensions() []string
NeedPostProcess() bool
SanitizerRules() []setting.MarkupSanitizerRule
Render(ctx *RenderContext, input io.Reader, output io.Writer) error
}

Expand Down Expand Up @@ -104,37 +105,32 @@ func render(ctx *RenderContext, renderer Renderer, input io.Reader, output io.Wr
_ = pw.Close()
}()

if renderer.NeedPostProcess() {
pr2, pw2 := io.Pipe()
defer func() {
_ = pr2.Close()
_ = pw2.Close()
}()

wg.Add(1)
go func() {
buf := SanitizeReader(pr2)
_, err = io.Copy(output, buf)
_ = pr2.Close()
wg.Done()
}()

wg.Add(1)
go func() {
pr2, pw2 := io.Pipe()
defer func() {
_ = pr2.Close()
_ = pw2.Close()
}()

wg.Add(1)
go func() {
buf := SanitizeReader(pr2, renderer.Name())
_, err = io.Copy(output, buf)
_ = pr2.Close()
wg.Done()
}()

wg.Add(1)
go func() {
if renderer.NeedPostProcess() {
err = PostProcess(ctx, pr, pw2)
_ = pr.Close()
_ = pw2.Close()
wg.Done()
}()
} else {
wg.Add(1)
go func() {
buf := SanitizeReader(pr)
_, err = io.Copy(output, buf)
_ = pr.Close()
wg.Done()
}()
}
} else {
_, err = io.Copy(pw2, pr)
}
_ = pr.Close()
_ = pw2.Close()
wg.Done()
}()

if err1 := renderer.Render(ctx, input, pw); err1 != nil {
return err1
}
Expand Down