Skip to content

Commit

Permalink
Add sanitizer rules per renderer (go-gitea#16110)
Browse files Browse the repository at this point in the history
* Added sanitizer rules per renderer.

* Updated documentation.

Co-authored-by: techknowlogick <techknowlogick@gitea.io>
  • Loading branch information
2 people authored and AbdulrhmnGhanem committed Aug 10, 2021
1 parent 0b2aa4a commit 181d16f
Show file tree
Hide file tree
Showing 10 changed files with 215 additions and 113 deletions.
4 changes: 4 additions & 0 deletions docs/content/doc/advanced/config-cheat-sheet.en-us.md
Original file line number Diff line number Diff line change
Expand Up @@ -907,13 +907,17 @@ Gitea supports customizing the sanitization policy for rendered HTML. The exampl
ELEMENT = span
ALLOW_ATTR = class
REGEXP = ^\s*((math(\s+|$)|inline(\s+|$)|display(\s+|$)))+
ALLOW_DATA_URI_IMAGES = true
```

- `ELEMENT`: The element this policy applies to. Must be non-empty.
- `ALLOW_ATTR`: The attribute this policy allows. Must be non-empty.
- `REGEXP`: A regex to match the contents of the attribute against. Must be present but may be empty for unconditional whitelisting of this attribute.
- `ALLOW_DATA_URI_IMAGES`: **false** Allow data uri images (`<img src="data:image/png;base64,..."/>`).

Multiple sanitisation rules can be defined by adding unique subsections, e.g. `[markup.sanitizer.TeX-2]`.
To apply a sanitisation rules only for a specify external renderer they must use the renderer name, e.g. `[markup.sanitizer.asciidoc.rule-1]`.
If the rule is defined above the renderer ini section or the name does not match a renderer it is applied to every renderer.

## Time (`time`)

Expand Down
41 changes: 38 additions & 3 deletions docs/content/doc/advanced/external-renderers.en-us.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,8 +64,8 @@ IS_INPUT_FILE = false
[markup.jupyter]
ENABLED = true
FILE_EXTENSIONS = .ipynb
RENDER_COMMAND = "jupyter nbconvert --stdout --to html --template basic "
IS_INPUT_FILE = true
RENDER_COMMAND = "jupyter nbconvert --stdin --stdout --to html --template basic"
IS_INPUT_FILE = false

[markup.restructuredtext]
ENABLED = true
Expand All @@ -90,15 +90,50 @@ FILE_EXTENSIONS = .md,.markdown
RENDER_COMMAND = pandoc -f markdown -t html --katex
```

You must define `ELEMENT`, `ALLOW_ATTR`, and `REGEXP` in each section.
You must define `ELEMENT` and `ALLOW_ATTR` in each section.

To define multiple entries, add a unique alphanumeric suffix (e.g., `[markup.sanitizer.1]` and `[markup.sanitizer.something]`).

To apply a sanitisation rules only for a specify external renderer they must use the renderer name, e.g. `[markup.sanitizer.asciidoc.rule-1]`, `[markup.sanitizer.<renderer>.rule-1]`.

**Note**: If the rule is defined above the renderer ini section or the name does not match a renderer it is applied to every renderer.

Once your configuration changes have been made, restart Gitea to have changes take effect.

**Note**: Prior to Gitea 1.12 there was a single `markup.sanitiser` section with keys that were redefined for multiple rules, however,
there were significant problems with this method of configuration necessitating configuration through multiple sections.

### Example: Office DOCX

Display Office DOCX files with [`pandoc`](https://pandoc.org/):
```ini
[markup.docx]
ENABLED = true
FILE_EXTENSIONS = .docx
RENDER_COMMAND = "pandoc --from docx --to html --self-contained --template /path/to/basic.html"

[markup.sanitizer.docx.img]
ALLOW_DATA_URI_IMAGES = true
```

The template file has the following content:
```
$body$
```

### Example: Jupyter Notebook

Display Jupyter Notebook files with [`nbconvert`](https://github.com/jupyter/nbconvert):
```ini
[markup.jupyter]
ENABLED = true
FILE_EXTENSIONS = .ipynb
RENDER_COMMAND = "jupyter-nbconvert --stdin --stdout --to html --template basic"

[markup.sanitizer.jupyter.img]
ALLOW_DATA_URI_IMAGES = true
```

## Customizing CSS
The external renderer is specified in the .ini in the format `[markup.XXXXX]` and the HTML supplied by your external renderer will be wrapped in a `<div>` with classes `markup` and `XXXXX`. The `markup` class provides out of the box styling (as does `markdown` if `XXXXX` is `markdown`). Otherwise you can use these classes to specifically target the contents of your rendered HTML.

Expand Down
10 changes: 10 additions & 0 deletions modules/markup/csv/csv.go
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ import (
"html"
"io"
"io/ioutil"
"regexp"
"strconv"

"code.gitea.io/gitea/modules/csv"
Expand Down Expand Up @@ -38,6 +39,15 @@ func (Renderer) Extensions() []string {
return []string{".csv", ".tsv"}
}

// SanitizerRules implements markup.Renderer
func (Renderer) SanitizerRules() []setting.MarkupSanitizerRule {
return []setting.MarkupSanitizerRule{
{Element: "table", AllowAttr: "class", Regexp: regexp.MustCompile(`data-table`)},
{Element: "th", AllowAttr: "class", Regexp: regexp.MustCompile(`line-num`)},
{Element: "td", AllowAttr: "class", Regexp: regexp.MustCompile(`line-num`)},
}
}

func writeField(w io.Writer, element, class, field string) error {
if _, err := io.WriteString(w, "<"); err != nil {
return err
Expand Down
7 changes: 6 additions & 1 deletion modules/markup/external/external.go
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ func RegisterRenderers() {

// Renderer implements markup.Renderer for external tools
type Renderer struct {
setting.MarkupRenderer
*setting.MarkupRenderer
}

// Name returns the external tool name
Expand All @@ -48,6 +48,11 @@ func (p *Renderer) Extensions() []string {
return p.FileExtensions
}

// SanitizerRules implements markup.Renderer
func (p *Renderer) SanitizerRules() []setting.MarkupSanitizerRule {
return p.MarkupSanitizerRules
}

func envMark(envName string) string {
if runtime.GOOS == "windows" {
return "%" + envName + "%"
Expand Down
4 changes: 2 additions & 2 deletions modules/markup/html_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -112,7 +112,7 @@ func TestRender_links(t *testing.T) {

defaultCustom := setting.Markdown.CustomURLSchemes
setting.Markdown.CustomURLSchemes = []string{"ftp", "magnet"}
ReplaceSanitizer()
InitializeSanitizer()
CustomLinkURLSchemes(setting.Markdown.CustomURLSchemes)

test(
Expand Down Expand Up @@ -192,7 +192,7 @@ func TestRender_links(t *testing.T) {

// Restore previous settings
setting.Markdown.CustomURLSchemes = defaultCustom
ReplaceSanitizer()
InitializeSanitizer()
CustomLinkURLSchemes(setting.Markdown.CustomURLSchemes)
}

Expand Down
9 changes: 7 additions & 2 deletions modules/markup/markdown/markdown.go
Original file line number Diff line number Diff line change
Expand Up @@ -199,7 +199,7 @@ func actualRender(ctx *markup.RenderContext, input io.Reader, output io.Writer)
}
_ = lw.Close()
}()
buf := markup.SanitizeReader(rd)
buf := markup.SanitizeReader(rd, "")
_, err := io.Copy(output, buf)
return err
}
Expand All @@ -215,7 +215,7 @@ func render(ctx *markup.RenderContext, input io.Reader, output io.Writer) error
if log.IsDebug() {
log.Debug("Panic in markdown: %v\n%s", err, string(log.Stack(2)))
}
ret := markup.SanitizeReader(input)
ret := markup.SanitizeReader(input, "")
_, err = io.Copy(output, ret)
if err != nil {
log.Error("SanitizeReader failed: %v", err)
Expand Down Expand Up @@ -249,6 +249,11 @@ func (Renderer) Extensions() []string {
return setting.Markdown.FileExtensions
}

// SanitizerRules implements markup.Renderer
func (Renderer) SanitizerRules() []setting.MarkupSanitizerRule {
return []setting.MarkupSanitizerRule{}
}

// Render implements markup.Renderer
func (Renderer) Render(ctx *markup.RenderContext, input io.Reader, output io.Writer) error {
return render(ctx, input, output)
Expand Down
6 changes: 6 additions & 0 deletions modules/markup/orgmode/orgmode.go
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ import (

"code.gitea.io/gitea/modules/highlight"
"code.gitea.io/gitea/modules/markup"
"code.gitea.io/gitea/modules/setting"
"code.gitea.io/gitea/modules/util"

"github.com/alecthomas/chroma"
Expand Down Expand Up @@ -41,6 +42,11 @@ func (Renderer) Extensions() []string {
return []string{".org"}
}

// SanitizerRules implements markup.Renderer
func (Renderer) SanitizerRules() []setting.MarkupSanitizerRule {
return []setting.MarkupSanitizerRule{}
}

// Render renders orgmode rawbytes to HTML
func Render(ctx *markup.RenderContext, input io.Reader, output io.Writer) error {
htmlWriter := org.NewHTMLWriter()
Expand Down
56 changes: 26 additions & 30 deletions modules/markup/renderer.go
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,7 @@ type Renderer interface {
Name() string // markup format name
Extensions() []string
NeedPostProcess() bool
SanitizerRules() []setting.MarkupSanitizerRule
Render(ctx *RenderContext, input io.Reader, output io.Writer) error
}

Expand Down Expand Up @@ -136,37 +137,32 @@ func render(ctx *RenderContext, renderer Renderer, input io.Reader, output io.Wr
_ = pw.Close()
}()

if renderer.NeedPostProcess() {
pr2, pw2 := io.Pipe()
defer func() {
_ = pr2.Close()
_ = pw2.Close()
}()

wg.Add(1)
go func() {
buf := SanitizeReader(pr2)
_, err = io.Copy(output, buf)
_ = pr2.Close()
wg.Done()
}()

wg.Add(1)
go func() {
pr2, pw2 := io.Pipe()
defer func() {
_ = pr2.Close()
_ = pw2.Close()
}()

wg.Add(1)
go func() {
buf := SanitizeReader(pr2, renderer.Name())
_, err = io.Copy(output, buf)
_ = pr2.Close()
wg.Done()
}()

wg.Add(1)
go func() {
if renderer.NeedPostProcess() {
err = PostProcess(ctx, pr, pw2)
_ = pr.Close()
_ = pw2.Close()
wg.Done()
}()
} else {
wg.Add(1)
go func() {
buf := SanitizeReader(pr)
_, err = io.Copy(output, buf)
_ = pr.Close()
wg.Done()
}()
}
} else {
_, err = io.Copy(pw2, pr)
}
_ = pr.Close()
_ = pw2.Close()
wg.Done()
}()

if err1 := renderer.Render(ctx, input, pw); err1 != nil {
return err1
}
Expand Down

0 comments on commit 181d16f

Please sign in to comment.