Skip to content
Andrew Gaul edited this page Feb 20, 2024 · 2 revisions

S3Proxy can use regex to rename blobs before they are uploaded to the backend. The regex middleware is configured as:

s3proxy.regex-blobstore.match.<regex name 1> = <regex match expression>
s3proxy.regex-blobstore.replace.<regex name 1> = <regex replace expression>

    .... [snip] ....

s3proxy.regex-blobstore.match.<regex name N> = <regex match expression>
s3proxy.regex-blobstore.replace.<regex name N> = <regex replace expression>

You can use multiple regex, and they are evaluated in order. The first regex that matches breaks the evaluation. Each regex match config MUST be accompanied by its replace rule.

For example, the following rule:

s3proxy.regex-blobstore.match.type_partition=^prefix/(\\w+)/(\\d{4})/(\\d{2})/(\\d{2})/(.*)$
s3proxy.regex-blobstore.replace.type_partition=prefix/date=$2-$3-$4/type=$1/$5

will rewrite blob name from:

prefix/test/2023/01/01/test.txt

to:

prefix/date=2023-01-01/type=test/test.txt

NOTE: Note that the match expression must be in escaped string format so, for example, in order to match a digit, you need to escape the backslash (e.g. use \d NOT \d).