Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash on illegal characters #48

Closed
retikulum opened this issue Mar 24, 2021 · 4 comments · Fixed by #49
Closed

Crash on illegal characters #48

retikulum opened this issue Mar 24, 2021 · 4 comments · Fixed by #49
Labels
bug Something isn't working

Comments

@retikulum
Copy link

Hi,

While I was fuzzing library, I encountered some bugs. You can find them in below.

What is my program?

#include <stdio.h>
#include "/mnt/ramdisk/PeppaPEG/peppapeg.h"
#include <stdlib.h>
#include <unistd.h>
#include <signal.h>
#include <string.h>
# define ENTRY 1

int main(int argc, char* argv[]) {
    FILE *in_file  = fopen(argv[1], "r");
    char data[255];
    fscanf(in_file, "%[^\n]", data);
    P4_Grammar* grammar = P4_CreateGrammar();
    if (grammar == NULL) {
        return 1;
    }

    if (P4_AddLiteral(grammar, ENTRY, data, false) != P4_Ok) {
        return 1;
    }

    P4_Source*  source = P4_CreateSource(data, ENTRY);
    if (source == NULL) {
        return 1;
    }

    if (P4_Parse(grammar, source) != P4_Ok) {
        return 1;
    }

    P4_Token*   root = P4_GetSourceAst(source);
    char*       text = P4_CopyTokenString(root);

    free(text);
    P4_DeleteSource(source);
    P4_DeleteGrammar(grammar);
    return 1;

}

Ubuntu Version

Ubuntu 18.04.5 LTS
Linux ubuntu 4.15.0-112-generic #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

Bug

Some special characters can't be parsed and it caused memory (heap-bufffer-overflow) errors.

You can produce it with string "Hello Worìd" (ì = 0xEC or 236). I patched P4_CaseCmpInsensitive function to show what is the value of P4_String given to P4_ReadRune as an argument. It can be seen that it is stopped after ì because it is parsing variable wrong and assign wrong number to P4_Rune* c . With the use of wrong value, program will access memory areas where it shouldn’t be. I am aware of open issue (#38) for supporting other utf encodings.

image

image

image

If input is a long string which contains “illegal” characters, program will be aboreted and throws DEADLYSIGNAL. This bug can be produced without sanitizers. Inpus can be found here: https://controlc.com/74f1e9b9

image

image

AFL++ (https://github.com/AFLplusplus/AFLplusplus) is used for this fuzzing campaign.

@soasme soasme added the bug Something isn't working label Mar 24, 2021
@soasme soasme reopened this Mar 24, 2021
@soasme
Copy link
Owner

soasme commented Mar 24, 2021

@retikulum I have fixed the issue by revamping the implementation of P4_CaseCmpInsensitive. The case-insensitive literal rule "Hello Worìd" can now match the input "HELLO WORÌD", see test.

The case https://controlc.com/74f1e9b9 can pass on my local. Can you please revisit this issue when you're free?

Also, I think introducing fuzzy testing into this project is definitely a must-do. I have added it to the roadmap.

@retikulum
Copy link
Author

@soasme Thanks for your great effort. I have tested both cases. "Hello Worìd" works perfectly on my machine however the second case it is still failing. I think I shouldn't have copy-pasted test case on controlc because it generally brokes for non-ascii. I have tested second case both with sanitizers and without sanitizers. It again crashes. My crash_test case is attached.

image

crash_test.txt

For the introducing fuzzing test into project, you can consider integrating it with oss-fuzz (I know that it can take time to integrate it for small projects) or you can use similar script that I use for this campaign. It is inspired by afl++ installation script. I haven't tested it, I just wrote it for giving brief explanation to you.

#install afl++
sudo apt-get update
sudo apt-get install -y build-essential python3-dev automake git flex bison libglib2.0-dev libpixman-1-dev python3-setuptools
# try to install llvm 11 and install the distro default if that fails
sudo apt-get install -y lld-11 llvm-11 llvm-11-dev clang-11 || sudo apt-get install -y lld llvm llvm-dev clang 
sudo apt-get install -y gcc-$(gcc --version|head -n1|sed 's/.* //'|sed 's/\..*//')-plugin-dev libstdc++-$(gcc --version|head -n1|sed 's/.* //'|sed 's/\..*//')-dev
git clone https://github.com/AFLplusplus/AFLplusplus && cd AFLplusplus
make distrib
sudo make install
#change directory to source
cd ..
#you can change harness according to which function/s you want to fuzz
afl-clang-fast harness.c peppapeg.c -o harness 
#create one small corpus
mkdir inputs
echo "Hello World" > inputs/rand
sudo afl-system-config
mkdir sync_dir
#start afl -- you can change/play parameters for specific scenarios
afl-fuzz -i inputs -o sync_dir -- ./harness @@

@soasme
Copy link
Owner

soasme commented Mar 25, 2021

Thanks for the procedure. I'll take it a look.

The exact character that caused the problem is 00 as it is used by C to determine the end of the input string. Currently, Peppa PEG only supports UTF-8 encoding; 00 is supposed not to be appearing in the middle of the content. I think you probably need to sanitize the input string by replacing 00 to something else like whitespace or \uFFFD. Similar assumption can be found in commonmark spec.

@retikulum
Copy link
Author

@soasme Hi again. This will help me a lot to gain better understanding of how these encoding schemes work. You can close the issue as you wish. Thank you for your time and effort.

@soasme soasme closed this as completed Mar 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants