Welcome to the Custom GREP Implementation project! This project is a custom implementation of the classic grep
utility, focusing on regular expression pattern matching. It supports various regex features, and the goal is to mirror many of the functionalities found in traditional grep
tools.
- Introduction
- Supported Patterns
- Literal Characters
- \d Pattern - Digits
- \w Pattern - Alphanumerical Characters
- [--] Pattern - Positive Character Groups
- [^--] Pattern - Negative Character Groups
- ^ Pattern - Start of String / Line Anchor
- $ Pattern - End of String / Line Anchor
- + Pattern - "One or More" Quantifier
- ? Pattern - "Zero or One" Quantifier
- . Pattern - Wildcard
- (-|-) Pattern - Alternation Pattern
- Future Additions
- Setup and Usage
- Implementation Details
- Contributing
- License
- Acknowledgments
Regular expressions (Regexes, for short) are patterns used to match character combinations in strings. grep
is a CLI tool for searching using Regexes.
This project is a custom implementation that mirrors many features of grep
when it comes to matching patterns defined by regexes within any text corpus. It's built in C++ and aims to be both educational and practical.
Matches the exact sequence of characters in the input string.
Example:
$ echo "hello" | ./grep.sh -E "hello"
Matches any digit character [0-9]
.
Example:
$ echo "2023" | ./grep.sh -E "\d\d\d\d"
Matches any alphanumeric character [a-zA-Z0-9_]
.
Example:
$ echo "hello123" | ./grep.sh -E "\w\w\w\w\w\d\d\d"
Matches any character within the specified group.
Example:
$ echo "hello" | ./grep.sh -E "[aeiou]"
Matches any character NOT within the specified group.
Example:
$ echo "hello" | ./grep.sh -E "[^aeiou]"
Matches the start of a string or line.
Example:
$ echo "hello" | ./grep.sh -E "^h"
Matches the end of a string or line.
Example:
$ echo "hello" | ./grep.sh -E "o$"
Matches one or more occurrences of the preceding element.
Example:
$ echo "hellooo" | ./grep.sh -E "o+"
Matches zero or one occurrence of the preceding element.
Example:
$ echo "color" | ./grep.sh -E "colou?r"
Matches any single character.
Example:
$ echo "dog" | ./grep.sh -E "d.g"
Matches either of the patterns separated by |
.
Example:
$ echo "cat" | ./grep.sh -E "cat|dog"
Planned future enhancements include:
- Support for single backreferences
- Handling multiple backreferences
- Nested backreferences
-
Ensure you have
cmake
installed locally. -
Run
./grep.sh
to execute the program. The main implementation is insrc/Server.cpp
. -
Example usage:
$ echo "cats and dogs" | ./grep.sh -E "cat."
Note: The
.
character acts as a wildcard, matching any single character.Example Result:
No backref found Checking for pattern: cat Input line length: 13, Logical Pattern Length: 3 Starting match attempt from Input Position: 0 Starting match: Pattern: "cat" Input Line: "cats and dogs" Initial Input Position: 0 Pattern Position: 0, Input Position: 0 Checking for character literal: 'c' at Input[0] = 'c' Pattern Position: 1, Input Position: 1 Checking for character literal: 'a' at Input[1] = 'a' Pattern Position: 2, Input Position: 2 Checking for character literal: 't' at Input[2] = 't' [END MATCHING] Matched String: cat Found: true Result: true Matched String: cat. Execution time: 1000334 microseconds
Notev2: The program intelligently reduces patterns by eliminating unnecessary wildcards, making pattern matching more efficient.
This feature matches exact characters in the input string. It's the most basic form of pattern matching.
Quantifiers like +
and ?
modify the behavior of the preceding element, allowing flexible pattern matching.
Character groups and their negations ([--]
, [^--]
) provide powerful ways to match specific sets of characters.
Anchors like ^
and $
allow you to match patterns at specific positions within the string.
Wildcards match any character, making your patterns versatile.
The (-|-)
pattern allows you to match one pattern or another, increasing the power of your regex.
Contributions are welcome! Please feel free to submit issues, feature requests, and pull requests. A simple CI workflow was implemented for ease of contribution. It runs the provided test script containing sanity checks of previous functionalities. So, before pushing any changes, run:
cd grep
./test.sh
And make sure all tests passsed. Thank you in advance!
This project is licensed under the MIT License. See the LICENSE file for details.
- Inspired by the classic
grep
utility and CodeCrafters project. - Thanks to all contributors and supporters!