# Evolving grep to egrep
Enhancing further our application so it is able to perform searches based on regular expressions. Refactoring the code so move from function based to object oriented approach and extending our application with some new arguments.

## Objective

To understand how to define and use **classes** and **objects** and how to use regular expressions from module **re**.

## Classes and objects

Classes are defined as in example:

```
class ClassName:
	def __init__(self):
		# Constructor

	def method_name(self):
		# Do something
```

**Self** argument is referring to the object itself.

## Regular expressions

is module provides regular expression matching operations similar to those found in **Perl**.

Following metacharacters and groups are supported:

```
. ^ $ * + ? { } [ ] \ | ( )
```

`\d` Matches any decimal digit; this is equivalent to the class `[0-9]`.

`\D`  Matches any non-digit character; this is equivalent to the class `[^0-9]`.

`\s` Matches any whitespace character; this is equivalent to the class `[ \t\n\r\f\v]`.

`\S` Matches any non-whitespace character; this is equivalent to the class `[^ \t\n\r\f\v]`.

`\w` Matches any alphanumeric character; this is equivalent to the class `[a-zA-Z0-9_]`.

`\W` Matches any non-alphanumeric character; this is equivalent to the class `[^a-zA-Z0-9_]`.

And following special flags:

| Flag                            | Meaning                                                      |
| ------------------------------- | ------------------------------------------------------------ |
| `ASCII`, `A`                    | Makes several escapes like `\w`, `\b`, `\s` and `\d` match only on ASCII characters with the respective property. |
| `DOTALL`, `S`                   | Make `.` match any character, including newlines.            |
| `IGNORECASE`, `I`               | Do case-insensitive matches.                                 |
| `LOCALE`, `L`                   | Do a locale-aware match.                                     |
| `MULTILINE`, `M`                | Multi-line matching, affecting `^` and `$`.                  |
| `VERBOSE`, `X` (for ‘extended’) | Enable verbose REs, which can be organized more cleanly and understandably. |

## Hands-on

In [2]:
import argparse
import os
import re
import sys
import colorama

""" 4.1 Implementing egrep using function """
# def search_in_path(search, input_path, is_regex=False, only_matching=False, with_filename=False):
#     search_results = []
#     if os.path.isdir(input_path):
#         print('Scanning path: {:25s}'.format(input_path))
#         input_dir_contents = os.scandir(path=input_path)
#         for input_dir_element in input_dir_contents:
#             search_results.extend(search_in_path(search, input_dir_element.path))
#     else:
#         input_file = open(input_path, 'r')
#         print('Opening file: {:25s}'.format(input_file.name))
#         for input_line in input_file.readlines():
#             if is_regex:
#                 if search_result := re.search(search, input_line):
#                     search_result = input_line[search_result.span()[0]:search_result.span()[1]] \
#                         if only_matching else input_line
#                     search_result = '{}: {}'.format(os.path.basename(input_path), search_result) \
#                         if with_filename else search_result
#                     search_results.append(search_result)
#             elif not is_regex and input_line.find(search) >= 0:
#                 search_results.append(
#                     '{}: {}'.format(os.path.basename(input_path), input_line) if with_filename else input_line)
#     return search_results

""" 4.2 Implementing egrep using class """
class Grep:
    def __init__(self, is_regex=False, only_matching=False, with_filename=False):
        self._is_regex = is_regex
        self._only_matching = only_matching
        self._with_filename = with_filename

    def search_in_string(self, search, search_string):
        search_result = None
        if self._is_regex:
            search_result = re.search(search, search_string)
            # if search_result := re.search(search, search_string):
            if search_result:
                search_result = search_string[search_result.span()[0]:search_result.span()[1]] \
                    if self._only_matching else search_string
        elif not self._is_regex and search_string.find(search) >= 0:
            search_result = search_string
        return search_result

    def search_in_path(self, search, input_path):
        search_results = []
        if os.path.isdir(input_path):
            print('Scanning path: {:25s}'.format(input_path))
            input_dir_contents = os.scandir(path=input_path)
            for input_dir_element in input_dir_contents:
                search_results.extend(self.search_in_path(search, input_dir_element.path))
        else:
            input_file = open(input_path, 'r')
            print('Opening file: {:25s}'.format(input_file.name))
            for input_line in input_file.readlines():
                search_result = self.search_in_string(search, input_line)
                # if search_result := self.search_in_string(search, input_line):
                if search_result:
                    search_results.append('{}: {}'.format(os.path.basename(input_file.name), search_result) \
                                              if self._with_filename else search_result)
        return search_results


def main():
    parser = argparse.ArgumentParser()
    parser.add_argument('search', type=str, help='Pattern to search for')
    parser.add_argument('input_paths', nargs='+', type=str, help='List of input file paths')
    parser.add_argument('-e', '--regex', dest='is_regexp', action='store_true', help='Use search as regexp')
    parser.add_argument('-r', '--recursive', type=str, help='Search recursively in directories')
    parser.add_argument('-o', '--only-matching', dest='only_matching', action='store_true',
                        help='Show matched string only')
    parser.add_argument('-H', '--with-filename', dest='with_filename', action='store_true',
                        help='Show matched string only')
    args = parser.parse_args(sys.argv[1:])

    """ 4.1 Implementing egrep using function """
    # for input_path in args.input_paths:
    #     search_results = search_in_path(args.search, input_path, args.is_regexp, args.only_matching, args.with_filename)
    #     print(colorama.Style.RESET_ALL + 'Search results:')
    #     for search_result in search_results:
    #         print(colorama.Fore.GREEN + search_result)

    """ 4.2 Implementing egrep using class """
    grep = Grep(args.is_regexp, args.only_matching, args.with_filename)
    for input_path in args.input_paths:
        search_results = grep.search_in_path(args.search, input_path)
        print(colorama.Style.RESET_ALL + 'Search results:')
        for search_result in search_results:
            print(colorama.Fore.GREEN + search_result)


if __name__ == '__main__':
    main()

usage: ipykernel_launcher.py [-h] [-e] [-r RECURSIVE] [-o] [-H]
                             search input_paths [input_paths ...]
ipykernel_launcher.py: error: the following arguments are required: input_paths


SystemExit: 2

  warn("To exit: use 'exit', 'quit', or Ctrl-D.", stacklevel=1)


## Quiz

1. Can you have static class methods in Python?

2. What is the regular expression to locate word **fox** in the sample text:

   ```
   The quick brown fox jumps over the lazy dog.
   The quick brown dog jumps over the small fox.
   The quick red fox jumps over the brown fox.
   ```

In [None]:
from IPython.display import Markdown as md
with open(f'Answers/Answers4.md') as file:
    md_content = ''.join(file.readlines())
md(md_content)

[Previous chapter - Chapter3](Chapter3.ipynb) | [Next chapter - Chapter5](Chapter5.ipynb)