# Making a simple log file analyzer
Creating a second CLI application that analyzes log files and reutilizes egrep from previous chapters.

## Objective

To understand important concepts of **comprehensions** and **generators**.

## Comprehensions

A **comprehension** is a concise way to create lists, tuples and dicts. Common applications are to make new arrays or maps where each element is the result of some operations applied to each member of another sequence or iterable, or to create a subsequence of those elements that satisfy a certain condition. This is no essentially different than running it within a loop or a function.

If you run case below your will run out of memory.

```python
def find_even_number_function(number_stream):
    even_number = []
    for n in number_stream:
        if n % 2 == 0:
            even_number.append(n)
    return even_number


for i in find_even_number_function(range(1,1000000000)):
    print(i)

for i in [n for n in range(1, 1000000000) if n % 2 == 0]:
    print(i)
```

## Generators

A **generator** is very similar to a function that returns  an array, in that a generator has parameters, can be called, and  generates a sequence of values. However, instead of building an array  containing all the values and returning them all at once, a generator `yields` the values one at a time, which requires less memory and allows the  caller to get started processing the first few values immediately. 

In principle generators are memory efficient for its lazy evaluation. 

If you run case below your will run out of time.

```python
def find_even_number_generator(number_stream):
    for n in number_stream:
        if n % 2 == 0:
            yield n


for i in find_even_number_generator(range(1, 1000000000)):
    print(i)

for i in (n for n in range(1, 1000000000) if n % 2 == 0):
    print(i)
```

Difference between generator and normal function is that:

- Once the function yields, the function is paused and the control is transferred to the caller.
- When the function terminates, StopIteration is raised automatically on further calls.
- Local variables and their states are remembered between successive calls.
- Generator function contains one or more yield statement instead of return statement.
- As the methods like `_next_()` and `_iter_()` are implemented automatically, we can iterate through the items using `next()`.

## Hands-on

In [2]:
import argparse
import os
import re
import sys
import colorama


class Grep:
    def __init__(self, is_regex=False, only_matching=False, with_filename=False):
        self._is_regex = is_regex
        self._only_matching = only_matching
        self._with_filename = with_filename

    def search_in_string(self, search, search_string, return_groups=False):
        search_result = None
        search_groups = None
        if self._is_regex:
            search_result = re.search(search, search_string)
            if search_result:
                # if search_result := re.search(search, search_string):
                # search_groups = search_result.groups()
                search_groups = search_result.groupdict()
                search_result = search_string[search_result.span()[0]:search_result.span()[1]] \
                    if self._only_matching else search_string
        elif not self._is_regex and search_string.find(search) >= 0:
            search_result = search_string
        if return_groups:
            return search_result, search_groups
        else:
            return search_result

    def search_in_path(self, search, input_path):
        search_results = []
        if os.path.isdir(input_path):
            print('Scanning path: {:25s}'.format(input_path))
            input_dir_contents = os.scandir(path=input_path)
            for input_dir_element in input_dir_contents:
                search_results.extend(self.search_in_path(search, input_dir_element.path))
        else:
            input_file = open(input_path, 'r')
            print('Opening file: {:25s}'.format(input_file.name))
            for input_line in input_file.readlines():
                search_result = self.search_in_string(search, input_line)
                if search_result:
                    search_results.append('{}: {}'.format(os.path.basename(input_file.name), search_result) \
                                              if self._with_filename else search_result)
        return search_results


def main():
    parser = argparse.ArgumentParser()
    parser.add_argument('search', type=str, help='Pattern to search for')
    parser.add_argument('input_paths', nargs='+', type=str, help='List of input file paths')
    parser.add_argument('-e', '--regex', dest='is_regexp', action='store_true', help='Use search as regexp')
    parser.add_argument('-r', '--recursive', type=str, help='Search recursively in directories')
    parser.add_argument('-o', '--only-matching', dest='only_matching', action='store_true',
                        help='Show matched string only')
    parser.add_argument('-H', '--with-filename', dest='with_filename', action='store_true',
                        help='Show matched string only')
    args = parser.parse_args(sys.argv[1:])

    grep = Grep(args.is_regexp, args.only_matching, args.with_filename)
    for input_path in args.input_paths:
        search_results = grep.search_in_path(args.search, input_path)
        print(colorama.Style.RESET_ALL + 'Search results:')
        for search_result in search_results:
            print(colorama.Fore.GREEN + search_result)


if __name__ == '__main__':
    main()

usage: ipykernel_launcher.py [-h] [-e] [-r RECURSIVE] [-o] [-H]
                             search input_paths [input_paths ...]
ipykernel_launcher.py: error: the following arguments are required: input_paths


SystemExit: 2

  warn("To exit: use 'exit', 'quit', or Ctrl-D.", stacklevel=1)


[Previous chapter - Chapter4](Chapter4.ipynb) | [Next chapter - Chapter6](Chapter6.ipynb)