# grep
---

## Introduction to Regular Expressions

```
grep [options] [regexp] [filename]
```

### Quotation Marks and Regular Expressions
### Metacharacters

Regular expression metacharacters

 Metacharacter  |         Name            |     Matches
----------------|-------------------------|----------------
**Items to match a single character** ||
.               | Dot                     | Any one character
[...]           | Character class         | Any character listed in brackets
[^...]          | Negated character class | Any character not listed in brackets
\char           | Escape character        | The character after the slash literally; used when you want to search for a "special" character, such as "$" (i.e., use "\$")
**Items that match a position** ||
^               | Caret                   | Start of a line
$               | Dollar sign             | End of a line
\<              | Backslash less-than     | Start of a word
\>              | Backslash greater-than  | End of a word
**The quantifiers** ||
?               | Question mark           | Optional; considered a quantifier
*               | Asterisk                | Any number (including zero); sometimes used as a general wildcard
+               | Plus                    | One or more of the preceding expression
{N}             | Match exactly           | Match exactly *N* times
{N,}            | Match at least          | Match at least *N* times
{min,max}       | Specified range         | Match between *min* and *max* times
**Other** ||
&#124;          | Alternation             | Matches either expression given
-               | Dash                    | Indicates a range
(...)           | Parentheses             | Used to limit scope of alternation
\1, \2, ...     | Backreference           | Matches text previously matched within parentheses (e.g., first set, second set, etc.)
\b              | Word boundary           | Batches characters that typically mark the end of a word (e.g., space, period, etc)
\B              | Backslash               | This is an alternative to using "\\" to match a backslash, used for readability
\w              | Word character          | This is used to match any "word" character (i.e., any letter, number, and the underscore character)
\W              | Non-word character      | This matches any character that isn't used in words (i.e, not a letter, number, or underscore)
\`              | Start of buffer         | Matches the start of a buffer send to *grep*
\'              | End of buffer           | Matches the end of a buffer send to *grep*

### POSIX Character Classes

POSIX character definitions

 POSIX definition | Contents of character defition
------------------|-------------------------------
[:alpha:]         | Any alphabetical character, regardless of case
[:digit:]         | Any numerical character
[:alnum:]         | Any alphabetical or numerical character
[:blank:]         | Space or tab characters
[:xdigit:]        | Hexadecimal characters; any number or A-F or a-f
[:punct:]         | Any punctuation symbol
[:print:]         | Any printable character (not control characters)
[:space:]         | Any whitespace character
[:graph:]         | Exclude whitespace characters
[:upper:]         | Any uppercase letter
[:lower:]         | Any lowercase letter
[:cntrl:]         | Control characters

One placement of POSIX character definitions will match only one single character. To match repetitions of character classes, you would have to repeat the definition:

```
'[:digit:]'
'[:digit:][:digit:][:digit:]'
'[:digit:]{3}'
```

## grep Basics

There are two ways to employ *grep*.

```
$ grep regexp filename
$ cat filename | grep regexp
```

There is a case to be made for piping commands when you with to search through content that is continually streaming. For instance, if you want to monitor a logfile in real-time for specified content:

```
tail -f /var/log/messages | grep WARNING
```

## Basic Regular Expressions (grep or grep -G)

One limitation of basic *grep*: the "extended" regular expressions metacharacters—?, +, {, }, |, (, )-do not word with basic *grep*. The functions provided by those characters exist if you preface them with an escape.

### Match Control

```
-e pattern, --regexp=pattern
    grep -e -style doc.txt
```

Ensures that *grep* recognizes the pattern as the regular expression argument. Useful if the regular expression begins with a hyphen, which makes it look like an option.

```
-f file, --file=file
    grep -f pattern.txt searchhere.txt
```

Takes pattern from *file*. This option allows you to input all the patterns you want to match into a file, called *pattern.txt* here. Then, *grep* searches for all the patterns from *pattern.txt* in the designated file *searchhere.txt*. The patterns are additive; that is, *grep* returns every line that matches any pattern. The pattern file must list one pattern per line. If *pattern.txt* is empty, nothing will match.

```
-i, --ignore-case
    grep -i 'help' me.txt
```

Ignores capitalization in the given regular expressions, either via the command line or in a file of regular expressions specified by the **-f** option.

```
-v, --invert-match
    grep -v oranges filename
```

Returns lines that do **not** match, instead of lines that do.

```
-w, --word-regexp
    grep -w 'xyz' filename
```

Matches only when the input text consists of full words. This is the equivalent of putting **\b** at the beginning and end of the regular expression.

```
-x, --line-regexp
    grep -x 'Hello, world!' filename
```
Like **-w**, but must match an entire line.

### General Output Control

```
-c, --count
    grep -c contact.html access.log
```

Instead of the normal output, you receive just a count of how many lines matched in each input file.

```
grep -c -v contact.html access.log
```

This example returns a count of all the lines that do *not* match the given string.

```
--color[=WHEN], --colour[=WHEN]
    grep -color[=auto] regexp filename
```

Assuming if terminal can support color, *grep* will colorize the pattern in the output. Color is defined by the environment variable **GREP_COLORS**. **WHEN** has three options: **never**, **always**, and **auto**.

```
-l, --files-with-matches
    grep -l "ERROR:" *.log
```

Instead of normal output, prints just names of input files containing the pattern. As with **-L**, the search stops on the first match. This can make *grep* more efficient.

```
-L, --files-without-match
    grep -L 'ERROR:' *.log
```

Instead of normal output, prints just names of input files that contain no matches. This is an efficient use of *grep* because it stops searching each file once it finds any match, instead of continuing to search the entire file for multiple matches.

```
-m NUM, --max-count=NUM
    grep -m 10 'ERROR:' *.log
```

This option tells *grep* to stop reading a file after *NUM* lines are matched. This is useful for reading large files where repetition is likely, such as logfiles. If you simply want to see whether strings are present without flooding the terminal, use this option. This helps to distinguish between pervasive and intermittent errors.

```
-o, --only-matching
    grep -o pattern filename
```

Prints only the text that matches, instead of the whole line of input. This is particularyly useful when implementing *grep* to examing a disk partition or a binary file for the presence of multiple patterns. This would output the pattern that was matched without the content that would cause problems for the terminal.

```
-q, --quiet, --silent
    grep -q pattern filename
```

Suppresses output. The command still conveys useful information because the *grep* command's exit status (0 for success if a match is found, 1 for no match found, 2 if the program cannot run because of an error) can be checked. The option is used in scripts to determine the presence of a pattern in a file without displaying unnecessary output.

```
-s, --no-messages
    grep -s pattern filename
```

Silently discards any error messages resulting from non-existent files or permission errors. This is helpful for scripts that search an entire filesystem without root permissions, and thus will likely encounter permissions errors that may be undersirable. On the other side, it also will suppress useful diagnostic information, which could mean that problems may not be discovered.

### Output Line Prefix Control

```
-b, --byte-offset
    grep -b pattern filename
```

Display the byte offset of each matching text instead of the line number. The number displayed is the byte offset of the start of the line. This is particularly useful for binary file analysis, constructing (or reverse-engineering) patches, or other tasks where line numbers are meaningless.

```
grep -b -o pattern filename
```

A **-o** prints the offset along with the matched pattern itself and not the whole matched line containing the pattern. This causes *grep* to print the byte of offset of the start of the matched string instead of the matched line.

```
-H, --with-filename
    grep -H pattern filename
```

Includes the name of the file before each line printed, and is the default when more than one file is input to the search. This is usefule when searching only one file and you want the filename to be contained in the output.

```
-h, --no-filename
    grep -h pattern *
```

The opposite of *-H*. When more than one file is involved, is suppresses printing the filename before each output. It is the default when only one file or standard input is involved. This is useful for suppressing filenames when searching entire directories.

```
--label=LABEL
    gzip -cd file.gz | grep --label=LABEL pattern
```

When the input is taken from standard input (for instance, when the output of another file is redirected into *grep*), the **-label** option will prefix the line with **LABEL**.

```
-n, --line-number
    grep -n pattern filename
```

Includes the line number of each line displayed. This can be useful in code debugging, allowing you to go into the file and specify a particular line number to start editing.

```
-T, --initial-tab
    grep -T pattern filename
```

Inserts a tab before each matching line, putting the tab between the information generated by *grep* and the matching lines. This option is useful for clarifying the layout. For instance, it can separate line numbers, byte offsets, labels, etc., from the matching text.

```
-u, --unix-bytes-offsets
    grep -u -b pattern filename
```

This option only works under the MS-DOS and Microsoft Windows platforms and needs to be invoked with **-b**. This option will compute the byte-offset as if it were running under a Unix system and strip out carriage return characters.

```
-Z, --null
    grep -Z pattern filename
```

Prints an ASCII NUL (a zero byte) after each filename. This is useful when processing filenames that may contain special characters (such as carrage returns).

### Context Line Control

```
-A NUM, --after-context=NUM
    grep -A 3 Copyright filename
```

Offers a context for matching lines by printing the NUM lines that following each match. A group separator (--) is placed between each set of matches. In this case, it will print the next three lines after the matching line. This is useful when searching through source code. 

```
-B NUM, --before-context=NUM
    grep -B 3 Copyright filename
```

Same concept as the **-A NUM** option, except that is prints the line *before* the match instead of after it.

```
-C NUM, --before-context=NUM
    grep -C 3 Copyright filename
```

The **-C NUM** option operates as if the user entered both the **-A NUM** and **-B NUM** options. It will display *NUM* lines before and after the match.

### File and Directory Selection

```
-a, --text
    grep -a pattern filename
```

Equivalent to the **-binary-files=text** option, allowing a binary file to be processed as if it were a text file.

```
--binary-files=TYPE
    grep --binary-files=TYPE pattern filename
```

*TYPE* can be either **binary**, **without-match**, or **text**. When *grep* first examines a file, it determines whether the file is a "binary" file (a file primarily composed of non-human-readable text) and changes its output accordingly. By default, a match in a binary file causes *grep* to display simply the message "Binary file *somefile.bin* matches." The default behavior can also be specified with the **--binary-files=binary** option.

When **TYPE** is **without-match**, *grep* does not search the binary file and proceeds as if had no matches (equivalent to the -l option). When **TYPE** is **text**, the binary file is processed like text (equivalent to the **-a** option). When **TYPE** is **without-match**, *grep* will simply skip those files and not search through them. Sometimes **--binary-files=text** outputs binary garbage and the terminal may interpret some of that garbage as commands, which in turn can render the
terminal unreadable until reset. To recover from this, use the commands *tput init* and *tput reset*.

```
-D ACTION, --devices=ACTION
    grep -D read 123-45-6789 /dev/hda1
```

If the input file is a special file, such as FIFO or a socket, this flag tells *grep* how to procceed. By default, *grep* will process these files as if they were normal files on a system. If **ACTION** is set to **skip**, *grep* will silently ignore them. The example will search an entire disk partition for the fake Social Security number shown. When **ACTION** is set to **read**, *grep* will read through the device as if it were a normal file.

```
-d ACTION, --directories=ACTION
    grep -d ACTION pattern path
```

This flag tells *grep* how to process directories submitted as input files. When **ACTION** is **read**, this reads the directory as if were a file. **recurse** searches the files within that directory (same as the **-R** option), and **skip** skips the directory without searching it.

```
--exclude=GLOB
    grep --exclude=PATTERN path
```

Refines the list of input files by telling *grep* to ignore files whose names match the specified pattern. *PATTERN*  can be an entire filename or can contain the typical "file-globbing" wildcards the shell uses when matching files (*, ? and []). For instance, **--exclude=*.exe** will skip all files ending in *.exe*.

```
--exclude-from=FILE
    grep --exclude-from=FILE path
```

Similar to the **--exclude** option, except that it takes a list of patterns from a specified filename, which lists each pattern on a separate line. *grep* will ignore all files that match any lines in the list of patterns given.

```
--exclude-dir=DIR
    grep --exclude-dir=DIR pattern path
```

Any directories in the path matching the pattern *DIR* will be excluded from recursive searches. In this case, the actual directory name (relatvie name or absolute path name) has to be included to be ignored. This option also must be used with the **-r** option or the **-d recurse** option in order to be relevant.

```
-l
    grep -l pattern filename
```

Same as the **--binary-files=without-match** option. When *grep* finds a binary file, it will assume there is no match in the file.

```
--include=GLOB
    grep --include=*.log pattern filename
```

Limits searches to input files whose names match the given pattern. This option is particularly useful when searching directories using the **-R** option. Files not matching the given pattern will be ignored. An entire filename can be specified, or can contain the typical "file-globbing" wildcards the shell uses when matching files (*, ? and []).

```
-R, -r, --recursive
    grep -R pattern path
    grep -r pattern path
```

Searches all files underneath each directory submitted as an input file to *grep*.

## Extended Regular Expressions (egrep or grep -E)

*grep -E* and *egrep* are the same exact command. The commands search files for patterns that have been interpreted as extended regular expressions. As far as command-line options, *grep -E* and *grep* take the same ones—the only differences are in how they process the search pattern:

```
?
```

`?` is an expression carries the meaning of *optional*. Any character preceding the question mark may or may not appear in the target string. For example, say you are looking for the word "behavior", which can also be written as "behaviour". Instead of using the (|) option, you can use the command:

```
egrep 'behaviou?r' filename
```

```
+
```

`+` will look at the previous character and allow an unlimited amount of repetitions when it looks for matching strings. For instance, the following command would match both "pattern1" and "pattern1111", but would not match "pattern":

```
egrep 'pattern1+' filename
```

```
{n, m}
```

The braces are used to determine how many times a pattern needs to be repeated before a match occurs.

```
egrep 'pattern{4}' filename
egrep 'pattern{4,}' filename
egrep 'pattern{,4}' filename    # not valid. "no more than X" matches is not available
egrep 'pattern{4,6}' filename
```

```
|
```

Used in a regular expression, this character signifiers "or". As a result, pipe (|) allows you to combine several patterns into one expression.

```
egrep 'name1|name2' filename
```

```
()
```

Parentheses can be used to "group" particular strings of text for the purposes of backreferences, alternation, or simply readability. Additionally, the use of parentheses can help resolve any ambiguity in precisely what the user wants the search pattern to do. Patterns placed inside parentheses are often called subpatterns.

Also parentheses put limits on pipe (|). This allows the user to more tightly define which strings are part of or in scope of the "or" operation.

```
egrep 'patt(a|e)rn' filename
```

Without the parentheses, the search pattern would be **patta|ern**, which would match if the string "patta" or "ern" is found, a very different outcome than the intention.

In basic regular expressions, the backslash (\) negates the metacharacter's behavior and forces the search to match the character  in a literal sense. The same happans in *egrep*, but these is an exception. The metacharacter *{* is not supported by the traditional *egrep*. Although some versions interpret \{ literally, it should be avoided  in *egrep* patterns. Instead [{] should be used to match the character without invoking the special meaning.

It is not precisely true that basic *grep* does not have these metacharacters as well, It does, but they cannot be used directly. Each of hte special metacharacters in extended regular expressions needs to be prefaced by an escape to draw out its special meaning. Note that this is the reverse of normal escaping behavior, which usually strips special meaning.

Basic versus extended regular expressions comparison


 Basic regular expressions | Extended regular expressions
---------------------------|-------------------------------
'\(red\)'                  | '(red)'
'a\{1,3\}'                 | 'a{1,3}'
'behaviou\?r'              | 'behaviou?r'
'pattern\+'                | 'pattern+'

## Fixed Strings (fgrep or grep -F)

*fgrep* is known as fixed string or fast *grep*. It is known as "fast grep" because of the great performance it has compared to *grep* and *egrep*. It accomplishes this by dropping regular expressions altogether and looking for a defined string pattern. It is useful for searching for specific static content in a precise manner.

The command to evoke *fgrep* is:

```
fgrep string_pattern filename
```

By design, *fgrep* was intended to operate fast and free of intensive functions; as a result, it can take a more limited set of command-line options.

```
-b
    fgrep -b string_pattern filename
```

Shows the block number where the *string_pattern* was found. Because entire lines are printed by default, the byte number displayed is the byte offset of the start of the line.

```
-c
    fgrep -c string_pattern filename
```

This counts the number of lines that contain one or more instances of the *string_pattern*.

```
-e, -string
    fgrep -e string_pattern filename
```

Used for the search of more than one pattern or when the *string_pattern* begins with hyphen. Though you can use a newline character to specify more than one string, instead you could use multiple *-e* options, which is useful in scripting:

```
fgrep -e string_pattern1
-e string_pattern2 filename
```

```
-f file
    fgrep -f newfile string_pattern filename
```

Outputs the results of the search into a new file instead of printing directly to the terminal. This is unlike the behavior of the **-f** option in grep; there it specifies a search pattern input file.

```
-h
    fgrep -h string_pattern filename
```

When the search is done in more than one file, using -h stops *fgrep* from displaying *filenames* before the matched output.

```
-i
    fgrep -i string_pattern filename
```

The -i option tells *fgrep* to ignore capitalization contained in the *string_pattern* when matching the pattern.

```
-l
    fgrep -l string_pattern filename
```

Displays the files containing the *string_pattern* but not the matching lines themselves.

```
-n
    fgrep -n string_pattern filename
```

Prints out the line number before the line that matches the given *string_pattern*.

```
-v
    fgrep -v string_pattern filename
```

Matches any lines that do not contain the given *string_pattern*.

```
-x
    fgrep -x string_pattern filename
```

Prints out the lines that match the *string_pattern* in their entirety. This is the default behavior of *fgrep*, so usually it does not need to be specified.

## Perl-Style Regular Expressions (grep -P)
## Introduction to grep-Relevant Environment Variables
## Choosing Between grep Types and Performance Considerations

### When to Use grep -E

Although almost everything can be done in *grep -G* that can be done *grep -E*, the latter has the advantage of accomplishing the task in fewer characters, without the counterintuitive escaping discussed earlier. All of the extra functionality in extended regular expressions has to do with quantifiers or subpatterns. Additionally, if any significant use of backreferences is needed, extended regular expression are ideal.

### When to Use grep -F

There is one prerequisite to using *grep -F*, and if a user cannot meet that requirement, *grep -F* is simply not an option. Namely, any search pattern for *grep -F* cannot contain any metacharacters, escapes, wildcards, or alternations. Its performance is faster, but at the expense of functionality.

That said, *grep -F* is expremely useful for quickly searching large amounts of data for tightly defined strings, making it the ideal tool to search through immense logfiles quickly. If fact, it is fairly easy to develop a robust "log watching" script with *grep -F* and good text file listing of important words or phrases that should be pulled out of logfiles for analysis.

Another good use for *grep -F* is searching through mail logs and mail folders to ensure delivery of emails to users, especially on systems with many mail accounts. This is made possible by assigning every email message a unique Message ID. For instance:

```
grep -FHr Message-ID /var/mail
```

This command will search for the fixed string *MESSAGE-ID* for all files inside */var/mail* (and recurse any subdirectories), and then display the match and also the filename.

### When to Use grep -P

## Advanced Tips and Tricks with grep

### Backreferences
### Binary File Searching
### Useful Recipes
#### IP addresses

```
$ grep -E '\b[0-9]{1,3}(\.[0-9]{1,3}){3}\b' patterns
```

A more complicated formula to ensure that false positives are not registered looks like:

```
$ grep -E '\b((25[0-5]|2[0-4][0-9]|[01]?
[0-9][0-9]?)\.){3}(25[0-5]|2[0-4][0-9]|
[01]?[0-9][0-9]?)\b' patterns
```

#### MAC addresses

```
$ grep -Ei '\b[0-9a-f]{2}
(:[0-9a-f]{2}){5}\b' patterns
```

In this case, the additional -i options is added so no regard is given to capitalization. As with the IP recipe, [:xdigit:] could be used in place of [0-9a-f] if better readability is desired.

#### Email addresses

```
$ grep -Ei '\b[a-z0-9]{1,}@*\.
(com|net|org|uk|mil|gov|edu)\b' patterns
```

The list is only a partial subset of top-level domains that are currently approved for used. For instance, one may wish to search for only U.S.-based addresses, so the *.uk* result may not make mush sense. This pattern is basically a starting point for customization.

#### U.S.-based phone numbers

```
$ grep -E '\b(\(|)[0-9]{3}
(\)|-|\)-|)[0-9]{3}(-|)[0-9]{4}\b' patterns

(312)-555-1212
(312) 555-1212
312-555-1212
3125551212
```

#### Social Security numbers

```
$ grep -E '\b[0-9]{3}( |-|)
[0-9]{2}( |-|)[0-9]{4}\b' patterns

333333333
333 33 3333
333-33-3333
```

#### Credit card numbers
#### Copyright-protected or confidential material
#### Searching through large numbers of files
#### Matching strings across multiple lines

## Tips

### Find connection leak issue

```
find . -type f -name \*.cs -exec grep -n -B 10 "Connection.Open()" {} + > ~/result.txt
find . -type f -name \*.cs -exec grep -n -C 3 SMiPatchManager {} + | grep SMObject > ~/result.txt
```

Delete matched line in vim
```
:g/Contains/d