<img src="https://user-images.strikinglycdn.com/res/hrscywv4p/image/upload/c_limit,fl_lossy,h_300,w_300,f_auto,q_auto/1266110/Logo_wzxi0f.png" style="float: left; margin: 20px; height: 55px">

**'It takes something more than intelligence to act intelligently.' - [Fyodor Dostoyevsky](https://en.wikipedia.org/wiki/Fyodor_Dostoevsky)**

***LEARNING OBJECTIVES***
- Intro to `regular expressions`
- Learn how to use `lambda function`.
- Deal with missing values
- Learn how to use `lists Comprehensions`

# Regular Expressions

## Introduction

Regular expressions are used to identify whether a pattern exists in a given sequence of characters (string) or not. They help in manipulating textual data, which is often a pre-requisite for data science projects that involve text mining. You must have come across some application of regular expressions: they are used at the server side to validate the format of email addresses or password during registration, used for parsing text data files to find, replace or delete certain string, etc.

In Python, regular expressions are supported by the `re` module. That means that if you want to start using them in your Python scripts, you have to import this module with the help of `import`
:

In [None]:
import re

## Basic Patterns: Ordinary Characters

You can easily tackle many basic patterns in Python using the ordinary characters. Ordinary characters are the simplest regular expressions. They match themselves exactly and do not have a special meaning in their regular expression syntax.

Ordinary characters can be used to perform simple exact matches:

In [None]:
pattern = r"Cookie"
sequence = "Cookie"

if re.match(pattern, sequence):
    print("Match!")
else: print("Not a match!")


In [None]:
pattern = r"Cokie"
sequence = "Cookie"

if re.match(pattern, sequence):
    print("Match!")
else: print("Not a match!")


The `match()` function returns a match object if the text matches the pattern. Otherwise it returns `None`. The `re` module also contains several other functions and you will learn some of them later on in the tutorial.

For now, though, let's focus on ordinary characters! Do you notice the `r` at the start of the pattern `Cookie`?

This is called a raw string literal. It changes how the string literal is interpreted. Such literals are stored as they appear.

For example, `\` is just a backslash when prefixed with a `r` rather than being interpreted as an escape sequence. You will see what this means with special characters. Sometimes, the syntax involves backslash-escaped characters and to prevent these characters from being interpreted as escape sequences, you use the raw `r` prefix. You don't actually need it for this example, however it is a good practice to use it for consistency.

## Special Characters

Special characters are characters which do not match themselves as seen but actually have a special meaning when used in a regular expression.

The most widely used special characters are:
- A period. Matches any single character except newline character.

In [None]:
re.search(r'Co.k.e', 'Cookie').group()

- Lowercase w. Matches any single letter, digit or underscore.

In [None]:
re.search(r'Co\wk\we', 'Cookie').group()

- Uppercase w. Matches any character not part of \w (lowercase w).

In [None]:
re.search(r'C\Wke', 'C@ke').group()

- Lowercase s. Matches a single whitespace character like: space, newline, tab, return.

In [None]:
re.search(r'Eat\scake', 'Eat cake').group()

- Uppercase s. Matches any character not part of \s (lowercase s).

In [None]:
re.search(r'Cook\Se', 'Cookie').group()

- [a-zA-Z0-9] - Matches any letter from (a to z) or (A to Z) or (0 to 9). Characters that are not within a range can be matched by complementing the set. If the first character of the set is ^, all the characters that are not in the set will be matched.

In [None]:
re.search(r'Number: [0-6]', 'Number: 5').group()

In [None]:
# Matches any character except 5
re.search(r'Number: [^5]', 'Number: 03').group()

## Repetitions

It becomes quite tedious if you are looking to find long patterns in a sequence. Fortunately, the re module handles repetitions using the following special characters:

- `+` - Checks for one or more characters to its left.

In [None]:
re.search(r'Co+kie', 'Cooookie').group()

- `*` - Checks for zero or more characters to its left.

In [None]:
# Checks for any occurrence of a or o or both in the given sequence
re.search(r'Ca*o*kie', 'Caokie').group()

- `?` - Checks for exactly zero or one character to its left.

In [None]:
# Checks for exactly zero or one occurrence of a or o or both in the given sequence
re.search(r'Colou?r', 'Color').group()

But what if you want to check for exact number of sequence repetition?

For example, checking the validity of a phone number in an application. re module handles this very gracefully as well using the following regular expressions:

- {x} - Repeat exactly x number of times.

- {x,} - Repeat at least x times or more.

- {x, y} - Repeat at least x times but no more than y times.

In [None]:
re.search(r'\d{9,10}', '0987654321').group()

## re Python Library

The re library in Python provides several functions that makes it a skill worth mastering. You have already seen some of them, such as the re.search(), re.match(). Let's check out some useful functions in detail:

- `search(pattern, string, flags=0)`

With this function, you scan through the given string/sequence looking for the first location where the regular expression produces a match. It returns a corresponding match object if found, else returns None if no position in the string matches the pattern. Note that None is different from finding a zero-length match at some point in the string.

In [None]:
pattern = "cookie"
sequence = "Cake and cookie"

re.search(pattern, sequence).group()

- `match(pattern, string, flags=0)`

Returns a corresponding match object if zero or more characters at the beginning of string match the pattern. Else it returns None, if the string does not match the given pattern.

In [None]:
pattern = "C"
sequence1 = "IceCream"

# No match since "C" is not at the start of "IceCream"
re.match(pattern, sequence1)


In [None]:
sequence2 = "Cake"

re.match(pattern,sequence2).group()

### search() versus match()

The `match()` function checks for a match only at the beginning of the string (by default) whereas the `search()` function checks for a match anywhere in the string.

`findall(pattern, string, flags=0)`

Finds all the possible matches in the entire sequence and returns them as a list of strings. Each returned string represents one match.

In [None]:
email_address = "Please contact us at: support@datacamp.com, xyz@datacamp.com"

#'addresses' is a list that stores all the possible match
addresses = re.findall(r'[\w\.-]+@[\w\.-]+', email_address)
for address in addresses: 
    print(address)

If you want to know more about regular expressions visit https://www.datacamp.com/community/tutorials/python-regular-expression-tutorial

#  Lambda Functions

Lambda is a tool for building functions. We already know how to build functions
using `def`, but let's do a quick comparison of the two.


Here's building a function using `def`:
```Python
def square_root(x): return math.sqrt(x)
```

Here's building the same function using `lambda`:
```Python
square_root_lambda = lambda x: math.sqrt(x)
```

We know that functions are usually created to reduce code duplication or to modularize code. But suppose you need to create a function that is going to be used only once ‚Äî called from only one place in your application. What would you do then?

Actually, lambdas are only useful when you want to define a **one-off function.** In other words, a function that will be used only once in your program. These functions are called anonymous functions.

As you will see later, there are many situations where anonymous functions can be useful.

Well, first of all, you *don‚Äôt need to give the function a name*. It can be ‚Äúanonymous‚Äù. And you can just define it right in the place where you want to use it. That‚Äôs where lambda is useful.

Some things to remember about lambda:
- it can only take a single expression
- it does not contain a return statement
- it is a tool for creating anonymous procedures

More information on [Lambda](https://pythonconquerstheuniverse.wordpress.com/2011/08/29/lambda_tutorial/).

Lambda functions can accept zero or more arguments but only one expression. The return value of the lambda function is the value that this expression is evaluated to.

For example, if we want to define the same function f that we defined before using lambda syntax, this is how it will look like:

In [None]:
f = lambda x: x * x
type(f)

## Lambdas with multiple arguments

As you saw earlier, it was easy to define a lambda function with one argument.

In [None]:
f = lambda x: x * x
f(5)

But if you want to define a lambda function that accepts more than one argument, you can separate the input arguments by commas.

For example, say we want to define a lambda that takes two integer arguments and returns their product.

In [None]:
f = lambda x, y: x * y
f(5, 2)

## Lambdas with no arguments

Say you want to define a lambda function that takes no arguments and returns True.

You can achieve this with the following code.

In [None]:
f = lambda: True
f()

## Multiline lambdas

Yes, at some point in your life you will be wondering if you can have a lambda function with multiple lines.

And the answer is:

No you can‚Äôt üôÇ

Python lambda functions accept only one and only one expression.

If your function has multiple expressions/statements, you are better off defining a function the traditional way instead of using lambdas.

## Using lambdas with map

One common operation you will apply to Python lists is to apply an operation to each item.

Map is a Python built-in function that takes in a function and a sequence as arguments and then calls the input function on each item of the sequence.

For example, assume we have a list of integers and we want to square each element of the list using the map function.

In [None]:
L = [1, 2, 3, 4]
list(map(lambda x: x**2, L))

See, instead of defining a function and then passing it to map as an argument, you can just use lambdas to quickly define a function inside the map parentheses.

This makes sense especially if you are not going to use this function again in your code.

## Using lambdas with filter

As the name suggests, filter is another built-in function that actually filters a sequence or any iterable object.

In other words, given any iterable object (like a list), the filter function filters out some of the elements while keeping some based on some criteria.

This criteria is defined by the caller of filter by passing in a function as an argument.

This function is applied to each element of the iterable.

If the return value is True, the element is kept. Otherwise, the element is disregarded.

for example, let‚Äôs define a very simple function that returns True for even numbers and False for odd numbers:

In [None]:
def even_fn(x):
    if x % 2 == 0:
        return True
    return False

print(list(filter(even_fn, [1, 3, 2, 5, 20, 21])))

That said, With the magic of lambdas you can do the same thing more succinctly.

The above code will transform into this one-liner

In [None]:
print(list(filter(lambda x: x % 2 == 0, [1, 3, 2, 5, 20, 21])))

## Missing data

Missing data is also known as `NaN` values.  By ‚Äúmissing‚Äù we simply mean null or ‚Äúnot present for whatever reason‚Äù. Many data sets simply arrive with missing data, either because it exists and was not collected or it never existed.

Use the `isnull()` method to detect the missing values. The output shows `True` when the value is missing. By adding an index into the dataset, you obtain just the entries that are missing.

In [None]:
import pandas as pd
import numpy as np

s = pd.Series([1, 2, 3, np.NaN, 5, 6, None])
print (s.isnull())

In [None]:
print (s[s.isnull()])

A dataset could represent missing data in several ways. In this example, you see missing data represented as `np.NaN` which stands for "NumPy Not a Number" and the Python `None` value.

### Fill in missing data

To fill in missing data use `fillna()`. For `fillna()` you need to provide a number. Usually, the mean, median, or mode is used. Let's use the same data set and this time let's fill in missing values with the mean.


In [None]:
s = pd.Series([1, 2, 3, np.NaN, 5, 6, None])
print (s.fillna(int(s.mean())))

We could also just drop all the NAs, by using `dropna()`:

In [None]:
s = pd.Series([1, 2, 3, np.NaN, 5, 6, None])
print (s.dropna())

Note: Here is some information on dealing with [missing data](http://pandas.pydata.org/pandas-docs/stable/missing_data.html).

#  List Comprehensions

Sometimes a programming design pattern becomes common enough to warrant its own special syntax. Python‚Äôs list comprehensions are a prime example of such a syntactic sugar.

List comprehensions in Python are great, but mastering them can be tricky because they don‚Äôt solve a new problem: they just provide a new syntax to solve an existing problem.

List comprehensions are a tool for transforming one list (any iterable actually) into another list. During this transformation, elements can be conditionally included in the new list and each element can be transformed as needed.

If you‚Äôre familiar with functional programming, you can think of list comprehensions as syntactic sugar for a `filter` followed by a `map`:



In [1]:
numbers = [1, 2, 3, 4, 5]

doubled_odds = map(lambda n: n * 2, filter(lambda n: n % 2 == 1, numbers))
doubled_odds = [n * 2 for n in numbers if n % 2 == 1]

## From loops to comprehensions

Every list comprehension can be rewritten as a for loop but not every for loop can be rewritten as a list comprehension.

The key to understanding when to use list comprehensions is to practice identifying problems that smell like list comprehensions.

If you can rewrite your code to look just like this for loop, you can also rewrite it as a list comprehension:

In [None]:
# Iterating through a string Using for Loop

h_letters = []

for letter in 'human':
    h_letters.append(letter)

print(h_letters)

In [None]:
#  Iterating through a string Using List Comprehension
h_letters = [ letter for letter in 'human' ]
print( h_letters)

## List Comprehensions vs Lambda functions

List comprehensions aren‚Äôt the only way to work on lists. Various built-in functions and lambda functions can create and modify lists in less lines of code.

In [None]:
letters = list(map(lambda x: x, 'human'))
print(letters)

However, list comprehensions are usually more human readable than lambda functions. It is easier to understand what the programmer was trying to accomplish when list comprehensions are used.

## Conditionals in List Comprehension

List comprehensions can utilize conditional statement to modify existing list (or other tuples). We will create list that uses mathematical operators, integers, and `range()`.

In [None]:
# Using if with for loops

number_list = []

for x in range(20):
    if x % 2 == 0:
        number_list.append(x)

print (number_list)

In [None]:
# Using if with List Comprehension

number_list = [ x for x in range(20) if x % 2 == 0]
print(number_list)

### Nested IF and else

In [None]:
# Nested IF with for loops

number_list = []

for y in range(100):
    if y % 2 == 0 and y % 5 == 0:
        number_list.append(y)

print (number_list)

In [None]:
# Nested IF with List Comprehension

num_list = [y for y in range(100) if y % 2 == 0 if y % 5 == 0]
print(num_list)

In [None]:
# if...else With List Comprehension

obj = ["Even" if i%2==0 else "Odd" for i in range(10)]
print(obj)

# Dictionaries Comprehensions `EXTRA`

[DataCamp Tutorial](https://www.datacamp.com/community/tutorials/python-dictionary-comprehension)