### Lambda functions
You'll now be introduced to a powerful Python feature that will help you clean your data more effectively: lambda functions. Instead of using the def syntax that you used in the previous exercise, lambda functions let you make simple, one-line functions.

For example, here's a function that squares a variable used in an .apply() method:

def my_square(x):
    return x ** 2

df.apply(my_square)
The equivalent code using a lambda function is:

df.apply(lambda x: x ** 2)
The lambda function takes one parameter - the variable x. The function itself just squares x and returns the result, which is whatever the one line of code evaluates to. In this way, lambda functions can make your code concise and Pythonic.

The tips dataset has been pre-loaded into a DataFrame called tips. Your job is to clean its 'total_dollar' column by removing the dollar sign. You'll do this using two different methods: With the .replace() method, and with regular expressions. The regular expression module re has been pre-imported.

### Instructions

- Use the .replace() method inside a lambda function to remove the dollar sign from the 'total_dollar' column of tips.
- You need to specify two arguments to the .replace() method: The string to be replaced ('$'), and the string to replace it by ('').
- Apply the lambda function over the 'total_dollar' column of tips.
- Use a regular expression to remove the dollar sign from the 'total_dollar' column of tips.
- The pattern has been provided for you: It is the first argument of the re.findall() function.
- Complete the rest of the lambda function and apply it over the 'total_dollar' column of tips. Notice that because re.findall() returns a list, you have to slice it in order to access the actual value.
- Hit 'Submit Answer' to verify that you have removed the dollar sign from the column.

In [10]:
import pandas as pd
import re
tips=pd.read_csv('tip.txt')
tips.head()

Unnamed: 0,num,total_bill,tip,sex,smoker,day,time,size,total_dollar,Unnamed: 9
0,0,16.99,1.01,Female,No,Sun,Dinner,2,$16.99,
1,1,10.34,1.66,Male,No,Sun,Dinner,3,$10.34,
2,2,21.01,3.5,Male,No,Sun,Dinner,3,$21.01,
3,3,23.68,3.31,Male,No,Sun,Dinner,2,$23.68,
4,4,24.59,3.61,Female,No,Sun,Dinner,4,$24.59,


In [11]:
# Write the lambda function using replace
tips['total_dollar_replace'] = tips.total_dollar.apply(lambda x: x.replace('$', ''))

# Write the lambda function using regular expressions
tips['total_dollar_re'] = tips.total_dollar.apply(lambda x: re.findall('\d+\.\d+', x)[0]) # [1] IndexError: list index out of range

# Print the head of tips
print(tips.head())

   num  total_bill   tip     sex smoker  day    time  size total_dollar  \
0    0       16.99  1.01  Female     No  Sun  Dinner     2       $16.99   
1    1       10.34  1.66    Male     No  Sun  Dinner     3       $10.34   
2    2       21.01  3.50    Male     No  Sun  Dinner     3       $21.01   
3    3       23.68  3.31    Male     No  Sun  Dinner     2       $23.68   
4    4       24.59  3.61  Female     No  Sun  Dinner     4       $24.59   

   Unnamed: 9 total_dollar_replace total_dollar_re  
0         NaN                16.99           16.99  
1         NaN                10.34           10.34  
2         NaN                21.01           21.01  
3         NaN                23.68           23.68  
4         NaN                24.59           24.59  


In [12]:
re.findall(r'\w','http://www.hackerrank.com/')

['h',
 't',
 't',
 'p',
 'w',
 'w',
 'w',
 'h',
 'a',
 'c',
 'k',
 'e',
 'r',
 'r',
 'a',
 'n',
 'k',
 'c',
 'o',
 'm']

In [13]:
print(re.finditer(r'\w','http://www.hackerrank.com/'))
map(lambda x: x.group(),re.finditer(r'\w','http://www.hackerrank.com/'))

<callable_iterator object at 0x0000024AD5B3CAC8>


<map at 0x24ad5b3ccc0>

- \d= any number (a digit)
- \D= anything but a number (a non-digit)
- \s = space (tab,space,newline etc.)
- \S= anything but a space
- \s= space
- \t =tab
- \e = escape
- \n = new line
- \w = letters ( Match alphanumeric character, including "_")
- \W =anything but letters ( Matches a non-alphanumeric character excluding "_")
- . = anything but letters (periods)
- ? = matches 0 or 1
- \b = any character except for new line
- {x} = this amount of preceding code
- ^ match start of a string
-    '+ = matches 1 or more'
- [] = range or "variance"

In [14]:
import re
regex = "\[P\] (.+?) \[/P\]+?"
line = "President [P] Barack Obama [/P] met Microsoft founder [P] Bill Gates [/P], yesterday."
person = re.findall(regex, line)
print(person)

['Barack Obama', 'Bill Gates']


In [15]:
#    Example of w+ and ^ Expression
#   "^": This expression matches the start of a string
    
# "w+": This expression matches the alphanumeric character in the string
xx = "guru99,education is fun"
r1 = re.findall("^\w+",xx)
print(r1)

['guru99']


In [16]:
xx = "guru99,education is fun"
r1 = re.findall(r"^\w",xx)
print(r1)

['g']


In [17]:
#  "s": This expression is used for creating a space in the string

print((re.split('\s','we are splitting the words')))
print((re.split('s','split the words')))

['we', 'are', 'splitting', 'the', 'words']
['', 'plit the word', '']


Using regular expression methods
The "re" package provides several methods to actually perform queries on an input string. The method we going to see are

- re.match()
- re.search()
- re.findall()

In [18]:
txt = "The rain in Spain"
x = re.search("^The.*Spain$", txt)
print(x)

<re.Match object; span=(0, 17), match='The rain in Spain'>


# [url to official](https://docs.python.org/2/library/re.html)
# [w3school](https://www.w3schools.com/python/python_regex.asp)

In [19]:
str = "The rain in Spain"
x = re.findall("ai", str)
print(x)

['ai', 'ai']


In [20]:
str = "The rain in Spain"
x = re.search("\s", str)
print(x)
print(x.start())
print("The first white-space character is located in position:", x.start())

<re.Match object; span=(3, 4), match=' '>
3
The first white-space character is located in position: 3


In [21]:
str = "The rain in Spain"
x = re.split("\s", str)
print(x)

['The', 'rain', 'in', 'Spain']


In [22]:
str = "The rain in Spain"
x = re.split("\s", str, 1)
print(x)

['The', 'rain in Spain']


In [23]:
str = "The rain in Spain"
x = re.sub("\s", ":", str)
print(x)

The:rain:in:Spain


In [24]:
str = "The rain in Spain"
x = re.sub("\s", "9", str, 2)
print(x)

The9rain9in Spain


In [25]:
str = "The rain in Spain"
x = re.sub("\s", "9", str,1)
print(x)

The9rain in Spain


In [38]:
str = "The rain in Spain"
x = re.search("Sp", str)
print(x)
print(x.start()) # first position

<re.Match object; span=(12, 14), match='Sp'>
12


In [27]:
str = "The rain in Spain"
x = re.search(r"\bS\w+", str)
print(x.span())

(12, 17)


In [28]:
str = "The rain in Spain"
x = re.search(r"\bS\w+", str)
print(x.string)

The rain in Spain


In [29]:
str = "The rain in Spain"
x = re.search(r"\bS\w+", str)
print(x.group())

Spain


[url](https://docs.python.org/2.0/lib/match-objects.html)