## Python Basics - 3

### Lambda

In [4]:
var = lambda a:a+20
var(5)

25

In [5]:
lambda_add = lambda a,b : a+b
lambda_add(5,3)

8

### Filter

filter() is to filter out all the elements of a sequence, for which the function returns True

In [6]:
original_list = [5, 17, 32, 43, 12, 62, 237, 133, 78, 21]
# odd numbers
list(filter(lambda x: (x%2 != 0) , original_list))

[5, 17, 43, 237, 133, 21]

In [7]:
original_list = [5, 11, 15, 43, 20, 65, 235, 133, 75, 21]
# multiples of 5
list(filter(lambda x: (x%5 == 0) , original_list))

[5, 15, 20, 65, 235, 75]

In [8]:
original_list = ['Analytics','Standard','Super','Data','Science','Vidhya']
# starting with 's'
list(filter(lambda x: (x[0] == 'S') , original_list))

['Standard', 'Super', 'Science']

### Map

map() is called with a lambda function and a new list is returned which contains all the lambda modified items returned by that function for each item

In [11]:
def times2(var):
    return var*2
seq = [1,2,3,4,5]
map(times2,seq)
list(map(times2,seq))

[2, 4, 6, 8, 10]

In [12]:
list(map(lambda var: var*2,seq))
list(filter(lambda item: item%2 == 0,seq))

[2, 4]

In [9]:
original_list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
# double each number
list(map(lambda x: x*2 , original_list)) 

[2, 4, 6, 8, 10, 12, 14, 16, 18, 20]

In [10]:
original_list = ['analytics','vidhya','king','south','east']
# capatilize first letters
list(map(lambda x: x[0].upper()+x[1:] , original_list)) 

['Analytics', 'Vidhya', 'King', 'South', 'East']

### Reduce

reduce() is called with a lambda function and a new reduced result is returned.

In [13]:
from functools import reduce
original_list = [1, 2, 3, 4, 5, 6, 7, 8, 9]
# sum
reduce((lambda x, y : x + y), original_list) 

45

In [14]:
original_list = [110, 53, 3, 424, 255, 16, 42, 256]
# largest number
reduce((lambda x, y: x if (x > y) else y ), original_list) 

424

### Math

In [3]:
import math
print(abs(-5))
print(math.ceil(35.74))
print(math.floor(16.94))
print(math.exp(7))
print(math.sqrt(25))
print(math.log(5))
print(math.log10(2))
print(max(3,5,7,1,10))
print(min(4,7,56,1,-9))
print(math.pow(4,2))
print(math.hypot(3,4))
print(math.pi)
print(math.factorial(4))

5
36
16
1096.6331584284585
5.0
1.6094379124341003
0.3010299956639812
10
-9
16.0
5.0
3.141592653589793
24


## Regex

Regular expressions, or regexes, are written in a condensed formatting language. In general, you can think of a regular expression as a pattern which you give to a regex processor with some source data. The processor then parses that source data using that pattern, and returns chunks of text back to the a data scientist or programmer for further manipulation. There's really three main reasons you would want to do this - to check whether a pattern exists within some source data, to get all instances of a complex pattern from some source data, or to clean your source data using a pattern generally through string splitting.

In [4]:
import re

In [5]:
# match() - checks for a match at the beginning of the string
# search() - checks for a match anywhere in the string

text = "This is a good day."
if re.search("good", text): 
    print("Wonderful!")
else:
    print("Alas :(")

Wonderful!


In [6]:
# Tokenizing - string is separated into substrings based on patterns
# findall(), split() - parse the string and return chunks

text = "Amy works diligently. Amy gets good grades. Our student Amy is succesful."
re.split("Amy", text)

['',
 ' works diligently. ',
 ' gets good grades. Our student ',
 ' is succesful.']

In [7]:
re.findall("Amy", text)

['Amy', 'Amy', 'Amy']

In [8]:
# regex markup language - describes patterns in text
# Anchors - specify start and/or the end of the string that you are trying to match
# ^ - start, $ - end

text = "Amy works diligently. Amy gets good grades. Our student Amy is succesful."
re.search("^Amy",text)

<re.Match object; span=(0, 3), match='Amy'>

In [9]:
grades="ACAAAABCBCBAA"
re.findall("B",grades)

['B', 'B', 'B']

In [10]:
# count the number of A's or B's in the list, we put the characters A and B inside square brackets
# all lower case letters - [a-z]
re.findall("[AB]",grades)

['A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'A', 'A']

In [11]:
# A followed by B or C
re.findall("[A][B-C]",grades)

['AC', 'AB']

In [12]:
# | = OR
re.findall("AB|AC",grades)

['AC', 'AB']

In [13]:
# negate our results - not A's
re.findall("[^A]",grades)

['C', 'B', 'C', 'B', 'C', 'B']

In [14]:
# Quantifiers are the number of times you want a pattern to be matched in order to match
# e{m,n}, where e = expression, m = minimum times, n = maximum times the item matched
re.findall("A{2,10}",grades)

['AAAA', 'AA']

In [15]:
# re.findall("AA",grades)
re.findall("A{2,2}",grades)

['AA', 'AA', 'AA']

In [16]:
# decreasing trend in a student's grades
re.findall("A{1,10}B{1,10}C{1,10}",grades)

['AAAABC']

In [17]:
with open("dataset/ferpa.txt","r") as file:
    wiki=file.read()
wiki

'\'Overview[edit]\\nFERPA gives parents access to their child\\\'s education records, an opportunity to seek to have the records amended, and some control over the disclosure of information from the records. With several exceptions, schools must have a student\\\'s consent prior to the disclosure of education records after that student is 18 years old. The law applies only to educational agencies and institutions that receive funds under a program administered by the U.S. Department of Education.\\n\\nOther regulations under this act, effective starting January 3, 2012, allow for greater disclosures of personal and directory student identifying information and regulate student IDs and e-mail addresses.[2] For example, schools may provide external companies with a student\\\'s personally identifiable information without the student\\\'s consent.[2]\\n\\nExamples of situations affected by FERPA include school employees divulging information to anyone other than the student about the stud

In [18]:
# find all headers in the text
# headers have the words [edit] behind them, followed by a newline character
# \w to match any letter, including digits and numbers
# * to match 0 or more times
re.findall("[\w ]*\[edit\]",wiki)

['Overview[edit]',
 'nAccess to public records[edit]',
 'nStudent medical records[edit]']