### Some good tricks in python programming

**Reference:** https://towardsdatascience.com/python-for-data-science-8-concepts-you-may-have-forgotten-i-did-825966908393

#### 1. list comprehension

In [1]:
x = [1, 2, 4, 6]
out = [w**2 for w in x]
print(out)

[1, 4, 16, 36]


#### 2. Lambda function

Basic syntax: lamdba arguments: expression

In [2]:
seq = [1, 2, 3, 4, 5]
# suppose we want to multiply each element by 2
# see what we get here:
seq * 2

[1, 2, 3, 4, 5, 1, 2, 3, 4, 5]

Now try this:

In [3]:
results = list(map(lambda var: var*2, seq))
print(results)

[2, 4, 6, 8, 10]


Filter out all elements larger than 2:

In [4]:
result = list(filter(lambda var: var>2, seq))
print(result)

[3, 4, 5]


**A simple application: **
Filtering the text from the corpuses that being component 'NN' 

In [12]:
import nltk

text = ["managing or supervising people or projects",
       "biological scientist in a research institute"]

tokenize = list(map(nltk.word_tokenize, text))
pos_tag = list(map(nltk.pos_tag, tokenize))
pos_tag

[[('managing', 'NN'),
  ('or', 'CC'),
  ('supervising', 'VBG'),
  ('people', 'NNS'),
  ('or', 'CC'),
  ('projects', 'NNS')],
 [('biological', 'JJ'),
  ('scientist', 'NN'),
  ('in', 'IN'),
  ('a', 'DT'),
  ('research', 'NN'),
  ('institute', 'NN')]]

Use the *filter* function:

In [35]:
list(filter(lambda x: x[1] == 'NN', pos_tag[0]))

[('managing', 'NN')]

Use *list comprehension*:

In [45]:
[x for x in pos_tag[0] if x[1] == 'NN']

[('managing', 'NN')]

Define a function and combine with the *map* function to reach the goal:

In [39]:
def pos_filter(compnt, lst):
    output = list(filter(lambda x: x[1] == 'NN', lst))
    return output

In [44]:
list(map(pos_filter, 'NN', pos_tag))

[[('managing', 'NN')],
 [('scientist', 'NN'), ('research', 'NN'), ('institute', 'NN')]]

#### 3. Meaning of axis

In [14]:
import pandas as pd
my_df = pd.DataFrame({'id': [1,2,3,4], 
                      'state': ['TN', 'VA', 'CA', 'IL'],
                      'timezone': ['ET', 'ET', 'PT', 'CT']}) 
my_df

Unnamed: 0,id,state,timezone
0,1,TN,ET
1,2,VA,ET
2,3,CA,PT
3,4,IL,CT


Work on rows, then use axis = 0:

In [13]:
my_df.drop(1, axis = 0)

Unnamed: 0,id,state,timezone
0,1,TN,ET
2,3,CA,PT
3,4,IL,CT


Work on columns, then use axis = 1:

In [18]:
my_df.drop('id', axis = 1)

Unnamed: 0,state,timezone
0,TN,ET
1,VA,ET
2,CA,PT
3,IL,CT
