- nbextensions has many useful tools, e.g. Table of Contents, Autopep8

https://jupyter-contrib-nbextensions.readthedocs.io/en/latest/install.html

<br>

- View → Cell Toolbar → Slideshow to prepare slides in Jupyter notebook

<br>

In below,
- pytest fixture setup and teardown common resources
- pytest mark.parametrize allow each of the value in the list passed to the unit test once, and get tested.

In [None]:
import module_1

@pytest.fixture
def commonly_used_object():
    return module_1.func_1()

@pytest.mark.parametrize('col', ['col_1', 'col_2', 'col_3'])    
def test_function_1(commonly_used_object, col):
    df = commonly_used_object._generate_df()
    assert (col in df.columns)

<br>
<br>

In below,
- Single asterisk is for unnamed arguments, e.g. list
- Double asterisk is for named, keyword arguments, e.g. dict

In [13]:
def func_1(*args):
    for number in args:
        print(number)

func_1(1, 2, 3)

def func_2(**kwargs):
    print(sum(kwargs.values()))
    for name, number in kwargs.items():
        print("Name {} has a number of {}.".format(name, number))

func_2(aaa = 1, bbb = 2, ccc = 3)

1
2
3
6
Name aaa has a number of 1.
Name bbb has a number of 2.
Name ccc has a number of 3.


<br>
<br>

In below, 

Decorator, from https://www.python-course.eu/python3_decorators.php

In [27]:
def our_decorator(func):
    def function_wrapper(x):
        print("Before calling " + func.__name__)
        func(x)
        print("After calling " + func.__name__)
    return function_wrapper

@our_decorator
def foo(y):
    print("Hi, foo has been called with " + str(y))

# The @our_decorator is equivalent to
# foo = our_decorator(foo)
# The argument of foo() will be passed to function_wrapper()
foo("Hi")

Before calling foo
Hi, foo has been called with Hi
After calling foo


Can decorate more functions

In [26]:
@our_decorator
def succ(n):
    print(n + 1)

succ(10)

Before calling succ
11
After calling succ


Decorators with Parameters... Need to wrap another function around our previous decorator function.

In [35]:
def greeting(expr):
    def greeting_decorator(func):
        def function_wrapper(x):
            print(expr + ", " + func.__name__ + " returns:")
            func(x)
        return function_wrapper
    return greeting_decorator

@greeting("Good evening")
def foo(x):
    print(x + ", time to eat foo.")

@greeting("Good morning")
def bar(x):
    print(x + ", time to eat bar.")
    
foo("Adam")
bar("Alex")

Good evening, foo returns:
Adam, time to eat foo.
Good morning, bar returns:
Alex, time to eat bar.


<br>

In below,

setting up database easily

In [None]:
import pandas as pd
import sqlite3

conn = sqlite3.connect("../data/test.db")

test_df.to_sql("test_table", con=conn, if_exists="replace", index=False)

<br>

In below,

Flask example. Two chunks of code need to be run in two notebooks seperately.

In [None]:
import flask

def main():

    app = flask.Flask(__name__)

    example = {
        "student_1": 100,
        "student_2": 90,
        "student_3": 85,
        "student_4": 99
    }

    @app.route("/endpoint_1", methods=["GET"])
    def api_all():
        return flask.jsonify(example)

    @app.route("/endpoint_2", methods=["GET"])
    def api_code():
        code = flask.request.args["code"]
        return flask.jsonify(example[code])
    
    app.run()

if __name__ == "__main__":
    main()

In [None]:
import requests

# Example 1
endpoint = "http://127.0.0.1:5000/endpoint_1"

r = requests.get(endpoint)

print(r.status_code)

if r.status_code == 200:
    print("Success")
    print(r.json())
else:
    print("Fail")
    
# Example 2
endpoint = "http://127.0.0.1:5000/endpoint_2"

r = requests.get(endpoint, params={"code":"student_1"})

if r.status_code == 200:
    print("Success")
    print(r.json())
else:
    print("Fail")

<br>
<br>

Regex:
- `re.search()` with `span()` or `group()` reports the first match it finds.
- `re.findall()` finds all substrings where the RE matches, and returns them as a list.

<br>
<br>
In below,

fancy assertion:

In [36]:
assert False, "Did you know you can put explaination here?"

AssertionError: Did you know you can put explaination here?

<br>
<br>

In below,

Logging:

In [41]:
import logging

# for jupyter notebooks
logger = logging.getLogger()

# the file handler
fhandler = logging.FileHandler(filename="example.log", mode="a")

# format
formatter = logging.Formatter(
    "Timestamp: %(asctime)s - Level: %(levelname)s - Message: %(message)s"
)

# set the format
fhandler.setFormatter(formatter)

# add the file handler
logger.addHandler(fhandler)

# setting the level of logging, messages below this severity will not be logged
logger.setLevel(logging.WARNING)

Now try generating five messages in the order of severity

In [42]:
logging.debug("debug message")
logging.info("info message")
logging.warning("warning message")
logging.error("error message")
logging.critical("critical message")

In the log, there is now:
- Timestamp: 2020-06-05 10:04:15,631 - Level: WARNING - Message: warning message
- Timestamp: 2020-06-05 10:04:15,632 - Level: ERROR - Message: error message
- Timestamp: 2020-06-05 10:04:15,632 - Level: CRITICAL - Message: critical message

<br>
<br>

In below,

`np.where()`

In [1]:
import numpy as np

a = np.arange(9).reshape((3, 3))
print(a)

[[0 1 2]
 [3 4 5]
 [6 7 8]]


In [2]:
print(np.where(a < 4, a * 10, -1))

[[ 0 10 20]
 [30 -1 -1]
 [-1 -1 -1]]


<br>
<br>

In below,

To return an object other than an integer/string from `np.vectorize`, you need to specify the object type: 

    np.vectorize(foo, otypes=[list])

else you will get a 
    
    ValueError: setting an array element with a sequence

<br>
<br>

In below,

how to generate `requirements.txt`, and how to use it:

In [None]:
pip freeze > requirements.txt
pip install -r requirements.txt

<br>
<br>

In below,

how to ask pylint to ignore a specific line for a specific check. And see all checks that can be disabled here https://docs.pylint.org/en/1.6.0/features.html

In [None]:
x    =         x                +       1 # pylint: disable=line-too-long, bad-whitespace

<br>
<br>

In below,

how to get every combination of items in two lists:

In [2]:
import itertools
for row, col in list(itertools.product([1,2,3], [4,5,6])):
    print(row, col)

1 4
1 5
1 6
2 4
2 5
2 6
3 4
3 5
3 6


<br>
<br>

In [6]:
s = pd.Series(['cat', 'dog', np.nan, 'rabbit'])
s.map({'cat': 'kitten', 'dog': 'puppy'})

0    kitten
1     puppy
2       NaN
3       NaN
dtype: object

<br>
<br>


In [4]:
s = pd.Series([3, 1, 2, 3, 4, np.nan])
s.value_counts()

3.0    2
4.0    1
2.0    1
1.0    1
dtype: int64

In [6]:
s.value_counts(normalize=True, dropna=False)

3.0    0.333333
NaN    0.166667
4.0    0.166667
2.0    0.166667
1.0    0.166667
dtype: float64

<br>
<br>

In below,

a function to compare two dataframes

In [None]:
# compare two data frames and show diffs (need equal shapes and sort)
import numbers

def dfDiff(oldFrame, newFrame):
    # convert all numerical columns to 4 decimal places and 64 bits
    oldFrame = oldFrame.apply(lambda x: np.around(np.float64(x), 4) if isinstance(x[0], numbers.Number) else x)
    newFrame = newFrame.apply(lambda x: np.around(np.float64(x), 4) if isinstance(x[0], numbers.Number) else x)
    
    dfBool = (oldFrame != newFrame).stack()  
    diff = pd.concat([oldFrame.stack()[dfBool],
    newFrame.stack()[dfBool]], axis=1)
    diff.columns=["Old", "New"]
    return diff

<br>
<br>

In below,

sick of big data frame for showing examples? Just pick, say, 30 random rows.

In [None]:
df.sample(n=30, random_state=1)

<br>
<br>

crosstab

In [2]:
a = np.array(["foo", "foo", "foo", "foo", "bar", "bar", "bar", "bar", "foo", "foo", "foo"])
b = np.array(["one", "one", "one", "two", "one", "one", "one", "two", "two", "two", "one"])
pd.crosstab(a, b, rownames=['a'], colnames=['b'])

b,one,two
a,Unnamed: 1_level_1,Unnamed: 2_level_1
bar,3,1
foo,4,3


<br>
<br>

In below,

Counter. See many more functionality https://docs.python.org/3/library/collections.html#collections.Counter

In [1]:
from collections import Counter
c = Counter(['eggs', 'ham', 'bacon', 'bacon', 'ham', 'ham', 'ham'])
c['bacon']

2

In [2]:
c.most_common(2)

[('ham', 4), ('bacon', 2)]

<br>
<br>

In below,

1) Linting and formatting are not the same. Quote "Linting, on the other hand, analyzes code for common syntactical, stylistic, and functional errors as well as unconventional programming practices that can lead to errors. Although there is a little overlap between formatting and linting, the two capabilities are complementary.)"

2) Run autopep8 by `Shift + Option + F`

3) No need to enable linting. It is enabled by default.

4) Autopep8 is the default VS code python fomatter, but it is not powerful enough, a lot of correction is not done. Need to specify `"python.linting.pycodestyleEnabled": true` in “settings.json”, see errors in the lower pane, and manually do the corrections.

5) To customize pep8 rules, in the Command Palette, search for “settings.json” (choose Open Settings (JSON), not Open Default Settings (JSON)). Then add the following:

In [None]:
{
    "_a_place_for_comments" : [
        "pycodestyleArgs and autopep8Args are both about pep8 and must be the same",
        "autopep8Args specifies shift + option + F autoformating, won't correct everything, so need pycodestyleArgs",
        "pycodestyleArgs displays pep8 warnings and errors"
    ],
     "python.linting.pycodestyleEnabled": true,
     "python.linting.pycodestyleArgs": [
        "--max-line-length=100",
        "--ignore=E251,E266"
    ],
    "python.formatting.autopep8Args": [
        "--max-line-length=100",
        "--ignore=E251,E266"
    ]
}

<br>
<br>
<br>

<div class="alert alert-danger" role="alert">
    <p>How to show an alert.</p>
    <p>How to show an alert.</p>
</div>

<div class="alert alert-info">
    <p>How to show an alert.</p>
</div>

<br>
<br>

## Table of contents
[1. Step 1](#Step_1)<br>
[2. Step 2](#Step_2)<br>
[3. Step 3](#Step_3)<br>

- It seems that a snake case section heading is necessary.
- `#` is necessary
- As long as `<a class="anchor" id="Step_3"></a>` is added in markdown, naming of the Step 3 heading does not matter.
- Size of the heading does not matter.

## Step_1

# Step_2

<a class="anchor" id="Step_3"></a>

#### Step aaaaaaaa 3

```python
# How to use python style markdown
df = function_1(df)
df = function_2(df, 'SOME_STRING')
```

<br>
<br>
<br>

In below,

how to make a table

col_1|col_2
----|-------
1 | “aa”
1 | “bb”
2 | “aa”

<br>
<br>

In below,

how to read in all types of YAML formats

In [45]:
import yaml
with open('yaml_examples.yml', 'r') as file:
    config = yaml.load(file, Loader=yaml.FullLoader)

In [46]:
config["example_1"]

'20190101'

<br>
To get List

In [47]:
config["example_2"]

[['item_1', 'item_2', 'item_3']]

In [48]:
config["example_3"]

['item_1', 'item_2', 'item_3']

In [49]:
config["example_4"]

['item_1', 'item_2', 'item_3']

<br>
To get Dict

In [50]:
config["example_5"]

{'item_1': '9:00 am', 'item_2': '12:00 pm', 'item_3': '5:00 pm'}

In [51]:
config["example_6"]

{'item_1': '9:00 am', 'item_2': '12:00 pm', 'item_3': '5:00 pm'}

<br>
Creative

In [52]:
config["example_7"]

'{item_1} + {item_2} = 1234'

In [53]:
config["example_7"].format(item_1 = "AAA", item_2 = "BBB")

'AAA + BBB = 1234'

`()` are not recognized, and therefore are treated as characters.

In [54]:
config["example_8"]

"('item_1','item_2','item_3')"

<br>
Reference within YAML file

In [55]:
config["example_9"]

{'item_1': '9:00 am', 'item_2': '12:00 pm', 'item_3': ['5:00 pm', '9:00 pm']}

In [56]:
config["example_10"]

{'employee_1': {'name': 'AAA',
  'age': 20,
  'arrival': '9:00 am',
  'lunch': '12:00 pm',
  'leave (normal/night)': ['5:00 pm', '9:00 pm']},
 'employee_2': {'name': 'BBB',
  'age': 21,
  'arrival': '9:00 am',
  'lunch': '12:00 pm',
  'leave (normal/night)': ['5:00 pm', '9:00 pm']},
 'employee_3': {'name': 'CCC',
  'age': 22,
  'arrival': '9:00 am',
  'lunch': '12:00 pm',
  'leave (normal/night)': ['5:00 pm', '9:00 pm']},
 'employee_4': {'name': 'DDD',
  'age': 23,
  'arrival': '9:00 am',
  'lunch': '12:00 pm',
  'leave (normal/night)': ['5:00 pm', '9:00 pm']},
 'employee_5': {'name': 'EEE',
  'age': 24,
  'arrival': '9:00 am',
  'lunch': '12:00 pm',
  'leave (normal/night)': ['5:00 pm', '9:00 pm']}}

<br>
<br>

In below,

in a notebook on the Kaggle platform, create a link to download the dataframe which was saved with .to_csv method

In [None]:
from IPython.display import HTML

def create_download_link(title = "Download CSV file", filename = "data.csv"):  
    html = '<a href={filename}>{title}</a>'
    html = html.format(title=title,filename=filename)
    return HTML(html)

# create a link to download the dataframe which was saved with .to_csv method
create_download_link(filename='predictions.csv')

<br>
<br>

In below,

@classmethod

The class method is always attached to a class, with the first argument as the class itself `cls`.

In [2]:
from datetime import date

# random Person
class Person:
    def __init__(self, name, age):
        self.name = name
        self.age = age

    @classmethod
    def fromBirthYear(cls, name, birthYear):
        return cls(name, date.today().year - birthYear)

    def display(self):
        print(self.name + "'s age is: " + str(self.age))

person = Person('Adam', 19)
person.display()

person1 = Person.fromBirthYear('John',  1985)
person1.display()

Adam's age is: 19
John's age is: 35


<br>

@staticmethod

In [7]:
class Student(object):

    @staticmethod
    def is_full_name(name_str):
        names = name_str.split(' ')
        return len(names) > 1

print(Student.is_full_name('Scott Robinson'))   # True
print(Student.is_full_name('Scott'))            # False

True
False


<br>
<br>

In below,

Set the cell type to `Raw NBConvert` to reserve code not to be run

<br>
<br>

In below,

An instances of a class that has a `__call__` function behaves like a function and can be called like a function.

From: https://discuss.pytorch.org/t/is-model-forward-x-the-same-as-model-call-x/33460

    forward function is called in the .__call__ function.
    __call__ is already defined in nn.Module, will register all hooks and call your forward function. 
    That’s also the reason to call the module directly (output = model(data)) instead of model.forward(data).

In [1]:
class Product: 
    def __init__(self): 
        print("Instance Created") 
  
    # Defining __call__ method 
    def __call__(self, a, b): 
        print(a * b) 
  
# Instance created
ans = Product() 
  
# __call__ method will be called 
ans(10, 20)

Instance Created
200


<br>
<br>

A function that returns `self` will allow method cascading. 

See https://stackoverflow.com/questions/43380042/purpose-of-return-self-python 

An example is having it in a `fit()` function.

<br>
<br>

In below,

use `setattr` to assist method chaining

In [1]:
import pandas as pd

df = pd.DataFrame({
    'Brand': ['Honda Civic','Toyota Corolla','Ford Focus','Audi A4'],
    'Price': [22000,25000,27000,35000]
})

def user_defined_function(df, arg_1, arg_2):
    print(df[arg_1].max())
    print(df.sort_values(arg_2))

setattr(pd.DataFrame, 'user_defined_function', user_defined_function)

df

Unnamed: 0,Brand,Price
0,Honda Civic,22000
1,Toyota Corolla,25000
2,Ford Focus,27000
3,Audi A4,35000


In [2]:
df.user_defined_function(arg_1="Price", arg_2="Brand")

35000
            Brand  Price
3         Audi A4  35000
2      Ford Focus  27000
0     Honda Civic  22000
1  Toyota Corolla  25000


<br>
<br>

Good source of built-in data https://scikit-learn.org/stable/datasets/index.html

make_classification( ) is a good starting point of making dummy data https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_classification.html

<br>
<br>

In below

pandas.explode

In [3]:
df = pd.DataFrame({'A': [[1, 2, 3], 'foo', [], [3, 4]], 'B': 1})
df

Unnamed: 0,A,B
0,"[1, 2, 3]",1
1,foo,1
2,[],1
3,"[3, 4]",1


In [4]:
df.explode('A')

Unnamed: 0,A,B
0,1,1
0,2,1
0,3,1
1,foo,1
2,,1
3,3,1
3,4,1


<br>
<br>

In below,

how to add a percentile (of a column) for each row

In [8]:
df = pd.DataFrame({'A': [1, 2, 3, 3, 4, 10, 13, 17, 18, 18, 20]})
df.assign(percentile_rank=df["A"].rank(pct=True))

Unnamed: 0,A,percentile_rank
0,1,0.090909
1,2,0.181818
2,3,0.318182
3,3,0.318182
4,4,0.454545
5,10,0.545455
6,13,0.636364
7,17,0.727273
8,18,0.863636
9,18,0.863636


<br>
<br>

In below,

`groupby and agg` can work with `lambda`

In [11]:
df = pd.DataFrame({
    "a": ["foo", "foo", "foo", "foo", "bar", "bar", "bar", "bar", "foo", "foo", "foo"],
    "b": ["one", "one", "one", "two", "one", "one", "one", "two", "two", "two", "one"]
})
df

Unnamed: 0,a,b
0,foo,one
1,foo,one
2,foo,one
3,foo,two
4,bar,one
5,bar,one
6,bar,one
7,bar,two
8,foo,two
9,foo,two


In [14]:
(
    df
    .groupby("a")
    .agg({'b': lambda x: x.str.cat(sep=' ')})
    .reset_index()
)

Unnamed: 0,a,b
0,bar,one one one two
1,foo,one one one two two two one
