Jupyter Notebook tips: https://www.dataquest.io/blog/jupyter-notebook-tips-tricks-shortcuts/
1. command + shift + f  (open command palette)

2. evaluate variable in each line
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"
OR
create a file ~/.ipython/profile_default/ipython_config.py with the lines below.
c = get_config()

#Run all nodes interactively
c.InteractiveShell.ast_node_interactivity = "all"


In [None]:
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"


3. Easy links to documentation
3.1. Under menu of "Help" or 
3.2. ?str.replace()

4. Plotting in notebooks
There are many options for generating plots in your notebooks.

matplotlib (the de-facto standard), activated with %matplotlib inline – Here’s a Dataquest Matplotlib Tutorial.
%matplotlib notebook provides interactivity but can be a little slow, since rendering is done server-side.
Seaborn is built over Matplotlib and makes building more attractive plots easier. Just by importing Seaborn, your matplotlib plots are made ‘prettier’ without any code modification.
mpld3 provides alternative renderer (using d3) for matplotlib code. Quite nice, though incomplete.
bokeh is a better option for building interactive plots.
plot.ly can generate nice plots – this used to be a paid service only but was recently open sourced.
Altair is a relatively new declarative visualization library for Python. It’s easy to use and makes great looking plots, however the ability to customize those plots is not nearly as powerful as in Matplotlib.

5. IPython Magic Commands


In [4]:
# This will list all magic commands
%lsmagic
#https://ipython.readthedocs.io/en/stable/interactive/magics.html

Available line magics:
%alias  %alias_magic  %autoawait  %autocall  %automagic  %autosave  %bookmark  %cat  %cd  %clear  %colors  %conda  %config  %connect_info  %cp  %debug  %dhist  %dirs  %doctest_mode  %ed  %edit  %env  %gui  %hist  %history  %killbgscripts  %ldir  %less  %lf  %lk  %ll  %load  %load_ext  %loadpy  %logoff  %logon  %logstart  %logstate  %logstop  %ls  %lsmagic  %lx  %macro  %magic  %man  %matplotlib  %mkdir  %more  %mv  %notebook  %page  %pastebin  %pdb  %pdef  %pdoc  %pfile  %pinfo  %pinfo2  %pip  %popd  %pprint  %precision  %prun  %psearch  %psource  %pushd  %pwd  %pycat  %pylab  %qtconsole  %quickref  %recall  %rehashx  %reload_ext  %rep  %rerun  %reset  %reset_selective  %rm  %rmdir  %run  %save  %sc  %set_env  %store  %sx  %system  %tb  %time  %timeit  %unalias  %unload_ext  %who  %who_ls  %whos  %xdel  %xmode

Available cell magics:
%%!  %%HTML  %%SVG  %%bash  %%capture  %%debug  %%file  %%html  %%javascript  %%js  %%latex  %%markdown  %%perl  %%prun  %%pypy  %%

6. IPython Magic – %env: Set Environment Variables
You can manage environment variables of your notebook without restarting the jupyter server process. Some libraries (like theano) use environment variables to control behavior, %env is the most convenient way.

In [None]:
# Running %env without any arguments
# lists all environment variables# The line below sets the environment
# variable
%env OMP_NUM_THREADS%env OMP_NUM_THREADS=4

7. IPython Magic – %run: Execute python code
%run can execute python code from .py files – this is well-documented behavior. Lesser known is the fact that it can also execute other jupyter notebooks, which can quite useful.
Note that using %run is not the same as importing a python module.

In [None]:
# this will execute and show the output from
# all code cells of the specified notebook
%run ./two-histograms.ipynb

8. IPython Magic – %load: Insert the code from an external script
This will replace the contents of the cell with an external script. You can either use a file on your computer as a source, or alternatively a URL.

In [None]:
# Before Running
%load ./hello_world.py
# After Running
# %load ./hello_world.py
if __name__ == "__main__":
print("Hello World!")

Imbalanced classification: credit card fraud detection
Description: Demonstration of how to handle highly imbalanced classification problems.
This example looks at the Kaggle Credit Card Fraud Detection dataset to demonstrate how to train a classification model on data with highly imbalanced classes.

In [None]:
from tensorflow import keras
# Get the real data from https://www.kaggle.com/mlg-ulb/creditcardfraud/

data_file_url = (
    "https://www.kaggle.com/mlg-ulb/creditcardfraud/creditcard.csv"
)
csv_download = keras.utils.get_file(
    "creditcard.csv", data_file_url, extract=False
)


In [5]:
#vectorize the CSV data
import csv
import numpy as np

from tensorflow import keras
# Get the real data from https://www.kaggle.com/mlg-ulb/creditcardfraud/

fname = "/Users/lcao@us.ibm.com/keras_example_structured/creditcard.csv"

all_features = []
all_targets = []
with open(fname) as f:
    for i, line in enumerate(f):
        if i == 0:
            print("HEADER:", line.strip())
            continue  # Skip header
        fields = line.strip().split(",")
        all_features.append([float(v.replace('"', "")) for v in fields[:-1]])
        all_targets.append([int(fields[-1].replace('"', ""))])
        if i == 1:
            print("EXAMPLE FEATURES:", all_features[-1])

features = np.array(all_features, dtype="float32")
targets = np.array(all_targets, dtype="uint8")
print("features.shape:", features.shape)
print("targets.shape:", targets.shape)

HEADER: "Time","V1","V2","V3","V4","V5","V6","V7","V8","V9","V10","V11","V12","V13","V14","V15","V16","V17","V18","V19","V20","V21","V22","V23","V24","V25","V26","V27","V28","Amount","Class"
EXAMPLE FEATURES: [0.0, -1.3598071336738, -0.0727811733098497, 2.53634673796914, 1.37815522427443, -0.338320769942518, 0.462387777762292, 0.239598554061257, 0.0986979012610507, 0.363786969611213, 0.0907941719789316, -0.551599533260813, -0.617800855762348, -0.991389847235408, -0.311169353699879, 1.46817697209427, -0.470400525259478, 0.207971241929242, 0.0257905801985591, 0.403992960255733, 0.251412098239705, -0.018306777944153, 0.277837575558899, -0.110473910188767, 0.0669280749146731, 0.128539358273528, -0.189114843888824, 0.133558376740387, -0.0210530534538215, 149.62]
features.shape: (284807, 30)
targets.shape: (284807, 1)


In [6]:
#Prepare a validation set
num_val_samples = int(len(features) * 0.2)
train_features = features[:-num_val_samples]
train_targets = targets[:-num_val_samples]
val_features = features[-num_val_samples:]
val_targets = targets[-num_val_samples:]

print("Number of training samples:", len(train_features))
print("Number of validation samples:", len(val_features))

Number of training samples: 227846
Number of validation samples: 56961


In [7]:
#Analyze class imbalance in the targets
counts = np.bincount(train_targets[:, 0])
print(
    "Number of positive samples in training data: {} ({:.2f}% of total)".format(
        counts[1], 100 * float(counts[1]) / len(train_targets)
    )
)

weight_for_0 = 1.0 / counts[0]
weight_for_1 = 1.0 / counts[1]

Number of positive samples in training data: 417 (0.18% of total)


In [None]:
#Normalize the data using training set statistics
mean = np.mean(train_features, axis=0)
train_features -= mean
val_features -= mean
std = np.std(train_features, axis=0)
train_features /= std
val_features /= std

In [11]:
print(train_features.shape[-1], train_features.shape[0], train_features.shape[1])

30 227846 30


In [8]:
#Build a binary classification model
from tensorflow import keras

model = keras.Sequential(
    [
        keras.layers.Dense(
            256, activation="relu", input_shape=(train_features.shape[-1],)
        ),
        keras.layers.Dense(256, activation="relu"),
        keras.layers.Dropout(0.3),
        keras.layers.Dense(256, activation="relu"),
        keras.layers.Dropout(0.3),
        keras.layers.Dense(1, activation="sigmoid"),
    ]
)
model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense (Dense)                (None, 256)               7936      
_________________________________________________________________
dense_1 (Dense)              (None, 256)               65792     
_________________________________________________________________
dropout (Dropout)            (None, 256)               0         
_________________________________________________________________
dense_2 (Dense)              (None, 256)               65792     
_________________________________________________________________
dropout_1 (Dropout)          (None, 256)               0         
_________________________________________________________________
dense_3 (Dense)              (None, 1)                 257       
Total params: 139,777
Trainable params: 139,777
Non-trainable params: 0
__________________________________________________

In [None]:
#7936 = 256*30 + 256 (bias)
#65792 = 256 * 256 + 256 (bias)

In [15]:
#Train the model with class_weight argument
metrics = [
    keras.metrics.FalseNegatives(name="fn"),
    keras.metrics.FalsePositives(name="fp"),
    keras.metrics.TrueNegatives(name="tn"),
    keras.metrics.TruePositives(name="tp"),
    keras.metrics.Precision(name="precision"),
    keras.metrics.Recall(name="recall"),
]

model.compile(
    optimizer=keras.optimizers.Adam(1e-2), loss="binary_crossentropy", metrics=metrics
)

callbacks = [keras.callbacks.ModelCheckpoint("fraud_model_at_epoch_{epoch}.h5")]
class_weight = {0: weight_for_0, 1: weight_for_1}

model.fit(
    train_features,
    train_targets,
    batch_size=2048,
    epochs=20,
    verbose=2,
    callbacks=callbacks,
    validation_data=(val_features, val_targets),
    class_weight=class_weight,
)

Epoch 1/20
112/112 - 2s - loss: 6.0948e-06 - fn: 256.0000 - fp: 85669.0000 - tn: 141760.0000 - tp: 161.0000 - precision: 0.0019 - recall: 0.3861 - val_loss: 0.6968 - val_fn: 0.0000e+00 - val_fp: 56886.0000 - val_tn: 0.0000e+00 - val_tp: 75.0000 - val_precision: 0.0013 - val_recall: 1.0000
Epoch 2/20
112/112 - 2s - loss: 6.0928e-06 - fn: 280.0000 - fp: 74315.0000 - tn: 153114.0000 - tp: 137.0000 - precision: 0.0018 - recall: 0.3285 - val_loss: 0.7231 - val_fn: 0.0000e+00 - val_fp: 56886.0000 - val_tn: 0.0000e+00 - val_tp: 75.0000 - val_precision: 0.0013 - val_recall: 1.0000
Epoch 3/20
112/112 - 2s - loss: 6.1002e-06 - fn: 169.0000 - fp: 142405.0000 - tn: 85024.0000 - tp: 248.0000 - precision: 0.0017 - recall: 0.5947 - val_loss: 0.7257 - val_fn: 0.0000e+00 - val_fp: 56886.0000 - val_tn: 0.0000e+00 - val_tp: 75.0000 - val_precision: 0.0013 - val_recall: 1.0000
Epoch 4/20
112/112 - 2s - loss: 6.0897e-06 - fn: 143.0000 - fp: 149985.0000 - tn: 77444.0000 - tp: 274.0000 - precision: 0.0018 - 

<tensorflow.python.keras.callbacks.History at 0x657bc83d0>

In [None]:
Conclusions
At the end of training, out of 56,961 validation transactions, we are:

Correctly identifying 66 of them as fraudulent
Missing 9 fraudulent transactions
At the cost of incorrectly flagging 441 legitimate transactions
In the real world, one would put an even higher weight on class 1, so as to reflect that False Negatives are more costly than False Positives.

Next time your credit card gets declined in an online purchase -- this is why.
