### Log loss for binary classification
 - Actual value: $y$ = {1=yes, 0=no}
 - Prediction (probability that the value is 1): $p$

$$logloss = -\frac{1}{N}\sum_{i=1}^N \{y_i log(p_i) + (1-y_i)log(1-p_i)\}$$

In [None]:
import numpy as np
def compute_log_loss(predicted, actual, eps=1e-14):
""" Computes the logarithmic loss between predicted and
actual when these are 1D arrays.
:param predicted: The predicted probabilities as floats between 0-1
:param actual: The actual binary labels. Either 0 or 1.
:param eps (optional): log(0) is inf, so we need to offset our
predicted values slightly by eps from 0 or 1.
"""
    predicted = np.clip(predicted, eps, 1 - eps)
    loss = -1 * np.mean(actual * np.log(predicted) + (1 - actual) * np.log(1 - predicted))

    return loss

### Combining text columns for tokenization
 - In order to get a bag-of-words representation for all of the text data in our DataFrame, you must first convert the text data in each row of the DataFrame into a single string.

 - This function will convert all training text data in your DataFrame to a single string per row that can be passed to the vectorizer object and made into a bag-of-words using the .fit_transform() method.

 - Note that the function uses NUMERIC_COLUMNS and LABELS to determine which columns to drop. These lists have been loaded into the workspace.

In [None]:
# Define combine_text_columns()
def combine_text_columns(data_frame, to_drop=NUMERIC_COLUMNS + LABELS):
    """ converts all text in each row of data_frame to single vector """
    
    # Drop non-text columns that are in the df
    to_drop = set(to_drop) & set(data_frame.columns.tolist())
    text_data = data_frame.drop(to_drop, axis=1)
    
    # Replace nans with blanks
    text_data.fillna("", inplace=True)
    
    # Join all text items in a row that have a space in between
    return text_data.apply(lambda x: " ".join(x), axis=1)

In [None]:
def add_zeros(string):
    """Returns a string padded with zeros to ensure consistent length"""
    updated_string = string + '0'
    def add_more():
        """Adds more zeros if necessary"""
        nonlocal updated_string
        updated_string = updated_string + '0'
    
    while len(updated_string) < 6:
        add_more()
    return updated_string

# test: (add_zeros('5.6'), add_zeros('45.67'))