<a href="https://colab.research.google.com/github/marquesarthur/vanilla-bert-vs-huggingface/blob/main/vanilla_keras_bert.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Based on https://colab.research.google.com/drive/14b2rbIgwhQ1BI-zkyiMjQv-jV85xj9tf#scrollTo=5qSd2lLwJ7lH


**problem statement:**


*   a developer has to inspect an **artifact X**
*   Within the artifact, only a portion of the text is relevant to **input task Y**
*   We ought to build a model that establishes relationships between Y and sentences x ∈ X 
*  **The model must determine if x is relevant to task Y**



<br>

___

*Example of a task and an annotated artifact:*

<br>

[<img src="https://i.imgur.com/Zj1317H.jpg">](https://i.imgur.com/Zj1317H.jpg)




* The coloured sentences are sentences annotated as relevant to the input task. 
* The warmer the color, the more annotators selected that portion of the text. 
* For simplicity, we process the data and used sentences 

<br>

___

*Ultimately, our data is a tuple representing:*


*   **text** = artifact sentence

*   **question** = task description

*   **source** = URL of the artifact

*   **category_index** = whether sentence is relevant [or not] for the input task

*   **weights** = number of participants who annotated sentence as relevant


<br>

___



In [1]:
## comments unless you run it on colab

In [2]:
# @title Install dependencies

# !pip install -q keras-bert==0.85.0 keras-rectified-adam==0.15.0
# !pip install -q keras-bert keras-rectified-adam
# %tensorflow_version 1.x

In [3]:
# !pip install -q scikit-learn tqdm pandas python-Levenshtein path colorama

In [4]:
# @title Download git repo
# !git clone https://github.com/marquesarthur/vanilla-bert-vs-huggingface.git

In [5]:
# %cd vanilla-bert-vs-huggingface
# !git pull
# !ls -l

In [6]:
# @title Download BERT model
# !wget -q https://storage.googleapis.com/bert_models/2018_10_18/uncased_L-12_H-768_A-12.zip
# !unzip -o uncased_L-12_H-768_A-12.zip

In [7]:
# @title Import data as JSON
import itertools
import json
import logging
import os
import sys
import random
from pathlib import Path

from Levenshtein import ratio
from colorama import Fore, Style

logger = logging.getLogger()
logger.level = logging.DEBUG
stream_handler = logging.StreamHandler(sys.stdout)
logger.addHandler(stream_handler)

from ds_android import get_input_for_BERT

raw_data = get_input_for_BERT()

print('Sample entry from data:')
print(json.dumps(raw_data[0], indent=4, sort_keys=True))

[31m5 [33m47 [0m https://developer.android.com/reference/android/widget/ArrayAdapter
[31m9 [33m21 [0m https://stackoverflow.com/questions/6442054
[31m3 [33m22 [0m https://github.com/nostra13/Android-Universal-Image-Loader/issues/462
[31m22 [33m211 [0m https://www.raywenderlich.com/155-android-listview-tutorial-with-kotlin
[31m21 [33m59 [0m https://guides.codepath.com/android/Using-an-ArrayAdapter-with-ListView
[31m6 [33m33 [0m https://github.com/realm/realm-java/issues/776
[31m9 [33m15 [0m https://developer.android.com/training/volley/request
[31m14 [33m65 [0m https://stackoverflow.com/questions/28504524
[31m20 [33m59 [0m https://medium.com/@JasonCromer/android-asynctask-http-request-tutorial-6b429d833e28
[31m5 [33m97 [0m https://www.twilio.com/blog/5-ways-to-make-http-requests-in-java
[31m17 [33m33 [0m https://developer.android.com/guide/navigation/navigation-custom-back
[31m6 [33m55 [0m https://stackoverflow.com/questions/10108774
[31m5 [33m470 

[31m22 [33m104 [0m https://developer.android.com/reference/org/json/JSONObject
[31m8 [33m31 [0m https://guides.codepath.com/android/converting-json-to-models
[31m5 [33m34 [0m https://developer.android.com/guide/topics/media-apps/volume-and-earphones
[31m4 [33m40 [0m https://developer.android.com/training/gestures/scale
[31m6 [33m32 [0m https://stackoverflow.com/questions/10630373
Sample entry from data:
{
    "category_index": 0,
    "question": "Explanation of the getView() method of an ArrayAdapter",
    "source": "https://developer.android.com/reference/android/widget/ArrayAdapter",
    "text": "public class ArrayAdapter extends BaseAdapter implements Filterable, ThemedSpinnerAdapter",
    "weights": 0
}


In [8]:
from collections import Counter, defaultdict

cnt = Counter([d['category_index'] for d in raw_data])

total = sum(cnt.values())

labels_cnt = [cnt[0] / float(total), cnt[1] / float(total)]
print('label distribution')
print('')
print('not-relevant -- {:.0f}%'.format(labels_cnt[0] * 100))
print('RELEVANT ------ {:.0f}%'.format(labels_cnt[1] * 100))

label distribution

not-relevant -- 88%
RELEVANT ------ 12%


In [9]:
# @title Set environment variables

import os
import contextlib
import tensorflow as tf
import os
import codecs
import numpy as np
import math
import json

import numpy as np
import pandas as pd

from collections import defaultdict, Counter
from tqdm import tqdm

USE_TPU = False
os.environ['TF_KERAS'] = '1'

# # @title Initialize TPU Strategy
if USE_TPU:
    TPU_WORKER = 'grpc://' + os.environ['COLAB_TPU_ADDR']
    resolver = tf.contrib.cluster_resolver.TPUClusterResolver(TPU_WORKER)
    tf.contrib.distribute.initialize_tpu_system(resolver)
    strategy = tf.contrib.distribute.TPUStrategy(resolver)

# sklearn libs
import sklearn
from sklearn.metrics import confusion_matrix
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, f1_score
from sklearn.metrics import precision_recall_fscore_support

# Tensorflow Imports
import tensorflow as tf
from tensorflow.python import keras
import tensorflow.keras.backend as K
from tensorflow.keras.optimizers import Adam


# Keras-bert imports
from keras_bert import Tokenizer
from keras_bert import get_custom_objects
from keras_bert import load_trained_model_from_checkpoint

os.environ['TF_KERAS'] = '1'

# Bert Model Constants
SEQ_LEN = 128
BATCH_SIZE = 32 # larger batch size causes OOM errors
EPOCHS = 3
LR = 2e-5

pretrained_path = 'uncased_L-12_H-768_A-12'
config_path = os.path.join(pretrained_path, 'bert_config.json')
checkpoint_path = os.path.join(pretrained_path, 'bert_model.ckpt')
vocab_path = os.path.join(pretrained_path, 'vocab.txt')

Creating converter from 7 to 5
Creating converter from 5 to 7
Creating converter from 7 to 5
Creating converter from 5 to 7


In [10]:
# @title Initialize Variables

sess = K.get_session()
uninitialized_variables = set([i.decode('ascii') for i in sess.run(tf.report_uninitialized_variables())])
init_op = tf.variables_initializer(
    [v for v in tf.global_variables() if v.name.split(':')[0] in uninitialized_variables]
)
sess.run(init_op)

In [11]:
# @title JSON to dataframe helper functions
def undersample_df(df, n_times=3):
    class_0,class_1 = df.category_index.value_counts()
    c0 = df[df['category_index'] == 0]
    c1 = df[df['category_index'] == 1]
    df_0 = c0.sample(int(n_times * class_1))
    
    undersampled_df = pd.concat([df_0, c1],axis=0)
    return undersampled_df

def get_ds_synthetic_data(min_w=3):
    short_task = {
          "bugzilla": """How to query bugs using the custom fields with the Bugzilla REST API?""",
          "databases": """Which technology should be adopted for the database layer abstraction: Object/Relational Mapping (ORM) or a Java Database Connectivity API (JDBC)?""",
          "gpmdpu": """Can I bind the cmd key to the GPMDPU shortcuts?""",
          "lucene": """How does Lucene compute similarity scores for the BM25 similarity?""",
          "networking": """Which technology should be adopted for the notification system, Server-Sent Events (SSE) or WebSockets?""",
    }

    with open('relevance_corpus.json') as ipf:
        aux = json.load(ipf)
        raw_data = defaultdict(list)
        for d in aux:
            if d['task'] == 'yargs':
                continue

            raw_data['text'].append(d['text'])
            raw_data['question'].append(short_task[d['task']])
            raw_data['source'].append(d['source'])
            raw_data['category_index'].append(1 if d['weight'] > min_w else 0)
            raw_data['weights'].append(d['weight'] if d['weight'] > min_w else 0)

        data = pd.DataFrame.from_dict(raw_data)
        data = undersample_df(data, n_times=1)
        data = data.sample(frac=1).reset_index(drop=True)

    return data

def get_class_weights(y, smooth_factor=0, upper_bound=5.0):
    """
    Returns the weights for each class based on the frequencies of the samples
    :param smooth_factor: factor that smooths extremely uneven weights
    :param y: list of true labels (the labels must be hashable)
    :return: dictionary with the weight for each class
    """
    counter = Counter(y)

    if smooth_factor > 0:
        p = max(counter.values()) * smooth_factor
        for k in counter.keys():
            counter[k] += p

    majority = max(counter.values())

    clazz = {cls: float(majority / count) for cls, count in counter.items()}
    result = {}
    for key, value in clazz.items():
        if value > upper_bound:
            value = upper_bound
        
        result[key] = value
    return result

def add_raw_data(result, data):
    result['text'].append(data['text'])
    result['question'].append(data['question'])
    result['source'].append(data['source'])
    result['category_index'].append(data['category_index'])
    result['weights'].append(data['weights'])


In [12]:
# @title Tokenizer

token_dict = {}
with codecs.open(vocab_path, 'r', 'utf8') as reader:
    for line in reader:
        token = line.strip()
        token_dict[token] = len(token_dict)

tokenizer = Tokenizer(token_dict)

# FIXME: global variable that is referenced inside the train/test functions...
model = None

In [13]:
# @title data encoder

def encode_data(df, tokenizer, over_sampling=1, testing=False):
    relevant = 1
    indices, segments, labels, metadata = [], [], [], []
    
    for index, row in df.iterrows():
        _ids, _segments = tokenizer.encode(
            first=row["text"], 
            second=row["question"], 
            max_len=SEQ_LEN
        )
        
        label = row["category_index"]
        if label == relevant:
            for _ in range(over_sampling):
                indices.append(_ids)
                segments.append(_segments)
                labels.append(label)
                metadata.append((row['weights'], row['text'], row["question"]))
        else:
            indices.append(_ids)
            segments.append(_segments)
            labels.append(label)
            metadata.append((row['weights'], row['text'], row["question"]))
        
    # zip data into single list, shuffle everything and decompress
    items = list(zip(indices, segments, labels, metadata))
    np.random.shuffle(items)
    indices, segments, labels, metadata = zip(*items)
    indices = np.array(indices)
    
    # checks if array size is equals to batch size. If it's not, remove the last n entries to make it divisable
    mod = indices.shape[0] % BATCH_SIZE
    if mod > 0 and not testing:
        indices, segments, labels, metadata = indices[:-mod], segments[:-mod], labels[:-mod], metadata[:-mod]
    
    X, y = [indices, np.array(segments)], np.array(labels)

    return X, y, metadata

In [14]:
# @title Metrics & Logging functions

from sklearn.metrics import classification_report

recommendation_metrics = defaultdict(list)
prediction_metrics = defaultdict(list)

classification_report_lst = []
log_examples_lst = []

def aggregate_macro_metrics(store_at, precision, recall, fscore):   
    store_at['precision'].append(precision)
    store_at['recall'].append(recall)
    store_at['fscore'].append(fscore)

def aggregate_recommendation_metrics(store_at, k, precision_at_k, pyramid_precision_at_k):
    store_at['k'].append(k)
    store_at['precision'].append(precision_at_k)
    store_at['∆ precision'].append(pyramid_precision_at_k)

def log_examples(task_title, source, text, pweights, y_predict, y_probs, k=10):
    # get the predicted prob at every index
    idx_probs = [(idx, y_predict[idx], y_probs[idx]) for idx, _ in enumerate(y_predict)]
    
    # filter probs for all indexes predicted as relevant  
    idx_probs = list(filter(lambda k: k[1] == 1, idx_probs))
    
    most_probable = sorted(idx_probs, key=lambda i: i[2], reverse=True)
    
    result = [idx for idx, _, _ in most_probable][:k]
    
    for idx in result:
        log_examples_lst.append((
            source, 
            task_title,
            pweights[idx],
            y_predict[idx],
            y_probs[idx],
            text[idx]
        ))

def _precision_at_k(y_test, y_predict, y_prob, k=10):
    # get the predicted prob at every index
    idx_probs = [(idx, y_predict[idx], y_prob[idx]) for idx, _ in enumerate(y_test)]
    
    # filter probs for all indexes predicted as relevant  
    idx_probs = list(filter(lambda k: k[1] == 1, idx_probs))
    
    most_probable = sorted(idx_probs, key=lambda i: i[2], reverse=True)
    result = [y_test[idx] * y_predict[idx] for idx, _, _ in most_probable]   
    y_predict = [y for _, y, _ in most_probable]
    
    result = result[:k]
    y_predict = y_predict[:k]
    ratio = sum(result) / float(len(y_predict) + 0.00001)
    return ratio     


def _pyramid_score(y_optimal, y_predicted, y_prob, k=10):

    # create reference table for weights 
    # y_predicted = [i for i in y_optimal]
    # get the predicted prob at every index
    idx_probs = [(idx, y_optimal[idx], y_predicted[idx], y_prob[idx]) for idx, _ in enumerate(y_optimal)]
    
    # filter probs for all indexes predicted as relevant  
    idx_probs = list(filter(lambda aux: aux[2] == 1, idx_probs))

    # sort
    most_probable = sorted(idx_probs, key=lambda i: i[3], reverse=True)

    # compute predicted and optimal score up until K
    predicted_score = [w for _, w, _, _ in most_probable][:k]
    optimal_score = sorted(y_optimal, reverse=True)[:k]
    
    ratio = sum(predicted_score) / float(sum(optimal_score) + 0.00001)
    return ratio           

In [15]:
#@title Training procedures

def get_train_val_test(task_uid, size=0.9, undersample=False, aug=True, undersample_n=3):
    if not isinstance(task_uid, list):
        task_uid = [task_uid]
        
    train_data_raw = defaultdict(list)
    test_data_raw = defaultdict(list)
    
    for _data in tqdm(CORPUS):
        if _data['question'] in task_uid:
            add_raw_data(test_data_raw, _data)
        else:
            add_raw_data(train_data_raw, _data)
    
    train_val = pd.DataFrame.from_dict(train_data_raw)
    test = pd.DataFrame.from_dict(test_data_raw)
    
    # https://stackoverflow.com/questions/29576430/shuffle-dataframe-rows
    #  randomize rows....    
    train_val = train_val.sample(frac=1).reset_index(drop=True)
    test = test.sample(frac=1).reset_index(drop=True)
    
    if undersample:
        train_val = undersample_df(train_val, n_times=undersample_n)
        train_val = train_val.sample(frac=1).reset_index(drop=True)
        
    if aug:
        train_val = pd.concat([train_val, get_ds_synthetic_data()],axis=0)
        train_val = train_val.sample(frac=1).reset_index(drop=True)
    
    weights = get_class_weights(train_val['category_index'].tolist())
    
    train, val = train_test_split(
        train_val, 
        stratify=train_val['category_index'].tolist(), 
        train_size=size
    )
    
    return train, val, test, weights        

In [16]:
# @title Testing procedures

def test_model(source, df_test, model, tokenizer):
    
    test_x, test_y, metadata = encode_data(df_test, tokenizer, testing=True)
    
    logger.info(Fore.YELLOW + str(len(test_x)) + Style.RESET_ALL)
    
    text = [m[1] for m in metadata]
    pweights = [m[0] for m in metadata]
    task_title = metadata[0][2]

    predicts = model.predict(test_x, verbose=True)
    
    y_probs = predicts[:, 1]
    y_predict = predicts.argmax(axis=-1)

    accuracy = accuracy_score(test_y, y_predict)
    macro_f1 = f1_score(test_y, y_predict, average='macro')
    
    classification_report_lst.append(classification_report(test_y, y_predict))

    logger.info("-" * 20)    
    
    logger.info("Y")
    logger.info("[0s] {} [1s] {}".format(
        len(list(filter(lambda k: k== 0, test_y))),
        len(list(filter(lambda k: k== 1, test_y)))
    ))
    
        
    logger.info("predicted")
    logger.info("[0s] {} [1s] {}".format(
        len(list(filter(lambda k: k== 0, y_predict))),
        len(list(filter(lambda k: k== 1, y_predict)))
    ))
    
    logger.info("-" * 20)
    
    logger.info("Accuracy: {:.4f}".format(accuracy))
    logger.info("macro_f1: {:.4f}".format(macro_f1))

    precision, recall, fscore, _ = precision_recall_fscore_support(test_y, y_predict, average='macro')
    
    aggregate_macro_metrics(prediction_metrics, precision, recall, fscore)
    
    logger.info("Precision: {:.4f}".format(precision))
    logger.info("Recall: {:.4f}".format(recall))
    logger.info("F1: {:.4f}".format(fscore))
    
    logger.info("-" * 20)
    
    for k in [3, 5, 10]:
        p_at_k = _precision_at_k(test_y, y_predict, y_probs, k=k)
        score_at_k = _pyramid_score(pweights, y_predict, y_probs, k=k)
                                     
        aggregate_recommendation_metrics(recommendation_metrics, k, p_at_k, score_at_k)
        
        logger.info("")
        logger.info("Precision_at_{}: {:.4f}".format(k, p_at_k))
        logger.info("Pyramid_at_{}: {:.4f}".format(k, score_at_k))
    logger.info("-" * 20)
    
    log_examples(task_title, source, text, pweights, y_predict, y_probs, k=5)

In [17]:
# @title 10-fold cross validation WIP
CORPUS = raw_data

all_tasks = sorted(list(set([d['question'] for d in raw_data])))
rseed = 20210343
random.seed(rseed)
random.shuffle(all_tasks)

from sklearn.model_selection import KFold

n_splits = 10
kf = KFold(n_splits=n_splits, random_state=rseed)
np_tasks_arr = np.array(all_tasks)

idx_split = 0
for train_index, test_index in kf.split(np_tasks_arr):    
    test_tasks_lst = np_tasks_arr[test_index].tolist()
    
    logger.info("")
    logger.info(Fore.RED + f"Fold {idx_split}" + Style.RESET_ALL)
    logger.info('\n'.join(test_tasks_lst))
    
    df_train, df_val, df_test, weights = get_train_val_test(test_tasks_lst, undersample=True, undersample_n=3) 
    
    logger.info('-' * 10)
    logger.info(Fore.RED + 'train'+ Style.RESET_ALL)
    logger.info(str(df_train.category_index.value_counts()))
    logger.info("")

    logger.info(Fore.RED + 'val'+ Style.RESET_ALL)
    logger.info(str(df_val.category_index.value_counts()))
    logger.info("")

    logger.info(Fore.RED + 'test'+ Style.RESET_ALL)
    logger.info(str(df_test.category_index.value_counts()))
    logger.info("")

    logger.info(Fore.RED + 'weights'+ Style.RESET_ALL)
    logger.info(str(weights))
    logger.info('-' * 10)
    
    train_x, train_y, _ = encode_data(df_train, tokenizer, over_sampling=1)
    val_x, val_y, _ = encode_data(df_val, tokenizer)
    

    model = load_trained_model_from_checkpoint(
      config_path,
      checkpoint_path,
      training=True,
      trainable=True,
      seq_len=SEQ_LEN
    )
    
    inputs = model.inputs[:2]
    dense = model.get_layer('NSP-Dense').output
    outputs = keras.layers.Dense(units=2, activation='softmax', name="probs")(dense)
    model = keras.models.Model(inputs, outputs)

    optimizer = Adam(lr=LR)

    
    model.compile(
      optimizer=optimizer,
      loss='sparse_categorical_crossentropy',
      metrics=['sparse_categorical_accuracy'],
    )
    
    
    logger.info("")
    logger.info(Fore.RED + f"Training model" + Style.RESET_ALL)
    history = model.fit(
        train_x,
        train_y,
        epochs=EPOCHS,
        batch_size=BATCH_SIZE,
        class_weight=weights,
        validation_data=(val_x, val_y)
    )
    
    logger.info("")
    logger.info(Fore.RED + f"Testing model" + Style.RESET_ALL)
    for source in df_test["source"].unique():
        df_source = df_test[df_test["source"] == source]   

        logger.info(source)
        test_model(source, df_source, model, tokenizer)
            
    idx_split += 1
    break


[31mFold 0[0m
how can i get the value of text view in recyclerview item?
Hide MarkerView when nothing selected
How to check programmatically whether app is running in debug mode or not?
JSONObject parse dictionary objects
Want to add drawable icons insteadof colorful dots


100%|██████████| 7917/7917 [00:00<00:00, 449389.71it/s]


----------
[31mtrain[0m
0    2645
1     991
Name: category_index, dtype: int64

[31mval[0m
0    294
1    110
Name: category_index, dtype: int64

[31mtest[0m
0    669
1     66
Name: category_index, dtype: int64

[31mweights[0m
{0: 1.0, 1: 2.6693914623069936}
----------
From /home/msarthur/vanilla/lib/python3.7/site-packages/tensorflow_core/python/keras/initializers.py:119: calling RandomUniform.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
From /home/msarthur/vanilla/lib/python3.7/site-packages/tensorflow_core/python/ops/resource_variable_ops.py:1630: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.

[31mTra

In [18]:
#@title Metrics report
def avg_recommendation_metric_for(data, k=3, filter_outliers=True):
    __precision = []
    __pyramid = []
    
    total_len = len(data['k'])
    
    for idx in range(total_len):
        
        __value = data['k'][idx]
        if __value  == k:
            if filter_outliers:            
                if data['precision'][idx] > 0.:
                    __precision.append(data['precision'][idx])
                if data['∆ precision'][idx] > 0.:
                    __pyramid.append(data['∆ precision'][idx])
            else:
                __precision.append(data['precision'][idx])
                __pyramid.append(data['∆ precision'][idx])
                

    return np.mean(__precision), np.mean(__pyramid)

def avg_macro_metric_for(data):
    __precision = data['precision']
    __recall = data['recall']
    __fscore = data['fscore']

    return np.mean(__precision), np.mean(__recall), np.mean(__fscore)    

In [19]:

_precision, __pyramid_score = avg_recommendation_metric_for(
    recommendation_metrics, 
    k=3
)

logger.info(Fore.YELLOW + "k=3" + Style.RESET_ALL)
logger.info("precision: " + Fore.RED + "{:.3f}".format(_precision) + Style.RESET_ALL)
logger.info("pyramid:   " + Fore.RED + "{:.3f}".format(__pyramid_score) + Style.RESET_ALL)

[33mk=3[0m
precision: [31m0.467[0m
pyramid:   [31m0.475[0m


In [20]:
_precision, _recall, _f1score = avg_macro_metric_for(prediction_metrics)

logger.info("")
logger.info(Fore.YELLOW + "Model metrics" + Style.RESET_ALL)
logger.info("precision: " + Fore.RED + "{:.3f}".format(_precision) + Style.RESET_ALL)
logger.info("recall:    " + Fore.RED + "{:.3f}".format(_recall) + Style.RESET_ALL)
logger.info("f1-score:  " + Fore.RED + "{:.3f}".format(_f1score) + Style.RESET_ALL)


[33mModel metrics[0m
precision: [31m0.510[0m
recall:    [31m0.532[0m
f1-score:  [31m0.474[0m


In [21]:
def examples_per_source_type(source_type='misc', n_samples=None):
    _sources = list(set([x[0] for x in log_examples_lst]))
    _template = "[w={}]" + Fore.RED + "[y={}]" + Fore.YELLOW + "[p={:.4f}]" + Style.RESET_ALL + " {}"

    idx = 0
    for s in _sources:
        examples_in_source = []
        if source_type == 'api' and ('docs.oracle' in s or 'developer.android' in s):
            examples_in_source = list(filter(lambda k: k[0] == s, log_examples_lst))
            task_title = examples_in_source[0][1]
            idx += 1
        elif source_type == 'so' and ('stackoverflow.com' in s):
            examples_in_source = list(filter(lambda k: k[0] == s, log_examples_lst))
            task_title = examples_in_source[0][1]            
            idx += 1
        elif source_type == 'git' and ('github.com' in s):
            examples_in_source = list(filter(lambda k: k[0] == s, log_examples_lst))
            task_title = examples_in_source[0][1]
            idx += 1
        elif source_type == 'misc' and 'github.com' not in s and 'docs.oracle' not in s and 'developer.android' not in s and 'stackoverflow.com' not in s:
            examples_in_source = list(filter(lambda k: k[0] == s, log_examples_lst))
            task_title = examples_in_source[0][1]
            idx += 1
        if not examples_in_source:
            continue
        logger.info('')
        logger.info(Fore.RED + f"{task_title}" + Style.RESET_ALL)    
        logger.info(s)
        logger.info('')

        for _, _, pweights, y_predict, y_probs, text in examples_in_source:
            logger.info(_template.format(pweights, y_predict, y_probs, text))
            logger.info('')
        logger.info('-' * 20)
        
        if n_samples and idx >= n_samples:
            break
    

In [22]:
#@title Sample prediction outputs for API sources

logger.info(Fore.RED + "API" + Style.RESET_ALL)
examples_per_source_type(source_type='api', n_samples=8)

[31mAPI[0m

[31mhow can i get the value of text view in recyclerview item?[0m
https://developer.android.com/codelabs/basic-android-kotlin-training-recyclerview-scrollable-list

[w=0][31m[y=1][33m[p=0.8901][0m Create a variable called recyclerView and use findViewById ( ) to find a reference to the RecyclerView within the layout.

[w=0][31m[y=1][33m[p=0.8862][0m Since your layout only has a single child view, RecyclerView, you can switch to a simpler ViewGroup called FrameLayout that should be used for holding a single child view.

[w=0][31m[y=1][33m[p=0.8749][0m When you run the app, RecyclerView uses the adapter to figure out how to display your data on screen.

[w=0][31m[y=1][33m[p=0.8701][0m In this case, you need an adapter that takes an Affirmation instance from the list returned by loadAffirmations ( ), and turns it into a list item view, so that it can be displayed in the RecyclerView.

[w=0][31m[y=1][33m[p=0.8629][0m Replace getItemCount ( ) with this:

-----

In [23]:
#@title Sample prediction outputs for GIT sources

logger.info(Fore.RED + "GIT" + Style.RESET_ALL)
examples_per_source_type(source_type='git', n_samples=4)

[31mGIT[0m

[31mHow to check programmatically whether app is running in debug mode or not?[0m
https://github.com/flutter/flutter/issues/11392

[w=0][31m[y=1][33m[p=0.8367][0m Document how to check if profile/release/debug mode in dart

[w=0][31m[y=1][33m[p=0.7998][0m Document how to check if profile/release/debug mode in dart · Issue # 11392 · flutter/flutter · GitHub

[w=0][31m[y=1][33m[p=0.7703][0m The only way that works reliably has been posted above in # 11392 ( comment ).

[w=0][31m[y=1][33m[p=0.6336][0m Check Flutter mode from Dart code

[w=0][31m[y=1][33m[p=0.5799][0m Meanwhile, quickest way to get this into the repo might be by updating the docs: )

--------------------

[31mWant to add drawable icons insteadof colorful dots[0m
https://github.com/SundeepK/CompactCalendarView/issues/181

[w=0][31m[y=1][33m[p=0.9016][0m You can tweak the code on how you want to draw the icons:

[w=3][31m[y=1][33m[p=0.8781][0m So really you want to replace:

[w=0][31m[

In [24]:
#@title Sample prediction outputs for SO sources

logger.info(Fore.RED + "SO" + Style.RESET_ALL)
examples_per_source_type(source_type='so', n_samples=4)

[31mSO[0m

[31mHow to check programmatically whether app is running in debug mode or not?[0m
https://stackoverflow.com/questions/23844667

[w=0][31m[y=1][33m[p=0.9095][0m Try the following:

[w=0][31m[y=1][33m[p=0.8545][0m then, in your code you detect the ENABLE_CRASHLYTICS flag as follows:

[w=0][31m[y=1][33m[p=0.8243][0m Alternatively, you could differentiate using BuildConfig.BUILD _ TYPE ;

[w=3][31m[y=1][33m[p=0.8208][0m If you are using Android Studio, or if you are using Gradle from the command line, you can add your own stuff to BuildConfig or otherwise tweak the debug and release build types to help distinguish these situations at runtime.

[w=0][31m[y=1][33m[p=0.7598][0m Due to the mixed comments about BuildConfig.DEBUG, I used the following to disable crashlytics -LRB- and analytics -RRB- in debug mode:

--------------------

[31mhow can i get the value of text view in recyclerview item?[0m
https://stackoverflow.com/questions/37096547

[w=0][31m[y=1][

In [25]:
#@title Sample prediction outputs for MISC sources

logger.info(Fore.RED + "MISC" + Style.RESET_ALL)
examples_per_source_type(source_type='misc', n_samples=4)

[31mMISC[0m

[31mJSONObject parse dictionary objects[0m
https://guides.codepath.com/android/converting-json-to-models

[w=0][31m[y=1][33m[p=0.8979][0m In this case, we want to execute a request to http://api.yelp.com/v2/search?term=food&location=San+Francisco and then this will return us a JSON dictionary that looks like:

[w=1][31m[y=1][33m[p=0.8885][0m With this method in place, we could take a single business JSON dictionary such as:

[w=1][31m[y=1][33m[p=0.8839][0m Next, we need to add method that would manage the deserialization of a JSON dictionary into a populated Business object:

[w=0][31m[y=1][33m[p=0.8347][0m We could now run the app and verify that the JSON array of business has the format we expect from the provided sample response in the documentation.

[w=0][31m[y=1][33m[p=0.8255][0m Jump to SectionTable of ContentsOverviewFetching JSON ResultsSetting up our ModelPutting It All TogetherBonus: Setting Up Your Adapter

--------------------

[31mhow can 