<a href="https://colab.research.google.com/github/marquesarthur/vanilla-bert-vs-huggingface/blob/main/hugging_face_keras_bert.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Based on 



1.   https://towardsdatascience.com/hugging-face-transformers-fine-tuning-distilbert-for-binary-classification-tasks-490f1d192379
2.   https://www.analyticsvidhya.com/blog/2020/07/transfer-learning-for-nlp-fine-tuning-bert-for-text-classification/
3.   https://huggingface.co/transformers/training.html#fine-tuning-with-keras




**problem statement:**


*   a developer has to inspect an **artifact X**
*   Within the artifact, only a portion of the text is relevant to **input task Y**
*   We ought to build a model that establishes relationships between **Y** and **sentences x ∈ X** 
*  The model must determine: **is x relevant to task Y**




<br>

___

*Example of a task and an annotated artifact:*

<br>

[<img src="https://i.imgur.com/Zj1317H.jpg">](https://i.imgur.com/Zj1317H.jpg)




* The coloured sentences are sentences annotated as relevant to the input task. 
* The warmer the color, the more annotators selected that portion of the text. 
* For simplicity, we process the data and used sentences 

<br>

___

*Ultimately, our data is a tuple representing:*


*   **text** = artifact sentence

*   **question** = task description

*   **source** = URL of the artifact

*   **category_index** = whether sentence is relevant [or not] for the input task

*   **weights** = number of participants who annotated sentence as relevant


<br>

___



In [1]:
# @title Install dependencies

# !pip install transformers
# %tensorflow_version 2.x

In [2]:
# !pip install scikit-learn tqdm pandas python-Levenshtein path colorama matplotlib seaborn

In [3]:
# !pip install python-Levenshtein

In [4]:
# @title Download git repo
# !git clone https://github.com/marquesarthur/vanilla-bert-vs-huggingface.git

In [5]:
# %cd vanilla-bert-vs-huggingface
# !git pull
# !ls -l

In [6]:
# @title Import data as JSON
import itertools
import json
import logging
import os
import sys
import random
from pathlib import Path

from Levenshtein import ratio
from colorama import Fore, Style

logger = logging.getLogger()
logger.level = logging.DEBUG
stream_handler = logging.StreamHandler(sys.stdout)
logger.addHandler(stream_handler)

from ds_android import get_input_for_BERT

raw_data = get_input_for_BERT()

print('Sample entry from data:')
print(json.dumps(raw_data[0], indent=4, sort_keys=True))

[31m39 [33m129 [0m https://developer.android.com/training/permissions/requesting
[31m14 [33m21 [0m https://stackoverflow.com/questions/5233543
[31m4 [33m34 [0m https://github.com/morenoh149/react-native-contacts/issues/516
[31m27 [33m63 [0m https://guides.codepath.com/android/Understanding-App-Permissions
[31m9 [33m161 [0m https://www.avg.com/en/signal/guide-to-android-app-permissions-how-to-use-them-smartly
[31m9 [33m15 [0m https://developer.android.com/training/volley/request
[31m14 [33m65 [0m https://stackoverflow.com/questions/28504524
[31m20 [33m59 [0m https://medium.com/@JasonCromer/android-asynctask-http-request-tutorial-6b429d833e28
[31m5 [33m97 [0m https://www.twilio.com/blog/5-ways-to-make-http-requests-in-java
[31m4 [33m12 [0m https://stackoverflow.com/questions/33241952
[31m6 [33m33 [0m https://github.com/realm/realm-java/issues/776
[31m3 [33m17 [0m https://stackoverflow.com/questions/8712652
[31m8 [33m59 [0m https://dzone.com/articles

[31m6 [33m32 [0m https://stackoverflow.com/questions/10630373
[31m4 [33m54 [0m https://developer.android.com/training/gestures/scroll
[31m4 [33m16 [0m https://stackoverflow.com/questions/39588322
[31m20 [33m196 [0m https://developer.android.com/training/dependency-injection/dagger-android
[31m6 [33m44 [0m https://stackoverflow.com/questions/57235136
[31m24 [33m121 [0m https://guides.codepath.com/android/dependency-injection-with-dagger-2
Sample entry from data:
{
    "category_index": 1,
    "question": "Permission Denial when trying to access contacts in Android",
    "source": "https://developer.android.com/training/permissions/requesting",
    "text": "Every Android app runs in a limited-access sandbox.",
    "weights": 1
}


In [7]:
from collections import Counter, defaultdict

cnt = Counter([d['category_index'] for d in raw_data])

total = sum(cnt.values())

labels_cnt = [cnt[0] / float(total), cnt[1] / float(total)]
print('label distribution')
print('')
print('not-relevant -- {:.0f}%'.format(labels_cnt[0] * 100))
print('RELEVANT ------ {:.0f}%'.format(labels_cnt[1] * 100))

label distribution

not-relevant -- 87%
RELEVANT ------ 13%


In [8]:
seframes = {}
with open('seframes.json') as input_file:
    seframes = json.load(input_file)

In [9]:
def has_meaningful_frame(text):    
    meaning_frames = [
        'Temporal_collocation', 'Execution', 'Using', 'Intentionally_act',
        'Being_obligated', 'Likelihood', 'Causation', 'Required_event',
        'Desiring', 'Awareness', 'Grasp', 'Attempt'
    ]
    
    if text in seframes:
        text_labels = seframes[text]
        if any([elem in meaning_frames for elem in text_labels]):
            return True
        
    return False

In [10]:
fold_results = dict()
if os.path.isfile('bert_ds_android.json'):
    logger.info(Fore.YELLOW + "Loading data from cache" + Style.RESET_ALL)
    with open('bert_ds_android.json') as input_file:
        fold_results = json.load(input_file)

[33mLoading data from cache[0m


In [11]:
# @title Set environment variables

model_id = 'bert-base-uncased'
# model_id = 'distilbert-base-uncased'

import os
import contextlib
import tensorflow as tf
import os
import codecs
import numpy as np
import math
import json

import numpy as np
import pandas as pd

from collections import defaultdict, Counter
from tqdm import tqdm

USE_TPU = False
os.environ['TF_KERAS'] = '1'

# @title Initialize TPU Strategy
if USE_TPU:
    TPU_WORKER = 'grpc://' + os.environ['COLAB_TPU_ADDR']
    resolver = tf.contrib.cluster_resolver.TPUClusterResolver(TPU_WORKER)
    tf.contrib.distribute.initialize_tpu_system(resolver)
    strategy = tf.contrib.distribute.TPUStrategy(resolver)

# sklearn libs
import sklearn
from sklearn.metrics import confusion_matrix
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, f1_score
from sklearn.metrics import precision_recall_fscore_support
from sklearn.metrics import accuracy_score
from sklearn.metrics import roc_auc_score
from sklearn.metrics import roc_curve
from sklearn.metrics import classification_report

# Tensorflow Imports
import tensorflow as tf
from tensorflow.python import keras
import tensorflow.keras.backend as K
from tensorflow.keras.optimizers import Adam
from tensorflow.keras import initializers


# Hugging face imports
from transformers import AutoTokenizer
from transformers import TFDistilBertForSequenceClassification, TFBertForSequenceClassification
from transformers import TFDistilBertModel, DistilBertConfig
from transformers import DistilBertTokenizerFast, BertTokenizerFast

Falling back to TensorFlow client; we recommended you install the Cloud TPU client directly with pip install cloud-tpu-client.


In [12]:
# @title Model parameters

# Bert Model Constants
SEQ_LEN = 64 # 128
BATCH_SIZE = 64 # 64 32 larger batch size causes OOM errors
EPOCHS = 10 # 3 4
LR = 1e-5 # 2e-5

# 3e-4, 1e-4, 5e-5, 3e-5
# My own constants
# USE_FRAME_FILTERING = False
# UNDERSAMPLING = True
# N_UNDERSAMPLING = 2 # ratio of how many samples from 0-class, to 1-class, e.g.: 2:1
# USE_DS_SYNTHETIC = False

USE_FRAME_FILTERING = False
UNDERSAMPLING = True
N_UNDERSAMPLING = 2 # ratio of how many samples from 0-class, to 1-class, e.g.: 2:1
USE_DS_SYNTHETIC = False
MIN_W = 3

In [13]:
# @title JSON to dataframe helper functions
def undersample_df(df, n_times=3):
    class_0,class_1 = df.category_index.value_counts()
    c0 = df[df['category_index'] == 0]
    c1 = df[df['category_index'] == 1]
    df_0 = c0.sample(int(n_times * class_1))
    
    undersampled_df = pd.concat([df_0, c1],axis=0)
    return undersampled_df

def get_ds_synthetic_data(min_w=MIN_W):
    short_task = {
      "bugzilla": """How to query bugs using the custom fields with the Bugzilla REST API?""",
      "databases": """Which technology should be adopted for the database layer abstraction: Object/Relational Mapping (ORM) or a Java Database Connectivity API (JDBC)?""",
      "gpmdpu": """Can I bind the cmd key to the GPMDPU shortcuts?""",
      "lucene": """How does Lucene compute similarity scores for the BM25 similarity?""",
      "networking": """Which technology should be adopted for the notification system, Server-Sent Events (SSE) or WebSockets?""",
    }

    with open('relevance_corpus.json') as ipf:
        aux = json.load(ipf)
        raw_data = defaultdict(list)
        for d in aux:
            if d['task'] == 'yargs':
                continue

            raw_data['text'].append(d['text'])
            raw_data['question'].append(short_task[d['task']])
            raw_data['source'].append(d['source'])
            raw_data['category_index'].append(1 if d['weight'] > min_w else 0)
            raw_data['weights'].append(d['weight'] if d['weight'] > min_w else 0)
 
        data = pd.DataFrame.from_dict(raw_data)
        data = undersample_df(data, n_times=1)
        data = data.sample(frac=1).reset_index(drop=True)
      
    return data

def get_class_weights(y, smooth_factor=0, upper_bound=5.0):
    """
    Returns the weights for each class based on the frequencies of the samples
    :param smooth_factor: factor that smooths extremely uneven weights
    :param y: list of true labels (the labels must be hashable)
    :return: dictionary with the weight for each class
    """
    counter = Counter(y)

    if smooth_factor > 0:
        p = max(counter.values()) * smooth_factor
        for k in counter.keys():
            counter[k] += p

    majority = max(counter.values())

    clazz = {cls: float(majority / count) for cls, count in counter.items()}
    result = {}
    for key, value in clazz.items():
        if value > upper_bound:
            value = upper_bound
        
        result[key] = value
    return result

def add_raw_data(result, data):
    s = data['source']
    if 'docs.oracle' in s or 'developer.android' in s:
        source_type = 'api'
    elif 'stackoverflow.com' in s:
        source_type = 'so'
    elif 'github.com' in s:
        source_type = 'git'
    else:
        source_type = 'misc'
    
    result['text'].append(data['text'])
    result['question'].append(data['question'])
    result['source'].append(data['source'])
    result['category_index'].append(data['category_index'])
    result['weights'].append(data['weights'])
    result['source_type'].append(source_type)


In [14]:
# @title Tokenizer

print(model_id)
if model_id == 'distilbert-base-uncased':
    tokenizer = DistilBertTokenizerFast.from_pretrained(model_id, cache_dir='/home/msarthur/scratch', local_files_only=True)
else:
    tokenizer = BertTokenizerFast.from_pretrained(model_id, cache_dir='/home/msarthur/scratch', local_files_only=True)

bert-base-uncased


In [15]:
tokenizer

PreTrainedTokenizerFast(name_or_path='bert-base-uncased', vocab_size=30522, model_max_len=512, is_fast=True, padding_side='right', special_tokens={'unk_token': '[UNK]', 'sep_token': '[SEP]', 'pad_token': '[PAD]', 'cls_token': '[CLS]', 'mask_token': '[MASK]'})

In [16]:
# @title data encoder

def _encode(tokenizer, dataframe, max_length=SEQ_LEN):
    
    seq_a = dataframe['text'].tolist()
    seq_b = dataframe['question'].tolist()
    
    return tokenizer(seq_a, seq_b, truncation=True, padding=True, max_length=max_length)

def to_one_hot_encoding(data, nb_classes = 2):
    targets = np.array([data]).reshape(-1)
    one_hot_targets = np.eye(nb_classes)[targets]
    return one_hot_targets    

In [17]:
# @title Metrics & Logging functions

from sklearn.metrics import classification_report

recommendation_metrics = defaultdict(list)
prediction_metrics = defaultdict(list)
api_metrics = defaultdict(list)
so_metrics = defaultdict(list)
git_metrics = defaultdict(list)
misc_metrics = defaultdict(list)

classification_report_lst = []
log_examples_lst = []
source_lst = []
venn_diagram_set = []

def aggregate_macro_metrics(store_at, precision, recall, fscore):   
    store_at['precision'].append(precision)
    store_at['recall'].append(recall)
    store_at['fscore'].append(fscore)
    
    
def aggregate_macro_source_metrics(precision, recall, fscore, source):
    s = source
    if 'docs.oracle' in s or 'developer.android' in s:
        aggregate_macro_metrics(api_metrics, precision, recall, fscore)
    elif 'stackoverflow.com' in s:
        aggregate_macro_metrics(so_metrics, precision, recall, fscore)
    elif 'github.com' in s:
        aggregate_macro_metrics(git_metrics, precision, recall, fscore)        
    elif  'github.com' not in s and 'docs.oracle' not in s and 'developer.android' not in s and 'stackoverflow.com' not in s:
        aggregate_macro_metrics(misc_metrics, precision, recall, fscore)
    

def aggregate_recommendation_metrics(store_at, k, precision_at_k, pyramid_precision_at_k):
    store_at['k'].append(k)
    store_at['precision'].append(precision_at_k)
    store_at['∆ precision'].append(pyramid_precision_at_k)
    
def aggregate_report_metrics(clz_report):
    relevant_label = str(1)
    if relevant_label in clz_report:
        for _key in ['precision', 'recall']:
            if _key in clz_report[relevant_label]:
                clz_report_lst[_key].append(clz_report[relevant_label][_key])    
                
def log_examples(task_title, source, text, pweights, y_predict, y_probs, k=10):
    # get the predicted prob at every index
    idx_probs = [(idx, y_predict[idx], y_probs[idx]) for idx, _ in enumerate(y_predict)]
    
    # filter probs for all indexes predicted as relevant  
    idx_probs = list(filter(lambda k: k[1] == 1, idx_probs))
    
    most_probable = sorted(idx_probs, key=lambda i: i[2], reverse=True)
    
    result = [idx for idx, _, _ in most_probable][:k]
    
    for idx in result:
        log_examples_lst.append((
            source, 
            task_title,
            pweights[idx],
            y_predict[idx],
            y_probs[idx],
            text[idx]
        ))
        
def log_venn_diagram(y_true, y_predicted, text):
    cnt = 0
    try:
        for _true, _predict, _t in zip(y_true, y_predicted, text):
            if _true == 1 and _predict == 1:
                cnt += 1
                venn_diagram_set.append(_t)
    except Exception as ex:
        logger.info(str(ex))
    logger.info(Fore.RED + str(cnt) + Style.RESET_ALL + " entries logged")

    
def avg_macro_metric_for(data):
    __precision = data['precision']
    __recall = data['recall']
    __fscore = data['fscore']

    return np.mean(__precision), np.mean(__recall), np.mean(__fscore)        

In [18]:
#@title Training procedures

def get_train_val_test(task_uid, size=0.9, undersample=False, aug=True, undersample_n=3):
    if not isinstance(task_uid, list):
        task_uid = [task_uid]
        
    train_data_raw = defaultdict(list)
    test_data_raw = defaultdict(list)
    
    for _data in tqdm(CORPUS):
        if _data['question'] in task_uid:
            add_raw_data(test_data_raw, _data)
        else:
            add_raw_data(train_data_raw, _data)
    
    train_val = pd.DataFrame.from_dict(train_data_raw)
    test = pd.DataFrame.from_dict(test_data_raw)
    
    # https://stackoverflow.com/questions/29576430/shuffle-dataframe-rows
    #  randomize rows....    
    train_val = train_val.sample(frac=1).reset_index(drop=True)
    test = test.sample(frac=1).reset_index(drop=True)
    
    if undersample:
        train_val = undersample_df(train_val, n_times=undersample_n)
        train_val = train_val.sample(frac=1).reset_index(drop=True)
        
    if aug:
        train_val = pd.concat([train_val, get_ds_synthetic_data()],axis=0)
        train_val = train_val.sample(frac=1).reset_index(drop=True)
    
    weights = get_class_weights(train_val['category_index'].tolist())
    
    train, val = train_test_split(
        train_val, 
        stratify=train_val['category_index'].tolist(), 
        train_size=size
    )
    
    return train, val, test, weights        

In [19]:
def update_predictions(task_title, text, y_predict, y_probs, relevant_class=1):
    result = []
    
    for _t, _y, _prob in zip(text, y_predict, y_probs):
        if _y == relevant_class:
            if has_meaningful_frame(_t):
                result.append(_y)
            else:
                result.append(0)
        else:
            result.append(_y)
    
    return result    

In [20]:
# @title Testing procedures

# https://medium.com/geekculture/hugging-face-distilbert-tensorflow-for-custom-text-classification-1ad4a49e26a7
def eval_model(model, test_data):
    preds = model.predict(test_data.batch(1)).logits  
    
    #transform to array with probabilities
    res = tf.nn.softmax(preds, axis=1).numpy()      

    return res.argmax(axis=-1), res[:, 1]

def test_model(source, df_test, model, tokenizer, pos_filter=False):
    
    df_source = df_test[df_test["source"] == source]   
    task_title = df_source['question'].tolist()[0]
    text = df_source['text'].tolist()
    pweights = df_source['weights'].tolist()
    
    # Encode X_test
    test_encodings = _encode(tokenizer, df_source)
    test_labels = df_source['category_index'].tolist()
    
    test_dataset = tf.data.Dataset.from_tensor_slices((
        dict(test_encodings),
        test_labels
    ))
    
    y_true = [y.numpy() for x, y in test_dataset]
    y_predict, y_probs = eval_model(model, test_dataset)
    
    if pos_filter:
        y_predict = update_predictions(task_title, text, y_predict, y_probs)
    

    accuracy = accuracy_score(y_true, y_predict)
    macro_f1 = f1_score(y_true, y_predict, average='macro')
    
    classification_report_lst.append(classification_report(y_true, y_predict))
    aggregate_report_metrics(classification_report(y_true, y_predict, output_dict=True))
    

    logger.info("-" * 20)    
    
    logger.info("Y")
    logger.info("[0s] {} [1s] {}".format(
        len(list(filter(lambda k: k== 0, y_true))),
        len(list(filter(lambda k: k== 1, y_true)))
    ))
    
        
    logger.info("predicted")
    logger.info("[0s] {} [1s] {}".format(
        len(list(filter(lambda k: k== 0, y_predict))),
        len(list(filter(lambda k: k== 1, y_predict)))
    ))
    
    logger.info("-" * 20)
    
    logger.info("Accuracy: {:.4f}".format(accuracy))
    logger.info("macro_f1: {:.4f}".format(macro_f1))

    precision, recall, fscore, _ = precision_recall_fscore_support(y_true, y_predict, average='macro')
    
    aggregate_macro_metrics(prediction_metrics, precision, recall, fscore)
    aggregate_macro_source_metrics(precision, recall, fscore, source)
    
    logger.info("Precision: {:.4f}".format(precision))
    logger.info("Recall: {:.4f}".format(recall))
    logger.info("F1: {:.4f}".format(fscore))
    
    log_examples(task_title, source, text, pweights, y_predict, y_probs, k=10)
    log_venn_diagram(y_true, y_predict, text)
    source_lst.append(source)

In [21]:
def add_idx_fold_results(idx_split, store_at):
    if idx_split not in store_at:
        store_at[idx_split] = dict()
        store_at[idx_split]['run_cnt'] = 0
        store_at[idx_split]['overall'] = defaultdict(list)
        store_at[idx_split]['api'] = defaultdict(list)
        store_at[idx_split]['so'] = defaultdict(list)
        store_at[idx_split]['git'] = defaultdict(list)
        store_at[idx_split]['misc'] = defaultdict(list)
    
    store_at[idx_split]['run_cnt'] += 1
    
    _precision, _recall, _f1score = avg_macro_metric_for(prediction_metrics)
    store_at[idx_split]['overall']['precision'].append(_precision)
    store_at[idx_split]['overall']['recall'].append(_recall)
    store_at[idx_split]['overall']['fscore'].append(_f1score)  
    
    _precision, _recall, _f1score = avg_macro_metric_for(api_metrics)
    store_at[idx_split]['api']['precision'].append(_precision)
    store_at[idx_split]['api']['recall'].append(_recall)
    store_at[idx_split]['api']['fscore'].append(_f1score)  
    
    _precision, _recall, _f1score = avg_macro_metric_for(so_metrics)
    store_at[idx_split]['so']['precision'].append(_precision)
    store_at[idx_split]['so']['recall'].append(_recall)
    store_at[idx_split]['so']['fscore'].append(_f1score)  
    
    _precision, _recall, _f1score = avg_macro_metric_for(git_metrics)
    store_at[idx_split]['git']['precision'].append(_precision)
    store_at[idx_split]['git']['recall'].append(_recall)
    store_at[idx_split]['git']['fscore'].append(_f1score)  
    
    _precision, _recall, _f1score = avg_macro_metric_for(misc_metrics)
    store_at[idx_split]['misc']['precision'].append(_precision)
    store_at[idx_split]['misc']['recall'].append(_recall)
    store_at[idx_split]['misc']['fscore'].append(_f1score)  

In [22]:
# model = TFBertForSequenceClassification.from_pretrained(model_id, cache_dir='/home/msarthur/scratch', local_files_only=True)

In [23]:
# @title 10-fold cross validation WIP
CORPUS = raw_data

all_tasks = sorted(list(set([d['question'] for d in raw_data])))
rseed = 20210343
random.seed(rseed)
random.shuffle(all_tasks)

from sklearn.model_selection import KFold


file_handler = logging.FileHandler('/home/msarthur/scratch/LOG-bert_ds_android.ans')
file_handler.setLevel(logging.DEBUG)
logger.addHandler(file_handler)


n_splits = 10
kf = KFold(n_splits=n_splits, random_state=rseed)
np_tasks_arr = np.array(all_tasks)


for _iterations in range(5):
    logger.info(Fore.YELLOW + f"i={_iterations}" + Style.RESET_ALL)
    idx_split = 0
    for train_index, test_index in kf.split(np_tasks_arr):
        idx_split = str(idx_split)
        # 10 runs per fold to avoid reporting peek results in a given fold
        if idx_split in fold_results and fold_results[idx_split]['run_cnt'] >= 10:
            logger.info(Fore.RED + f"Fold {idx_split} FULLY TESTED" + Style.RESET_ALL)
            continue


        # <------------------------------------------------------------------------- EVAL VARIABLES
        recommendation_metrics = defaultdict(list)
        prediction_metrics = defaultdict(list)
        api_metrics = defaultdict(list)
        so_metrics = defaultdict(list)
        git_metrics = defaultdict(list)
        misc_metrics = defaultdict(list)
        random_prediction_metrics = defaultdict(list)
        clz_report_lst = defaultdict(list)

        classification_report_lst = []
        log_examples_lst = []
        source_lst = []
        venn_diagram_set = []
        # <------------------------------------------------------------------------- EVAL VARIABLES


        test_tasks_lst = np_tasks_arr[test_index].tolist()

        logger.info("")
        logger.info(Fore.RED + f"Fold {idx_split}" + Style.RESET_ALL)
        logger.info('\n'.join(test_tasks_lst))

        # <------------------------------------------------------------------------- INPUT
        df_train, df_val, df_test, weights = get_train_val_test(
            test_tasks_lst,
            aug=USE_DS_SYNTHETIC,
            undersample=UNDERSAMPLING, 
            undersample_n=N_UNDERSAMPLING
        )
        # <------------------------------------------------------------------------- INPUT

        logger.info('-' * 10)
        logger.info(Fore.RED + 'train'+ Style.RESET_ALL)
        logger.info(str(df_train.category_index.value_counts()))
        logger.info("")

        logger.info(Fore.RED + 'test'+ Style.RESET_ALL)
        logger.info(str(df_test.category_index.value_counts()))
        logger.info("")

        logger.info(Fore.RED + 'weights'+ Style.RESET_ALL)
        logger.info(str(weights))
        logger.info('-' * 10)


        # Encode X_train
        train_encodings = _encode(tokenizer, df_train)
        train_labels = df_train['category_index'].tolist()

        # Encode X_valid
        val_encodings = _encode(tokenizer, df_val)
        val_labels = df_val['category_index'].tolist()


        # https://huggingface.co/transformers/custom_datasets.html
        train_dataset = tf.data.Dataset.from_tensor_slices((
            dict(train_encodings),
            train_labels
        ))

        val_dataset = tf.data.Dataset.from_tensor_slices((
            dict(val_encodings),
            val_labels
        ))


        if model_id == 'distilbert-base-uncased':
            model = TFDistilBertForSequenceClassification.from_pretrained(
                model_id, cache_dir='/home/msarthur/scratch'
            )
        else:
            model = TFBertForSequenceClassification.from_pretrained(
                model_id, cache_dir='/home/msarthur/scratch', local_files_only=True
            )

        # freeze all the parameters
        # for param in model.parameters():
        #   param.requires_grad = False


        optimizer = tf.keras.optimizers.Adam(learning_rate=LR)
        loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)

        METRICS = [
            tf.keras.metrics.SparseCategoricalAccuracy()
        ]

        early_stopper = tf.keras.callbacks.EarlyStopping(
            monitor='val_loss', mode='min', patience=4, 
            verbose=1, restore_best_weights=True
        )
        
        # https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/ModelCheckpoint
        checkpoint_filepath = '/home/msarthur/scratch/best_model'

        mc = tf.keras.callbacks.ModelCheckpoint(
            checkpoint_filepath, 
            monitor='val_loss', mode='min', verbose=1, 
            save_best_only=True,
            save_weights_only=True
        )

        model.compile(
            optimizer=optimizer,
            loss=loss_fn,
            metrics=METRICS
        )

        # https://discuss.huggingface.co/t/how-to-dealing-with-data-imbalance/393/3
        # https://wandb.ai/ayush-thakur/huggingface/reports/Early-Stopping-in-HuggingFace-Examples--Vmlldzo0MzE2MTM
        model.fit(
            train_dataset.shuffle(1000).batch(BATCH_SIZE), 
            epochs=EPOCHS, 
            batch_size=BATCH_SIZE,
            class_weight=weights,
            validation_data=val_dataset.shuffle(1000).batch(BATCH_SIZE),
            callbacks=[early_stopper, mc]
        )

        model.load_weights(checkpoint_filepath)

        logger.info("")
        logger.info(Fore.RED + f"Testing model" + Style.RESET_ALL)
        for source in df_test["source"].unique():
            df_source = df_test[df_test["source"] == source]   
            logger.info(source)
            test_model(source, df_source, model, tokenizer, pos_filter=USE_FRAME_FILTERING)

        add_idx_fold_results(idx_split, fold_results)
        if 'venn_diagram_set' not in fold_results:
            fold_results['venn_diagram_set'] = []

        fold_results['venn_diagram_set'] += venn_diagram_set
        fold_results['venn_diagram_set'] = list(set(fold_results['venn_diagram_set']))


        _precision, _recall, _f1score = avg_macro_metric_for(prediction_metrics)

        logger.info("")
        logger.info(Fore.YELLOW + "Model metrics" + Style.RESET_ALL)
        logger.info("precision: " + Fore.RED + "{:.3f}".format(_precision) + Style.RESET_ALL)
        logger.info("recall:    " + Fore.RED + "{:.3f}".format(_recall) + Style.RESET_ALL)
        logger.info("f1-score:  " + Fore.RED + "{:.3f}".format(_f1score) + Style.RESET_ALL)

        idx_split = int(idx_split)

        idx_split += 1


        log_sources_data = [api_metrics, so_metrics, git_metrics, misc_metrics]
        log_sources_ids = ['api_metrics', 'so_metrics', 'git_metrics', 'misc_metrics']

        for _id, __data in zip(log_sources_ids, log_sources_data):
            _precision, _recall, _f1score = avg_macro_metric_for(__data)

            logger.info("")
            logger.info(Fore.YELLOW + f"{_id}" + Style.RESET_ALL)
            logger.info("precision: " + Fore.RED + "{:.3f}".format(_precision) + Style.RESET_ALL)
            logger.info("recall:    " + Fore.RED + "{:.3f}".format(_recall) + Style.RESET_ALL)
            logger.info("f1-score:  " + Fore.RED + "{:.3f}".format(_f1score) + Style.RESET_ALL)


    #     break
        if idx_split >= 3:
            break

[33mi=0[0m

[31mFold 0[0m
how can i get the value of text view in recyclerview item?
Hide MarkerView when nothing selected
How to check programmatically whether app is running in debug mode or not?
JSONObject parse dictionary objects
Want to add drawable icons insteadof colorful dots


100%|██████████| 7918/7918 [00:00<00:00, 860331.05it/s]

----------
[31mtrain[0m
0    1659
1     830
Name: category_index, dtype: int64

[31mtest[0m
0    664
1     71
Name: category_index, dtype: int64

[31mweights[0m
{0: 1.0, 1: 2.0}
----------



All model checkpoint layers were used when initializing TFBertForSequenceClassification.

Some layers of TFBertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch 1/10
The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).
AutoGraph could not transform <bound method Socket.send of <zmq.sugar.socket.Socket object at 0x2b2ffb0193d0>> and will run it as-is.
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: module, class, method, function, traceback, frame, or code object was expected, got cython_function_or_method
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: module, class, method, function, traceback, frame, or code object was expected, got cython_function_or_method
The parameter `return_dict` cannot be set in graph mode an


[33mmisc_metrics[0m
precision: [31m0.569[0m
recall:    [31m0.637[0m
f1-score:  [31m0.492[0m

[31mFold 1[0m
 height must be > 0
Write and Read a json data to internal storage android
Android PDF Rendering
How can I hide a fragment on start of my MainActivity( or the application)?
polymorphic deserialization of JSON with jackson, property type becomes &quot;null&quot;


100%|██████████| 7918/7918 [00:00<00:00, 856006.88it/s]

----------
[31mtrain[0m
0    1605
1     803
Name: category_index, dtype: int64

[31mtest[0m
0    659
1    101
Name: category_index, dtype: int64

[31mweights[0m
{0: 1.0, 1: 2.0}
----------



All model checkpoint layers were used when initializing TFBertForSequenceClassification.

Some layers of TFBertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch 1/10
The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).
The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.
The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).
The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.
The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.

Epoch 00001: val_loss improved from inf to 0.66488, saving model to /home/msarthur/scratch/best_model
Epoch 2/10
Epoch 00002: val_loss improved from 0.66488 to 0.60778, saving model to /home/msarthur/scratch/best_model
Epoch 3/10
Ep

The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.
--------------------
Y
[0s] 1 [1s] 3
predicted
[0s] 2 [1s] 2
--------------------
Accuracy: 0.7500
macro_f1: 0.7333
Precision: 0.7500
Recall: 0.8333
F1: 0.7333
[31m2[0m entries logged

[33mModel metrics[0m
precision: [31m0.568[0m
recall:    [31m0.595[0m
f1-score:  [31m0.524[0m

[33mapi_metrics[0m
precision: [31m0.493[0m
recall:    [31m0.534[0m
f1-score:  [31m0.380[0m

[33mso_metrics[0m
precision: [31m0.563[0m
recall:    [31m0.589[0m
f1-score:  [31m0.551[0m

[33mgit_metrics[0m
precision: [31m0.729[0m
recall:    [31m0.754[0m
f1-score:  [31m0.738[0m

[33mmisc_metrics[0m
precision: [31m0.621[0m
recall:    [31m0.635[0m
f1-score:  [31m0.599[0m

[31mFold 2[0m
How to Integrate reCAPTCHA 2.0 in Android
How can I make this rxjava zip to run in parallel?
Permission Denial when trying to access contacts in Android
keyUp called when key is still pressed
Don’t leak Mo

100%|██████████| 7918/7918 [00:00<00:00, 848440.31it/s]

----------
[31mtrain[0m
0    1463
1     732
Name: category_index, dtype: int64

[31mtest[0m
0    1178
1     180
Name: category_index, dtype: int64

[31mweights[0m
{0: 1.0, 1: 2.0}
----------



All model checkpoint layers were used when initializing TFBertForSequenceClassification.

Some layers of TFBertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch 1/10
The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).
The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.
The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).
The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.
The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.

Epoch 00001: val_loss improved from inf to 0.67841, saving model to /home/msarthur/scratch/best_model
Epoch 2/10
Epoch 00002: val_loss improved from 0.67841 to 0.61994, saving model to /home/msarthur/scratch/best_model
Epoch 3/10
Ep

recall:    [31m0.677[0m
f1-score:  [31m0.516[0m
[33mi=1[0m

[31mFold 0[0m
how can i get the value of text view in recyclerview item?
Hide MarkerView when nothing selected
How to check programmatically whether app is running in debug mode or not?
JSONObject parse dictionary objects
Want to add drawable icons insteadof colorful dots


100%|██████████| 7918/7918 [00:00<00:00, 824122.76it/s]

----------
[31mtrain[0m
0    1659
1     830
Name: category_index, dtype: int64

[31mtest[0m
0    664
1     71
Name: category_index, dtype: int64

[31mweights[0m
{1: 2.0, 0: 1.0}
----------



All model checkpoint layers were used when initializing TFBertForSequenceClassification.

Some layers of TFBertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch 1/10
The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).
The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.
The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).
The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.
The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.

Epoch 00001: val_loss improved from inf to 0.62363, saving model to /home/msarthur/scratch/best_model
Epoch 2/10
Epoch 00002: val_loss improved from 0.62363 to 0.55935, saving model to /home/msarthur/scratch/best_model
Epoch 3/10
Ep

100%|██████████| 7918/7918 [00:00<00:00, 817207.59it/s]

----------
[31mtrain[0m
0    1605
1     803
Name: category_index, dtype: int64

[31mtest[0m
0    659
1    101
Name: category_index, dtype: int64

[31mweights[0m
{0: 1.0, 1: 2.0}
----------



All model checkpoint layers were used when initializing TFBertForSequenceClassification.

Some layers of TFBertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch 1/10
The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).
The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.
The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).
The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.
The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.

Epoch 00001: val_loss improved from inf to 0.68739, saving model to /home/msarthur/scratch/best_model
Epoch 2/10
Epoch 00002: val_loss improved from 0.68739 to 0.65358, saving model to /home/msarthur/scratch/best_model
Epoch 3/10
Ep

Recall: 0.6667
F1: 0.5833
[31m1[0m entries logged
https://github.com/FasterXML/jackson-databind/issues/1538
--------------------
Y
[0s] 26 [1s] 10
predicted
[0s] 25 [1s] 11
--------------------
Accuracy: 0.6944
macro_f1: 0.6303
Precision: 0.6273
Recall: 0.6346
F1: 0.6303
[31m5[0m entries logged
https://stackoverflow.com/questions/40168601
The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).
The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.
--------------------
Y
[0s] 1 [1s] 3
predicted
[0s] 2 [1s] 2
--------------------
Accuracy: 0.7500
macro_f1: 0.7333
Precision: 0.7500
Recall: 0.8333
F1: 0.7333
[31m2[0m entries logged

[33mModel metrics[0m
precision: [31m0.614[0m
recall:    [31m0.656[0m
f1-score:  [31m0.538[0m

[33mapi_metrics[0m
precision: 

100%|██████████| 7918/7918 [00:00<00:00, 855411.58it/s]

----------
[31mtrain[0m
0    1463
1     732
Name: category_index, dtype: int64

[31mtest[0m
0    1178
1     180
Name: category_index, dtype: int64

[31mweights[0m
{1: 2.0, 0: 1.0}
----------



All model checkpoint layers were used when initializing TFBertForSequenceClassification.

Some layers of TFBertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch 1/10
The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).
The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.
The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).
The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.
The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.

Epoch 00001: val_loss improved from inf to 0.61150, saving model to /home/msarthur/scratch/best_model
Epoch 2/10
Epoch 00002: val_loss improved from 0.61150 to 0.60043, saving model to /home/msarthur/scratch/best_model
Epoch 3/10
Ep

Accuracy: 0.5556
macro_f1: 0.5000
Precision: 0.5687
Recall: 0.6359
F1: 0.5000
[31m3[0m entries logged
https://stackoverflow.com/questions/27297067
--------------------
Y
[0s] 10 [1s] 11
predicted
[0s] 5 [1s] 16
--------------------
Accuracy: 0.3810
macro_f1: 0.3259
Precision: 0.3187
Recall: 0.3682
F1: 0.3259
[31m7[0m entries logged

[33mModel metrics[0m
precision: [31m0.532[0m
recall:    [31m0.566[0m
f1-score:  [31m0.433[0m

[33mapi_metrics[0m
precision: [31m0.575[0m
recall:    [31m0.544[0m
f1-score:  [31m0.437[0m

[33mso_metrics[0m
precision: [31m0.489[0m
recall:    [31m0.513[0m
f1-score:  [31m0.435[0m

[33mgit_metrics[0m
precision: [31m0.521[0m
recall:    [31m0.550[0m
f1-score:  [31m0.471[0m

[33mmisc_metrics[0m
precision: [31m0.533[0m
recall:    [31m0.629[0m
f1-score:  [31m0.422[0m
[33mi=2[0m

[31mFold 0[0m
how can i get the value of text view in recyclerview item?
Hide MarkerView when nothing selected
How to check programmatically whe

100%|██████████| 7918/7918 [00:00<00:00, 710536.99it/s]

----------
[31mtrain[0m
0    1659
1     830
Name: category_index, dtype: int64

[31mtest[0m
0    664
1     71
Name: category_index, dtype: int64

[31mweights[0m
{0: 1.0, 1: 2.0}
----------



All model checkpoint layers were used when initializing TFBertForSequenceClassification.

Some layers of TFBertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch 1/10
The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).
The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.
The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).
The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.
The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.

Epoch 00001: val_loss improved from inf to 0.64738, saving model to /home/msarthur/scratch/best_model
Epoch 2/10
Epoch 00002: val_loss improved from 0.64738 to 0.60657, saving model to /home/msarthur/scratch/best_model
Epoch 3/10
Ep

f1-score:  [31m0.525[0m

[33mgit_metrics[0m
precision: [31m0.543[0m
recall:    [31m0.575[0m
f1-score:  [31m0.553[0m

[33mmisc_metrics[0m
precision: [31m0.616[0m
recall:    [31m0.671[0m
f1-score:  [31m0.594[0m

[31mFold 1[0m
 height must be > 0
Write and Read a json data to internal storage android
Android PDF Rendering
How can I hide a fragment on start of my MainActivity( or the application)?
polymorphic deserialization of JSON with jackson, property type becomes &quot;null&quot;


100%|██████████| 7918/7918 [00:00<00:00, 418320.94it/s]

----------
[31mtrain[0m
0    1605
1     803
Name: category_index, dtype: int64

[31mtest[0m
0    659
1    101
Name: category_index, dtype: int64

[31mweights[0m
{0: 1.0, 1: 2.0}
----------



All model checkpoint layers were used when initializing TFBertForSequenceClassification.

Some layers of TFBertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch 1/10
The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).
The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.
The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).
The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.
The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.

Epoch 00001: val_loss improved from inf to 0.68938, saving model to /home/msarthur/scratch/best_model
Epoch 2/10
Epoch 00002: val_loss improved from 0.68938 to 0.65312, saving model to /home/msarthur/scratch/best_model
Epoch 3/10
Ep

  _warn_prf(average, modifier, msg_start, len(result))


--------------------
Y
[0s] 26 [1s] 10
predicted
[0s] 32 [1s] 4
--------------------
Accuracy: 0.7222
macro_f1: 0.5567
Precision: 0.6250
Recall: 0.5615
F1: 0.5567
[31m2[0m entries logged
https://developer.android.com/reference/android/graphics/pdf/PdfRenderer
--------------------
Y
[0s] 36 [1s] 8
predicted
[0s] 12 [1s] 32
--------------------
Accuracy: 0.3182
macro_f1: 0.3125
Precision: 0.4531
Recall: 0.4375
F1: 0.3125
[31m5[0m entries logged
https://stackoverflow.com/questions/14347588
--------------------
Y
[0s] 20 [1s] 5
predicted
[0s] 25 [1s] 0
--------------------
Accuracy: 0.8000
macro_f1: 0.4444
Precision: 0.4000
Recall: 0.5000
F1: 0.4444
[31m0[0m entries logged
https://stackoverflow.com/questions/2883355
The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).
The parameter `return_dict` c

100%|██████████| 7918/7918 [00:00<00:00, 818577.29it/s]

----------
[31mtrain[0m
0    1463
1     732
Name: category_index, dtype: int64

[31mtest[0m
0    1178
1     180
Name: category_index, dtype: int64

[31mweights[0m
{1: 2.0, 0: 1.0}
----------



All model checkpoint layers were used when initializing TFBertForSequenceClassification.

Some layers of TFBertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch 1/10
The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).
The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.
The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).
The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.
The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.

Epoch 00001: val_loss improved from inf to 0.64538, saving model to /home/msarthur/scratch/best_model
Epoch 2/10
Epoch 00002: val_loss improved from 0.64538 to 0.60769, saving model to /home/msarthur/scratch/best_model
Epoch 3/10
Ep

recall:    [31m0.624[0m
f1-score:  [31m0.536[0m

[33mapi_metrics[0m
precision: [31m0.622[0m
recall:    [31m0.662[0m
f1-score:  [31m0.566[0m

[33mso_metrics[0m
precision: [31m0.584[0m
recall:    [31m0.589[0m
f1-score:  [31m0.558[0m

[33mgit_metrics[0m
precision: [31m0.480[0m
recall:    [31m0.458[0m
f1-score:  [31m0.444[0m

[33mmisc_metrics[0m
precision: [31m0.552[0m
recall:    [31m0.656[0m
f1-score:  [31m0.511[0m
[33mi=3[0m

[31mFold 0[0m
how can i get the value of text view in recyclerview item?
Hide MarkerView when nothing selected
How to check programmatically whether app is running in debug mode or not?
JSONObject parse dictionary objects
Want to add drawable icons insteadof colorful dots


100%|██████████| 7918/7918 [00:00<00:00, 778894.39it/s]

----------
[31mtrain[0m
0    1659
1     830
Name: category_index, dtype: int64

[31mtest[0m
0    664
1     71
Name: category_index, dtype: int64

[31mweights[0m
{0: 1.0, 1: 2.0}
----------



All model checkpoint layers were used when initializing TFBertForSequenceClassification.

Some layers of TFBertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch 1/10
The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).
The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.
The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).
The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.
The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.

Epoch 00001: val_loss improved from inf to 0.67419, saving model to /home/msarthur/scratch/best_model
Epoch 2/10
Epoch 00002: val_loss improved from 0.67419 to 0.62809, saving model to /home/msarthur/scratch/best_model
Epoch 3/10
Ep

100%|██████████| 7918/7918 [00:00<00:00, 785396.69it/s]

----------
[31mtrain[0m
0    1605
1     803
Name: category_index, dtype: int64

[31mtest[0m
0    659
1    101
Name: category_index, dtype: int64

[31mweights[0m
{0: 1.0, 1: 2.0}
----------



All model checkpoint layers were used when initializing TFBertForSequenceClassification.

Some layers of TFBertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch 1/10
The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).
The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.
The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).
The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.
The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.

Epoch 00001: val_loss improved from inf to 0.67348, saving model to /home/msarthur/scratch/best_model
Epoch 2/10
Epoch 00002: val_loss improved from 0.67348 to 0.60028, saving model to /home/msarthur/scratch/best_model
Epoch 3/10
Ep

  _warn_prf(average, modifier, msg_start, len(result))


--------------------
Y
[0s] 132 [1s] 31
predicted
[0s] 159 [1s] 4
--------------------
Accuracy: 0.7853
macro_f1: 0.4399
Precision: 0.4025
Recall: 0.4848
F1: 0.4399
[31m0[0m entries logged
https://developer.android.com/reference/android/graphics/pdf/PdfRenderer
--------------------
Y
[0s] 36 [1s] 8
predicted
[0s] 12 [1s] 32
--------------------
Accuracy: 0.4545
macro_f1: 0.4500
Precision: 0.6250
Recall: 0.6667
F1: 0.4500
[31m8[0m entries logged
https://developer.android.com/training/basics/firstapp/starting-activity
--------------------
Y
[0s] 66 [1s] 6
predicted
[0s] 66 [1s] 6
--------------------
Accuracy: 0.8889
macro_f1: 0.6364
Precision: 0.6364
Recall: 0.6364
F1: 0.6364
[31m2[0m entries logged
https://docs.oracle.com/javase/7/docs/api/java/awt/Rectangle.html
--------------------
Y
[0s] 53 [1s] 3
predicted
[0s] 19 [1s] 37
--------------------
Accuracy: 0.3571
macro_f1: 0.3000
Precision: 0.5007
Recall: 0.5031
F1: 0.3000
[31m2[0m entries logged
https://medium.com/@chahat.jain

100%|██████████| 7918/7918 [00:00<00:00, 775511.37it/s]

----------
[31mtrain[0m
0    1463
1     732
Name: category_index, dtype: int64

[31mtest[0m
0    1178
1     180
Name: category_index, dtype: int64

[31mweights[0m
{1: 2.0, 0: 1.0}
----------



All model checkpoint layers were used when initializing TFBertForSequenceClassification.

Some layers of TFBertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch 1/10
The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).
The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.
The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).
The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.
The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.

Epoch 00001: val_loss improved from inf to 0.69001, saving model to /home/msarthur/scratch/best_model
Epoch 2/10
Epoch 00002: val_loss improved from 0.69001 to 0.68811, saving model to /home/msarthur/scratch/best_model
Epoch 3/10
Ep

macro_f1: 0.6786
Precision: 0.6786
Recall: 0.6786
F1: 0.6786
[31m11[0m entries logged
https://developer.android.com/training/keyboard-input/commands
--------------------
Y
[0s] 11 [1s] 3
predicted
[0s] 7 [1s] 7
--------------------
Accuracy: 0.7143
macro_f1: 0.6889
Precision: 0.7143
Recall: 0.8182
F1: 0.6889
[31m3[0m entries logged

[33mModel metrics[0m
precision: [31m0.583[0m
recall:    [31m0.631[0m
f1-score:  [31m0.567[0m

[33mapi_metrics[0m
precision: [31m0.614[0m
recall:    [31m0.637[0m
f1-score:  [31m0.590[0m

[33mso_metrics[0m
precision: [31m0.597[0m
recall:    [31m0.620[0m
f1-score:  [31m0.588[0m

[33mgit_metrics[0m
precision: [31m0.547[0m
recall:    [31m0.600[0m
f1-score:  [31m0.530[0m

[33mmisc_metrics[0m
precision: [31m0.553[0m
recall:    [31m0.642[0m
f1-score:  [31m0.538[0m
[33mi=4[0m

[31mFold 0[0m
how can i get the value of text view in recyclerview item?
Hide MarkerView when nothing selected
How to check programmatically whe

100%|██████████| 7918/7918 [00:00<00:00, 408478.15it/s]

----------
[31mtrain[0m
0    1659
1     830
Name: category_index, dtype: int64

[31mtest[0m
0    664
1     71
Name: category_index, dtype: int64

[31mweights[0m
{1: 2.0, 0: 1.0}
----------



All model checkpoint layers were used when initializing TFBertForSequenceClassification.

Some layers of TFBertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch 1/10
The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).
The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.
The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).
The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.
The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.

Epoch 00001: val_loss improved from inf to 0.68193, saving model to /home/msarthur/scratch/best_model
Epoch 2/10
Epoch 00002: val_loss improved from 0.68193 to 0.59190, saving model to /home/msarthur/scratch/best_model
Epoch 3/10
Ep

recall:    [31m0.636[0m
f1-score:  [31m0.598[0m

[33mgit_metrics[0m
precision: [31m0.509[0m
recall:    [31m0.521[0m
f1-score:  [31m0.496[0m

[33mmisc_metrics[0m
precision: [31m0.596[0m
recall:    [31m0.644[0m
f1-score:  [31m0.554[0m

[31mFold 1[0m
 height must be > 0
Write and Read a json data to internal storage android
Android PDF Rendering
How can I hide a fragment on start of my MainActivity( or the application)?
polymorphic deserialization of JSON with jackson, property type becomes &quot;null&quot;


100%|██████████| 7918/7918 [00:00<00:00, 753619.39it/s]

----------
[31mtrain[0m
0    1605
1     803
Name: category_index, dtype: int64

[31mtest[0m
0    659
1    101
Name: category_index, dtype: int64

[31mweights[0m
{0: 1.0, 1: 2.0}
----------



All model checkpoint layers were used when initializing TFBertForSequenceClassification.

Some layers of TFBertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch 1/10
The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).
The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.
The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).
The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.
The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.

Epoch 00001: val_loss improved from inf to 0.64965, saving model to /home/msarthur/scratch/best_model
Epoch 2/10
Epoch 00002: val_loss improved from 0.64965 to 0.61753, saving model to /home/msarthur/scratch/best_model
Epoch 3/10
Ep

--------------------
Y
[0s] 11 [1s] 5
predicted
[0s] 3 [1s] 13
--------------------
Accuracy: 0.5000
macro_f1: 0.4921
Precision: 0.6923
Recall: 0.6364
F1: 0.4921
[31m5[0m entries logged
https://stackoverflow.com/questions/40168601
The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).
The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.
--------------------
Y
[0s] 1 [1s] 3
predicted
[0s] 3 [1s] 1
--------------------
Accuracy: 0.5000
macro_f1: 0.5000
Precision: 0.6667
Recall: 0.6667
F1: 0.5000
[31m1[0m entries logged

[33mModel metrics[0m
precision: [31m0.616[0m
recall:    [31m0.631[0m
f1-score:  [31m0.499[0m

[33mapi_metrics[0m
precision: [31m0.584[0m
recall:    [31m0.653[0m
f1-score:  [31m0.482[0m

[33mso_metrics[0m
precision: [31m0.616[0m

100%|██████████| 7918/7918 [00:00<00:00, 858573.95it/s]

----------
[31mtrain[0m
0    1463
1     732
Name: category_index, dtype: int64

[31mtest[0m
0    1178
1     180
Name: category_index, dtype: int64

[31mweights[0m
{0: 1.0, 1: 2.0}
----------



All model checkpoint layers were used when initializing TFBertForSequenceClassification.

Some layers of TFBertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch 1/10
The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).
The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.
The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).
The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.
The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.

Epoch 00001: val_loss improved from inf to 0.64753, saving model to /home/msarthur/scratch/best_model
Epoch 2/10
Epoch 00002: val_loss improved from 0.64753 to 0.60221, saving model to /home/msarthur/scratch/best_model
Epoch 3/10
Ep

recall:    [31m0.595[0m
f1-score:  [31m0.496[0m

[33mapi_metrics[0m
precision: [31m0.586[0m
recall:    [31m0.581[0m
f1-score:  [31m0.493[0m

[33mso_metrics[0m
precision: [31m0.560[0m
recall:    [31m0.573[0m
f1-score:  [31m0.534[0m

[33mgit_metrics[0m
precision: [31m0.547[0m
recall:    [31m0.600[0m
f1-score:  [31m0.530[0m

[33mmisc_metrics[0m
precision: [31m0.543[0m
recall:    [31m0.623[0m
f1-score:  [31m0.463[0m


In [24]:
for key_i, value in fold_results.items():
    if isinstance(value, dict):
        for key_j, __data in value.items():
            if key_j == 'overall':
                logger.info(Fore.YELLOW + f"{key_i}" + Style.RESET_ALL)
                logger.info("precision: " + Fore.RED +
                            "{:.3f}".format(np.mean(__data['precision'])) + Style.RESET_ALL +
                           f" {str([round(x, 2) for x in __data['precision']])}")
                logger.info("recall:    " + Fore.RED +
                            "{:.3f}".format(np.mean(__data['recall'])) + Style.RESET_ALL+
                           f" {str([round(x, 2) for x in __data['recall']])}")
                logger.info("f1-score:  " + 
                            Fore.RED + "{:.3f}".format(np.mean(__data['fscore'])) + Style.RESET_ALL+
                           f" {str([round(x, 2) for x in __data['fscore']])}")

[33m0[0m
precision: [31m0.565[0m [0.57, 0.54, 0.55, 0.59, 0.58, 0.54, 0.57, 0.57]
recall:    [31m0.573[0m [0.59, 0.54, 0.56, 0.58, 0.57, 0.56, 0.61, 0.58]
f1-score:  [31m0.507[0m [0.5, 0.45, 0.48, 0.53, 0.52, 0.53, 0.54, 0.51]
[33m1[0m
precision: [31m0.576[0m [0.65, 0.54, 0.57, 0.61, 0.53, 0.51, 0.62]
recall:    [31m0.606[0m [0.67, 0.59, 0.6, 0.66, 0.54, 0.56, 0.63]
f1-score:  [31m0.505[0m [0.54, 0.5, 0.52, 0.54, 0.48, 0.47, 0.5]
[33m2[0m
precision: [31m0.563[0m [0.55, 0.54, 0.6, 0.53, 0.58, 0.58, 0.56]
recall:    [31m0.597[0m [0.57, 0.57, 0.63, 0.57, 0.62, 0.63, 0.59]
f1-score:  [31m0.501[0m [0.44, 0.5, 0.53, 0.43, 0.54, 0.57, 0.5]


In [25]:
logger.info(Fore.YELLOW + "Caching results" + Style.RESET_ALL)
with open('bert_ds_android.json', 'w') as fo:
    json.dump(fold_results, fo, indent=4)

[33mCaching results[0m


In [26]:
fold_results.keys()

dict_keys(['0', 'venn_diagram_set', '1', '2'])

In [27]:
# cnt = 0
# for source in df_test["source"].unique():
#     df_source = df_test[df_test["source"] == source]   
#     logger.info(source)
#     test_model(source, df_source, model, tokenizer, pos_filter=True)
#     cnt += 1
#     if cnt >= 5:
#         break

In [28]:
#@title Metrics report
# logger.info(json.dumps(fold_results, indent=4, sort_keys=True))

In [29]:
# _precision, _recall, _f1score = avg_macro_metric_for(prediction_metrics)

# logger.info("")
# logger.info(Fore.YELLOW + "Model metrics" + Style.RESET_ALL)
# logger.info("precision: " + Fore.RED + "{:.3f}".format(_precision) + Style.RESET_ALL)
# logger.info("recall:    " + Fore.RED + "{:.3f}".format(_recall) + Style.RESET_ALL)
# logger.info("f1-score:  " + Fore.RED + "{:.3f}".format(_f1score) + Style.RESET_ALL)


# _precision, _recall, _f1score = avg_macro_metric_for(api_metrics)

# logger.info("")
# logger.info(Fore.YELLOW + "API metrics" + Style.RESET_ALL)
# logger.info("precision: " + Fore.RED + "{:.3f}".format(_precision) + Style.RESET_ALL)
# logger.info("recall:    " + Fore.RED + "{:.3f}".format(_recall) + Style.RESET_ALL)
# logger.info("f1-score:  " + Fore.RED + "{:.3f}".format(_f1score) + Style.RESET_ALL)

# _precision, _recall, _f1score = avg_macro_metric_for(so_metrics)

# logger.info("")
# logger.info(Fore.YELLOW + "SO metrics" + Style.RESET_ALL)
# logger.info("precision: " + Fore.RED + "{:.3f}".format(_precision) + Style.RESET_ALL)
# logger.info("recall:    " + Fore.RED + "{:.3f}".format(_recall) + Style.RESET_ALL)
# logger.info("f1-score:  " + Fore.RED + "{:.3f}".format(_f1score) + Style.RESET_ALL)

# _precision, _recall, _f1score = avg_macro_metric_for(git_metrics)

# logger.info("")
# logger.info(Fore.YELLOW + "GIT metrics" + Style.RESET_ALL)
# logger.info("precision: " + Fore.RED + "{:.3f}".format(_precision) + Style.RESET_ALL)
# logger.info("recall:    " + Fore.RED + "{:.3f}".format(_recall) + Style.RESET_ALL)
# logger.info("f1-score:  " + Fore.RED + "{:.3f}".format(_f1score) + Style.RESET_ALL)

# _precision, _recall, _f1score = avg_macro_metric_for(misc_metrics)

# logger.info("")
# logger.info(Fore.YELLOW + "MISC metrics" + Style.RESET_ALL)
# logger.info("precision: " + Fore.RED + "{:.3f}".format(_precision) + Style.RESET_ALL)
# logger.info("recall:    " + Fore.RED + "{:.3f}".format(_recall) + Style.RESET_ALL)
# logger.info("f1-score:  " + Fore.RED + "{:.3f}".format(_f1score) + Style.RESET_ALL)

In [30]:
def examples_per_source_type(source_type='misc', n_samples=None):
    _sources = list(set([x[0] for x in log_examples_lst]))

    _template = "[w={}]" + Fore.RED + "[y={}]" + Fore.YELLOW + "[p={:.4f}]" + Style.RESET_ALL + " {}"

    idx = 0
    for s in _sources:
        examples_in_source = []
        if source_type == 'api' and ('docs.oracle' in s or 'developer.android' in s):
            examples_in_source = list(filter(lambda k: k[0] == s, log_examples_lst))
            task_title = examples_in_source[0][1]
            idx += 1
        elif source_type == 'so' and ('stackoverflow.com' in s):
            examples_in_source = list(filter(lambda k: k[0] == s, log_examples_lst))
            task_title = examples_in_source[0][1]            
            idx += 1
        elif source_type == 'git' and ('github.com' in s):
            examples_in_source = list(filter(lambda k: k[0] == s, log_examples_lst))
            task_title = examples_in_source[0][1]
            idx += 1
        elif source_type == 'misc' and 'github.com' not in s and 'docs.oracle' not in s and 'developer.android' not in s and 'stackoverflow.com' not in s:
            examples_in_source = list(filter(lambda k: k[0] == s, log_examples_lst))
            task_title = examples_in_source[0][1]
            idx += 1
        if not examples_in_source:
            continue
        logger.info('')
        logger.info(Fore.RED + f"{task_title}" + Style.RESET_ALL)    
        logger.info(s)
        logger.info('')

        for _, _, pweights, y_predict, y_probs, text in examples_in_source:
            logger.info(_template.format(pweights, y_predict, y_probs, text))
            logger.info('')
        logger.info('-' * 20)
      
        if n_samples and idx >= n_samples:
            break
    

In [31]:
#@title Sample prediction outputs for API sources

logger.info(Fore.RED + "API" + Style.RESET_ALL)
examples_per_source_type(source_type='api', n_samples=8)

[31mAPI[0m

[31mHow to Integrate reCAPTCHA 2.0 in Android[0m
https://developer.android.com/training/safetynet/recaptcha

[w=1][31m[y=1][33m[p=0.7355][0m In the Adding reCAPTCHA to your app section on the page that appears next, your public and private keys appear under Site key and Secret key, respectively.

[w=2][31m[y=1][33m[p=0.7347][0m reCAPTCHA is a free service that uses an advanced risk analysis engine to protect your app from spam and other abusive actions.

[w=1][31m[y=1][33m[p=0.7316][0m The SafetyNet service includes a reCAPTCHA API that you can use to protect your app from malicious traffic.

[w=1][31m[y=1][33m[p=0.7314][0m When the reCAPTCHA API executes the onSuccess ( ) method, the user has successfully completed the CAPTCHA challenge.

[w=0][31m[y=1][33m[p=0.7314][0m Add the calling app's package name to the site key on the reCAPTCHA Admin Console, or disable package name validation for your site key.

[w=0][31m[y=1][33m[p=0.7307][0m By accessing o

In [32]:
#@title Sample prediction outputs for GIT sources

logger.info(Fore.RED + "GIT" + Style.RESET_ALL)
examples_per_source_type(source_type='git', n_samples=8)

[31mGIT[0m

[31mPermission Denial when trying to access contacts in Android[0m
https://github.com/morenoh149/react-native-contacts/issues/516

[w=1][31m[y=1][33m[p=0.7097][0m Check permissions before calling Contacts.getAll ( )

[w=0][31m[y=1][33m[p=0.7090][0m If permissions are not granted, the callback should be called, with the error field being non-null/undefined.

[w=0][31m[y=1][33m[p=0.6979][0m You must use read profile permission in android platform.

[w=0][31m[y=1][33m[p=0.6974][0m My PermissionsAndroid is granted and i can not catch the error, still have the crash with API 22 when i make a getAll call.

[w=0][31m[y=1][33m[p=0.6937][0m Contacts.getAll ( ) crashes Android app when permissions are not granted

[w=0][31m[y=1][33m[p=0.6931][0m In iOS, permissions aren't granted, it will be handled in the error block:

[w=0][31m[y=1][33m[p=0.6911][0m Contacts.getAll ( ) crashes Android app when permissions are not granted · Issue # 516 · morenoh149/react-nat

In [33]:
#@title Sample prediction outputs for SO sources

logger.info(Fore.RED + "SO" + Style.RESET_ALL)
examples_per_source_type(source_type='so', n_samples=8)

[31mSO[0m

[31mDon’t leak MockWebServer ports across tests[0m
https://stackoverflow.com/questions/24952513

[w=1][31m[y=1][33m[p=0.7288][0m The easiest way to simulate network issues with MockWebServer is by setting the SocketPolicy to SocketPolicy.DISCONNECT _ AT_START, SocketPolicy.NO _ RESPONSE or etc:

[w=0][31m[y=1][33m[p=0.7264][0m As stated in above answers, MockWebServer is a great library for mocking retrofit responses, but you don't need that library for mocking this exception.

[w=0][31m[y=1][33m[p=0.7243][0m MockRestAdapter offers these APIs:

[w=0][31m[y=1][33m[p=0.7213][0m For mocking all other exceptions I would recommend MockWebServer, I use it a lot in my project for testing responses.

[w=0][31m[y=1][33m[p=0.7180][0m ConnectException - mockwebserver can throw a timeout exception.

[w=1][31m[y=1][33m[p=0.7111][0m I don't know if it's useful, but you can simulate a timeout with MockWebServer:

[w=0][31m[y=1][33m[p=0.7106][0m Retrofit has a retro

In [34]:
#@title Sample prediction outputs for MISC sources

logger.info(Fore.RED + "MISC" + Style.RESET_ALL)
examples_per_source_type(source_type='misc', n_samples=8)

[31mMISC[0m

[31mPermission Denial when trying to access contacts in Android[0m
https://www.avg.com/en/signal/guide-to-android-app-permissions-how-to-use-them-smartly

[w=0][31m[y=1][33m[p=0.7060][0m They're the most dangerous, because any app with root privileges can do whatever it wants -- regardless which permissions you've already blocked or enabled.

[w=0][31m[y=1][33m[p=0.7058][0m See all apps that are using a specific permission This is similar to the method above, but it works from the opposite direction.

[w=0][31m[y=1][33m[p=0.7039][0m Choose any app, and tap Permissions.

[w=3][31m[y=1][33m[p=0.7031][0m Now Android allows you to decide which permissions to accept on a case-by-case basis -- after the app is installed.

[w=0][31m[y=1][33m[p=0.7018][0m The good: Fitness apps need this permission to monitor your heart rate while you exercise, provide health tips, etc..

[w=0][31m[y=1][33m[p=0.7016][0m If you've installed a camera app, for example, it will n


[w=0][31m[y=1][33m[p=0.7136][0m Let us create our custom runner class that extends the AndroidJUnitRunner.MockRunner.javaLet me bring you focus to this lineWhen we use this MockRunner for testing our application, rather than using MyTestingApp for creating our application component the test will use UiTestApp.So how does that help ?

--------------------


In [35]:
logger.info(Fore.RED + f"{len(fold_results['venn_diagram_set'])} entries VENN SET" + Style.RESET_ALL)
for _t in fold_results['venn_diagram_set']:
    logger.info(_t)

[31m301 entries VENN SET[0m
To set this up, you'll need a mechanism to tell the app to use the real URL normally, but the mock URL when you run tests.
Next, modify teardown ( ) to stop the server:
Make sure you add it outside of the application tag.
You can then use one of the following classes:
Define a concrete implementation of the ContentProvider class and its required methods.
Of the suggestions proposed, LINK actually combines observable results with each other, which may or may not be what is wanted, but was not asked in the question.
Now that we've defined the basic adapter and ViewHolder, we need to begin filling in our adapter.
Returns the value mapped by name if it exists and is an int or can be coerced to an int, or throws otherwise.
I had the same error and traced it to a bug with DrawableCompat.wrap -LRB- -RRB- in 23.4.0 that doesn't exist in earlier & later versions of the support library.
Important: Normal Permissions must be added to the AndroidManifest:
The system p

Returns the value mapped by name, or throws if no such mapping exists.
In MainActivity, add the EXTRA_MESSAGE constant and the sendMessage ( ) code, as shown:
However, in the API response, we actually get a collection of business JSON in an array.
Consider the following below: If you have a JSON object for `` Vehicle'', it could be a `` Car'' or `` Plane'', each with its own fields, some unique to the other.
If you don't declare any dangerous permissions, or if your app is installed on a device that runs Android 5.1 ( API level 22 ) or lower, the permissions are automatically granted, and you don't need to complete any of the remaining steps on this page.
Next, we need to add method that would manage the deserialization of a JSON dictionary into a populated Business object:
Therefore, Android will always ask you to approve dangerous permissions.
Note that in the zip function, the parameters have concrete types that correspond to the types of the observables being zipped.
And this code 

That line draws a single event.
In earlier versions of Android, accepting potentially dangerous permission groups was an all-or-nothing affair.
Setting up our Model The primary resource in the Yelp API is the Business.
Make sure you are referencing your project's BuildConfig class, not from any of your dependency libraries.
When you create a test with a mock server, the app shouldn't use the real URL.
Every Android app runs in a limited-access sandbox.
In the Adding reCAPTCHA to your app section on the page that appears next, your public and private keys appear under Site key and Secret key, respectively.
The following code snippet shows how to invoke this method:
To display the system permissions dialog when necessary, call the launch ( ) method on the instance of ActivityResultLauncher that you saved in the previous step.
You can also control concurrency, which means coordinating the execution of several coroutines declaratively with Flow.
To learn how to validate the user's response

In particular, your app should make users aware of the features that don't work because of the missing permission.
Here, http://127.0.0.1 is the local URL of your computer and 8080 is the port MockWebServer will use.
Check out this stackoverflow for a discussion on deciding when to replace vs hide and show.
These permission can then be allowed or denied by the user.
The system displays a runtime permission prompt, such as the one shown on the permissions overview page.
Overview A fragment is a reusable class implementing a portion of an activity.
If the user presses and holds the button, then onKeyDown ( ) is called multiple times.
The first step when adding a `` Runtime Permission'' is to add it to the AndroidManifest:
render the page on the prepared bitmap
A content provider manages access to a central repository of data.
A Rectangle whose width or height is exactly zero has location along those axes with zero dimension, but is otherwise considered empty.
Beginning in Android 6.0 -LR

Do not inset the content with any margins from the PrintAttributes as the application is responsible to render it such that the margins are respected.
If the permission you need to add isn't listed under the normal permissions, you'll need to deal with `` Runtime Permissions''.
The user had no way of changing permissions, even after installing the app.
Anyone with HTTP POST knowledge could put random data inside of the g-recaptcha-response form field, and foll your site to make it think that this field was provided by the google widget.
Instead, it should use the mock server's URL.
