<a href="https://colab.research.google.com/github/marquesarthur/vanilla-bert-vs-huggingface/blob/main/hugging_face_keras_bert.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Based on 



1.   https://towardsdatascience.com/hugging-face-transformers-fine-tuning-distilbert-for-binary-classification-tasks-490f1d192379
2.   https://www.analyticsvidhya.com/blog/2020/07/transfer-learning-for-nlp-fine-tuning-bert-for-text-classification/
3.   https://huggingface.co/transformers/training.html#fine-tuning-with-keras




**problem statement:**


*   a developer has to inspect an **artifact X**
*   Within the artifact, only a portion of the text is relevant to **input task Y**
*   We ought to build a model that establishes relationships between **Y** and **sentences x ∈ X** 
*  The model must determine: **is x relevant to task Y**




<br>

___

*Example of a task and an annotated artifact:*

<br>

[<img src="https://i.imgur.com/Zj1317H.jpg">](https://i.imgur.com/Zj1317H.jpg)




* The coloured sentences are sentences annotated as relevant to the input task. 
* The warmer the color, the more annotators selected that portion of the text. 
* For simplicity, we process the data and used sentences 

<br>

___

*Ultimately, our data is a tuple representing:*


*   **text** = artifact sentence

*   **question** = task description

*   **source** = URL of the artifact

*   **category_index** = whether sentence is relevant [or not] for the input task

*   **weights** = number of participants who annotated sentence as relevant


<br>

___



In [1]:
# @title Install dependencies

# !pip install transformers
# %tensorflow_version 2.x

In [2]:
# !pip install scikit-learn tqdm pandas python-Levenshtein path colorama matplotlib seaborn

In [3]:
# !pip install python-Levenshtein

In [4]:
# @title Download git repo
# !git clone https://github.com/marquesarthur/vanilla-bert-vs-huggingface.git

In [5]:
# %cd vanilla-bert-vs-huggingface
# !git pull
# !ls -l

In [6]:
# @title Import data as JSON
import itertools
import json
import logging
import os
import sys
import random
from pathlib import Path

from Levenshtein import ratio
from colorama import Fore, Style

logger = logging.getLogger()
logger.level = logging.DEBUG
stream_handler = logging.StreamHandler(sys.stdout)
logger.addHandler(stream_handler)

from ds_android import get_input_for_BERT

raw_data = get_input_for_BERT()

print('Sample entry from data:')
print(json.dumps(raw_data[0], indent=4, sort_keys=True))

[31m39 [33m129 [0m https://developer.android.com/training/permissions/requesting
[31m14 [33m21 [0m https://stackoverflow.com/questions/5233543
[31m4 [33m34 [0m https://github.com/morenoh149/react-native-contacts/issues/516
[31m27 [33m63 [0m https://guides.codepath.com/android/Understanding-App-Permissions
[31m9 [33m161 [0m https://www.avg.com/en/signal/guide-to-android-app-permissions-how-to-use-them-smartly
[31m9 [33m15 [0m https://developer.android.com/training/volley/request
[31m14 [33m65 [0m https://stackoverflow.com/questions/28504524
[31m20 [33m59 [0m https://medium.com/@JasonCromer/android-asynctask-http-request-tutorial-6b429d833e28
[31m5 [33m97 [0m https://www.twilio.com/blog/5-ways-to-make-http-requests-in-java
[31m4 [33m12 [0m https://stackoverflow.com/questions/33241952
[31m6 [33m33 [0m https://github.com/realm/realm-java/issues/776
[31m3 [33m17 [0m https://stackoverflow.com/questions/8712652
[31m8 [33m59 [0m https://dzone.com/articles

[31m6 [33m32 [0m https://stackoverflow.com/questions/10630373
[31m4 [33m54 [0m https://developer.android.com/training/gestures/scroll
[31m4 [33m16 [0m https://stackoverflow.com/questions/39588322
[31m20 [33m196 [0m https://developer.android.com/training/dependency-injection/dagger-android
[31m6 [33m44 [0m https://stackoverflow.com/questions/57235136
[31m24 [33m121 [0m https://guides.codepath.com/android/dependency-injection-with-dagger-2
Sample entry from data:
{
    "category_index": 1,
    "question": "Permission Denial when trying to access contacts in Android",
    "source": "https://developer.android.com/training/permissions/requesting",
    "text": "Every Android app runs in a limited-access sandbox.",
    "weights": 1
}


In [7]:
from collections import Counter, defaultdict

cnt = Counter([d['category_index'] for d in raw_data])

total = sum(cnt.values())

labels_cnt = [cnt[0] / float(total), cnt[1] / float(total)]
print('label distribution')
print('')
print('not-relevant -- {:.0f}%'.format(labels_cnt[0] * 100))
print('RELEVANT ------ {:.0f}%'.format(labels_cnt[1] * 100))

label distribution

not-relevant -- 88%
RELEVANT ------ 12%


In [8]:
seframes = {}
with open('seframes.json') as input_file:
    seframes = json.load(input_file)

In [9]:
def has_meaningful_frame(text):    
    meaning_frames = [
        'Temporal_collocation', 'Execution', 'Using', 'Intentionally_act',
        'Being_obligated', 'Likelihood', 'Causation', 'Required_event',
        'Desiring', 'Awareness', 'Grasp', 'Attempt'
    ]
    
    if text in seframes:
        text_labels = seframes[text]
        if any([elem in meaning_frames for elem in text_labels]):
            return True
        
    return False

In [10]:
fold_results = dict()
if os.path.isfile('bert_ds_synthetic_pyramid.json'):
    logger.info(Fore.YELLOW + "Loading data from cache" + Style.RESET_ALL)
    with open('bert_ds_synthetic_pyramid.json') as input_file:
        fold_results = json.load(input_file)

[33mLoading data from cache[0m


In [11]:
# @title Set environment variables

model_id = 'bert-base-uncased'
# model_id = 'distilbert-base-uncased'

import os
import contextlib
import tensorflow as tf
import os
import codecs
import numpy as np
import math
import json

import numpy as np
import pandas as pd

from collections import defaultdict, Counter
from tqdm import tqdm

USE_TPU = False
os.environ['TF_KERAS'] = '1'

# @title Initialize TPU Strategy
if USE_TPU:
    TPU_WORKER = 'grpc://' + os.environ['COLAB_TPU_ADDR']
    resolver = tf.contrib.cluster_resolver.TPUClusterResolver(TPU_WORKER)
    tf.contrib.distribute.initialize_tpu_system(resolver)
    strategy = tf.contrib.distribute.TPUStrategy(resolver)

# sklearn libs
import sklearn
from sklearn.metrics import confusion_matrix
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, f1_score
from sklearn.metrics import precision_recall_fscore_support
from sklearn.metrics import accuracy_score
from sklearn.metrics import roc_auc_score
from sklearn.metrics import roc_curve
from sklearn.metrics import classification_report

# Tensorflow Imports
import tensorflow as tf
from tensorflow.python import keras
import tensorflow.keras.backend as K
from tensorflow.keras.optimizers import Adam
from tensorflow.keras import initializers


# Hugging face imports
from transformers import AutoTokenizer
from transformers import TFDistilBertForSequenceClassification, TFBertForSequenceClassification
from transformers import TFDistilBertModel, DistilBertConfig
from transformers import DistilBertTokenizerFast, BertTokenizerFast

Falling back to TensorFlow client; we recommended you install the Cloud TPU client directly with pip install cloud-tpu-client.


In [12]:
# @title Model parameters

# Bert Model Constants
SEQ_LEN = 64 # 128
BATCH_SIZE = 64 # 64 32 larger batch size causes OOM errors
EPOCHS = 10 # 3 4
LR = 1e-5 # 2e-5

# 3e-4, 1e-4, 5e-5, 3e-5
# My own constants
# USE_FRAME_FILTERING = False
# UNDERSAMPLING = True
# N_UNDERSAMPLING = 2 # ratio of how many samples from 0-class, to 1-class, e.g.: 2:1
# USE_DS_SYNTHETIC = False

USE_FRAME_FILTERING = False
UNDERSAMPLING = True
N_UNDERSAMPLING = 2 # ratio of how many samples from 0-class, to 1-class, e.g.: 2:1
USE_DS_SYNTHETIC = False
MIN_W = 3

In [13]:
# @title JSON to dataframe helper functions
def undersample_df(df, n_times=3):
    class_0,class_1 = df.category_index.value_counts()
    c0 = df[df['category_index'] == 0]
    c1 = df[df['category_index'] == 1]
    df_0 = c0.sample(int(n_times * class_1))
    
    undersampled_df = pd.concat([df_0, c1],axis=0)
    return undersampled_df

def get_ds_synthetic_data(min_w=MIN_W, undersample_n=2):
    short_task = {
      "bugzilla": """How to query bugs using the custom fields with the Bugzilla REST API?""",
      "databases": """Which technology should be adopted for the database layer abstraction: Object/Relational Mapping (ORM) or a Java Database Connectivity API (JDBC)?""",
      "gpmdpu": """Can I bind the cmd key to the GPMDPU shortcuts?""",
      "lucene": """How does Lucene compute similarity scores for the BM25 similarity?""",
      "networking": """Which technology should be adopted for the notification system, Server-Sent Events (SSE) or WebSockets?""",
    }

    with open('relevance_corpus.json') as ipf:
        aux = json.load(ipf)
        raw_data = defaultdict(list)
        for d in aux:
            if d['task'] == 'yargs':
                continue

            raw_data['text'].append(d['text'])
            raw_data['question'].append(short_task[d['task']])
            raw_data['source'].append(d['source'])
            raw_data['category_index'].append(1 if d['weight'] > min_w else 0)
            raw_data['weights'].append(d['weight'] if d['weight'] > min_w else 0)
            raw_data['source_type'].append('synthetic_dataset')
 
        data = pd.DataFrame.from_dict(raw_data)
        data = undersample_df(data, n_times=undersample_n)
        data = data.sample(frac=1).reset_index(drop=True)
      
    return data

def get_class_weights(y, smooth_factor=0, upper_bound=5.0):
    """
    Returns the weights for each class based on the frequencies of the samples
    :param smooth_factor: factor that smooths extremely uneven weights
    :param y: list of true labels (the labels must be hashable)
    :return: dictionary with the weight for each class
    """
    counter = Counter(y)

    if smooth_factor > 0:
        p = max(counter.values()) * smooth_factor
        for k in counter.keys():
            counter[k] += p

    majority = max(counter.values())

    clazz = {cls: float(majority / count) for cls, count in counter.items()}
    result = {}
    for key, value in clazz.items():
        if value > upper_bound:
            value = upper_bound
        
        result[key] = value
    return result

def add_raw_data(result, data):
    s = data['source']
    if 'docs.oracle' in s or 'developer.android' in s:
        source_type = 'api'
    elif 'stackoverflow.com' in s:
        source_type = 'so'
    elif 'github.com' in s:
        source_type = 'git'
    else:
        source_type = 'misc'
    pyramid = 1 if data['weights'] >= 1 else 0
    
    result['text'].append(data['text'])
    result['question'].append(data['question'])
    result['source'].append(data['source'])
    result['category_index'].append(pyramid)
    result['weights'].append(data['weights'])
    result['source_type'].append(source_type)


In [14]:
# @title Tokenizer

print(model_id)
if model_id == 'distilbert-base-uncased':
    tokenizer = DistilBertTokenizerFast.from_pretrained(model_id, cache_dir='/home/msarthur/scratch', local_files_only=True)
else:
    tokenizer = BertTokenizerFast.from_pretrained(model_id, cache_dir='/home/msarthur/scratch', local_files_only=True)

bert-base-uncased


In [15]:
tokenizer

PreTrainedTokenizerFast(name_or_path='bert-base-uncased', vocab_size=30522, model_max_len=512, is_fast=True, padding_side='right', special_tokens={'unk_token': '[UNK]', 'sep_token': '[SEP]', 'pad_token': '[PAD]', 'cls_token': '[CLS]', 'mask_token': '[MASK]'})

In [16]:
# @title data encoder

def _encode(tokenizer, dataframe, max_length=SEQ_LEN):
    
    seq_a = dataframe['text'].tolist()
    seq_b = dataframe['question'].tolist()
    
    return tokenizer(seq_a, seq_b, truncation=True, padding=True, max_length=max_length)

def to_one_hot_encoding(data, nb_classes = 2):
    targets = np.array([data]).reshape(-1)
    one_hot_targets = np.eye(nb_classes)[targets]
    return one_hot_targets    

In [17]:
# @title Metrics & Logging functions

from sklearn.metrics import classification_report

recommendation_metrics = defaultdict(list)
prediction_metrics = defaultdict(list)
api_metrics = defaultdict(list)
so_metrics = defaultdict(list)
git_metrics = defaultdict(list)
misc_metrics = defaultdict(list)

classification_report_lst = []
log_examples_lst = []
source_lst = []
venn_diagram_set = []

def aggregate_macro_metrics(store_at, precision, recall, fscore):   
    store_at['precision'].append(precision)
    store_at['recall'].append(recall)
    store_at['fscore'].append(fscore)
    
    
def aggregate_macro_source_metrics(precision, recall, fscore, source):
    s = source
    if 'docs.oracle' in s or 'developer.android' in s:
        aggregate_macro_metrics(api_metrics, precision, recall, fscore)
    elif 'stackoverflow.com' in s:
        aggregate_macro_metrics(so_metrics, precision, recall, fscore)
    elif 'github.com' in s:
        aggregate_macro_metrics(git_metrics, precision, recall, fscore)        
    elif  'github.com' not in s and 'docs.oracle' not in s and 'developer.android' not in s and 'stackoverflow.com' not in s:
        aggregate_macro_metrics(misc_metrics, precision, recall, fscore)
    

def aggregate_recommendation_metrics(store_at, k, precision_at_k, pyramid_precision_at_k):
    store_at['k'].append(k)
    store_at['precision'].append(precision_at_k)
    store_at['∆ precision'].append(pyramid_precision_at_k)
    
def aggregate_report_metrics(clz_report):
    relevant_label = str(1)
    if relevant_label in clz_report:
        for _key in ['precision', 'recall']:
            if _key in clz_report[relevant_label]:
                clz_report_lst[_key].append(clz_report[relevant_label][_key])    
                
def log_examples(task_title, source, text, pweights, y_predict, y_probs, k=10):
    # get the predicted prob at every index
    idx_probs = [(idx, y_predict[idx], y_probs[idx]) for idx, _ in enumerate(y_predict)]
    
    # filter probs for all indexes predicted as relevant  
    idx_probs = list(filter(lambda k: k[1] == 1, idx_probs))
    
    most_probable = sorted(idx_probs, key=lambda i: i[2], reverse=True)
    
    result = [idx for idx, _, _ in most_probable][:k]
    
    for idx in result:
        log_examples_lst.append((
            source, 
            task_title,
            pweights[idx],
            y_predict[idx],
            y_probs[idx],
            text[idx]
        ))
        
def log_venn_diagram(y_true, y_predicted, text):
    cnt = 0
    try:
        for _true, _predict, _t in zip(y_true, y_predicted, text):
            if _true == 1 and _predict == 1:
                cnt += 1
                venn_diagram_set.append(_t)
    except Exception as ex:
        logger.info(str(ex))
    logger.info(Fore.RED + str(cnt) + Style.RESET_ALL + " entries logged")

    
def avg_macro_metric_for(data):
    __precision = data['precision']
    __recall = data['recall']
    __fscore = data['fscore']

    return np.mean(__precision), np.mean(__recall), np.mean(__fscore)        

In [18]:
#@title Training procedures

def get_train_val_test(task_uid, size=0.9, undersample=False, aug=True, undersample_n=3, min_w=2):
    if not isinstance(task_uid, list):
        task_uid = [task_uid]
        
    train_data_raw = defaultdict(list)
    test_data_raw = defaultdict(list)
    
    for _data in tqdm(CORPUS):
        if _data['question'] in task_uid:
            add_raw_data(test_data_raw, _data)
        
    
    train_val = get_ds_synthetic_data(undersample_n=undersample_n, min_w=min_w)
    test = pd.DataFrame.from_dict(test_data_raw)
    
    # https://stackoverflow.com/questions/29576430/shuffle-dataframe-rows
    #  randomize rows....    
    train_val = train_val.sample(frac=1).reset_index(drop=True)
    test = test.sample(frac=1).reset_index(drop=True)
    
    print(train_val['category_index'].value_counts())

    
    weights = get_class_weights(train_val['category_index'].tolist())
    
    train, val = train_test_split(
        train_val, 
        stratify=train_val['category_index'].tolist(), 
        train_size=size
    )
    
    return train, val, test, weights        

In [19]:
def update_predictions(task_title, text, y_predict, y_probs, relevant_class=1):
    result = []
    
    for _t, _y, _prob in zip(text, y_predict, y_probs):
        if _y == relevant_class:
            if has_meaningful_frame(_t):
                result.append(_y)
            else:
                result.append(0)
        else:
            result.append(_y)
    
    return result    

In [20]:
# @title Testing procedures

# https://medium.com/geekculture/hugging-face-distilbert-tensorflow-for-custom-text-classification-1ad4a49e26a7
def eval_model(model, test_data):
    preds = model.predict(test_data.batch(1)).logits  
    
    #transform to array with probabilities
    res = tf.nn.softmax(preds, axis=1).numpy()      

    return res.argmax(axis=-1), res[:, 1]

def test_model(source, df_test, model, tokenizer, pos_filter=False):
    
    df_source = df_test[df_test["source"] == source]   
    task_title = df_source['question'].tolist()[0]
    text = df_source['text'].tolist()
    pweights = df_source['weights'].tolist()
    
    # Encode X_test
    test_encodings = _encode(tokenizer, df_source)
    test_labels = df_source['category_index'].tolist()
    
    test_dataset = tf.data.Dataset.from_tensor_slices((
        dict(test_encodings),
        test_labels
    ))
    
    y_true = [y.numpy() for x, y in test_dataset]
    
    # <= 0  means that an artifact has no relevant information highlighted 
    # by two or more annotators. these artifacts are ignored
    if len(list(filter(lambda k: k == 1, y_true))) > 0:
        y_predict, y_probs = eval_model(model, test_dataset)

        if pos_filter:
            y_predict = update_predictions(task_title, text, y_predict, y_probs)


        accuracy = accuracy_score(y_true, y_predict)
        macro_f1 = f1_score(y_true, y_predict, average='macro')

        classification_report_lst.append(classification_report(y_true, y_predict))
        aggregate_report_metrics(classification_report(y_true, y_predict, output_dict=True))


        logger.info("-" * 20)    

        logger.info("Y")
        logger.info("[0s] {} [1s] {}".format(
            len(list(filter(lambda k: k== 0, y_true))),
            len(list(filter(lambda k: k== 1, y_true)))
        ))


        logger.info("predicted")
        logger.info("[0s] {} [1s] {}".format(
            len(list(filter(lambda k: k== 0, y_predict))),
            len(list(filter(lambda k: k== 1, y_predict)))
        ))

        logger.info("-" * 20)

        logger.info("Accuracy: {:.4f}".format(accuracy))
        logger.info("macro_f1: {:.4f}".format(macro_f1))

        precision, recall, fscore, _ = precision_recall_fscore_support(y_true, y_predict, average='macro')

        aggregate_macro_metrics(prediction_metrics, precision, recall, fscore)
        aggregate_macro_source_metrics(precision, recall, fscore, source)

        logger.info("Precision: {:.4f}".format(precision))
        logger.info("Recall: {:.4f}".format(recall))
        logger.info("F1: {:.4f}".format(fscore))

        log_examples(task_title, source, text, pweights, y_predict, y_probs, k=10)
        log_venn_diagram(y_true, y_predict, text)
        source_lst.append(source)

In [21]:
def add_idx_fold_results(idx_split, store_at):
    if idx_split not in store_at:
        store_at[idx_split] = dict()
        store_at[idx_split]['run_cnt'] = 0
        store_at[idx_split]['overall'] = defaultdict(list)
        store_at[idx_split]['api'] = defaultdict(list)
        store_at[idx_split]['so'] = defaultdict(list)
        store_at[idx_split]['git'] = defaultdict(list)
        store_at[idx_split]['misc'] = defaultdict(list)
    
    store_at[idx_split]['run_cnt'] += 1
    
    _precision, _recall, _f1score = avg_macro_metric_for(prediction_metrics)
    store_at[idx_split]['overall']['precision'].append(_precision)
    store_at[idx_split]['overall']['recall'].append(_recall)
    store_at[idx_split]['overall']['fscore'].append(_f1score)  
    
    _precision, _recall, _f1score = avg_macro_metric_for(api_metrics)
    store_at[idx_split]['api']['precision'].append(_precision)
    store_at[idx_split]['api']['recall'].append(_recall)
    store_at[idx_split]['api']['fscore'].append(_f1score)  
    
    _precision, _recall, _f1score = avg_macro_metric_for(so_metrics)
    store_at[idx_split]['so']['precision'].append(_precision)
    store_at[idx_split]['so']['recall'].append(_recall)
    store_at[idx_split]['so']['fscore'].append(_f1score)  
    
    _precision, _recall, _f1score = avg_macro_metric_for(git_metrics)
    store_at[idx_split]['git']['precision'].append(_precision)
    store_at[idx_split]['git']['recall'].append(_recall)
    store_at[idx_split]['git']['fscore'].append(_f1score)  
    
    _precision, _recall, _f1score = avg_macro_metric_for(misc_metrics)
    store_at[idx_split]['misc']['precision'].append(_precision)
    store_at[idx_split]['misc']['recall'].append(_recall)
    store_at[idx_split]['misc']['fscore'].append(_f1score)  

In [22]:
# model = TFBertForSequenceClassification.from_pretrained(model_id, cache_dir='/home/msarthur/scratch', local_files_only=True)

In [23]:
# @title 10-fold cross validation WIP
CORPUS = raw_data

all_tasks = sorted(list(set([d['question'] for d in raw_data])))
rseed = 20210343
random.seed(rseed)
random.shuffle(all_tasks)

from sklearn.model_selection import KFold


file_handler = logging.FileHandler('/home/msarthur/scratch/LOG-bert_ds_synthetic_pyramid.ans')
file_handler.setLevel(logging.DEBUG)
logger.addHandler(file_handler)


n_splits = 10
kf = KFold(n_splits=n_splits, random_state=rseed)
np_tasks_arr = np.array(all_tasks)



idx_split = 0
for train_index, test_index in kf.split(np_tasks_arr):

    idx_split = str(idx_split)
    eval_fold = True
    # 10 runs per fold to avoid reporting peek results in a given fold
    if idx_split in fold_results and fold_results[idx_split]['run_cnt'] >= 10:
        logger.info(Fore.RED + f"Fold {idx_split} FULLY TESTED" + Style.RESET_ALL)
        eval_fold = False


    if eval_fold:
        # <------------------------------------------------------------------------- EVAL VARIABLES
        recommendation_metrics = defaultdict(list)
        prediction_metrics = defaultdict(list)
        api_metrics = defaultdict(list)
        so_metrics = defaultdict(list)
        git_metrics = defaultdict(list)
        misc_metrics = defaultdict(list)
        random_prediction_metrics = defaultdict(list)
        clz_report_lst = defaultdict(list)

        classification_report_lst = []
        log_examples_lst = []
        source_lst = []
        venn_diagram_set = []
        # <------------------------------------------------------------------------- EVAL VARIABLES


        test_tasks_lst = np_tasks_arr[test_index].tolist()

        logger.info("")
        logger.info(Fore.RED + f"Fold {idx_split}" + Style.RESET_ALL)
        logger.info('\n'.join(test_tasks_lst))

        # <------------------------------------------------------------------------- INPUT
        df_train, df_val, df_test, weights = get_train_val_test(
            test_tasks_lst,
            aug=USE_DS_SYNTHETIC,
            undersample=UNDERSAMPLING, 
            undersample_n=N_UNDERSAMPLING
        )
        # <------------------------------------------------------------------------- INPUT

        logger.info('-' * 10)
        logger.info(Fore.RED + 'train'+ Style.RESET_ALL)
        logger.info(str(df_train.category_index.value_counts()))
        logger.info("")

        logger.info(Fore.RED + 'test'+ Style.RESET_ALL)
        logger.info(str(df_test.category_index.value_counts()))
        logger.info("")

        logger.info(Fore.RED + 'weights'+ Style.RESET_ALL)
        logger.info(str(weights))
        logger.info('-' * 10)


        # Encode X_train
        train_encodings = _encode(tokenizer, df_train)
        train_labels = df_train['category_index'].tolist()

        # Encode X_valid
        val_encodings = _encode(tokenizer, df_val)
        val_labels = df_val['category_index'].tolist()


        # https://huggingface.co/transformers/custom_datasets.html
        train_dataset = tf.data.Dataset.from_tensor_slices((
            dict(train_encodings),
            train_labels
        ))

        val_dataset = tf.data.Dataset.from_tensor_slices((
            dict(val_encodings),
            val_labels
        ))


        if model_id == 'distilbert-base-uncased':
            model = TFDistilBertForSequenceClassification.from_pretrained(
                model_id, cache_dir='/home/msarthur/scratch'
            )
        else:
            model = TFBertForSequenceClassification.from_pretrained(
                model_id, cache_dir='/home/msarthur/scratch', local_files_only=True
            )

        # freeze all the parameters
        # for param in model.parameters():
        #   param.requires_grad = False


        optimizer = tf.keras.optimizers.Adam(learning_rate=LR)
        loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)

        METRICS = [
            tf.keras.metrics.SparseCategoricalAccuracy()
        ]

        early_stopper = tf.keras.callbacks.EarlyStopping(
            monitor='val_loss', mode='min', patience=4, 
            verbose=1, restore_best_weights=True
        )

        # https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/ModelCheckpoint
        checkpoint_filepath = '/home/msarthur/scratch/best_synthetic_pyramid_model'

        mc = tf.keras.callbacks.ModelCheckpoint(
            checkpoint_filepath, 
            monitor='val_loss', mode='min', verbose=1, 
            save_best_only=True,
            save_weights_only=True
        )

        model.compile(
            optimizer=optimizer,
            loss=loss_fn,
            metrics=METRICS
        )

        # https://discuss.huggingface.co/t/how-to-dealing-with-data-imbalance/393/3
        # https://wandb.ai/ayush-thakur/huggingface/reports/Early-Stopping-in-HuggingFace-Examples--Vmlldzo0MzE2MTM
        model.fit(
            train_dataset.shuffle(1000).batch(BATCH_SIZE), 
            epochs=EPOCHS, 
            batch_size=BATCH_SIZE,
            class_weight=weights,
            validation_data=val_dataset.shuffle(1000).batch(BATCH_SIZE),
            callbacks=[early_stopper, mc]
        )

        model.load_weights(checkpoint_filepath)

        logger.info("")
        logger.info(Fore.RED + f"Testing model" + Style.RESET_ALL)
        for source in df_test["source"].unique():
            df_source = df_test[df_test["source"] == source]   
            logger.info(source)
            test_model(source, df_source, model, tokenizer, pos_filter=USE_FRAME_FILTERING)

        add_idx_fold_results(idx_split, fold_results)
        if 'venn_diagram_set' not in fold_results:
            fold_results['venn_diagram_set'] = []

        fold_results['venn_diagram_set'] += venn_diagram_set
        fold_results['venn_diagram_set'] = list(set(fold_results['venn_diagram_set']))


        _precision, _recall, _f1score = avg_macro_metric_for(prediction_metrics)

        logger.info("")
        logger.info(Fore.YELLOW + "Model metrics" + Style.RESET_ALL)
        logger.info("precision: " + Fore.RED + "{:.3f}".format(_precision) + Style.RESET_ALL)
        logger.info("recall:    " + Fore.RED + "{:.3f}".format(_recall) + Style.RESET_ALL)
        logger.info("f1-score:  " + Fore.RED + "{:.3f}".format(_f1score) + Style.RESET_ALL)




        log_sources_data = [api_metrics, so_metrics, git_metrics, misc_metrics]
        log_sources_ids = ['api_metrics', 'so_metrics', 'git_metrics', 'misc_metrics']

        for _id, __data in zip(log_sources_ids, log_sources_data):
            _precision, _recall, _f1score = avg_macro_metric_for(__data)

            logger.info("")
            logger.info(Fore.YELLOW + f"{_id}" + Style.RESET_ALL)
            logger.info("precision: " + Fore.RED + "{:.3f}".format(_precision) + Style.RESET_ALL)
            logger.info("recall:    " + Fore.RED + "{:.3f}".format(_recall) + Style.RESET_ALL)
            logger.info("f1-score:  " + Fore.RED + "{:.3f}".format(_f1score) + Style.RESET_ALL)


    idx_split = int(idx_split)
    idx_split += 1
    logger.info(f"next {idx_split}")
#     break
#         if idx_split >= 7:
#             logger.info(f"breaking at {idx_split}")
#             break


[31mFold 0[0m
how can i get the value of text view in recyclerview item?
Hide MarkerView when nothing selected
How to check programmatically whether app is running in debug mode or not?
JSONObject parse dictionary objects
Want to add drawable icons insteadof colorful dots


100%|██████████| 7948/7948 [00:00<00:00, 2294942.05it/s]

0    470
1    235
Name: category_index, dtype: int64
----------
[31mtrain[0m
0    423
1    211
Name: category_index, dtype: int64

[31mtest[0m
0    669
1     66
Name: category_index, dtype: int64

[31mweights[0m
{0: 1.0, 1: 2.0}
----------



All model checkpoint layers were used when initializing TFBertForSequenceClassification.

Some layers of TFBertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch 1/10
The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).
AutoGraph could not transform <bound method Socket.send of <zmq.sugar.socket.Socket object at 0x2afb29ef83d0>> and will run it as-is.
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: module, class, method, function, traceback, frame, or code object was expected, got cython_function_or_method
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: module, class, method, function, traceback, frame, or code object was expected, got cython_function_or_method
The parameter `return_dict` cannot be set in graph mode an

--------------------
Y
[0s] 23 [1s] 8
predicted
[0s] 8 [1s] 23
--------------------
Accuracy: 0.3871
macro_f1: 0.3871
Precision: 0.5054
Recall: 0.5054
F1: 0.3871
[31m6[0m entries logged
https://stackoverflow.com/questions/23844667
--------------------
Y
[0s] 23 [1s] 5
predicted
[0s] 21 [1s] 7
--------------------
Accuracy: 0.7143
macro_f1: 0.5758
Precision: 0.5714
Recall: 0.5913
F1: 0.5758
[31m2[0m entries logged
https://github.com/SundeepK/CompactCalendarView/issues/181
--------------------
Y
[0s] 33 [1s] 3
predicted
[0s] 20 [1s] 16
--------------------
Accuracy: 0.6389
macro_f1: 0.5353
Precision: 0.5938
Recall: 0.8030
F1: 0.5353
[31m3[0m entries logged
https://stackoverflow.com/questions/37096547
--------------------
Y
[0s] 12 [1s] 5
predicted
[0s] 9 [1s] 8
--------------------
Accuracy: 0.4706
macro_f1: 0.4396
Precision: 0.4583
Recall: 0.4500
F1: 0.4396
[31m2[0m entries logged

[33mModel metrics[0m
precision: [31m0.548[0m
recall:    [31m0.562[0m
f1-score:  [31m0.436[

100%|██████████| 7948/7948 [00:00<00:00, 1868837.77it/s]

0    470
1    235
Name: category_index, dtype: int64
----------
[31mtrain[0m
0    423
1    211
Name: category_index, dtype: int64

[31mtest[0m
0    606
1     98
Name: category_index, dtype: int64

[31mweights[0m
{0: 1.0, 1: 2.0}
----------



All model checkpoint layers were used when initializing TFBertForSequenceClassification.

Some layers of TFBertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch 1/10
The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).
The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.
The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).
The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.
The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.

Epoch 00001: val_loss improved from inf to 0.67741, saving model to /home/msarthur/scratch/best_synthetic_pyramid_model
Epoch 2/10
Epoch 00002: val_loss improved from 0.67741 to 0.65933, saving model to /home/msarthur/scratch/best_s

  _warn_prf(average, modifier, msg_start, len(result))


--------------------
Y
[0s] 11 [1s] 5
predicted
[0s] 12 [1s] 4
--------------------
Accuracy: 0.5625
macro_f1: 0.4589
Precision: 0.4583
Recall: 0.4636
F1: 0.4589
[31m1[0m entries logged
https://stackoverflow.com/questions/38980595
--------------------
Y
[0s] 2 [1s] 3
predicted
[0s] 5 [1s] 0
--------------------
Accuracy: 0.4000
macro_f1: 0.2857
Precision: 0.2000
Recall: 0.5000
F1: 0.2857
[31m0[0m entries logged
https://stackoverflow.com/questions/40168601
The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).
The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.
--------------------
Y
[0s] 1 [1s] 3
predicted
[0s] 3 [1s] 1
--------------------
Accuracy: 0.5000
macro_f1: 0.5000
Precision: 0.6667
Recall: 0.6667
F1: 0.5000
[31m1[0m entries logged

[33mModel metri

100%|██████████| 7948/7948 [00:00<00:00, 1683993.14it/s]

0    470
1    235
Name: category_index, dtype: int64
----------
[31mtrain[0m
0    423
1    211
Name: category_index, dtype: int64

[31mtest[0m
0    1178
1     180
Name: category_index, dtype: int64

[31mweights[0m
{0: 1.0, 1: 2.0}
----------



All model checkpoint layers were used when initializing TFBertForSequenceClassification.

Some layers of TFBertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch 1/10
The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).
The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.
The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).
The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.
The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.

Epoch 00001: val_loss improved from inf to 0.71270, saving model to /home/msarthur/scratch/best_synthetic_pyramid_model
Epoch 2/10
Epoch 00002: val_loss improved from 0.71270 to 0.63430, saving model to /home/msarthur/scratch/best_s

macro_f1: 0.4415
Precision: 0.5456
Recall: 0.6083
F1: 0.4415
[31m3[0m entries logged
https://stackoverflow.com/questions/24952513
--------------------
Y
[0s] 23 [1s] 4
predicted
[0s] 18 [1s] 9
--------------------
Accuracy: 0.7407
macro_f1: 0.6454
Precision: 0.6389
Recall: 0.7446
F1: 0.6454
[31m3[0m entries logged
https://dzone.com/articles/rxjava-idiomatic-concurrency-flatmap-vs-parallel
--------------------
Y
[0s] 106 [1s] 11
predicted
[0s] 78 [1s] 39
--------------------
Accuracy: 0.6068
macro_f1: 0.4150
Precision: 0.4679
Recall: 0.4164
F1: 0.4150
[31m2[0m entries logged
https://stackoverflow.com/questions/5233543
--------------------
Y
[0s] 7 [1s] 14
predicted
[0s] 5 [1s] 16
--------------------
Accuracy: 0.8095
macro_f1: 0.7667
Precision: 0.8063
Recall: 0.7500
F1: 0.7667
[31m13[0m entries logged
https://medium.com/mindorks/instrumentation-testing-with-mockwebserver-and-dagger2-56778477f0cf
--------------------
Y
[0s] 69 [1s] 3
predicted
[0s] 54 [1s] 18
--------------------

100%|██████████| 7948/7948 [00:00<00:00, 1762241.80it/s]

0    470
1    235
Name: category_index, dtype: int64
----------
[31mtrain[0m
0    423
1    211
Name: category_index, dtype: int64

[31mtest[0m
0    673
1     97
Name: category_index, dtype: int64

[31mweights[0m
{1: 2.0, 0: 1.0}
----------



All model checkpoint layers were used when initializing TFBertForSequenceClassification.

Some layers of TFBertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch 1/10
The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).
The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.
The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).
The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.
The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.

Epoch 00001: val_loss improved from inf to 0.71560, saving model to /home/msarthur/scratch/best_synthetic_pyramid_model
Epoch 2/10
Epoch 00002: val_loss improved from 0.71560 to 0.66843, saving model to /home/msarthur/scratch/best_s


[31mFold 4[0m
Android: rotate canvas around the center of the screen
TS shows numbers instead of contact names in notifications
No lock screen controls ever
Enums support with Realm?
Sound panning should work for stereo files (and if not, add it to the docs)


  out=out, **kwargs)
  ret = ret.dtype.type(ret / rcount)
100%|██████████| 7948/7948 [00:00<00:00, 2155458.95it/s]

0    470
1    235
Name: category_index, dtype: int64
----------
[31mtrain[0m
0    423
1    211
Name: category_index, dtype: int64

[31mtest[0m
0    235
1     41
Name: category_index, dtype: int64

[31mweights[0m
{0: 1.0, 1: 2.0}
----------



All model checkpoint layers were used when initializing TFBertForSequenceClassification.

Some layers of TFBertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch 1/10
The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).
The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.
The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).
The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.
The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.

Epoch 00001: val_loss improved from inf to 0.65116, saving model to /home/msarthur/scratch/best_synthetic_pyramid_model
Epoch 2/10
Epoch 00002: val_loss improved from 0.65116 to 0.64955, saving model to /home/msarthur/scratch/best_s

100%|██████████| 7948/7948 [00:00<00:00, 1888101.96it/s]

0    470
1    235
Name: category_index, dtype: int64
----------
[31mtrain[0m
0    423
1    211
Name: category_index, dtype: int64

[31mtest[0m
0    752
1    115
Name: category_index, dtype: int64

[31mweights[0m
{1: 2.0, 0: 1.0}
----------



All model checkpoint layers were used when initializing TFBertForSequenceClassification.

Some layers of TFBertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch 1/10
The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).
The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.
The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).
The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.
The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.

Epoch 00001: val_loss improved from inf to 0.65030, saving model to /home/msarthur/scratch/best_synthetic_pyramid_model
Epoch 2/10
Epoch 00002: val_loss improved from 0.65030 to 0.57656, saving model to /home/msarthur/scratch/best_s

predicted
[0s] 14 [1s] 24
--------------------
Accuracy: 0.4211
macro_f1: 0.3780
Precision: 0.5268
Recall: 0.5662
F1: 0.3780
[31m3[0m entries logged
https://developer.android.com/guide/navigation/navigation-swipe-view-2
--------------------
Y
[0s] 16 [1s] 3
predicted
[0s] 5 [1s] 14
--------------------
Accuracy: 0.3158
macro_f1: 0.3081
Precision: 0.4714
Recall: 0.4583
F1: 0.3081
[31m2[0m entries logged
https://stackoverflow.com/questions/36275986
--------------------
Y
[0s] 19 [1s] 5
predicted
[0s] 12 [1s] 12
--------------------
Accuracy: 0.6250
macro_f1: 0.5901
Precision: 0.6250
Recall: 0.6895
F1: 0.5901
[31m4[0m entries logged

[33mModel metrics[0m
precision: [31m0.533[0m
recall:    [31m0.578[0m
f1-score:  [31m0.395[0m

[33mapi_metrics[0m
precision: [31m0.495[0m
recall:    [31m0.495[0m
f1-score:  [31m0.322[0m

[33mso_metrics[0m
precision: [31m0.557[0m
recall:    [31m0.632[0m
f1-score:  [31m0.449[0m

[33mgit_metrics[0m
precision: [31mnan[0m
recall:  

  out=out, **kwargs)
  ret = ret.dtype.type(ret / rcount)
100%|██████████| 7948/7948 [00:00<00:00, 1734189.68it/s]

0    470
1    235
Name: category_index, dtype: int64
----------
[31mtrain[0m
0    423
1    211
Name: category_index, dtype: int64

[31mtest[0m
0    1182
1     169
Name: category_index, dtype: int64

[31mweights[0m
{0: 1.0, 1: 2.0}
----------



All model checkpoint layers were used when initializing TFBertForSequenceClassification.

Some layers of TFBertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch 1/10
The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).
The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.
The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).
The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.
The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.

Epoch 00001: val_loss improved from inf to 0.64676, saving model to /home/msarthur/scratch/best_synthetic_pyramid_model
Epoch 2/10
Epoch 00002: val_loss improved from 0.64676 to 0.61618, saving model to /home/msarthur/scratch/best_s

macro_f1: 0.3703
Precision: 0.4799
Recall: 0.4444
F1: 0.3703
[31m3[0m entries logged
https://github.com/nostra13/Android-Universal-Image-Loader/issues/462
The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).
The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.
--------------------
Y
[0s] 19 [1s] 3
predicted
[0s] 20 [1s] 2
--------------------
Accuracy: 0.7727
macro_f1: 0.4359
Precision: 0.4250
Recall: 0.4474
F1: 0.4359
[31m0[0m entries logged
https://www.i-programmer.info/programming/android/8521-android-adventures-menus-a-the-action-bar.html?start=1
--------------------
Y
[0s] 49 [1s] 5
predicted
[0s] 30 [1s] 24
--------------------
Accuracy: 0.5741
macro_f1: 0.4579
Precision: 0.5292
Recall: 0.5857
F1: 0.4579
[31m3[0m entries logged
https://stackoverflow.c

100%|██████████| 7948/7948 [00:00<00:00, 1666233.23it/s]

0    470
1    235
Name: category_index, dtype: int64
----------
[31mtrain[0m
0    423
1    211
Name: category_index, dtype: int64

[31mtest[0m
0    835
1     86
Name: category_index, dtype: int64

[31mweights[0m
{0: 1.0, 1: 2.0}
----------



All model checkpoint layers were used when initializing TFBertForSequenceClassification.

Some layers of TFBertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch 1/10
The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).
The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.
The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).
The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.
The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.

Epoch 00001: val_loss improved from inf to 0.65799, saving model to /home/msarthur/scratch/best_synthetic_pyramid_model
Epoch 2/10
Epoch 00002: val_loss did not improve from 0.65799
Epoch 3/10
Epoch 00003: val_loss did not improve f

--------------------
Y
[0s] 4 [1s] 7
predicted
[0s] 7 [1s] 4
--------------------
Accuracy: 0.5455
macro_f1: 0.5455
Precision: 0.5893
Recall: 0.5893
F1: 0.5455
[31m3[0m entries logged

[33mModel metrics[0m
precision: [31m0.498[0m
recall:    [31m0.544[0m
f1-score:  [31m0.388[0m

[33mapi_metrics[0m
precision: [31m0.439[0m
recall:    [31m0.492[0m
f1-score:  [31m0.269[0m

[33mso_metrics[0m
precision: [31m0.519[0m
recall:    [31m0.562[0m
f1-score:  [31m0.459[0m

[33mgit_metrics[0m
precision: [31m0.612[0m
recall:    [31m0.708[0m
f1-score:  [31m0.617[0m

[33mmisc_metrics[0m
precision: [31m0.518[0m
recall:    [31m0.533[0m
f1-score:  [31m0.369[0m
next 8

[31mFold 8[0m
SeekTo Position of cutted song not working
Android Gallery with pinch zoom
Wait for 2 async REST calls to result in success or error
how  to set Screenshot frame size


100%|██████████| 7948/7948 [00:00<00:00, 2030227.05it/s]

0    470
1    235
Name: category_index, dtype: int64
----------
[31mtrain[0m
0    423
1    211
Name: category_index, dtype: int64

[31mtest[0m
0    337
1     51
Name: category_index, dtype: int64

[31mweights[0m
{0: 1.0, 1: 2.0}
----------



All model checkpoint layers were used when initializing TFBertForSequenceClassification.

Some layers of TFBertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch 1/10
The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).
The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.
The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).
The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.
The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.

Epoch 00001: val_loss improved from inf to 0.65710, saving model to /home/msarthur/scratch/best_synthetic_pyramid_model
Epoch 2/10
Epoch 00002: val_loss improved from 0.65710 to 0.62825, saving model to /home/msarthur/scratch/best_s

100%|██████████| 7948/7948 [00:00<00:00, 2043794.26it/s]

0    470
1    235
Name: category_index, dtype: int64
----------
[31mtrain[0m
0    423
1    211
Name: category_index, dtype: int64

[31mtest[0m
0    493
1     85
Name: category_index, dtype: int64

[31mweights[0m
{0: 1.0, 1: 2.0}
----------



All model checkpoint layers were used when initializing TFBertForSequenceClassification.

Some layers of TFBertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


Epoch 1/10
The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).
The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.
The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained('name', output_attentions=True)`).
The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.
The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.

Epoch 00001: val_loss improved from inf to 0.67288, saving model to /home/msarthur/scratch/best_synthetic_pyramid_model
Epoch 2/10
Epoch 00002: val_loss improved from 0.67288 to 0.65153, saving model to /home/msarthur/scratch/best_s

F1: 0.4667
[31m1[0m entries logged

[33mModel metrics[0m
precision: [31m0.554[0m
recall:    [31m0.615[0m
f1-score:  [31m0.426[0m

[33mapi_metrics[0m
precision: [31m0.500[0m
recall:    [31m0.519[0m
f1-score:  [31m0.264[0m

[33mso_metrics[0m
precision: [31m0.579[0m
recall:    [31m0.616[0m
f1-score:  [31m0.489[0m

[33mgit_metrics[0m
precision: [31m0.551[0m
recall:    [31m0.732[0m
f1-score:  [31m0.517[0m

[33mmisc_metrics[0m
precision: [31m0.557[0m
recall:    [31m0.639[0m
f1-score:  [31m0.420[0m
next 10


In [24]:
__precision, __recall, __fscore = [], [], []

for key_i, value in fold_results.items():
    if isinstance(value, dict):
        for key_j, __data in value.items():
            if key_j == 'overall':
                logger.info(Fore.YELLOW + f"{key_i}" + Style.RESET_ALL)
                logger.info("precision: " + Fore.RED +
                            "{:.3f}".format(np.mean(__data['precision'])) + Style.RESET_ALL +
                           f" {str([round(x, 2) for x in __data['precision']])}")
                logger.info("recall:    " + Fore.RED +
                            "{:.3f}".format(np.mean(__data['recall'])) + Style.RESET_ALL+
                           f" {str([round(x, 2) for x in __data['recall']])}")
                logger.info("f1-score:  " + 
                            Fore.RED + "{:.3f}".format(np.mean(__data['fscore'])) + Style.RESET_ALL+
                           f" {str([round(x, 2) for x in __data['fscore']])}")
                
                __precision += __data['precision']
                __recall += __data['recall']
                __fscore += __data['fscore']
                
__precision = [x for x in __precision if str(x) != 'nan']
__recall = [x for x in __recall if str(x) != 'nan']
__fscore = [x for x in __fscore if str(x) != 'nan']


logger.info("\n")
logger.info(Fore.RED + "AGGREGATED METRICS" + Style.RESET_ALL)
logger.info("\nprecision: " + Fore.RED + "{:.3f}".format(np.mean(__precision)) + Style.RESET_ALL)
logger.info("recall:    " + Fore.RED + "{:.3f}".format(np.mean(__recall)) + Style.RESET_ALL)
logger.info("f1-score:  " +  Fore.RED + "{:.3f}".format(np.mean(__fscore)) + Style.RESET_ALL)

[33m0[0m
precision: [31m0.548[0m [0.55]
recall:    [31m0.562[0m [0.56]
f1-score:  [31m0.436[0m [0.44]
[33m1[0m
precision: [31m0.473[0m [0.47]
recall:    [31m0.509[0m [0.51]
f1-score:  [31m0.451[0m [0.45]
[33m2[0m
precision: [31m0.571[0m [0.57]
recall:    [31m0.590[0m [0.59]
f1-score:  [31m0.488[0m [0.49]
[33m3[0m
precision: [31m0.501[0m [0.5]
recall:    [31m0.525[0m [0.52]
f1-score:  [31m0.425[0m [0.43]
[33m4[0m
precision: [31m0.547[0m [0.55]
recall:    [31m0.582[0m [0.58]
f1-score:  [31m0.443[0m [0.44]
[33m5[0m
precision: [31m0.533[0m [0.53]
recall:    [31m0.578[0m [0.58]
f1-score:  [31m0.395[0m [0.4]
[33m6[0m
precision: [31m0.477[0m [0.48]
recall:    [31m0.491[0m [0.49]
f1-score:  [31m0.434[0m [0.43]
[33m7[0m
precision: [31m0.498[0m [0.5]
recall:    [31m0.544[0m [0.54]
f1-score:  [31m0.388[0m [0.39]
[33m8[0m
precision: [31m0.528[0m [0.53]
recall:    [31m0.539[0m [0.54]
f1-score:  [31m0.487[0m [0.49]
[33m9[0m
pr

In [25]:
logger.info(Fore.YELLOW + "Caching results" + Style.RESET_ALL)
with open('bert_ds_synthetic_pyramid.json', 'w') as fo:
    json.dump(fold_results, fo, indent=4)

[33mCaching results[0m


In [26]:
fold_results.keys()

dict_keys(['0', 'venn_diagram_set', '1', '2', '3', '4', '5', '6', '7', '8', '9'])

In [27]:
# cnt = 0
# for source in df_test["source"].unique():
#     df_source = df_test[df_test["source"] == source]   
#     logger.info(source)
#     test_model(source, df_source, model, tokenizer, pos_filter=True)
#     cnt += 1
#     if cnt >= 5:
#         break

In [28]:
#@title Metrics report
# logger.info(json.dumps(fold_results, indent=4, sort_keys=True))

In [29]:
# _precision, _recall, _f1score = avg_macro_metric_for(prediction_metrics)

# logger.info("")
# logger.info(Fore.YELLOW + "Model metrics" + Style.RESET_ALL)
# logger.info("precision: " + Fore.RED + "{:.3f}".format(_precision) + Style.RESET_ALL)
# logger.info("recall:    " + Fore.RED + "{:.3f}".format(_recall) + Style.RESET_ALL)
# logger.info("f1-score:  " + Fore.RED + "{:.3f}".format(_f1score) + Style.RESET_ALL)


# _precision, _recall, _f1score = avg_macro_metric_for(api_metrics)

# logger.info("")
# logger.info(Fore.YELLOW + "API metrics" + Style.RESET_ALL)
# logger.info("precision: " + Fore.RED + "{:.3f}".format(_precision) + Style.RESET_ALL)
# logger.info("recall:    " + Fore.RED + "{:.3f}".format(_recall) + Style.RESET_ALL)
# logger.info("f1-score:  " + Fore.RED + "{:.3f}".format(_f1score) + Style.RESET_ALL)

# _precision, _recall, _f1score = avg_macro_metric_for(so_metrics)

# logger.info("")
# logger.info(Fore.YELLOW + "SO metrics" + Style.RESET_ALL)
# logger.info("precision: " + Fore.RED + "{:.3f}".format(_precision) + Style.RESET_ALL)
# logger.info("recall:    " + Fore.RED + "{:.3f}".format(_recall) + Style.RESET_ALL)
# logger.info("f1-score:  " + Fore.RED + "{:.3f}".format(_f1score) + Style.RESET_ALL)

# _precision, _recall, _f1score = avg_macro_metric_for(git_metrics)

# logger.info("")
# logger.info(Fore.YELLOW + "GIT metrics" + Style.RESET_ALL)
# logger.info("precision: " + Fore.RED + "{:.3f}".format(_precision) + Style.RESET_ALL)
# logger.info("recall:    " + Fore.RED + "{:.3f}".format(_recall) + Style.RESET_ALL)
# logger.info("f1-score:  " + Fore.RED + "{:.3f}".format(_f1score) + Style.RESET_ALL)

# _precision, _recall, _f1score = avg_macro_metric_for(misc_metrics)

# logger.info("")
# logger.info(Fore.YELLOW + "MISC metrics" + Style.RESET_ALL)
# logger.info("precision: " + Fore.RED + "{:.3f}".format(_precision) + Style.RESET_ALL)
# logger.info("recall:    " + Fore.RED + "{:.3f}".format(_recall) + Style.RESET_ALL)
# logger.info("f1-score:  " + Fore.RED + "{:.3f}".format(_f1score) + Style.RESET_ALL)

In [30]:
def examples_per_source_type(source_type='misc', n_samples=None):
    _sources = list(set([x[0] for x in log_examples_lst]))

    _template = "[w={}]" + Fore.RED + "[y={}]" + Fore.YELLOW + "[p={:.4f}]" + Style.RESET_ALL + " {}"

    idx = 0
    for s in _sources:
        examples_in_source = []
        if source_type == 'api' and ('docs.oracle' in s or 'developer.android' in s):
            examples_in_source = list(filter(lambda k: k[0] == s, log_examples_lst))
            task_title = examples_in_source[0][1]
            idx += 1
        elif source_type == 'so' and ('stackoverflow.com' in s):
            examples_in_source = list(filter(lambda k: k[0] == s, log_examples_lst))
            task_title = examples_in_source[0][1]            
            idx += 1
        elif source_type == 'git' and ('github.com' in s):
            examples_in_source = list(filter(lambda k: k[0] == s, log_examples_lst))
            task_title = examples_in_source[0][1]
            idx += 1
        elif source_type == 'misc' and 'github.com' not in s and 'docs.oracle' not in s and 'developer.android' not in s and 'stackoverflow.com' not in s:
            examples_in_source = list(filter(lambda k: k[0] == s, log_examples_lst))
            task_title = examples_in_source[0][1]
            idx += 1
        if not examples_in_source:
            continue
        logger.info('')
        logger.info(Fore.RED + f"{task_title}" + Style.RESET_ALL)    
        logger.info(s)
        logger.info('')

        for _, _, pweights, y_predict, y_probs, text in examples_in_source:
            logger.info(_template.format(pweights, y_predict, y_probs, text))
            logger.info('')
        logger.info('-' * 20)
      
        if n_samples and idx >= n_samples:
            break
    

In [31]:
#@title Sample prediction outputs for API sources

logger.info(Fore.RED + "API" + Style.RESET_ALL)
examples_per_source_type(source_type='api', n_samples=8)

[31mAPI[0m

[31mHilt: How to prevent Hilt from picking dependency from a library?[0m
https://developer.android.com/training/dependency-injection/hilt-android

[w=0][31m[y=1][33m[p=0.8345][0m Each Hilt component is responsible for injecting its bindings into the corresponding Android class.

[w=0][31m[y=1][33m[p=0.8296][0m To perform field injection, Hilt needs to know how to provide instances of the necessary dependencies from the corresponding component.

[w=0][31m[y=1][33m[p=0.8238][0m Classes that Hilt injects can have other base classes that also use injection.

[w=0][31m[y=1][33m[p=0.8200][0m Doing manual dependency injection requires you to construct every class and its dependencies by hand, and to use containers to reuse and manage dependencies.

[w=0][31m[y=1][33m[p=0.8174][0m Hilt automatically creates and destroys instances of generated component classes following the lifecycle of the corresponding Android classes.

[w=0][31m[y=1][33m[p=0.8153][0m One wa

In [32]:
#@title Sample prediction outputs for GIT sources

logger.info(Fore.RED + "GIT" + Style.RESET_ALL)
examples_per_source_type(source_type='git', n_samples=8)

[31mGIT[0m

[31mHilt: How to prevent Hilt from picking dependency from a library?[0m
https://github.com/google/dagger/issues/1991

[w=0][31m[y=1][33m[p=0.8452][0m The application gradle module only contains your Application class and defines the root of your dependency injection graph.

[w=0][31m[y=1][33m[p=0.8326][0m In this project all feature/module projects declare dependencies on their api and impl using:

[w=0][31m[y=1][33m[p=0.8284][0m The difference in Hilt is that it can be easier to encounter since modules are aggregated into the components whereas in vanilla Dagger you would have to specify the Dagger module class in the @Component - annotated interface, which would ultimately require you to either expose those Gradle modules via api dependencies or make the app module ( or wherever the component is ) depend directly on the Gradle modules containing the Dagger modules.

[w=1][31m[y=1][33m[p=0.7965][0m Instead of changing implementation to api ( Not Recommende

In [33]:
#@title Sample prediction outputs for SO sources

logger.info(Fore.RED + "SO" + Style.RESET_ALL)
examples_per_source_type(source_type='so', n_samples=8)

[31mSO[0m

[31mAndroid SQLite performance in complex queries[0m
https://stackoverflow.com/questions/4015026

[w=3][31m[y=1][33m[p=0.8258][0m For SELECTs and UPDATEs, indexes can things up, but only if the indexes you create can actually be used by the queries that you need speeding up.

[w=0][31m[y=1][33m[p=0.8201][0m You can have indexes that contain multiple columns -LRB- to assist queries with multiple predicates -RRB-.

[w=0][31m[y=1][33m[p=0.8014][0m LINK -LRB- of table structures -RRB- is also worth considering -LRB- if you haven't already -RRB- simply because it tends to provide the smallest representation of the data in the database ; this is a trade-off, less I/O for more CPU, and one that is usually worthwhile in medium-scale enterprise databases -LRB- the sort I'm most familiar with -RRB-, but I'm afraid I've no idea whether the trade-off works well on small-scale platforms like Android.

[w=3][31m[y=1][33m[p=0.8006][0m If you have more complex queries that c

In [34]:
#@title Sample prediction outputs for MISC sources

logger.info(Fore.RED + "MISC" + Style.RESET_ALL)
examples_per_source_type(source_type='misc', n_samples=8)

[31mMISC[0m

[31mAndroid App Retrieve Data from Server but in a Secure way[0m
https://medium.com/mindorks/how-to-pass-large-data-between-server-and-client-android-securely-345fed551651

[w=1][31m[y=1][33m[p=0.7718][0m When we want to transfer some sensitive data to server ( at runtime ), we generate a passcode ( aka secret key ) using a symmetric encryption ( say AES ).

[w=0][31m[y=1][33m[p=0.7717][0m Hence it gets the large texts of data which was sent by the client securely.This technique is called Hybrid Cryptography.Show me the code!First of all server needs to generates a key pair using RSA.

[w=0][31m[y=1][33m[p=0.7604][0m Using this secret key we encrypt our large texts of data quickly.Now we use the public key to encrypt our secret key.We send this encrypted data and encrypted secret key combination to server ( using any commonly used way to send combination of data, like JSON ) Server receives this combination, extracts encrypted data and encrypted secret key fro

In [35]:
logger.info(Fore.RED + f"{len(fold_results['venn_diagram_set'])} entries VENN SET" + Style.RESET_ALL)
for _t in fold_results['venn_diagram_set']:
    logger.info(_t)

[31m563 entries VENN SET[0m

Stop using battery & phone optimizing apps Most phone and battery optimizing apps close all the background processes to free up RAM and put less load on the phone.
Use the addTrack ( ) method to mix multipe tracks together.
Such instructions are called bindings ...
Defining the Adapter Next, we need to define the adapter to describe the process of converting the Java object to a View ( in the getView method ).
The ActivityResultCallback defines how your app handles the user's response to the permission request.
If the user granted the permission to your app, you can access the private user data.
2 - Create MarkerView
The TextView being in wrap_content this does nothing, as the TextView is exactly the size of the text.
If the device is running Android 6.0 or higher, and your app's target SDK is 23 or higher: The app has to list the permissions in the manifest, and it must request each dangerous permission it needs while the app is running.
Continuous Locat

Normal Permissions When you need to add a new permission, first check this page to see if the permission is considered a PROTECTION_NORMAL permission.
You simply override the onOptionsItemSelected method in your `` child'' activity and check for the id of the back button which is android.R.id.home.
Checking for permissions before performing privileged actions seems fine to me.
Although content providers are meant to make data available to other applications, you may of course have activities in your application that allow the user to query and modify the data managed by your provider.
If the user denies or revokes a permission that a feature needs, gracefully degrade your app so that the user can continue using your app, possibly by disabling the feature that requires the permission.
As you can read LINK:
You can call KeyguardManager methods to find out if the device is locked and use an Activity lifecycle callback ( such as onResume ( ) that's called after unlocking ) to start lock ta

Layouts query the preferred size of their nodes by invoking the prefWidth ( height ) and prefHeight ( width ) methods.
Due to the specifics of Android threading, we can not run network tasks on the same thread as the UI thread.
Here's a bit of code to get EXPLAIN QUERY PLAN results into Android logcat from a running Android app.
Location Interface Implementations for Callbacks The following interfaces should be implemented to get the location update.
The system displays a runtime permission prompt, such as the one shown on the permissions overview page.
Permission Groups Permission Groups avoids spamming the user with a lot of permission requests while allowing the app developer to only request the minimal amount of permissions needed at any point in time.
In order to change preview orientation as the user re-orients the phone, within the surfaceChanged ( ) method of your preview class, first stop the preview with Camera.stopPreview ( ) change the orientation and then start the preview

By default the system places menu items into the overflow area that is only revealed when the user selects the three dot icon or more generally the action overflow icon.
To do so, include the request code in a call to requestPermissions ( ).
The following code snippet illustrates the request and a simple handling of the response:
The fused location provider is one of the location APIs in Google Play services.
In the meantime I think that you will need to annotate your child classes with @JsonTypeInfo and @JsonSubTypes to override the inherited annotations.
This also continuously updates the location on the move.
However, Power Saver mode disables an important feature of the phone i.e. data syncing.
Well we can put some kind of encryption into our requests, for example using RSA we can have a private key, put it into the application, and encrypt the requests.
When the reCAPTCHA API executes the onSuccess ( ) method, the user has successfully completed the CAPTCHA challenge.
However, in 

We can create the basic empty adapter and holder together in ContactsAdapter.java as follows:
Next, you'll need to initiate the permission request and handle the result.
Have you checked RemoteControlClient ?
So no need to implement your async tasks.
Use:
Every android app has its own internal storage only that app can access, you can read from there or write to it.
Instantiating the component We should do all this work within a specialization of the Application class since these instances should be declared only once throughout the entire lifespan of the application:
and in the LinearLayout, the default gravity -LRB- used here -RRB- is ` center'
If you are using Android Studio, or if you are using Gradle from the command line, you can add your own stuff to BuildConfig or otherwise tweak the debug and release build types to help distinguish these situations at runtime.
You can also control concurrency, which means coordinating the execution of several coroutines declaratively with Flow

You don't need to use so many lists, just create a class that will contain all the data of single item, there is no need for buttons, use just text change listener instead.
These permission can then be allowed or denied by the user.
For more complex use-cases or if you want to have your HTTP APIs abstracted as Java classes as part of a larger application look at Retrofit or Feign.
getLastLocation ( GoogleApiClient ) this API should be used when there is no need for continuous access to location from an application.
This is happening because FusedLocationProviderApi deprecated in a recent version of google play services.
Wait for the user to invoke the task or action in your app that requires access to specific private user data.
layout_gravity is the way the TextView will align itself in its parent, in your case in the vertical LinearLayout
Returns the value mapped by name if it exists and is a JSONObject, or null otherwise.
The important thing to keep in mind is that fragments should 

If this isn't the case, see the backwards compatibility section to understand how permissions will behave on your configuration.
To insert child views that represent each page, you need to hook this layout to a FragmentStateAdapter.
Binding the Adapter to the RecyclerView In our activity, we will populate a set of sample users which should be displayed in the RecyclerView.
Important: Normal Permissions must be added to the AndroidManifest:
For example, you could override onTouchEvent ( ) to process touch events directly, and produce a scrolling effect or a `` snapping to page'' animation in response to those touch events.
Installing a module into a component allows its bindings to be accessed as a dependency of other bindings in that component or in any child component below it in the component hierarchy:
Returns the value mapped by name if it exists and is a JSONArray, or null otherwise.
Both of these options will be set to 5 minutes, which is perfect as most connections usually timeo

However, in both of these cases, the zip function can only accept a single Object -LSB- -RSB- parameter since the types of the observables in the list are not known in advance as well as their number.
Add this block of code inside onCreate ( ), above the line where you set viewPager.currentItem:
Open MoviesPagerAdapter.kt and add the following method inside the class:
Design your app's UX so that specific actions in your app are associated with specific runtime permissions.
To add an action button, pass a PendingIntent to the addAction ( ) method.
Runtime Permissions If the permission you need to add isn't listed under the normal permissions, you'll need to deal with `` Runtime Permissions''.
The simplest adapter to use is called an ArrayAdapter because the adapter converts an ArrayList of objects into View items loaded into the ListView container.
Thus, all objects provided in the parent component are provided in the subcomponent too.
First, define the qualifiers that you will use to 