계획 

레이블이 zero 인지 non-zero인지 판단하는 모델이 하나 만들어진다. 

만약 zero라면 0라고 레이블을 예측한다. 

만약 non-zero라면 어떤 레이블인지 판단하는 모델이 예측을 한다. 


KoELECTRA를 사용하고 단순히 모든 레이블을 예측하는 KoELECTRA와 비교해서 얼만큼 성능이 좋은지 살펴본다.  

In [1]:
from transformers import * 
import numpy as np 
import pandas as pd
import matplotlib.pyplot as plt
import random 
import re 
import math 
from tqdm import tqdm 
import sklearn 
import torch 
import torch.nn as nn 
import torch.functional as f
from torch.utils.data import TensorDataset, DataLoader, RandomSampler, SequentialSampler 
from sklearn.model_selection import train_test_split 
import time 
import datetime
import seaborn as sns

PyTorch version 1.7.0+cu110 available.
TensorFlow version 2.5.0 available.


## Create model for determining zero or non zero labels

In [2]:
train = pd.read_csv("open/train.csv") 
test = pd.read_csv("open/test.csv") 

train.shape, test.shape, train['label'].nunique()


((174304, 13), (43576, 12), 46)

In [3]:
zero_train = train[train['label'] == 0] 
nonzero_train = train[train['label'] != 0]

In [4]:
zero_train.shape, nonzero_train.shape

((142571, 13), (31733, 13))

In [5]:
# convert all nonzero_train data sample's labels to 1 
nonzero_train['label'] = 1 

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


In [6]:
train_df = pd.concat([zero_train, nonzero_train], axis = 0)

## Define utility functions

In [7]:
def clean_text(sent):
    sent_clean=re.sub("[^가-힣ㄱ-하-ㅣ]", " ", sent)
    return sent_clean

In [8]:
def split_text(s, overlap = 20, chunk_size = 50): 
    total = [] 
    partial = [] 
    if len(s.split()) // (chunk_size - overlap) > 0:  
        n = len(s.split()) // (chunk_size - overlap) 
    else: 
        n = 1 
    for w in range(n): 
        if w == 0: 
            partial = s.split()[:chunk_size] 
            total.append(" ".join(partial)) 
        else:  
            partial = s.split()[w*(chunk_size - overlap):w*(chunk_size - overlap) + chunk_size]
            total.append(" ".join(partial)) 
    return total

## Preprocess data and tokenize

In [9]:
train_df['요약문_내용'] = train_df['요약문_연구목표'] + train_df['요약문_연구내용'] + train_df['요약문_기대효과'] 
test['요약문_내용'] = test['요약문_연구목표'] + test['요약문_연구내용'] + test['요약문_기대효과']

In [10]:
train_df['요약문_내용'].fillna('NAN',inplace=True) 
test['요약문_내용'].fillna('NAN',inplace=True)

In [11]:
train_df['사업명'].fillna('NAN',inplace=True) 
train_df['사업_부처명'].fillna('NAN',inplace=True) 
train_df['내역사업명'].fillna('NAN',inplace=True) 
train_df['과제명'].fillna('NAN',inplace=True) 
train_df['요약문_한글키워드'].fillna('NAN',inplace=True)

In [12]:
contents = train_df['요약문_내용'].values 
feature1 = train_df['사업명'].values 
feature2 = train_df['사업_부처명'].values 
feature3 = train_df['내역사업명'].values 
feature4 = train_df['과제명'].values 
feature5 = train_df['요약문_한글키워드'].values 
feature6 = train_df['label'].values 

train_data = {'사업명':[],'사업_부처명':[],'내역사업명':[],'과제명':[],'한글키워드':[],'요약문':[],'label':[]} 

for i in tqdm(range(contents.shape[0]), position = 0, leave = True): 
    sample = str(contents[i]) 
    splitted_text = split_text(clean_text(sample)) 
    for t in splitted_text: 
        train_data['요약문'].append(t) 
        train_data['사업명'].append(clean_text(str(feature1[i])))
        train_data['사업_부처명'].append(clean_text(str(feature2[i]))) 
        train_data['내역사업명'].append(clean_text(str(feature3[i]))) 
        train_data['과제명'].append(clean_text(str(feature4[i])))  
        train_data['한글키워드'].append(feature5[i]) # no cleaning for this one
        train_data['label'].append(feature6[i])


100%|██████████| 174304/174304 [01:41<00:00, 1724.88it/s]


In [13]:
train_data = pd.DataFrame(train_data)
train_data.head(5) 

Unnamed: 0,사업명,사업_부처명,내역사업명,과제명,한글키워드,요약문,label
0,이공학학술연구기반구축,교육부,지역대학우수과학자지원사업 년 년,대장암의 내성 표적 인자 발굴 및 반응 예측 유전자 지도 구축...,"대장암,항암제 내성,세포사멸,유전자발굴",최종목표 감수성 표적 유전자를 발굴하고 내성제어 기전을 연구 발굴된 유전자를 통한 ...,0
1,이공학학술연구기반구축,교육부,지역대학우수과학자지원사업 년 년,대장암의 내성 표적 인자 발굴 및 반응 예측 유전자 지도 구축...,"대장암,항암제 내성,세포사멸,유전자발굴",저항성 극복 기전을 규명 환자조직 동물실험 세포실험을 통해 대장암에 특이적으로 조절...,0
2,이공학학술연구기반구축,교육부,지역대학우수과학자지원사업 년 년,대장암의 내성 표적 인자 발굴 및 반응 예측 유전자 지도 구축...,"대장암,항암제 내성,세포사멸,유전자발굴",내성 유발 유전자를 발굴하고 저항성 극복 기전을 규명 추후 기반 항암화학요법 치료효...,0
3,이공학학술연구기반구축,교육부,지역대학우수과학자지원사업 년 년,대장암의 내성 표적 인자 발굴 및 반응 예측 유전자 지도 구축...,"대장암,항암제 내성,세포사멸,유전자발굴",대장암 환자조직을 이용하여 후보 유전자의 발현과 발현 양상 분석 후보 유전자 발현 ...,0
4,이공학학술연구기반구축,교육부,지역대학우수과학자지원사업 년 년,대장암의 내성 표적 인자 발굴 및 반응 예측 유전자 지도 구축...,"대장암,항암제 내성,세포사멸,유전자발굴",표적 가능성 검증 선천적 내성 예측인자 발굴 차년도 를 통한 후천적 내성 표적 후보...,0


In [14]:
## Now we tokenize each data and make sure they all lie within the 512 tokenization range 
## if not check how many have token length greater than 512 

tokenizer = ElectraTokenizer.from_pretrained("monologg/koelectra-base-v3-discriminator") 

def electra_tokenizer(sent, MAX_LEN):  
    encoded_dict = tokenizer.encode_plus(
        text = sent, 
        add_special_tokens = True, # add [CLS] and [SEP]
        pad_to_max_length = False, 
        return_attention_mask = True # constructing attention_masks 
    )  
    
    input_id = encoded_dict['input_ids'] 
    attention_mask = encoded_dict['attention_mask'] # differentiate padding from non padding 
    token_type_id = encoded_dict['token_type_ids'] # differentiate two sentences, not "really" necessary for now    
    
    if len(input_id) > 512: # head + tail methodology 
        input_id = input_id[:512] 
        attention_mask = attention_mask[:512] 
        token_type_id = token_type_id[:512]  
        print("Long Text!! Truncating to the first 512 tokens!")
    elif len(input_id) <= 512: 
        input_id = input_id + [0]*(512 - len(input_id)) 
        attention_mask = attention_mask + [0]*(512 - len(attention_mask))
        token_type_id = token_type_id + [0]*(512 - len(token_type_id))  
        
    return input_id, attention_mask, token_type_id


In [15]:
# define important hyperparameters
BATCH_SIZE = 32
NUM_EPOCHS = 30
VALID_SPLIT = 0.1 
MAX_LEN = 512 # max token size for BERT, ELECTRA


In [16]:
train_data['data'] = train_data['사업명'] + " " + train_data['사업_부처명'] + " " + train_data['내역사업명'] + " " + train_data['과제명'] + " " + train_data['한글키워드'] + " " + train_data['요약문'] 

train_data.head(2)


Unnamed: 0,사업명,사업_부처명,내역사업명,과제명,한글키워드,요약문,label,data
0,이공학학술연구기반구축,교육부,지역대학우수과학자지원사업 년 년,대장암의 내성 표적 인자 발굴 및 반응 예측 유전자 지도 구축...,"대장암,항암제 내성,세포사멸,유전자발굴",최종목표 감수성 표적 유전자를 발굴하고 내성제어 기전을 연구 발굴된 유전자를 통한 ...,0,이공학학술연구기반구축 교육부 지역대학우수과학자지원사업 년 년 대장암의...
1,이공학학술연구기반구축,교육부,지역대학우수과학자지원사업 년 년,대장암의 내성 표적 인자 발굴 및 반응 예측 유전자 지도 구축...,"대장암,항암제 내성,세포사멸,유전자발굴",저항성 극복 기전을 규명 환자조직 동물실험 세포실험을 통해 대장암에 특이적으로 조절...,0,이공학학술연구기반구축 교육부 지역대학우수과학자지원사업 년 년 대장암의...


In [17]:
train_text = train_data['data'].values 
train_labels = train_data['label'].values

In [18]:
N = train_data.shape[0] 

input_ids = np.zeros((N, MAX_LEN),dtype=int)
attention_masks = np.zeros((N, MAX_LEN),dtype=int)
token_type_ids = np.zeros((N, MAX_LEN),dtype=int) 
labels = np.zeros((N),dtype=int)

for i in tqdm(range(N), position=0, leave=True): 
    try:
        cur_str = train_text[i]
        cur_label = train_labels[i]
        input_id, attention_mask, token_type_id = electra_tokenizer(cur_str, MAX_LEN=MAX_LEN) 
        input_ids[i,] = input_id 
        attention_masks[i,] = attention_mask 
        token_type_ids[i,] = token_type_id
        labels[i] = cur_label 
    except Exception as e: 
        print(e)
        print(cur_str)
        pass


  4%|▍         | 65413/1638867 [01:54<43:07, 607.99it/s]  Token indices sequence length is longer than the specified maximum sequence length for this model (619 > 512). Running this sequence through the model will result in indexing errors
  4%|▍         | 65541/1638867 [01:54<43:34, 601.83it/s]

Long Text!! Truncating to the first 512 tokens!


  6%|▌         | 100868/1638867 [02:55<40:33, 632.10it/s] 

Long Text!! Truncating to the first 512 tokens!


  7%|▋         | 113267/1638867 [03:16<40:46, 623.47it/s]  

Long Text!! Truncating to the first 512 tokens!


  9%|▊         | 141721/1638867 [04:05<41:26, 602.15it/s]  

Long Text!! Truncating to the first 512 tokens!
Long Text!! Truncating to the first 512 tokens!
Long Text!! Truncating to the first 512 tokens!


  9%|▉         | 146011/1638867 [04:12<39:20, 632.31it/s]

Long Text!! Truncating to the first 512 tokens!


 12%|█▏        | 191878/1638867 [05:31<57:13, 421.44it/s]

Long Text!! Truncating to the first 512 tokens!
Long Text!! Truncating to the first 512 tokens!
Long Text!! Truncating to the first 512 tokens!


 13%|█▎        | 205819/1638867 [05:56<40:49, 585.10it/s]  

Long Text!! Truncating to the first 512 tokens!
Long Text!! Truncating to the first 512 tokens!


 13%|█▎        | 211854/1638867 [06:06<39:13, 606.24it/s]

Long Text!! Truncating to the first 512 tokens!


 23%|██▎       | 376012/1638867 [10:50<32:50, 640.81it/s]

Long Text!! Truncating to the first 512 tokens!
Long Text!! Truncating to the first 512 tokens!


 24%|██▍       | 390028/1638867 [11:14<32:44, 635.62it/s]

Long Text!! Truncating to the first 512 tokens!


 24%|██▍       | 400880/1638867 [11:33<32:51, 627.99it/s]

Long Text!! Truncating to the first 512 tokens!


 27%|██▋       | 446477/1638867 [12:52<31:29, 631.23it/s]

Long Text!! Truncating to the first 512 tokens!


 28%|██▊       | 465702/1638867 [13:27<31:24, 622.62it/s]

Long Text!! Truncating to the first 512 tokens!


 29%|██▉       | 471453/1638867 [13:37<44:09, 440.58it/s]

Long Text!! Truncating to the first 512 tokens!


 29%|██▉       | 476419/1638867 [13:46<56:39, 342.00it/s]

Long Text!! Truncating to the first 512 tokens!
Long Text!! Truncating to the first 512 tokens!
Long Text!! Truncating to the first 512 tokens!


 29%|██▉       | 481139/1638867 [13:55<29:17, 658.72it/s]IOPub message rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_msg_rate_limit`.

Current values:
NotebookApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
NotebookApp.rate_limit_window=3.0 (secs)

 47%|████▋     | 773839/1638867 [22:27<25:58, 554.89it/s]

Long Text!! Truncating to the first 512 tokens!
Long Text!! Truncating to the first 512 tokens!


 47%|████▋     | 776874/1638867 [22:33<22:16, 644.74it/s]

Long Text!! Truncating to the first 512 tokens!


 53%|█████▎    | 876297/1638867 [25:24<20:05, 632.67it/s]

Long Text!! Truncating to the first 512 tokens!


 54%|█████▍    | 891554/1638867 [25:51<29:56, 415.96it/s]

Long Text!! Truncating to the first 512 tokens!
Long Text!! Truncating to the first 512 tokens!


 56%|█████▋    | 924688/1638867 [26:51<20:48, 571.85it/s]IOPub message rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_msg_rate_limit`.

Current values:
NotebookApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
NotebookApp.rate_limit_window=3.0 (secs)

 73%|███████▎  | 1192080/1638867 [34:46<11:49, 629.53it/s]

Long Text!! Truncating to the first 512 tokens!


 78%|███████▊  | 1273483/1638867 [37:07<09:47, 622.08it/s]

Long Text!! Truncating to the first 512 tokens!


 79%|███████▊  | 1288719/1638867 [37:33<09:25, 619.20it/s]

Long Text!! Truncating to the first 512 tokens!


 82%|████████▏ | 1351974/1638867 [39:23<07:36, 628.82it/s]IOPub message rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_msg_rate_limit`.

Current values:
NotebookApp.iopub_msg_rate_limit=1000.0 (msgs/sec)
NotebookApp.rate_limit_window=3.0 (secs)

 95%|█████████▌| 1557696/1638867 [45:22<02:04, 649.46it/s]

Long Text!! Truncating to the first 512 tokens!


 95%|█████████▌| 1558224/1638867 [45:23<02:55, 458.95it/s]

Long Text!! Truncating to the first 512 tokens!


 96%|█████████▌| 1569307/1638867 [45:42<01:47, 645.65it/s]

Long Text!! Truncating to the first 512 tokens!


100%|██████████| 1638867/1638867 [47:46<00:00, 571.73it/s]


In [19]:
input_ids = torch.tensor(input_ids, dtype=int)
attention_masks = torch.tensor(attention_masks, dtype=int)
token_type_ids = torch.tensor(token_type_ids, dtype=int) 
labels = torch.tensor(labels, dtype=int)

In [20]:
train_inputs, val_inputs, train_labels, val_labels = train_test_split(input_ids, labels, random_state = 42, test_size = VALID_SPLIT) 

train_attention_mask, val_attention_mask, _, _ = train_test_split(attention_masks, labels, random_state = 42, test_size = VALID_SPLIT) 

train_token_ids, val_token_ids, _, _ = train_test_split(token_type_ids, labels, random_state = 42, test_size = VALID_SPLIT) 


train_inputs.shape, train_attention_mask.shape, train_token_ids.shape, train_labels.shape

(torch.Size([1474980, 512]),
 torch.Size([1474980, 512]),
 torch.Size([1474980, 512]),
 torch.Size([1474980]))

In [21]:
val_inputs.shape, val_attention_mask.shape, val_token_ids.shape, val_labels.shape

(torch.Size([163887, 512]),
 torch.Size([163887, 512]),
 torch.Size([163887, 512]),
 torch.Size([163887]))

In [22]:
train_data = TensorDataset(train_inputs, train_attention_mask, train_token_ids, train_labels) 
train_sampler = RandomSampler(train_data) 
train_dataloader = DataLoader(train_data, sampler=train_sampler, batch_size=BATCH_SIZE) 

validation_data = TensorDataset(val_inputs, val_attention_mask, val_token_ids, val_labels) 
validation_sampler = SequentialSampler(validation_data) 
validation_dataloader = DataLoader(validation_data, sampler=validation_sampler, batch_size=BATCH_SIZE)


In [23]:
# check if label is zero
model = ElectraForSequenceClassification.from_pretrained("monologg/koelectra-base-v3-discriminator", num_labels=2)
model.cuda()


Some weights of the model checkpoint at monologg/koelectra-base-v3-discriminator were not used when initializing ElectraForSequenceClassification: ['discriminator_predictions.dense.weight', 'discriminator_predictions.dense.bias', 'discriminator_predictions.dense_prediction.weight', 'discriminator_predictions.dense_prediction.bias']
- This IS expected if you are initializing ElectraForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing ElectraForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of ElectraForSequenceClassification were not initialized from the model checkpoint at monologg/koelectra-base-v3-discriminator and are newly initialized: 

ElectraForSequenceClassification(
  (electra): ElectraModel(
    (embeddings): ElectraEmbeddings(
      (word_embeddings): Embedding(35000, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (token_type_embeddings): Embedding(2, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (encoder): ElectraEncoder(
      (layer): ModuleList(
        (0): ElectraLayer(
          (attention): ElectraAttention(
            (self): ElectraSelfAttention(
              (query): Linear(in_features=768, out_features=768, bias=True)
              (key): Linear(in_features=768, out_features=768, bias=True)
              (value): Linear(in_features=768, out_features=768, bias=True)
              (dropout): Dropout(p=0.1, inplace=False)
            )
            (output): ElectraSelfOutput(
              (dense): Linear(in_features=768, out_features=768, bias=True)
              (LayerNorm): LayerNorm

In [24]:
optimizer = AdamW(model.parameters(), lr=2e-5, eps=1e-8)

epochs = 10 

total_steps = len(train_dataloader) * epochs 

scheduler = get_linear_schedule_with_warmup(optimizer, 
                                            num_warmup_steps = 0,
                                            num_training_steps = total_steps)

def flat_accuracy(preds, labels): 
    pred_flat = np.argmax(preds, axis=1).flatten() 
    labels_flat = labels.flatten() 
    return np.sum(pred_flat == labels_flat) / len(labels_flat) 

def format_time(elapsed):
    # 반올림
    elapsed_rounded = int(round((elapsed)))
    # hh:mm:ss으로 형태 변경
    return str(datetime.timedelta(seconds=elapsed_rounded))


device = torch.device("cuda")


# 재현을 위해 랜덤시드 고정
seed_val = 42
random.seed(seed_val)
np.random.seed(seed_val)
torch.manual_seed(seed_val)
torch.cuda.manual_seed_all(seed_val)

# 그래디언트 초기화
model.zero_grad()

# 에폭만큼 반복
for epoch_i in range(0, epochs):
    
    # ========================================
    #               Training
    # ========================================
    
    print("")
    print('======== Epoch {:} / {:} ========'.format(epoch_i + 1, epochs))
    print('Training...')

    # 시작 시간 설정
    t0 = time.time()

    # 로스 초기화
    total_loss = 0

    # 훈련모드로 변경
    model.train()
        
    # 데이터로더에서 배치만큼 반복하여 가져옴
    for step, batch in enumerate(train_dataloader):
        # 경과 정보 표시
        if step % 20 == 0 and not step == 0:
            elapsed = format_time(time.time() - t0)
            print('  Batch {:>5,}  of  {:>5,}.    Elapsed: {:}.'.format(step, len(train_dataloader), elapsed))
            print('  current average loss = {}'.format(total_loss / step))

        # 배치를 GPU에 넣음
        batch = tuple(t.to(device) for t in batch)
        
        # 배치에서 데이터 추출
        b_input_ids, b_input_mask, b_token_type_ids, b_labels = batch

        # Forward 수행                
        outputs = model(b_input_ids, 
                        token_type_ids=b_token_type_ids, 
                        attention_mask=b_input_mask, 
                        labels=b_labels)
        
        # 로스 구함
        loss = outputs[0]

        # 총 로스 계산
        total_loss += loss.item()

        # Backward 수행으로 그래디언트 계산
        loss.backward()

        # 그래디언트 클리핑
        torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)

        # 그래디언트를 통해 가중치 파라미터 업데이트
        optimizer.step()

        # 스케줄러로 학습률 감소
        scheduler.step()

        # 그래디언트 초기화
        model.zero_grad()

    # 평균 로스 계산
    avg_train_loss = total_loss / len(train_dataloader)            

    print("")
    print("  Average training loss: {}".format(avg_train_loss))
    print("  Training epoch took: {:}".format(format_time(time.time() - t0)))
        
    # ========================================
    #               Validation
    # ========================================

    print("")
    print("Running Validation...")

    #시작 시간 설정
    t0 = time.time()

    # 평가모드로 변경
    model.eval()

    # 변수 초기화
    eval_loss, eval_accuracy = 0, 0
    nb_eval_steps, nb_eval_examples = 0, 0

    # 데이터로더에서 배치만큼 반복하여 가져옴
    for batch in validation_dataloader:
        # 배치를 GPU에 넣음
        batch = tuple(t.to(device) for t in batch)
        
        # 배치에서 데이터 추출
        b_input_ids, b_input_mask, b_token_type_ids, b_labels = batch
        
        # 그래디언트 계산 안함
        with torch.no_grad():     
            # Forward 수행
            outputs = model(b_input_ids, 
                            token_type_ids=b_token_type_ids, 
                            attention_mask=b_input_mask, 
                            labels = b_labels)
    
        
        loss = outputs[0] 
        logits = outputs[1] 
        
        # 로스 구함 
        eval_loss += loss.item() 
        
        # CPU로 데이터 이동
        logits = logits.detach().cpu().numpy()
        label_ids = b_labels.to('cpu').numpy()
        
        # 출력 로짓과 라벨을 비교하여 정확도 계산
        tmp_eval_accuracy = flat_accuracy(logits, label_ids)
        eval_accuracy += tmp_eval_accuracy
        nb_eval_steps += 1
    
    avg_val_loss = eval_loss / len(validation_dataloader)            
    print("  Average validation loss: {}".format(avg_val_loss))
    print("  Accuracy: {}".format(eval_accuracy/nb_eval_steps))
    print("  Validation took: {:}".format(format_time(time.time() - t0)))
    
    torch.save(model.state_dict(), "ELECTRA_ZERO_MODEL_" + str(epoch_i + 1)) 
print("")
print("Training complete!")



Training...
  Batch    20  of  46,094.    Elapsed: 0:00:19.
  current average loss = 0.5253074169158936
  Batch    40  of  46,094.    Elapsed: 0:00:39.
  current average loss = 0.5173340976238251
  Batch    60  of  46,094.    Elapsed: 0:00:58.
  current average loss = 0.4975190247098605
  Batch    80  of  46,094.    Elapsed: 0:01:18.
  current average loss = 0.48271819911897185
  Batch   100  of  46,094.    Elapsed: 0:01:37.
  current average loss = 0.46367808103561403
  Batch   120  of  46,094.    Elapsed: 0:01:57.
  current average loss = 0.4480092347910007
  Batch   140  of  46,094.    Elapsed: 0:02:16.
  current average loss = 0.4409704953432083
  Batch   160  of  46,094.    Elapsed: 0:02:35.
  current average loss = 0.42653598710894586
  Batch   180  of  46,094.    Elapsed: 0:02:55.
  current average loss = 0.4130048093696435
  Batch   200  of  46,094.    Elapsed: 0:03:14.
  current average loss = 0.40458577204495666
  Batch   220  of  46,094.    Elapsed: 0:03:34.
  current avera

  Batch 1,800  of  46,094.    Elapsed: 0:29:21.
  current average loss = 0.260937618802612
  Batch 1,820  of  46,094.    Elapsed: 0:29:41.
  current average loss = 0.26082038300439386
  Batch 1,840  of  46,094.    Elapsed: 0:30:00.
  current average loss = 0.2601363322444503
  Batch 1,860  of  46,094.    Elapsed: 0:30:20.
  current average loss = 0.2594171258440662
  Batch 1,880  of  46,094.    Elapsed: 0:30:39.
  current average loss = 0.25852836796399603
  Batch 1,900  of  46,094.    Elapsed: 0:30:59.
  current average loss = 0.25769283980719354
  Batch 1,920  of  46,094.    Elapsed: 0:31:19.
  current average loss = 0.2571763135473399
  Batch 1,940  of  46,094.    Elapsed: 0:31:38.
  current average loss = 0.2565373414666536
  Batch 1,960  of  46,094.    Elapsed: 0:31:59.
  current average loss = 0.2560250788565953
  Batch 1,980  of  46,094.    Elapsed: 0:32:18.
  current average loss = 0.25570937190204857
  Batch 2,000  of  46,094.    Elapsed: 0:32:38.
  current average loss = 0.25

  Batch 3,580  of  46,094.    Elapsed: 0:58:22.
  current average loss = 0.23172058185572897
  Batch 3,600  of  46,094.    Elapsed: 0:58:41.
  current average loss = 0.2314276478356785
  Batch 3,620  of  46,094.    Elapsed: 0:59:01.
  current average loss = 0.23119373669981627
  Batch 3,640  of  46,094.    Elapsed: 0:59:20.
  current average loss = 0.2310831024413826
  Batch 3,660  of  46,094.    Elapsed: 0:59:40.
  current average loss = 0.23108487712229536
  Batch 3,680  of  46,094.    Elapsed: 0:59:59.
  current average loss = 0.2307760171511251
  Batch 3,700  of  46,094.    Elapsed: 1:00:19.
  current average loss = 0.2307320638997732
  Batch 3,720  of  46,094.    Elapsed: 1:00:38.
  current average loss = 0.23056110459409895
  Batch 3,740  of  46,094.    Elapsed: 1:00:58.
  current average loss = 0.230335403672036
  Batch 3,760  of  46,094.    Elapsed: 1:01:18.
  current average loss = 0.2300824071290566
  Batch 3,780  of  46,094.    Elapsed: 1:01:37.
  current average loss = 0.22

  Batch 5,360  of  46,094.    Elapsed: 1:27:20.
  current average loss = 0.21603943056621547
  Batch 5,380  of  46,094.    Elapsed: 1:27:39.
  current average loss = 0.21584740003508596
  Batch 5,400  of  46,094.    Elapsed: 1:27:59.
  current average loss = 0.21578729584675144
  Batch 5,420  of  46,094.    Elapsed: 1:28:18.
  current average loss = 0.21566373441863215
  Batch 5,440  of  46,094.    Elapsed: 1:28:38.
  current average loss = 0.21541275619494948
  Batch 5,460  of  46,094.    Elapsed: 1:28:57.
  current average loss = 0.21522417515993883
  Batch 5,480  of  46,094.    Elapsed: 1:29:17.
  current average loss = 0.21509580162360611
  Batch 5,500  of  46,094.    Elapsed: 1:29:36.
  current average loss = 0.21509487054226073
  Batch 5,520  of  46,094.    Elapsed: 1:29:56.
  current average loss = 0.21500188480131327
  Batch 5,540  of  46,094.    Elapsed: 1:30:16.
  current average loss = 0.21490083777342356
  Batch 5,560  of  46,094.    Elapsed: 1:30:35.
  current average loss

  Batch 7,140  of  46,094.    Elapsed: 1:56:21.
  current average loss = 0.20458111937643036
  Batch 7,160  of  46,094.    Elapsed: 1:56:41.
  current average loss = 0.20449233052344193
  Batch 7,180  of  46,094.    Elapsed: 1:57:01.
  current average loss = 0.20431595980098943
  Batch 7,200  of  46,094.    Elapsed: 1:57:20.
  current average loss = 0.2042221578083829
  Batch 7,220  of  46,094.    Elapsed: 1:57:40.
  current average loss = 0.2042076225805295
  Batch 7,240  of  46,094.    Elapsed: 1:58:00.
  current average loss = 0.20416844377515436
  Batch 7,260  of  46,094.    Elapsed: 1:58:19.
  current average loss = 0.20401002054767217
  Batch 7,280  of  46,094.    Elapsed: 1:58:39.
  current average loss = 0.2039089649212086
  Batch 7,300  of  46,094.    Elapsed: 1:58:58.
  current average loss = 0.20377173934820783
  Batch 7,320  of  46,094.    Elapsed: 1:59:18.
  current average loss = 0.2036259635176021
  Batch 7,340  of  46,094.    Elapsed: 1:59:37.
  current average loss = 0

  Batch 8,920  of  46,094.    Elapsed: 2:25:23.
  current average loss = 0.19475046516053646
  Batch 8,940  of  46,094.    Elapsed: 2:25:43.
  current average loss = 0.1946692486033177
  Batch 8,960  of  46,094.    Elapsed: 2:26:02.
  current average loss = 0.19458929059432453
  Batch 8,980  of  46,094.    Elapsed: 2:26:22.
  current average loss = 0.19445918303632276
  Batch 9,000  of  46,094.    Elapsed: 2:26:41.
  current average loss = 0.19430433496843197
  Batch 9,020  of  46,094.    Elapsed: 2:27:00.
  current average loss = 0.19432204397335368
  Batch 9,040  of  46,094.    Elapsed: 2:27:20.
  current average loss = 0.19425391191956629
  Batch 9,060  of  46,094.    Elapsed: 2:27:39.
  current average loss = 0.19412459730145995
  Batch 9,080  of  46,094.    Elapsed: 2:27:59.
  current average loss = 0.19407804644008123
  Batch 9,100  of  46,094.    Elapsed: 2:28:18.
  current average loss = 0.19404721949771456
  Batch 9,120  of  46,094.    Elapsed: 2:28:38.
  current average loss 

  Batch 10,680  of  46,094.    Elapsed: 2:54:01.
  current average loss = 0.18712652213302483
  Batch 10,700  of  46,094.    Elapsed: 2:54:21.
  current average loss = 0.18705909194466092
  Batch 10,720  of  46,094.    Elapsed: 2:54:40.
  current average loss = 0.18696996770030733
  Batch 10,740  of  46,094.    Elapsed: 2:55:00.
  current average loss = 0.18691085102147156
  Batch 10,760  of  46,094.    Elapsed: 2:55:19.
  current average loss = 0.18684613099899472
  Batch 10,780  of  46,094.    Elapsed: 2:55:39.
  current average loss = 0.1867756642714611
  Batch 10,800  of  46,094.    Elapsed: 2:55:58.
  current average loss = 0.18664498800125093
  Batch 10,820  of  46,094.    Elapsed: 2:56:18.
  current average loss = 0.18651371871002015
  Batch 10,840  of  46,094.    Elapsed: 2:56:37.
  current average loss = 0.18645867569158558
  Batch 10,860  of  46,094.    Elapsed: 2:56:56.
  current average loss = 0.18634135332067872
  Batch 10,880  of  46,094.    Elapsed: 2:57:16.
  current av

  Batch 12,440  of  46,094.    Elapsed: 3:22:39.
  current average loss = 0.17990088473346272
  Batch 12,460  of  46,094.    Elapsed: 3:22:58.
  current average loss = 0.1798222375457193
  Batch 12,480  of  46,094.    Elapsed: 3:23:18.
  current average loss = 0.17971026721842906
  Batch 12,500  of  46,094.    Elapsed: 3:23:37.
  current average loss = 0.17964564169248565
  Batch 12,520  of  46,094.    Elapsed: 3:23:57.
  current average loss = 0.17959061860344552
  Batch 12,540  of  46,094.    Elapsed: 3:24:16.
  current average loss = 0.17953686180449888
  Batch 12,560  of  46,094.    Elapsed: 3:24:36.
  current average loss = 0.1794662518763231
  Batch 12,580  of  46,094.    Elapsed: 3:24:55.
  current average loss = 0.1793711286930061
  Batch 12,600  of  46,094.    Elapsed: 3:25:15.
  current average loss = 0.17933393071972883
  Batch 12,620  of  46,094.    Elapsed: 3:25:34.
  current average loss = 0.1792657421303141
  Batch 12,640  of  46,094.    Elapsed: 3:25:54.
  current avera

  Batch 14,200  of  46,094.    Elapsed: 3:51:16.
  current average loss = 0.1735750619965953
  Batch 14,220  of  46,094.    Elapsed: 3:51:35.
  current average loss = 0.17349256354273768
  Batch 14,240  of  46,094.    Elapsed: 3:51:55.
  current average loss = 0.1734742240103719
  Batch 14,260  of  46,094.    Elapsed: 3:52:15.
  current average loss = 0.1733761899151544
  Batch 14,280  of  46,094.    Elapsed: 3:52:34.
  current average loss = 0.17328571609920218
  Batch 14,300  of  46,094.    Elapsed: 3:52:54.
  current average loss = 0.17320332151510381
  Batch 14,320  of  46,094.    Elapsed: 3:53:13.
  current average loss = 0.17312850039463695
  Batch 14,340  of  46,094.    Elapsed: 3:53:33.
  current average loss = 0.17305635099350883
  Batch 14,360  of  46,094.    Elapsed: 3:53:52.
  current average loss = 0.1729721783437088
  Batch 14,380  of  46,094.    Elapsed: 3:54:12.
  current average loss = 0.17288797794864222
  Batch 14,400  of  46,094.    Elapsed: 3:54:31.
  current avera

  Batch 15,960  of  46,094.    Elapsed: 4:19:53.
  current average loss = 0.16719606567644235
  Batch 15,980  of  46,094.    Elapsed: 4:20:13.
  current average loss = 0.16714494032912056
  Batch 16,000  of  46,094.    Elapsed: 4:20:32.
  current average loss = 0.16710462231854034
  Batch 16,020  of  46,094.    Elapsed: 4:20:52.
  current average loss = 0.16701279897686064
  Batch 16,040  of  46,094.    Elapsed: 4:21:11.
  current average loss = 0.16690570968349855
  Batch 16,060  of  46,094.    Elapsed: 4:21:31.
  current average loss = 0.16686714445195852
  Batch 16,080  of  46,094.    Elapsed: 4:21:50.
  current average loss = 0.1667689269878444
  Batch 16,100  of  46,094.    Elapsed: 4:22:10.
  current average loss = 0.1666951384520453
  Batch 16,120  of  46,094.    Elapsed: 4:22:29.
  current average loss = 0.16666663381279442
  Batch 16,140  of  46,094.    Elapsed: 4:22:49.
  current average loss = 0.1665924847831279
  Batch 16,160  of  46,094.    Elapsed: 4:23:08.
  current aver

KeyboardInterrupt: 