# Overview
Word Embedding은 구조적으로 dictionary(bag of words) 방식의 limited Vector Space에 한계가 있다. 즉, 알려진 단어 위주의 학습. 
본 Jupyter Notebook은 Log file 등의 layout 중심의 Document classification을 위해서는 Character-Level의 분류기를 설계할 필요가 있다 판단하여 관련 지식 및 논문을 정리해본다. 

In [1]:
import tensorflow as tf

  from ._conv import register_converters as _register_converters


### 참고 논문
*'Character-level Convolutional Networks for Text Classification', Xiang Zhang et al*

### 논문 中 2.2 Character quantization
...Our models accept a sequence of encoded characters as input. The encoding is done by prescribing an alphabet of size m for the input language, and then quantize each character using 1-of-m encoding (or "one-hot" encoding). Then, the sequence of characters is transformed to a sequence of such m sized vectors with fixed length L0. Any character exceeding length L0 is ignored, and <span style="color:blue">any characters that are not in the alphabet including blank characters are quantized as all-zero vectors. </span> 

- 공백(Blank)은 0이 아닌 고유의 Vector 값을 추가 부여
- L0 size를 분류대상 문서의 일부 threshold로 설정 (100 lines but <span style="color:red">how can we count the characters in one line)</span>

...The character quantization order is backward so that the latest reading on charaters is always placed near the begin of the output, making it easy for fully connected layers to associate weights with the latest reading.

The alphabet used in all of our models consists of 70 characters, <span style="color:green">==> **71** characters</span>

- including 26 english letters : **abcdefghijklmnopqrstuvwxyz**
- 10 digits : **0123456789**
- 33 other characters and the new line character : **<span>-,;.!?:’’’/\|_@#$%ˆ&*˜‘+-=<>()[]{}</span>**
- <span style="color:green">(Customize) add character : **공백(Black)**</span>

In [2]:
# add blank at the end of string variable
alphabet = "abcdefghijklmnopqrstuvwxyz0123456789-,;.!?:'\"/\\|_@#$%^&*~`+-=<>()[]{} "

## Dataset

Dataset은 산문(Report), 규격화된 표(Table) data에 Label 정보명을 File 명으로 하는 File Dataset이다.

예를 들어, 

- Filename : report01.txt
- Contents : 
            ....Studying examples of poems using various poetic devices helps such as context create an understanding of how those poetry terms work within different types of poetry.  For instance examples of poems using onomatopoeia can illustrate how sounds can be represented in poems.  Likewise, examples of poems using alliteration can shed light on how alliteration affects the rhythm of a poem.  Many poems can be an example of context, but sometimes good examples are hard to find.  You'll find relevant, concise poetry examples here. 
            
- Filename : table01.txt
- Contents :

                 ==============================================================================
                  ID              NAME                AGE                  PHONE
                 ==============================================================================
                  machine01       John                28                   089-7835-1945
                  k-meaning       Anderson            40                   1-333-5321-3542
                 ....
                 
두 데이터의 Bag of Words 차이는 거의 없다고 할 때, 문서 구조적인 차이만 보이고 있다. Character Level의 분류가 잘 작동하는지가 본 Notebook의 실험 주요 목표이다.

### 논문 中 2.3 Model Design
...We designed 2 ConvNets - one large and one samll. They are both 9 layers deep with 6 convolutional layers and 3 fully-connected layers. 

![](imgs/model.jpg)

The input have number of features equal to 70 due to our character quantization method, and the input feature length is 1014. It seems that 1014 characters could already capture most of the texts of interest. We also insert 2 dropout [10] modules in between the 3 fully-connected layers to regularize. They have dropout probability of 0.5. **Table 1** lists the configurations for convolutional layers, and **table 2** lists the configurations for fully-connected (linear) layers

In [39]:
probability = 0.5

def do_dropout(x):
    return tf.nn.dropout(x, probability)

### Table 1 
Convolutional layers used in our experiments. The convolutional layers have stride 1 and pooling layers are all non-overlapping ones, so we omit the description of their strides.

![](imgs/table1.jpg)

In [144]:
def table_1(x_input):
    
    # 71 X 1014 size of x_input

    #Layer 1
    conv1 = tf.layers.conv1d(inputs=x_input, filters=256, strides=1, kernel_size=7, activation=tf.nn.relu)
    conv1 = tf.layers.max_pooling1d(inputs=conv1, pool_size=3, strides=1)

    #Layer 2
    conv2 = tf.layers.conv1d(inputs=conv1, filters=256, strides=1, kernel_size=7, activation=tf.nn.relu)
    conv2 = tf.layers.max_pooling1d(inputs=conv2, pool_size=3, strides=1)
    
    #Layer 3
    conv3 = tf.layers.conv1d(inputs=conv2, filters=256, strides=1, kernel_size=3, activation=tf.nn.relu)
    
    #Layer 4
    conv4 = tf.layers.conv1d(inputs=conv3, filters=256, strides=1, kernel_size=3, activation=tf.nn.relu)
    
    #Layer 5
    conv5 = tf.layers.conv1d(inputs=conv4, filters=256, strides=1, kernel_size=3, activation=tf.nn.relu)
    
    #Layer 6
    conv6 = tf.layers.conv1d(inputs=conv5, filters=256, strides=1, kernel_size=3, activation=tf.nn.relu)
    conv6 = tf.layers.max_pooling1d(inputs=conv6, pool_size=3, strides=1)

    return conv6

### Table 2


![](imgs/table2.jpg)

In [176]:
def table_2(x_input):
    
    x = tf.reshape(x_input, [-1, 44*256])
    
    fc1 = tf.layers.flatten(x_input)
    fc1 = tf.layers.dense(fc1, 1024)
    fc1 = do_dropout(fc1)
    
    fc2 = tf.layers.flatten(fc1)
    fc2 = tf.layers.dense(fc2, 1024)
    fc2 = do_dropout(fc2)
    
    fc3 = tf.layers.flatten(fc2)
    fc3 = tf.layers.dense(fc3, 10)
    output = do_dropout(fc3)
    
    return output

In [197]:
X = tf.placeholder(np.float32, [None, 55000, 784])

In [198]:
Table1 = table_1(X)
Table2 = table_2(Table1)

In [199]:
Table2

<tf.Tensor 'dropout_21/mul:0' shape=(?, 10) dtype=float32>

In [209]:
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('./mnist/')

Extracting ./mnist/train-images-idx3-ubyte.gz
Extracting ./mnist/train-labels-idx1-ubyte.gz
Extracting ./mnist/t10k-images-idx3-ubyte.gz
Extracting ./mnist/t10k-labels-idx1-ubyte.gz


In [180]:
x = mnist.images.shape

(55000, 784)

In [181]:
y = mnist.labels.shape

(55000,)

In [200]:
loss = tf.reduce_mean(Table2)

In [202]:
train = tf.train.AdamOptimizer(learning_rate=0.002).minimize(loss)

In [204]:
sess = tf.Session()

In [210]:
# Training Parameters
learning_rate = 0.001
num_steps = 2000
batch_size = 128

# Network Parameters
num_input = 784 # MNIST data input (img shape: 28*28)
num_classes = 10 # MNIST total classes (0-9 digits)
dropout = 0.25 # Dropout, probability to drop a unit


# Create the neural network
def conv_net(x_dict, n_classes, dropout, reuse, is_training):
    # Define a scope for reusing the variables
    with tf.variable_scope('ConvNet', reuse=reuse):
        # TF Estimator input is a dict, in case of multiple inputs
        x = x_dict['images']

        # MNIST data input is a 1-D vector of 784 features (28*28 pixels)
        # Reshape to match picture format [Height x Width x Channel]
        # Tensor input become 4-D: [Batch Size, Height, Width, Channel]
        x = tf.reshape(x, shape=[-1, 28, 28, 1])

        # Convolution Layer with 32 filters and a kernel size of 5
        conv1 = tf.layers.conv2d(x, 32, 5, activation=tf.nn.relu)
        # Max Pooling (down-sampling) with strides of 2 and kernel size of 2
        conv1 = tf.layers.max_pooling2d(conv1, 2, 2)

        # Convolution Layer with 64 filters and a kernel size of 3
        conv2 = tf.layers.conv2d(conv1, 64, 3, activation=tf.nn.relu)
        # Max Pooling (down-sampling) with strides of 2 and kernel size of 2
        conv2 = tf.layers.max_pooling2d(conv2, 2, 2)

        # Flatten the data to a 1-D vector for the fully connected layer
        fc1 = tf.contrib.layers.flatten(conv2)

        # Fully connected layer (in tf contrib folder for now)
        fc1 = tf.layers.dense(fc1, 1024)
        # Apply Dropout (if is_training is False, dropout is not applied)
        fc1 = tf.layers.dropout(fc1, rate=dropout, training=is_training)

        # Output layer, class prediction
        out = tf.layers.dense(fc1, n_classes)

    return out


# Define the model function (following TF Estimator Template)
def model_fn(features, labels, mode):
    # Build the neural network
    # Because Dropout have different behavior at training and prediction time, we
    # need to create 2 distinct computation graphs that still share the same weights.
    logits_train = conv_net(features, num_classes, dropout, reuse=False,
                            is_training=True)
    logits_test = conv_net(features, num_classes, dropout, reuse=True,
                           is_training=False)

    # Predictions
    pred_classes = tf.argmax(logits_test, axis=1)
    pred_probas = tf.nn.softmax(logits_test)

    # If prediction mode, early return
    if mode == tf.estimator.ModeKeys.PREDICT:
        return tf.estimator.EstimatorSpec(mode, predictions=pred_classes)

        # Define loss and optimizer
    loss_op = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(
        logits=logits_train, labels=tf.cast(labels, dtype=tf.int32)))
    optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
    train_op = optimizer.minimize(loss_op,
                                  global_step=tf.train.get_global_step())

    # Evaluate the accuracy of the model
    acc_op = tf.metrics.accuracy(labels=labels, predictions=pred_classes)

    # TF Estimators requires to return a EstimatorSpec, that specify
    # the different ops for training, evaluating, ...
    estim_specs = tf.estimator.EstimatorSpec(
        mode=mode,
        predictions=pred_classes,
        loss=loss_op,
        train_op=train_op,
        eval_metric_ops={'accuracy': acc_op})

    return estim_specs

# Build the Estimator
model = tf.estimator.Estimator(model_fn)

# Define the input function for training
input_fn = tf.estimator.inputs.numpy_input_fn(
    x={'images': mnist.train.images}, y=mnist.train.labels,
    batch_size=batch_size, num_epochs=None, shuffle=True)
# Train the Model
model.train(input_fn, steps=num_steps)

# Evaluate the Model
# Define the input function for evaluating
input_fn = tf.estimator.inputs.numpy_input_fn(
    x={'images': mnist.test.images}, y=mnist.test.labels,
    batch_size=batch_size, shuffle=False)
# Use the Estimator 'evaluate' method
e = model.evaluate(input_fn)

print("Testing Accuracy:", e['accuracy'])

INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_save_checkpoints_steps': None, '_evaluation_master': '', '_save_summary_steps': 100, '_keep_checkpoint_every_n_hours': 10000, '_session_config': None, '_task_type': 'worker', '_save_checkpoints_secs': 600, '_global_id_in_cluster': 0, '_num_ps_replicas': 0, '_num_worker_replicas': 1, '_train_distribute': None, '_log_step_count_steps': 100, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7fd2800087b8>, '_service': None, '_master': '', '_model_dir': '/tmp/tmptvdy98yr', '_keep_checkpoint_max': 5, '_task_id': 0, '_tf_random_seed': None, '_is_chief': True}
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 1 into /tmp/tmptvdy98yr/model.ckpt.
INFO:tensorflow:loss = 2.313