<a href="https://colab.research.google.com/github/maruhachi/work-colaboratory/blob/master/RNN-suburi.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

RNNを用いたテキスト生成


In [0]:
!git clone https://github.com/oreilly-japan/deep-learning-with-keras-ja.git

fatal: could not create work tree dir 'deep-learning-with-keras-ja': Operation not supported


In [0]:
from google.colab import drive
drive.mount('/gdrive')
%cd /gdrive
!ls -la "/gdrive/My Drive/develop/twitter-2019-12-14-a38ba08b715a79675a4a7b065962453045a5501c5174e970eea2aa16acfc1ea7/tweet.js"

Drive already mounted at /gdrive; to attempt to forcibly remount, call drive.mount("/gdrive", force_remount=True).
/gdrive
-rw------- 1 root root 1607640 Dec 14 05:32 '/gdrive/My Drive/develop/twitter-2019-12-14-a38ba08b715a79675a4a7b065962453045a5501c5174e970eea2aa16acfc1ea7/tweet.js'


In [0]:
!pip install -r /content/deep-learning-with-keras-ja/ch06/requirements.txt

Collecting keras==2.1.6
[?25l  Downloading https://files.pythonhosted.org/packages/54/e8/eaff7a09349ae9bd40d3ebaf028b49f5e2392c771f294910f75bb608b241/Keras-2.1.6-py2.py3-none-any.whl (339kB)
[K     |████████████████████████████████| 348kB 9.7MB/s 
[?25hCollecting tensorflow==1.8.0
[?25l  Downloading https://files.pythonhosted.org/packages/22/c6/d08f7c549330c2acc1b18b5c1f0f8d9d2af92f54d56861f331f372731671/tensorflow-1.8.0-cp36-cp36m-manylinux1_x86_64.whl (49.1MB)
[K     |████████████████████████████████| 49.1MB 48kB/s 
[?25hCollecting h5py==2.7.1
[?25l  Downloading https://files.pythonhosted.org/packages/f2/b8/a63fcc840bba5c76e453dd712dbca63178a264c8990e0086b72965d4e954/h5py-2.7.1-cp36-cp36m-manylinux1_x86_64.whl (5.4MB)
[K     |████████████████████████████████| 5.4MB 56.0MB/s 
Collecting matplotlib==2.1.1
[?25l  Downloading https://files.pythonhosted.org/packages/34/50/d1649dafaecc91e360b1ca8defebb25f865e29928a98bc7d42ba3b1350e5/matplotlib-2.1.1-cp36-cp36m-manylinux1_x86_64.wh

In [0]:
from __future__ import print_function

import numpy as np
from keras.layers import Dense, Activation, SimpleRNN
from keras.models import Sequential
import codecs


INPUT_FILE = "/gdrive/My Drive/develop/twitter-2019-12-14-a38ba08b715a79675a4a7b065962453045a5501c5174e970eea2aa16acfc1ea7/result.txt"

# extract the input as a stream of characters
print("Extracting text from input...")
with codecs.open(INPUT_FILE, "r", encoding="utf-8") as f:
    lines = [line.strip().lower() for line in f
             if len(line) != 0]
    text = " ".join(lines)

# creating lookup tables
# Here chars is the number of features in our character "vocabulary"
chars = set(text)
nb_chars = len(chars)
char2index = dict((c, i) for i, c in enumerate(chars))
index2char = dict((i, c) for i, c in enumerate(chars))

# create inputs and labels from the text. We do this by stepping
# through the text ${step} character at a time, and extracting a
# sequence of size ${seqlen} and the next output char. For example,
# assuming an input text "The sky was falling", we would get the
# following sequence of input_chars and label_chars (first 5 only)
#   The sky wa -> s
#   he sky was ->
#   e sky was  -> f
#    sky was f -> a
#   sky was fa -> l
print("Creating input and label text...")
SEQLEN = 10
STEP = 1

input_chars = []
label_chars = []
for i in range(0, len(text) - SEQLEN, STEP):
    input_chars.append(text[i:i + SEQLEN])
    label_chars.append(text[i + SEQLEN])

# vectorize the input and label chars
# Each row of the input is represented by seqlen characters, each
# represented as a 1-hot encoding of size len(char). There are
# len(input_chars) such rows, so shape(X) is (len(input_chars),
# seqlen, nb_chars).
# Each row of output is a single character, also represented as a
# dense encoding of size len(char). Hence shape(y) is (len(input_chars),
# nb_chars).
print("Vectorizing input and label text...")
X = np.zeros((len(input_chars), SEQLEN, nb_chars), dtype=np.bool)
y = np.zeros((len(input_chars), nb_chars), dtype=np.bool)
for i, input_char in enumerate(input_chars):
    for j, ch in enumerate(input_char):
        X[i, j, char2index[ch]] = 1
    y[i, char2index[label_chars[i]]] = 1

# Build the model. We use a single RNN with a fully connected layer
# to compute the most likely predicted output char
HIDDEN_SIZE = 128
BATCH_SIZE = 128
NUM_ITERATIONS = 25
NUM_EPOCHS_PER_ITERATION = 1
NUM_PREDS_PER_EPOCH = 100

model = Sequential()
model.add(SimpleRNN(HIDDEN_SIZE, return_sequences=False,
                    input_shape=(SEQLEN, nb_chars),
                    unroll=True))
model.add(Dense(nb_chars))
model.add(Activation("softmax"))

model.compile(loss="categorical_crossentropy", optimizer="rmsprop")

# We train the model in batches and test output generated at each step
for iteration in range(NUM_ITERATIONS):
    print("=" * 50)
    print("Iteration #: {}".format(iteration))
    model.fit(X, y, batch_size=BATCH_SIZE, epochs=NUM_EPOCHS_PER_ITERATION)

    # testing model
    # randomly choose a row from input_chars, then use it to
    # generate text from model for next 100 chars
    test_idx = np.random.randint(len(input_chars))
    test_chars = input_chars[test_idx]
    print("Generating from seed: {}".format(test_chars))
    print(test_chars, end="")
    for i in range(NUM_PREDS_PER_EPOCH):
        Xtest = np.zeros((1, SEQLEN, nb_chars))
        for j, ch in enumerate(test_chars):
            Xtest[0, j, char2index[ch]] = 1
        pred = model.predict(Xtest, verbose=0)[0]
        ypred = index2char[np.argmax(pred)]
        print(ypred, end="")
        # move forward with test_chars + ypred
        test_chars = test_chars[1:] + ypred
    print()


Extracting text from input...
Creating input and label text...
Vectorizing input and label text...





Iteration #: 0
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where



Epoch 1/1





Generating from seed: http://bit
http://bitoosaiaaaso  #ラブライブ三昧  #ラブライブ三昧  #ラブライブ三昧  #ラブライブ三昧  #ラブライブ三昧  #ラブライブ三昧  #ラブライブ三昧  #ラブライブ三昧  #ラブライブ三昧
Iteration #: 1
Epoch 1/1
Generating from seed: にして誤魔化せるかの
にして誤魔化せるかの #ラブライブ三昧  @lovelive_sife  rt @lovelive_sife  rt @lovelive_sife  rt @lovelive_sife  rt @lovelive_sif
Iteration #: 2
Epoch 1/1
Generating from seed: でビックリしたぞ 微
でビックリしたぞ 微うはのイント  aqours  rt @lovelive_sif: #スクフェスシスース5ス555555555555555555555555555555555555555555555555555555
Iteration #: 3
Epoch 1/1
Generating from seed:   オイオイ スクフ
  オイオイ スクフェスのリートをrt @lovelive_sif: #スクフェスシリーズ5周年 記念し毎日当たるrtキャンペーン  催aqours  rt @lovelive_sif: #スクフェスシリーズ5周年 記念
Iteration #: 4
Epoch 1/1
Generating from seed: ①@lovelive
①@lovelive_sif: #スクフェスシリーズ5周年 記念 毎日当たるrtキャンペーン  催a  スクフェスシャン