# 2019/09/21 レポート 森下健太

# 自然言語処理100本ノック

## 第1章: 準備運動

### 00. 文字列の逆順

In [1]:
s = "stressed"
print(s[::-1])

desserts


### 01. 「パタトクカシーー」

In [2]:
s = "パタトクカシーー"
print(s[::2])

パトカー


### 02. 「パトカー」＋「タクシー」＝「パタトクカシーー」

In [3]:
s1 = "パトカー"
s2 = "タクシー"
print("".join([x + y for x, y in zip(s1, s2)]))

パタトクカシーー


### 03. 円周率

In [4]:
import re
s = "Now I need a drink, alcoholic of course, after the heavy lectures involving quantum mechanics."
print([len("".join(re.findall("[a-zA-z]+", x))) for x in s.split()])  # アルファベットのみカウント

[3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5, 8, 9, 7, 9]


### 04. 元素記号

In [5]:
s = "Hi He Lied Because Boron Could Not Oxidize Fluorine. " \
    "New Nations Might Also Sign Peace Security Clause. Arthur King Can."
print({i + 1: x[0] if i + 1 in [1, 5, 6, 7, 8, 9, 15, 16, 19] else x[:2] for i, x in enumerate(s.split())})

{1: 'H', 2: 'He', 3: 'Li', 4: 'Be', 5: 'B', 6: 'C', 7: 'N', 8: 'O', 9: 'F', 10: 'Ne', 11: 'Na', 12: 'Mi', 13: 'Al', 14: 'Si', 15: 'P', 16: 'S', 17: 'Cl', 18: 'Ar', 19: 'K', 20: 'Ca'}


### 05. n-gram

In [6]:
def ngram(s, n):
    """
    n-gramを作る関数
    --------------------------------------------------
    s: 文字列(文字n-gram)または単語のリスト(単語n-gram)
    n: 整数
    ---------------------------------------------------
    return: n-gram
    """
    
    return [s[i:i + n] for i in range(len(s) - n + 1)]


s = "I am an NLPer"
print(ngram(s.split(), 2))  # 単語bi-gram
print(ngram(s, 2))  # 文字bi-gram

[['I', 'am'], ['am', 'an'], ['an', 'NLPer']]
['I ', ' a', 'am', 'm ', ' a', 'an', 'n ', ' N', 'NL', 'LP', 'Pe', 'er']


### 06. 集合

In [7]:
s1 = "paraparaparadise"
s2 = "paragraph"

X = set(ngram(s1, 2))
Y = set(ngram(s2, 2))

print("X | Y: ", X | Y)  # 和集合
print("X & Y: ", X & Y)  # 積集合
print("X - Y: ", X - Y)  # 差集合
print("\"se\" in X: ", "se" in X)
print("\"se\" in Y: ", "ss" in Y)

X | Y:  {'ad', 'se', 'is', 'ap', 'gr', 'di', 'ph', 'ag', 'pa', 'ra', 'ar'}
X & Y:  {'pa', 'ra', 'ar', 'ap'}
X - Y:  {'di', 'ad', 'se', 'is'}
"se" in X:  True
"se" in Y:  False


### 07. テンプレートによる文生成

In [8]:
def temp(x, y, z):
    """
    テンプレート
    -----------------------------
    x: 文字列化可能なオブジェクト
    y: 文字列化可能なオブジェクト
    z: 文字列化可能なオブジェクト
    -----------------------------
    return: "x時のyはz"
    """
    
    return "{}時の{}は{}".format(x, y, z)


print(temp(12, "気温", 22.4))

12時の気温は22.4


### 08. 暗号文

In [9]:
def cipher(s):
    """
    文字列を暗号化する関数
    ----------------------
    s: 文字列
    ----------------------
    return: 暗号文
    """
    
    return "".join([str(219 - ord(x)) if "a" <= x <= "z" else x for x in s])


s = cipher("I'm a perfect human.")  # 暗号化
print(s)
uncipher = ""

# 複合化
i = 0
while i < len(s):
    if s[i:i + 3].isdecimal():
        uncipher += chr(219 - int(s[i:i + 3]))
        i += 3
    else:
        uncipher += s[i]
        i += 1
print(uncipher)

I'110 122 107118105117118120103 115102110122109.
I'm a perfect human.


### Typoglycemia

In [10]:
from random import sample

s = "I couldn't believe that I could actually understand what I was reading : the phenomenal power of the human mind ."
# 4文字以上の単語の先頭と末尾の文字以外をシャッフル
print(" ".join([x[0] + "".join(sample(x[1:-1], len(x) - 2)) + x[-1] if len(x) > 4 else x for x in s.split()]))

I cnuld'ot bieleve that I colud atacully udsaetnrnd what I was reinadg : the pehnoenaml power of the human mind .
