### 正規表現
文字列を検索対象に、正規表現パターンで検索する。

|関数|コード|
|:-:|:-:|
|先頭がマッチするかどうか|re.match(パターン, 検索対象)|
|検索する|re.search(パターン, 検索対象)|
|マッチする部分全て返す|re.findall(パターン, 検索対象)|
|置換|re.sub('　', '', string)|
|分割。リストで返す。|re.split(' ', string)|
|パターンのコンパイル|p = re.compile('one')|

- 参考URL：[https://note.nkmk.me/python-re-match-search-findall-etc/]

### クイズ

In [2]:
# 7-1 Unicode文字列の表示/名称の参照

import unicodedata
mystery = '\U0001f4a9'
mystery

unicodedata.name(mystery)

'PILE OF POO'

In [3]:
# 7-2 Unicode文字列をUTF-8にエンコードする。

pop_bytes = mystery.encode('utf-8')
pop_bytes



b'\xf0\x9f\x92\xa9'

In [6]:
# 7-3 デコード

pop_string = pop_bytes.decode('utf-8')
pop_string

pop_string == mystery

True

In [15]:
# 7-4 古い書式指定 

poem = '''
My kitty cat likes %s, 
My kitty cat likes %s,
My kitty cat fell on his %s,
And now thinks he's a %s.
'''

args = ('roast beef', 'ham', 'head', 'calm') # %表記する際は、挿入する文字列をタプルでまとめなければならない。
print(poem % args)




My kitty cat likes roast beef, 
My kitty cat likes ham,
My kitty cat fell on his head,
And now thinks he's a calm.



In [17]:
# (別解)
# 7-4 古い書式指定 

poem = '''
My kitty cat likes %s, 
My kitty cat likes %s,
My kitty cat fell on his %s,
And now thinks he's a %s.
''' %  ('roast beef', 'ham', 'head', 'calm') # %表記する際は、挿入する文字列をタプルでまとめなければならない。

print(poem)





My kitty cat likes roast beef, 
My kitty cat likes ham,
My kitty cat fell on his head,
And now thinks he's a calm.



In [27]:
# 7-5 新しい書式指定の準備　'{}{}{}'.format(n, f, s)

letter = '''
Dear {salutation} {name},

Thank you for your letter. We are sorry that our {product} {verbed} in your
{room}. Please note that it should never be used in a {room}, especially
near my {animals}.

Send us your receipt and {amount} for shipping and handling. We will send 
you another {product} that, in our tests, is {percent}% less likely to 
have {verbed}.

Thank you for your support.

Sincerly,
{spokesman}
{job_title}
'''


In [29]:
# 7-6 新しい書式指定　'{}{}{}'.format(n, f, s)

response = {
    'salutation': 'Colonel',
    'name': 'Hackenbush', 
    'product': 'KIBIT',
    'verbed': 'imploded',
    'room': 'conservatory',
    'animals': 'emus',
    'amount': '$1.38',
    'percent': '1',
    'spokesman': 'Ikki Ikazaki',
    'job_title': 'Analyst'
    }

print(letter.format(**response))


Dear Colonel Hackenbush,

Thank you for your letter. We are sorry that our KIBIT imploded in your
conservatory. Please note that it should never be used in a conservatory, especially
near my emus.

Send us your receipt and $1.38 for shipping and handling. We will send 
you another KIBIT that, in our tests, is 1% less likely to 
have imploded.

Thank you for your support.

Sincerly,
Ikki Ikazaki
Analyst



In [1]:
# 7-7 正規表現の準備

mammoth = '''
    We have seen the Queen of cheese, 
    Laying quietly at your ease, 
    Gently fanned by evening breeze -- 
    Thy fair form no flies dare seize. 

    All gaily dressed soon you'll go 
    To the great Provincial Show, 
    To be admired by many a beau 
    In the city of Toronto. 

    Cows numerous as a swarm of bees -- 
    Or as the leaves upon the trees -- 
    It did require to make thee please, 
    And stand unrivalled Queen of Cheese. 

    May you not receive a scar as 
    We have heard that Mr. Harris 
    Intends to send you off as far as 
    The great World's show at Paris. 

    Of the youth -- beware of these -- 
    For some of them might rudely squeeze 
    And bite your cheek; then songs or glees 
    We could not sing o' Queen of Cheese. 

    We'rt thou suspended from baloon, 
    You'd cast a shade, even at noon; 
    Folks would think it was the moon 
    About to fall and crush them soon. 
    '''

In [2]:
# 7-8 正規表現

import re
pat = r'\bc\w*' #\bおよび\wおよび*の意味をしっかり覚えること。また、''の左のrは、pythonは\bをバックスペースと解釈しないようにするオプションである。\bの意味は、単語の先頭または末尾を、\ｗの意味は任意の英数字1文字を、＊の意味は直前の値を0個以上を、意味する。
re.findall(pat, mammoth)

['cheese', 'city', 'cheek', 'could', 'cast', 'crush']

In [6]:
# 7-9 正規表現

pat = r'\bc\w{3}\b'
re.findall(pat, mammoth)

['city', 'cast']

In [7]:
# 7-10 正規表現

pat = r'\b\w*r\b'
re.findall(pat, mammoth)

['your', 'fair', 'Or', 'scar', 'Mr', 'far', 'For', 'your', 'or']

In [14]:
# 7-11 正規表現

pat = r'\b\w*[aiueo]{3}[^aiueo\s]*\w*\b'
re.findall(pat, mammoth)

['Queen', 'quietly', 'beau', 'Queen', 'squeeze', 'Queen']

In [18]:
# 7-12 バイナリ表現

import binascii
bin1 = '47494638396101000100800000000000ffffff21f90401000000002c000000000100010000020144003b'
gif = binascii.unhexlify(bin1)　#binasciiモジュールのunhexlify()関数はPythonデータをバイナリデータに変換する。
gif

b'GIF89a\x01\x00\x01\x00\x80\x00\x00\x00\x00\x00\xff\xff\xff!\xf9\x04\x01\x00\x00\x00\x00,\x00\x00\x00\x00\x01\x00\x01\x00\x00\x02\x01D\x00;'

In [19]:
# 7-13 GIF

gif[:6] == b'GIF89a' #バイト列を定義するためにbを文字列の前に置く。これがTrueなら、有効なGIFファイルであることを意味する。

True

In [20]:
# 7-14 GIFのサイズ抽出

import struct
width, height = struct.unpack('<HH', gif[6:10]) #structモジュールのunpack関数はバイナリデータをPythonデータに変換する。
width, height

(1, 1)