#### Web scraping for Kigo Finder
To create a dataset of kigo (seasonal word), used requests and BeautifulSoup to scrape kigo words from haiku association website.
Then count syllables of words and add it to the data and write JSON file for further use.

In [1]:
# import libraries
import requests
from bs4 import BeautifulSoup

In [2]:
# Access the website of haiku association. 
# Words for spring http://www.haiku-data.jp/kigo_list.php?season_cd=1#result
res = requests.get('http://www.haiku-data.jp/kigo_list.php?season_cd=1#result')
sp = BeautifulSoup(res.text, "html.parser")
elems = sp.find_all("tr")

In [3]:
list1=[]

for j in range(2,len(elems)):
    el2=elems[j]
    tmp=[]
    words=el2.find_all("td", class_="font1",background="img/bg-w.gif")
    # Add words and phonetic characters 
    for i in range(2):
        tmp.append(words[i].text.replace(u'\xa0', u''))

    # Add number of syllables
    # Exclude some characters that are not counted as one syllable
    if "ゃ" not in tmp[1] and "ゅ" not in tmp[1] and "ょ" not in tmp[1]:
        onsu=len(tmp[1])
    else:
        kazu=tmp[1].count("ゃ")+tmp[1].count("ゅ")+tmp[1].count("ょ")
        onsu=len(tmp[1])-kazu
    tmp.append(onsu)
    list1.append(tmp)
list1

[['藍蒔く', 'あいまく', 4],
 ['青木の花', 'あおきのはな', 6],
 ['石蓴', 'あおさ', 3],
 ['青饅', 'あおぬた', 4],
 ['青麦', 'あおむぎ', 4],
 ['通草の花', 'あけびのはな', 6],
 ['胡葱', 'あさつき', 4],
 ['朝寝', 'あさね', 3],
 ['麻蒔く', 'あさまく', 4],
 ['浅蜊', 'あさり', 3],
 ['薊', 'あざみ', 3],
 ['明日葉', 'あしたば', 4],
 ['蘆の角', 'あしのつの', 5],
 ['蘆の若葉', 'あしのわかば', 6],
 ['馬酔木の花', 'あしびのはな', 6],
 ['蘆焼く', 'あしやく', 4],
 ['アスパラガス', 'あすぱらがす', 6],
 ['アズマイチゲ', 'あずまいちげ', 6],
 ['東菊', 'あずまぎく', 5],
 ['畦塗', 'あぜぬり', 4],
 ['暖か', 'あたたか', 4],
 ['アネモネ', 'あねもね', 4],
 ['虻', 'あぶ', 2],
 ['海女', 'あま', 2],
 ['甘茶', 'あまちゃ', 3],
 ['鮎汲', 'あゆくみ', 4],
 ['杏の花', 'あんずのはな', 6],
 ['飯蛸', 'いいだこ', 4],
 ['鮊子', 'いかなご', 4],
 ['伊勢参', 'いせまいり', 5],
 ['磯遊び', 'いそあそび', 5],
 ['磯竈', 'いそかまど', 5],
 ['磯巾着', 'いそぎんちゃく', 6],
 ['磯菜摘', 'いそなつみ', 5],
 ['虎杖', 'いたどり', 4],
 ['苺の花', 'いちごのはな', 6],
 ['一の午', 'いちのうま', 5],
 ['銀杏の花', 'いちょうのはな', 6],
 ['一輪草', 'いちりんそう', 6],
 ['凍解', 'いてどけ', 4],
 ['犬ふぐり', 'いぬふぐり', 5],
 ['芋植う', 'いもうう', 4],
 ['岩燕', 'いわつばめ', 5],
 ['魚島', 'うおじま', 4],
 ['萍生ひ初む', 'うきくさおいそむ', 8],
 ['鶯', 'うぐいす', 4],
 ['鶯菜', 'うぐいすな

In [3]:
key1=['word', 'yomi', 'syllables']
spring=[dict(zip(key1,item)) for item in list1]

In [4]:
import json

# Writing JSON
with open('spring.json', 'w', encoding="utf-8") as f:
    json.dump(spring, f, indent=2, ensure_ascii=False)

Make sure the words are stored properly.

In [5]:
# Reading JSON file
with open('spring.json', 'r', encoding="utf-8") as f:
    json_output = json.load(f)

In [6]:
#json_output

[{'word': '藍蒔く', 'yomi': 'あいまく', 'syllables': 4},
 {'word': '青木の花', 'yomi': 'あおきのはな', 'syllables': 6},
 {'word': '石蓴', 'yomi': 'あおさ', 'syllables': 3},
 {'word': '青饅', 'yomi': 'あおぬた', 'syllables': 4},
 {'word': '青麦', 'yomi': 'あおむぎ', 'syllables': 4},
 {'word': '通草の花', 'yomi': 'あけびのはな', 'syllables': 6},
 {'word': '胡葱', 'yomi': 'あさつき', 'syllables': 4},
 {'word': '朝寝', 'yomi': 'あさね', 'syllables': 3},
 {'word': '麻蒔く', 'yomi': 'あさまく', 'syllables': 4},
 {'word': '浅蜊', 'yomi': 'あさり', 'syllables': 3},
 {'word': '薊', 'yomi': 'あざみ', 'syllables': 3},
 {'word': '明日葉', 'yomi': 'あしたば', 'syllables': 4},
 {'word': '蘆の角', 'yomi': 'あしのつの', 'syllables': 5},
 {'word': '蘆の若葉', 'yomi': 'あしのわかば', 'syllables': 6},
 {'word': '馬酔木の花', 'yomi': 'あしびのはな', 'syllables': 6},
 {'word': '蘆焼く', 'yomi': 'あしやく', 'syllables': 4},
 {'word': 'アスパラガス', 'yomi': 'あすぱらがす', 'syllables': 6},
 {'word': 'アズマイチゲ', 'yomi': 'あずまいちげ', 'syllables': 6},
 {'word': '東菊', 'yomi': 'あずまぎく', 'syllables': 5},
 {'word': '畦塗', 'yomi': 'あぜぬり', 'sylla

Do the same thing to other seasons and save them as JSON file.