# Information Retrieval

## Information Retrieval 알아보기

Information Retrieval은 말 그대로 정보의 검색을 뜻한다. 여기서 자주 사용하는 Search랑은 어떻게 차이가 나는 것일까? 기존의 Search는 DB와 같은 데이터 모음에서 모든 정보를 일일히 Brute-force 식으로 모두 검색을 진행하여 검색 목적에 맞으면 모두 반환을 한다. 하지만 이런 Search의 경우에는 단어의 문맥이나 사용뜻은 전혀 고려하지 않으며 일치만 하면 모두 반환하기 때문에 사용자의 목적에 제대로 부합하지 못하는 모습을 보인다. 하지만 Retrieval의 경우 사용자의 목적에 맞추어 관련된 자료를 찾을 수 있도록 가중치와 같은 방법을 사용하기 때문에 보다 더 지능적으로 사용자의 목적에 맞게 작동한다.

![](./pic/IR.png)

이번에 우리가 만들 Retrieval 프로그램은 인터넷에서 Crawler를 사용하여 정보를 가지고 있는 모든 웹문서들을 수집한다. 여기서 가져온 웹문서들은 Raw 문서들이기 때문에 정제되어 있지 않은 Text들의 모음이다. 이를 적절히 전처리 과정을 통하여 검색에 필요치 않는 웹문서 상의 특수문자, 문장기호 등을 제거한다. 이 과정을 통하여 일차적으로 정제된 데이터를 사용하여 내용을 분석하고 사용자의 질의에 적절하게 대응이 되도록 알고리즘을 사용하여 검색결과를 반환한다. 

## Crawler

Cralwer는 앞에서 말한 웹문서 수집 단계를 수행하는 역할을 한다. Crawler를 사용한다면 입력된 URL을 사용하여 모든 관련 웹문서들을 모두 수집을 하며 앞으로 사용자들은 이 웹문서를 기반으로 하여 원하는 작업을 수행할 수 있다. 그렇다면 이러한 Crawler를 직접 구현해 보도록 하겠다. 이를 위해 필요한 패키지들을 미리 확인해 보도록 하겠다.

### Requeats

![](./pic/requests.png)

가장 첫번째 단계인 웹문서 수집을 위하여서는 URL과 더불어서 이를 Python상에 불러올 수 있는 Requests 패키지를 필요로 한다. Requests 패키지는 Anaconda을 사용하고 있다면 기본적으로 설치되어 있는 패키지이다.

### BeautifulSoup

BeautifulSoup 은 requests 패키지를 사용하여 읽어온 웹문서들을 태그와 Class, id를 기반으로 하여 목적으로 하는 내용을 분류할 수 있는 패키지이다. 많은 웹문서들은 HTML 태그에 Class와 ID를 적용하여 구분을 하곤 한다. 아래의 사진을 보자.

![](./pic/tag.png)

다양한 태그들 안에 class값들이 구분되어 있다. 동일한 클래스를 사용하는 경우도 있고, 독립적인 클래스를 사용하는 경우도 있다. 따라서 단순히 태그로만 구분을 한다면 수없이 많은 데이터들을 얻게 되어 작업이 매우 복잡하고 오래걸릴 것이다. BeautifulSoup을 사용한다면 태그 뿐만 아니라 Calss와 ID를 통하여 목적에 맞는 데이터를 쉽게 얻을 수 있다.

### URLparse

URLparse 패키지는 URL을 분석하여 URL에 담겨있는 속성들을 찾고 이들을 나누어 따로 저장하여 둔다. URL이 실행하고자 하는 명령과 설정값들을 모두 한번에 저장해 두고 있으며 이를 기존의 Split 함수를 사용하여 나눌 수 있으나 이를 보다 빠르고 편하게 지원하는 패키지이다.

## Crawler 만들기

먼저 앞에서 소개한 패키지들을 먼저 import 하야 한다. BeautifulSoup과 같은 경우 bs4란 이름의 패키지 안에 존재한다.

In [1]:
from bs4 import BeautifulSoup
#import urllib.request             # HTTP Request and Response
#from urllib.parse import quote    # UTF-8 to ASCII for URL
from urllib.parse import urlparse, parse_qs # URL Parsing
import requests
import re

이번에 우리는 Naver사의 뉴스중 "가장 많이 본 뉴스" 부분을 사용해보도록 하겠다.

![](./pic/top.png)

이 부분에 해당된다. 뉴스 기사는 매일 갱신되는 관계로 수집하는 웹문서의 내용은 매일 달라질 것이다.

해당 주소를 사용하기 위하여 Naver의 URL을 가져오도록 하겠다.

In [2]:
#-*- coding: utf-8 -*-
NEWS_PATH = "./news/"
NAVER_NEWSNOW_URL = "http://news.naver.com"

먼저 웹문서의 리스트를 가져오는 함수를 만들어 보겠다. 우리가 만드는 Crwaler의 경우 웹문서의 분야가 뉴스이기 때문에 이름을 get_news_list라고 설정해 보도록 하겠다.

In [3]:
def get_news_list():
    newslist_html = requests.get(url=NAVER_NEWSNOW_URL).content
    newslist_lxml = BeautifulSoup(newslist_html, 'lxml', from_encoding='utf-8')
    newslist = [i.get('href') for i in newslist_lxml.select('ul.type14 li a.sm2')]
    print(newslist)
    return newslist

Requests 패키지의 get 함수를 사용하면 URL이 가리키는 웹문서를 그대로 가져올 수 있다. 그렇지만 뒤에 content를 붙였다는 것처럼, 그냥 get함수만 사용한다고 해서 내용을 수집할 수 있는 것은 아니다. 이는 requests 패키지 안에서 정의된 Object를 반환하기 때문이다. 여기서 이제 웹문서 안의 내용을 가져와야 한다. 이를 위해 get 함수에는 content attribute가 존재한다. 이는 get함수를 통해 받은 Object에서 content를 반환하는 attribute이며 이를 사용하여 웹문서 내용을 가져왔다. 해당 코드를 실행하면 결과가 출력될 것이다.

In [4]:
print(requests.get(url=NAVER_NEWSNOW_URL).content)

b'<!DOCTYPE HTML>\n<html lang="ko">\n<head>\n<meta charset="euc-kr">\n<meta http-equiv="X-UA-Compatible" content="IE=edge">\n<meta http-equiv="refresh" content="600" />\n<meta name="viewport" content="width=1023" />\n\r\n    \r\n    \r\n    \r\n    \r\n    \r\n    \r\n    \r\n    \r\n    \r\n    \r\n    \r\n    \r\n    \r\n    \r\n    \r\n    \r\n    \r\n\r\n<meta property="og:title"       content="\xb3\xd7\xc0\xcc\xb9\xf6 \xb4\xba\xbd\xba">\r\n<meta property="og:type"        content="website">\r\n<meta property="og:url"         content="http://news.naver.com/main/home.nhn">\r\n<meta property="og:image"       content="http://static.news.naver.net/image/news/ogtag/navernews_200x200_20160804.png"/>\r\n<meta property="og:description" content="\xc1\xa4\xc4\xa1, \xb0\xe6\xc1\xa6, \xbb\xe7\xc8\xb8, \xbb\xfd\xc8\xb0/\xb9\xae\xc8\xad, \xbc\xbc\xb0\xe8, IT/\xb0\xfa\xc7\xd0 \xb5\xee \xbe\xf0\xb7\xd0\xbb\xe7\xba\xb0, \xba\xd0\xbe\xdf\xba\xb0 \xb4\xba\xbd\xba \xb1\xe2\xbb\xe7 \xc1\xa6\xb0\xf8">\r\

알아보기 힘든 결과를 볼 수 있을 것이다. 이런 결과 나오는 이유는 웹문서는 컴퓨터에게 맞게 인코딩이 되어있는 상태이기 때문이다. 이제 여기서BeautifulSoup 패키지를 사용하여 가져온 웹문서를 lxml형식으로 우리가 원하는 내용만을 따로 뽑아서 저장한다. 가져오는 절차는 다음과 같다.

* 변환된 웹문서에서 ul태그의 Class가 'type14' 인 태그를 확인하고 그 안의 li 태그를 확인한다.
* 내부 a 태그 중 Class가 'sm2'인 태그를 찾아 그 안에 있는 내용을 모두 가져온다.
* 마지막으로 'href'가 있는 태그의 내용을 가져와 저장을 한다. 이를 통해 웹문서 안에 있는 "가장 많이 본 뉴스"의 url주소를 얻는다.

여기서 Class와 id, 태그들의 경우에는 **각 사이트마다 다르기 때문에** 여기서 사용하는 조건은 뽑아내고자 하는 사이트랑은 다를 수 있다. 따라서 해당 사이트의 구조를 스스로 찾아서 맞춤형으로 만들어야 한다. 브라우저마다 개발자도구를 지원하니 이를 통하여 구조를 확인해 볼 수 있다. 이렇게 하여 우리는 웹문서의 내용을 가져와 각 기사의 URL만을 뽑아내 저장하였다. 다음에는 얻어낸 URL을 통하여 해당 기사 웹문서 안에 우리가 필요한 부분인 제목과 기사본문만을 가져와 저장하도록 하겠다. 이를 위해 새로운 함수 get_news_content를 만들어 보겠다.

In [5]:
def get_news_content(news_url): 
    news_id = ''
    newscontent_text = ''
    url_params = urlparse(news_url)
    url_params = parse_qs(url_params.query)
    news_id = url_params['aid']

    newscontent_html = requests.get(NAVER_NEWSNOW_URL + news_url).content
    newscontent_lxml = BeautifulSoup(newscontent_html, 'lxml', from_encoding='utf-8')
    
    newscontent_head = newscontent_lxml.select('#articleTitle')
    newscontent_list = newscontent_lxml.select('#articleBodyContents')
    
    if not newscontent_head:
        newscontent_head = newscontent_lxml.select('.end_tit')
    if not newscontent_list:
        newscontent_list = newscontent_lxml.select('#articeBody')
    
    newscontent_text += str(newscontent_head[0].find_all(text=True))
    newscontent_text += str(newscontent_list[0].find_all(text=True))
    
    save_file(NEWS_PATH + str(news_id[0]) + '.txt', newscontent_text)

20



URLparse 패키지를 사용하여 URL을 값에 따라 분리하였고, 여기서 우리가 사용하는 Naver사의 "가장 많이 본 뉴스" 부분은 'aid'라는 값을 지니고 있기 때문에 해당 "aid"값을 news의 id로 파악하여 저장해 두었다. 저장한 id는 앞으로 우리가 수집한 기사 제목과 내용을 저장할 파일 이름으로 설정하도록 하겠다. 앞서 보았던 것 과 같은 방식으로 이번에는 기사 웹문서의 URL을 requests와 BeautifulSoup을 사용하여 내용을 가져오고, if문을 사용하여 태그 안에 id가 'articleTitle'인 경우 이는 제목이기 때문에 저장을 하고, 'articleBodyContents'인 경우에는 기사 본문이기 때문에 기사 본문을 가져와 저장하였다. 각각은 newscontent_text라는 변수에 지속적으로 추가되어 모여져 있다. 마지막으로 save_file 함수를 사용하여 모아온 newscontent_text 변수를 파일에 입력하여 저장함으로 써 aid값을 지닌 news_id를 파일의 이름으로, newscontent_text를 내용으로 하여 저장을 진행하였다.

In [6]:
def save_file(filename, text):
    news_file = open(filename, 'w', encoding='utf-8')
    news_file.write(text)
    news_file.close()

In [7]:
def open_file(file):
    news_file = open(file, 'r', encoding='utf-8')
    content = news_file.read()
    news_file.close()
    return content

이제 실제로 url을 긁어오는 작업을 진행해 주어야 한다. news_list라는 변수에 앞서 만든 "가장 많이 본 뉴스" 목록을 읽어 기사들의 URL을 저장시키고 그 URL들을 한개씩 get_news_content 함수에 입력값으로 넣어서 제목과 본문을 news_id의 이름을 가진 파일로 저장하였다.

In [8]:
news_list = get_news_list()
for news_url in news_list:
    get_news_content(news_url)

['/main/ranking/read.nhn?mid=etc&sid1=111&rankingType=popular_day&oid=001&aid=0009483730&date=20170818&type=1&rankingSeq=1&rankingSectionId=100', '/main/ranking/read.nhn?mid=etc&sid1=111&rankingType=popular_day&oid=421&aid=0002897237&date=20170818&type=1&rankingSeq=1&rankingSectionId=101', '/main/ranking/read.nhn?mid=etc&sid1=111&rankingType=popular_day&oid=032&aid=0002811684&date=20170818&type=1&rankingSeq=1&rankingSectionId=102', '/main/ranking/read.nhn?mid=etc&sid1=111&rankingType=popular_day&oid=015&aid=0003811615&date=20170818&type=0&rankingSeq=1&rankingSectionId=104', '/main/ranking/read.nhn?mid=etc&sid1=111&rankingType=popular_day&oid=241&aid=0002700418&date=20170818&type=1&rankingSeq=1&rankingSectionId=106']


FileNotFoundError: [Errno 2] No such file or directory: './news/0009483730.txt'

이렇게 우리는 Crawler를 완성하였다. 이제 내용을 분석을 하여 알고리즘이 사용자의 질의에 적절하게 대처할 수 있도록 조치하여야 한다. 먼저 단어들이 해당 문서에 얼마나 나오는지에 대해 알 수 있는 Indexer를 구현해 보고자 한다.

## KoNLPy

![](./pic/konlpy.png)

서울대학교에서 만든 한국어 정보처리를 위한 파이썬 패키지로 한국어 텍스트를 이용하여 기초적인 NLP 작업을 수행하는데 도움을 준다. KoNLPy의 특징으로 직관적인 함수명을 사용하고, 상세한 설명 문서를 제공해준다.

In [22]:
from konlpy.tag import Kkma
from konlpy.utils import pprint
from collections import Counter
import os
import re
import math
import pickle
import operator

## 데이터전처리
lxml 형식으로 가져온 데이터는 html 코드로 작성되어있다. html 코드에서의 줄바꿈이나 4칸 뛰우는 기능은 \n, \t와 같은 코드들로 이루어져 있고 택스트 데이터를 처리할 때 불필요한 존재들이다. 뉴스에서 가져온 말뭉치 데이터에서 불필요한 html 용어들이나 스크립트 코드들을 가능한 없애야 인덱싱에서 좋은 결과를 얻을 수 있다.

먼저 한국어만 처리하기 위해서는 모든 알파벳의 대소문자 범위와 각종 기호들을 빈칸으로 대체해서 영어와 각종 기호들도 제외한다. 

In [23]:
def preprocessing(content):
    content = re.sub('[a-zA-Z]', '', content)
    content = re.sub('[\{\}\[\]\/?.,;:|\)*~`!^\-_+<>@\#$%&\\\=\(\'\")]', '', content)
    content = ' '.join(content.split())
    
    return content

인덱싱을 하는 방법은 4가지로 어절단위로 끊기, 명사만 뽑기, 형태소분석기, 음절n개씩 끊는 방법이 있다. 

prhase는 어절 단위로 끊는 방법으로 제일 간단하고 쉽게 구현이 가능하지만 단어의 빈도를 잘 측정하지 못한다. 아빠의 빈도를 알고 싶지만 문서 내에서는 아빠가, 아빠는, 아빠에게 와 같은 명사 + 조사 같은 형식을 하나로 받기 때문에 다른 단어로 취급해서 단어빈도를 구하기 힘들다.

ngram은 음절 단위로 끊는 방법으로 주로 2~3 음절로 끊어서 사용한다. 본문 텍스트에 그대로 적용하면 공백이나 "와 같은 기호들도 포함해서 짤라서 단어의 빈도는 알 수 있으나 불필요한 데이터도 많이 만든다는 단점이 있다.

noun은 명사만 뽑는 방법으로 KonPLy에서 만든 한국어 정보처리 패키지를 사용해서 어절단위로 끊으면서 단어를 분석해 준다. 명사의 가능성이 조금이라도 있으면 선택하기 때문에 각종 신조어나 등장하지 않았던 단어들까지 명사로 보고 선택한다.

morpheme은 KonPLy에서 만든 형태소 분석기를 사용해서 형태소에 품사를 태그하여 어떠한 형태소가 어떤 품사인지에 분석한다. 명사이면 NNG와 같은 명사, 동사에는 VV와 같은 단어 등이 태그된다. 이에 대한 정보는 KoNLPy에서 보다 자세히 찾아볼 수 있다. 아래의 사진에서 간단한 예를 확인할 수 있다.

![](./pic/table.png)

In [24]:
def indexing(tool, content):
    kkma = Kkma()
    featurelist = dict()    
    if tool == 'phrase':        
        for term in content.split():
            if term in featurelist:
                featurelist[term] += 1
            else :
                featurelist[term] = 1
    elif tool == 'ngram':
        n = 2
        idx = 0
        for i in range(len(content)):
            term = content[i:i+n]
            if term in featurelist:
                featurelist[term] += 1
            else :
                featurelist[term] = 1

    elif tool == 'noun':
        for phrase in content.split():
            for term in kkma.nouns(phrase):
                if term in featurelist:
                    featurelist[term] += 1
                else :
                    featurelist[term] = 1
    elif tool == 'morpheme':
        for term in kkma.pos(content):
            term = term[0] + '/' + term[1]
            
            if term in featurelist:
                featurelist[term] += 1
            else :
                featurelist[term] = 1
    else:
        return False
    
    return featurelist

I단어들 중에서 가장 많은 빈도로 나온 단어를 돌려주는 기능을 계속 사용하기위해서 함수로 만들어 놓는다 

In [25]:
def maxfreq(termlist):
    return max(termlist.values())

## TF-IDF(Term Frequency - Inverse Document Frequency)

정보검색이나 텍스트 마이닝에서 주로 사용하는 가중치로 특정 문서에서의 단어 중요도를 평가하는 통계적 수치다. 수식은 아래와 같다.

$TF-IDF =  tf * idf $

TF(Term Frequency)는 단어빈도로 한 단어가 특정 문서내에서 몇번 등장했는지를 나타내는 횟수로 문서에서 단어가 얼마나 중요한지를 표현 할수 있다. 하지만 많은 문서에서 나타나는 흔한 단어일 수 있으니 다른 문서에서 적게 나타나야 중요한 단어라는 것을 알수 있다. TF값을 나타내는 수식은 $TF = \frac{Word Frequency}{Max Frequency}$이다. Word Frequency는 한 단어가 얼마나 등장하는지 빈도이고, Max Frequency는 모든 단어의 빈도의 합이다. TF의 경우 전체 모든 단어의 빈도를 분모로 지니기 때문에 0 ~ 1 사이의 값을 지닌다.

IDF(Inverse Document Frequency)는 역 문서 빈도로 전체 문서의 수에서 해당 단어가 포함된 문서의 수로 나눈 값을 말한다. IDF값을 나타내는 수식은 $IDF = \log{\frac{n(D)}{n(D : term \in D)}}$ 와 같다. 분자는 전체 문서의 갯수를, 분모는 단어 term이 들어있는 문서들의 수를 의미한다. 여기서 Log를 취하여 주는 이유는 값의 범위가 있는 TF와 달리 IDF는 값의 범위가 정해져 있지 않기 때문에 값이 매우 크게 나올 수 있어 TF값이 별다른 의미가 없게 만들 수 있다. 그런 이유로 Log를 취하여 TF와의 차이를 줄여 별다른 영향이 없도록 만든다.

보통 정보 검색을 위해서는 TF-IDF을 사용하나 상황마다 사용한 가중치를 다르고 현재 경우처럼 문서의 숫자가 적을 때는 TF만 사용하거나 다른 방법들도 사용이 가능하다. 여기서 쓰는 boolean은 단순하게 단어가 문서에 포함되어있는지를 확인해서 단어가 포함되어 있는 리스트를 되돌려 준다.

In [26]:
def weighting(tool, total, maxlist, featurelist):
    # tf/arg max tf * log N/df
    weightlist = dict()
    docvlength = dict()
    
    if tool == 'boolean':
        for term, freqinfo in featurelist.items():
            for docname, freq in freqinfo.items():
                if term in weightlist:
                    weightlist[term].update({docname:1})
                else:
                    weightlist[term] = {docname:1}
    elif tool == 'tf':
        for term, freqinfo in featurelist.items():
            for docname, freq in freqinfo.items():
                tf = freq/maxlist[docname]
                
                if term in weightlist:
                    weightlist[term].update({docname:tf})
                else:
                    weightlist[term] = {docname:tf}
                    
                if docname in docvlength:
                    docvlength[docname] += (tf ** 2) 
                else:
                    docvlength[docname] = (tf ** 2)
    elif tool == 'idf':
        for term, freqinfo in featurelist.items():
            for docname, freq in freqinfo.items():
                idf = math.log10(total/len(freqinfo))
                
                if term in weightlist:
                    weightlist[term].update({docname:idf})
                else:
                    weightlist[term] = {docname:idf}
                    
                if docname in docvlength:
                    docvlength[docname] += (idf ** 2)
                else:
                    docvlength[docname] = (idf ** 2)
    elif tool == 'tfidf':
        for term, freqinfo in featurelist.items():
            for docname, freq in freqinfo.items():
                tf = freq/maxlist[docname]
                idf = math.log10(total/len(freqinfo))
                tfidf = tf*idf
                
                if term in weightlist:
                    weightlist[term].update({docname:tfidf})
                else:
                    weightlist[term] = {docname:tfidf}
                    
                if docname in docvlength:
                    docvlength[docname] += (tfidf ** 2)
                else:
                    docvlength[docname] = (tfidf ** 2)
                    
    with open('./weight/termweight.pkl', 'wb') as f:
        pickle.dump(weightlist, f, pickle.HIGHEST_PROTOCOL)
    
    with open('./weight/docvlength.pkl', 'wb') as f:
        pickle.dump(docvlength, f, pickle.HIGHEST_PROTOCOL)
            
    return weightlist

os.walk 은 하위 디렉터리에 있는 모든 파일들을 불러오는 것을 가능하게 만드는 함수다. path에는 현재 경로, files에 리스트 형식으로 하위 디렉터리에 있는 모든 파일을 돌려준다. os.path.splitext를 사용하면 파일 이름과 형식으로 분리할 수 있고, 웹문서를 txt파일로 저장했기 때문에 .txt와 비교하면 txt파일들을 전부 불러올 수 있다.

In [27]:
def get_filelist(path):
    filelist = []
    for (path, dir, files) in os.walk(NEWS_PATH):
        for file in files:
            ext = os.path.splitext(file)[-1]
            
            if ext == '.txt':
                filelist.append("%s%s" % (path, file))          
    return filelist

앞에서 만든 함수들을 전부 이용해서 각 문서에 최대로 등장하는 단어을 모은 maxlist, 전체 문서에 해당 단어가 있는지를 검사할 수 있는 featurelist, 앞 weighting 함수로 만든 가중치값들을 weightlist에 넣어주고, toplist에 weightlist로 나온 값들로 갱신시켜준다.

In [None]:
maxlist = dict()
featurelist = dict()
weightlist = dict()
toplist = dict()

filelist = get_filelist(NEWS_PATH)
for file in filelist:
    filename = os.path.basename(file)
    content = preprocessing(open_file(file))
    termlist = indexing('noun', content)
   
    if not termlist:
        continue    
    maxlist[filename] = maxfreq(termlist)
    for term in termlist:
        if term in featurelist:
            featurelist[term].update({filename:termlist[term]})
        else:
            featurelist[term] = {filename:termlist[term]}
weightlist = weighting('tfidf',len(filelist), maxlist, featurelist)
for term, weightinfo in weightlist.items():
    for docname, weight in weightinfo.items():
        if (docname in toplist):
            toplist[docname].update({term:weight})
        else:
             toplist[docname] = {term:weight}
    
    for docname, termweight in toplist.items():
        #print(docname)
        sortedweight = sorted(termweight.items(), key=operator.itemgetter(1), reverse=True)
        pprint(sortedweight)
        #print()
#pprint(featurelist)

[('단독', 0.05376692341046299)]
[('단독', 0.05376692341046299), ('단독신혼일기2', 0.05376692341046299)]
[('신혼', 0.6452030809255559),
 ('단독', 0.05376692341046299),
 ('단독신혼일기2', 0.05376692341046299)]
[('신혼', 0.6452030809255559),
 ('일기', 0.53766923410463),
 ('단독', 0.05376692341046299),
 ('단독신혼일기2', 0.05376692341046299)]
[('신혼', 0.6452030809255559),
 ('일기', 0.53766923410463),
 ('2', 0.3979400086720376),
 ('단독', 0.05376692341046299),
 ('단독신혼일기2', 0.05376692341046299)]
[('2', 0.11369714533486788)]
[('신혼', 0.6452030809255559),
 ('일기', 0.53766923410463),
 ('2', 0.3979400086720376),
 ('오상', 0.32260154046277795),
 ('단독', 0.05376692341046299),
 ('단독신혼일기2', 0.05376692341046299)]
[('2', 0.11369714533486788)]
[('신혼', 0.6452030809255559),
 ('일기', 0.53766923410463),
 ('2', 0.3979400086720376),
 ('오상', 0.32260154046277795),
 ('오상진', 0.10753384682092598),
 ('단독', 0.05376692341046299),
 ('단독신혼일기2', 0.05376692341046299)]
[('2', 0.11369714533486788)]
[('신혼', 0.6452030809255559),
 ('일기', 0.53766923410463),
 ('2', 0.3

 ('1', 0.029818465540940437),
 ('본문', 0.0)]
[('2', 0.11369714533486788),
 ('1', 0.0369181001935453),
 ('건', 0.01894952422247798),
 ('본문', 0.0)]
[('본문', 0.0)]
[('1', 0.04845500650402821), ('본문', 0.0)]
[('1', 0.07268250975604232), ('본문', 0.0)]
[('신혼', 0.6452030809255559),
 ('일기', 0.53766923410463),
 ('윤', 0.4839023106941669),
 ('2', 0.3979400086720376),
 ('김소영', 0.3763684638732409),
 ('이동', 0.3763684638732409),
 ('조', 0.3763684638732409),
 ('조윤', 0.3763684638732409),
 ('오상', 0.32260154046277795),
 ('폰트', 0.32260154046277795),
 ('하차', 0.268834617052315),
 ('이동건', 0.16130077023138897),
 ('오상진', 0.10753384682092598),
 ('진', 0.10753384682092598),
 ('투입', 0.10753384682092598),
 ('고딕', 0.10753384682092598),
 ('건', 0.09183230969354715),
 ('단독', 0.05376692341046299),
 ('단독신혼일기2', 0.05376692341046299),
 ('설정', 0.05376692341046299),
 ('안내', 0.05376692341046299),
 ('안내나눔고딕', 0.05376692341046299),
 ('나눔', 0.05376692341046299),
 ('2돋움', 0.05376692341046299),
 ('돋움', 0.05376692341046299),
 ('3', 0.051

 ('단독신혼일기2', 0.05376692341046299),
 ('설정', 0.05376692341046299),
 ('안내', 0.05376692341046299),
 ('안내나눔고딕', 0.05376692341046299),
 ('나눔', 0.05376692341046299),
 ('2돋움', 0.05376692341046299),
 ('돋움', 0.05376692341046299),
 ('3바탕', 0.05376692341046299),
 ('바탕', 0.05376692341046299),
 ('1폰트', 0.05376692341046299),
 ('2폰트', 0.05376692341046299),
 ('3폰트', 0.05376692341046299),
 ('복수', 0.05376692341046299),
 ('3', 0.05119586529608225),
 ('1', 0.029818465540940437),
 ('18일', 0.017065288432027415),
 ('18', 0.007454616385235109),
 ('본문', 0.0),
 ('일', 0.0)]
[('2', 0.11369714533486788),
 ('18일', 0.04225690468882979),
 ('1', 0.0369181001935453),
 ('18', 0.027688575145158975),
 ('건', 0.01894952422247798),
 ('5', 0.01894952422247798),
 ('본문', 0.0),
 ('일', 0.0)]
[('본문', 0.0), ('일', 0.0)]
[('3', 0.07394958320545213),
 ('18일', 0.07394958320545213),
 ('4', 0.06632333477867293),
 ('1', 0.04845500650402821),
 ('18', 0.03230333766935214),
 ('본문', 0.0),
 ('일', 0.0)]
[('1', 0.07268250975604232),
 ('3', 0.0554

 ('건', 0.09183230969354715),
 ('5', 0.09183230969354715),
 ('장', 0.061221539795698096),
 ('단독', 0.05376692341046299),
 ('단독신혼일기2', 0.05376692341046299),
 ('설정', 0.05376692341046299),
 ('안내', 0.05376692341046299),
 ('안내나눔고딕', 0.05376692341046299),
 ('나눔', 0.05376692341046299),
 ('2돋움', 0.05376692341046299),
 ('돋움', 0.05376692341046299),
 ('3바탕', 0.05376692341046299),
 ('바탕', 0.05376692341046299),
 ('1폰트', 0.05376692341046299),
 ('2폰트', 0.05376692341046299),
 ('3폰트', 0.05376692341046299),
 ('복수', 0.05376692341046299),
 ('확정', 0.05376692341046299),
 ('녹화', 0.05376692341046299),
 ('러브콜', 0.05376692341046299),
 ('콜', 0.05376692341046299),
 ('응답', 0.05376692341046299),
 ('3', 0.05119586529608225),
 ('관계자', 0.030610769897849048),
 ('12', 0.030610769897849048),
 ('12일', 0.030610769897849048),
 ('1', 0.029818465540940437),
 ('18일', 0.017065288432027415),
 ('주', 0.014909232770470219),
 ('18', 0.007454616385235109),
 ('본문', 0.0),
 ('일', 0.0)]
[('2', 0.11369714533486788),
 ('18일', 0.04225690468882

 ('콜', 0.05376692341046299),
 ('응답', 0.05376692341046299),
 ('합류', 0.05376692341046299),
 ('합류키', 0.05376692341046299),
 ('일신상', 0.05376692341046299),
 ('이유', 0.05376692341046299),
 ('의견', 0.05376692341046299),
 ('전달', 0.05376692341046299),
 ('번', 0.05376692341046299),
 ('3', 0.05119586529608225),
 ('관계자', 0.030610769897849048),
 ('12', 0.030610769897849048),
 ('12일', 0.030610769897849048),
 ('키', 0.030610769897849048),
 ('1', 0.029818465540940437),
 ('18일', 0.017065288432027415),
 ('한', 0.017065288432027415),
 ('주', 0.014909232770470219),
 ('18', 0.007454616385235109),
 ('본문', 0.0),
 ('일', 0.0)]
[('2', 0.11369714533486788),
 ('18일', 0.04225690468882979),
 ('1', 0.0369181001935453),
 ('18', 0.027688575145158975),
 ('건', 0.01894952422247798),
 ('5', 0.01894952422247798),
 ('관계자', 0.01894952422247798),
 ('주', 0.0046147625241931625),
 ('본문', 0.0),
 ('일', 0.0)]
[('부부', 0.22739429066973577),
 ('12', 0.05684857266743394),
 ('12일', 0.05684857266743394),
 ('한', 0.03169267851662234),
 ('주', 0.0

 ('3폰트', 0.05376692341046299),
 ('복수', 0.05376692341046299),
 ('확정', 0.05376692341046299),
 ('녹화', 0.05376692341046299),
 ('러브콜', 0.05376692341046299),
 ('콜', 0.05376692341046299),
 ('응답', 0.05376692341046299),
 ('합류', 0.05376692341046299),
 ('합류키', 0.05376692341046299),
 ('일신상', 0.05376692341046299),
 ('이유', 0.05376692341046299),
 ('의견', 0.05376692341046299),
 ('전달', 0.05376692341046299),
 ('번', 0.05376692341046299),
 ('적', 0.05376692341046299),
 ('일상', 0.05376692341046299),
 ('신부', 0.05376692341046299),
 ('애정', 0.05376692341046299),
 ('각종', 0.05376692341046299),
 ('예능', 0.05376692341046299),
 ('라디오', 0.05376692341046299),
 ('3', 0.05119586529608225),
 ('전', 0.03413057686405483),
 ('관계자', 0.030610769897849048),
 ('12', 0.030610769897849048),
 ('12일', 0.030610769897849048),
 ('키', 0.030610769897849048),
 ('예정', 0.030610769897849048),
 ('1', 0.029818465540940437),
 ('18일', 0.017065288432027415),
 ('한', 0.017065288432027415),
 ('주', 0.014909232770470219),
 ('18', 0.007454616385235109),
 

 ('장', 0.061221539795698096),
 ('단독', 0.05376692341046299),
 ('단독신혼일기2', 0.05376692341046299),
 ('설정', 0.05376692341046299),
 ('안내', 0.05376692341046299),
 ('안내나눔고딕', 0.05376692341046299),
 ('나눔', 0.05376692341046299),
 ('2돋움', 0.05376692341046299),
 ('돋움', 0.05376692341046299),
 ('3바탕', 0.05376692341046299),
 ('바탕', 0.05376692341046299),
 ('1폰트', 0.05376692341046299),
 ('2폰트', 0.05376692341046299),
 ('3폰트', 0.05376692341046299),
 ('복수', 0.05376692341046299),
 ('확정', 0.05376692341046299),
 ('녹화', 0.05376692341046299),
 ('러브콜', 0.05376692341046299),
 ('콜', 0.05376692341046299),
 ('응답', 0.05376692341046299),
 ('합류', 0.05376692341046299),
 ('합류키', 0.05376692341046299),
 ('일신상', 0.05376692341046299),
 ('이유', 0.05376692341046299),
 ('의견', 0.05376692341046299),
 ('전달', 0.05376692341046299),
 ('번', 0.05376692341046299),
 ('적', 0.05376692341046299),
 ('일상', 0.05376692341046299),
 ('신부', 0.05376692341046299),
 ('애정', 0.05376692341046299),
 ('각종', 0.05376692341046299),
 ('예능', 0.0537669234104629

 ('진과', 0.21506769364185196),
 ('부부', 0.1836646193870943),
 ('이동건', 0.16130077023138897),
 ('일간', 0.16130077023138897),
 ('일간스포츠', 0.16130077023138897),
 ('스포츠', 0.16130077023138897),
 ('제작진', 0.16130077023138897),
 ('결혼', 0.16130077023138897),
 ('4', 0.12244307959139619),
 ('오상진', 0.10753384682092598),
 ('진', 0.10753384682092598),
 ('투입', 0.10753384682092598),
 ('고딕', 0.10753384682092598),
 ('4폰트', 0.10753384682092598),
 ('현', 0.10753384682092598),
 ('러브', 0.10753384682092598),
 ('방송', 0.10753384682092598),
 ('모델', 0.10753384682092598),
 ('장윤주', 0.10753384682092598),
 ('그', 0.10753384682092598),
 ('남편', 0.10753384682092598),
 ('정승', 0.10753384682092598),
 ('정승민', 0.10753384682092598),
 ('민', 0.10753384682092598),
 ('커플', 0.10753384682092598),
 ('건', 0.09183230969354715),
 ('5', 0.09183230969354715),
 ('공개', 0.09183230969354715),
 ('장', 0.061221539795698096),
 ('단독', 0.05376692341046299),
 ('단독신혼일기2', 0.05376692341046299),
 ('설정', 0.05376692341046299),
 ('안내', 0.05376692341046299),
 ('

 ('12일', 0.05684857266743394),
 ('한', 0.03169267851662234),
 ('주', 0.013844287572579488),
 ('본문', 0.0),
 ('일', 0.0)]
[('공개', 0.1989700043360188),
 ('한', 0.1109243748081782),
 ('전', 0.1109243748081782),
 ('3', 0.07394958320545213),
 ('18일', 0.07394958320545213),
 ('4', 0.06632333477867293),
 ('장', 0.06632333477867293),
 ('키', 0.06632333477867293),
 ('1', 0.04845500650402821),
 ('주', 0.04845500650402821),
 ('18', 0.03230333766935214),
 ('본문', 0.0),
 ('일', 0.0)]
[('1', 0.07268250975604232),
 ('3', 0.0554621874040891),
 ('18', 0.024227503252014105),
 ('본문', 0.0),
 ('일', 0.0)]
[('신혼', 0.6452030809255559),
 ('일기', 0.53766923410463),
 ('윤', 0.4839023106941669),
 ('2', 0.3979400086720376),
 ('김소영', 0.3763684638732409),
 ('이동', 0.3763684638732409),
 ('조', 0.3763684638732409),
 ('조윤', 0.3763684638732409),
 ('신혼일기2', 0.3763684638732409),
 ('오상', 0.32260154046277795),
 ('폰트', 0.32260154046277795),
 ('하차', 0.268834617052315),
 ('사이즈', 0.268834617052315),
 ('출연', 0.268834617052315),
 ('이동건과', 0.2150

 ('전', 0.1109243748081782),
 ('3', 0.07394958320545213),
 ('18일', 0.07394958320545213),
 ('4', 0.06632333477867293),
 ('장', 0.06632333477867293),
 ('키', 0.06632333477867293),
 ('1', 0.04845500650402821),
 ('주', 0.04845500650402821),
 ('18', 0.03230333766935214),
 ('본문', 0.0),
 ('일', 0.0)]
[('1', 0.07268250975604232),
 ('3', 0.0554621874040891),
 ('18', 0.024227503252014105),
 ('본문', 0.0),
 ('일', 0.0)]
[('신혼', 0.6452030809255559),
 ('일기', 0.53766923410463),
 ('윤', 0.4839023106941669),
 ('2', 0.3979400086720376),
 ('김소영', 0.3763684638732409),
 ('이동', 0.3763684638732409),
 ('조', 0.3763684638732409),
 ('조윤', 0.3763684638732409),
 ('신혼일기2', 0.3763684638732409),
 ('오상', 0.32260154046277795),
 ('폰트', 0.32260154046277795),
 ('하차', 0.268834617052315),
 ('사이즈', 0.268834617052315),
 ('출연', 0.268834617052315),
 ('이동건과', 0.21506769364185196),
 ('건과', 0.21506769364185196),
 ('오상진과', 0.21506769364185196),
 ('진과', 0.21506769364185196),
 ('부부', 0.1836646193870943),
 ('이동건', 0.16130077023138897),
 ('일간'

 ('3', 0.07394958320545213),
 ('18일', 0.07394958320545213),
 ('사람', 0.07394958320545213),
 ('4', 0.06632333477867293),
 ('장', 0.06632333477867293),
 ('키', 0.06632333477867293),
 ('1', 0.04845500650402821),
 ('주', 0.04845500650402821),
 ('18', 0.03230333766935214),
 ('본문', 0.0),
 ('일', 0.0)]
[('1', 0.07268250975604232),
 ('3', 0.0554621874040891),
 ('18', 0.024227503252014105),
 ('본문', 0.0),
 ('일', 0.0)]
[('신혼', 0.6452030809255559),
 ('일기', 0.53766923410463),
 ('윤', 0.4839023106941669),
 ('2', 0.3979400086720376),
 ('김소영', 0.3763684638732409),
 ('이동', 0.3763684638732409),
 ('조', 0.3763684638732409),
 ('조윤', 0.3763684638732409),
 ('신혼일기2', 0.3763684638732409),
 ('오상', 0.32260154046277795),
 ('폰트', 0.32260154046277795),
 ('하차', 0.268834617052315),
 ('사이즈', 0.268834617052315),
 ('출연', 0.268834617052315),
 ('이동건과', 0.21506769364185196),
 ('건과', 0.21506769364185196),
 ('오상진과', 0.21506769364185196),
 ('진과', 0.21506769364185196),
 ('부부', 0.1836646193870943),
 ('이동건', 0.16130077023138897),
 ('일

 ('한', 0.017065288432027415),
 ('주', 0.014909232770470219),
 ('18', 0.007454616385235109),
 ('중', 0.007454616385235109),
 ('본문', 0.0),
 ('일', 0.0)]
[('2', 0.11369714533486788),
 ('18일', 0.04225690468882979),
 ('등', 0.03789904844495596),
 ('1', 0.0369181001935453),
 ('18', 0.027688575145158975),
 ('건', 0.01894952422247798),
 ('5', 0.01894952422247798),
 ('관계자', 0.01894952422247798),
 ('예정', 0.01894952422247798),
 ('발표', 0.01894952422247798),
 ('전', 0.010564226172207447),
 ('중', 0.009229525048386325),
 ('주', 0.0046147625241931625),
 ('본문', 0.0),
 ('일', 0.0)]
[('부부', 0.22739429066973577),
 ('년', 0.1584633925831117),
 ('월', 0.11369714533486788),
 ('12', 0.05684857266743394),
 ('12일', 0.05684857266743394),
 ('두', 0.05684857266743394),
 ('영', 0.05684857266743394),
 ('한', 0.03169267851662234),
 ('사람', 0.03169267851662234),
 ('주', 0.013844287572579488),
 ('본문', 0.0),
 ('일', 0.0)]
[('년', 0.2218487496163564),
 ('공개', 0.1989700043360188),
 ('한', 0.1109243748081782),
 ('전', 0.1109243748081782),
 (

 ('일', 0.0)]
[('1', 0.07268250975604232),
 ('중', 0.07268250975604232),
 ('3', 0.0554621874040891),
 ('18', 0.024227503252014105),
 ('본문', 0.0),
 ('일', 0.0)]
[('신혼', 0.6452030809255559),
 ('일기', 0.53766923410463),
 ('윤', 0.4839023106941669),
 ('2', 0.3979400086720376),
 ('김소영', 0.3763684638732409),
 ('이동', 0.3763684638732409),
 ('조', 0.3763684638732409),
 ('조윤', 0.3763684638732409),
 ('신혼일기2', 0.3763684638732409),
 ('오상', 0.32260154046277795),
 ('폰트', 0.32260154046277795),
 ('하차', 0.268834617052315),
 ('사이즈', 0.268834617052315),
 ('출연', 0.268834617052315),
 ('이동건과', 0.21506769364185196),
 ('건과', 0.21506769364185196),
 ('오상진과', 0.21506769364185196),
 ('진과', 0.21506769364185196),
 ('부부', 0.1836646193870943),
 ('이동건', 0.16130077023138897),
 ('일간', 0.16130077023138897),
 ('일간스포츠', 0.16130077023138897),
 ('스포츠', 0.16130077023138897),
 ('제작진', 0.16130077023138897),
 ('결혼', 0.16130077023138897),
 ('4', 0.12244307959139619),
 ('오상진', 0.10753384682092598),
 ('진', 0.10753384682092598),
 ('투입', 0.

 ('조윤', 0.3763684638732409),
 ('신혼일기2', 0.3763684638732409),
 ('오상', 0.32260154046277795),
 ('폰트', 0.32260154046277795),
 ('하차', 0.268834617052315),
 ('사이즈', 0.268834617052315),
 ('출연', 0.268834617052315),
 ('이동건과', 0.21506769364185196),
 ('건과', 0.21506769364185196),
 ('오상진과', 0.21506769364185196),
 ('진과', 0.21506769364185196),
 ('부부', 0.1836646193870943),
 ('이동건', 0.16130077023138897),
 ('일간', 0.16130077023138897),
 ('일간스포츠', 0.16130077023138897),
 ('스포츠', 0.16130077023138897),
 ('제작진', 0.16130077023138897),
 ('결혼', 0.16130077023138897),
 ('4', 0.12244307959139619),
 ('오상진', 0.10753384682092598),
 ('진', 0.10753384682092598),
 ('투입', 0.10753384682092598),
 ('고딕', 0.10753384682092598),
 ('4폰트', 0.10753384682092598),
 ('현', 0.10753384682092598),
 ('러브', 0.10753384682092598),
 ('방송', 0.10753384682092598),
 ('모델', 0.10753384682092598),
 ('장윤주', 0.10753384682092598),
 ('그', 0.10753384682092598),
 ('남편', 0.10753384682092598),
 ('정승', 0.10753384682092598),
 ('정승민', 0.10753384682092598),
 ('민'

 ('전', 0.1109243748081782),
 ('3', 0.07394958320545213),
 ('18일', 0.07394958320545213),
 ('사람', 0.07394958320545213),
 ('4', 0.06632333477867293),
 ('장', 0.06632333477867293),
 ('키', 0.06632333477867293),
 ('당시', 0.06632333477867293),
 ('1', 0.04845500650402821),
 ('주', 0.04845500650402821),
 ('관련', 0.03697479160272606),
 ('18', 0.03230333766935214),
 ('중', 0.01615166883467607),
 ('본문', 0.0),
 ('일', 0.0)]
[('강원', 0.1989700043360188),
 ('강원도', 0.1989700043360188),
 ('도', 0.1989700043360188),
 ('1', 0.07268250975604232),
 ('중', 0.07268250975604232),
 ('3', 0.0554621874040891),
 ('18', 0.024227503252014105),
 ('본문', 0.0),
 ('일', 0.0)]
[('신혼', 0.6452030809255559),
 ('일기', 0.53766923410463),
 ('윤', 0.4839023106941669),
 ('2', 0.3979400086720376),
 ('김소영', 0.3763684638732409),
 ('이동', 0.3763684638732409),
 ('조', 0.3763684638732409),
 ('조윤', 0.3763684638732409),
 ('신혼일기2', 0.3763684638732409),
 ('오상', 0.32260154046277795),
 ('폰트', 0.32260154046277795),
 ('하차', 0.268834617052315),
 ('사이즈', 0.2

 ('중', 0.009229525048386325),
 ('주', 0.0046147625241931625),
 ('본문', 0.0),
 ('일', 0.0)]
[('부부', 0.22739429066973577),
 ('년', 0.1584633925831117),
 ('월', 0.11369714533486788),
 ('12', 0.05684857266743394),
 ('12일', 0.05684857266743394),
 ('두', 0.05684857266743394),
 ('영', 0.05684857266743394),
 ('한', 0.03169267851662234),
 ('사람', 0.03169267851662234),
 ('주', 0.013844287572579488),
 ('본문', 0.0),
 ('일', 0.0)]
[('년', 0.2218487496163564),
 ('공개', 0.1989700043360188),
 ('한', 0.1109243748081782),
 ('전', 0.1109243748081782),
 ('3', 0.07394958320545213),
 ('18일', 0.07394958320545213),
 ('사람', 0.07394958320545213),
 ('4', 0.06632333477867293),
 ('장', 0.06632333477867293),
 ('키', 0.06632333477867293),
 ('당시', 0.06632333477867293),
 ('1', 0.04845500650402821),
 ('주', 0.04845500650402821),
 ('관련', 0.03697479160272606),
 ('18', 0.03230333766935214),
 ('중', 0.01615166883467607),
 ('본문', 0.0),
 ('일', 0.0)]
[('강원', 0.1989700043360188),
 ('강원도', 0.1989700043360188),
 ('도', 0.1989700043360188),
 ('1', 0.

 ('년', 0.03413057686405483),
 ('사람', 0.03413057686405483),
 ('관계자', 0.030610769897849048),
 ('12', 0.030610769897849048),
 ('12일', 0.030610769897849048),
 ('키', 0.030610769897849048),
 ('예정', 0.030610769897849048),
 ('등', 0.030610769897849048),
 ('월', 0.030610769897849048),
 ('발표', 0.030610769897849048),
 ('영', 0.030610769897849048),
 ('당시', 0.030610769897849048),
 ('강원', 0.030610769897849048),
 ('강원도', 0.030610769897849048),
 ('도', 0.030610769897849048),
 ('과정', 0.030610769897849048),
 ('1', 0.029818465540940437),
 ('18일', 0.017065288432027415),
 ('한', 0.017065288432027415),
 ('관련', 0.017065288432027415),
 ('중인', 0.017065288432027415),
 ('주', 0.014909232770470219),
 ('18', 0.007454616385235109),
 ('중', 0.007454616385235109),
 ('본문', 0.0),
 ('일', 0.0)]
[('2', 0.11369714533486788),
 ('18일', 0.04225690468882979),
 ('등', 0.03789904844495596),
 ('과정', 0.03789904844495596),
 ('1', 0.0369181001935453),
 ('18', 0.027688575145158975),
 ('관련', 0.021128452344414895),
 ('중인', 0.021128452344414895

 ('확정', 0.05376692341046299),
 ('녹화', 0.05376692341046299),
 ('러브콜', 0.05376692341046299),
 ('콜', 0.05376692341046299),
 ('응답', 0.05376692341046299),
 ('합류', 0.05376692341046299),
 ('합류키', 0.05376692341046299),
 ('일신상', 0.05376692341046299),
 ('이유', 0.05376692341046299),
 ('의견', 0.05376692341046299),
 ('전달', 0.05376692341046299),
 ('번', 0.05376692341046299),
 ('적', 0.05376692341046299),
 ('일상', 0.05376692341046299),
 ('신부', 0.05376692341046299),
 ('애정', 0.05376692341046299),
 ('각종', 0.05376692341046299),
 ('예능', 0.05376692341046299),
 ('라디오', 0.05376692341046299),
 ('라디오스타', 0.05376692341046299),
 ('스타', 0.05376692341046299),
 ('러브스토리', 0.05376692341046299),
 ('스토리', 0.05376692341046299),
 ('손', 0.05376692341046299),
 ('손하트', 0.05376692341046299),
 ('하트', 0.05376692341046299),
 ('애처가', 0.05376692341046299),
 ('면모', 0.05376692341046299),
 ('요리', 0.05376692341046299),
 ('실력', 0.05376692341046299),
 ('수준급', 0.05376692341046299),
 ('4개월차', 0.05376692341046299),
 ('개월', 0.05376692341046299)

 ('한', 0.1109243748081782),
 ('전', 0.1109243748081782),
 ('3', 0.07394958320545213),
 ('18일', 0.07394958320545213),
 ('사람', 0.07394958320545213),
 ('4', 0.06632333477867293),
 ('장', 0.06632333477867293),
 ('키', 0.06632333477867293),
 ('당시', 0.06632333477867293),
 ('현재', 0.06632333477867293),
 ('분', 0.06632333477867293),
 ('1', 0.04845500650402821),
 ('주', 0.04845500650402821),
 ('관련', 0.03697479160272606),
 ('중인', 0.03697479160272606),
 ('18', 0.03230333766935214),
 ('중', 0.01615166883467607),
 ('본문', 0.0),
 ('일', 0.0)]
[('강원', 0.1989700043360188),
 ('강원도', 0.1989700043360188),
 ('도', 0.1989700043360188),
 ('1', 0.07268250975604232),
 ('중', 0.07268250975604232),
 ('3', 0.0554621874040891),
 ('이', 0.0554621874040891),
 ('18', 0.024227503252014105),
 ('본문', 0.0),
 ('일', 0.0)]
[('신혼', 0.6452030809255559),
 ('일기', 0.53766923410463),
 ('윤', 0.4839023106941669),
 ('2', 0.3979400086720376),
 ('김소영', 0.3763684638732409),
 ('이동', 0.3763684638732409),
 ('조', 0.3763684638732409),
 ('조윤', 0.376368

[('년', 0.2218487496163564),
 ('공개', 0.1989700043360188),
 ('한', 0.1109243748081782),
 ('전', 0.1109243748081782),
 ('3', 0.07394958320545213),
 ('18일', 0.07394958320545213),
 ('사람', 0.07394958320545213),
 ('4', 0.06632333477867293),
 ('장', 0.06632333477867293),
 ('키', 0.06632333477867293),
 ('당시', 0.06632333477867293),
 ('현재', 0.06632333477867293),
 ('분', 0.06632333477867293),
 ('1', 0.04845500650402821),
 ('주', 0.04845500650402821),
 ('관련', 0.03697479160272606),
 ('중인', 0.03697479160272606),
 ('북', 0.03697479160272606),
 ('18', 0.03230333766935214),
 ('중', 0.01615166883467607),
 ('본문', 0.0),
 ('일', 0.0),
 ('기자', 0.0)]
[('강원', 0.1989700043360188),
 ('강원도', 0.1989700043360188),
 ('도', 0.1989700043360188),
 ('1', 0.07268250975604232),
 ('중', 0.07268250975604232),
 ('3', 0.0554621874040891),
 ('이', 0.0554621874040891),
 ('18', 0.024227503252014105),
 ('본문', 0.0),
 ('일', 0.0),
 ('기자', 0.0)]
[('신혼', 0.6452030809255559),
 ('일기', 0.53766923410463),
 ('윤', 0.4839023106941669),
 ('2', 0.39794000

 ('진심', 0.05376692341046299),
 ('송구', 0.05376692341046299),
 ('점', 0.05376692341046299),
 ('양해', 0.05376692341046299),
 ('한편', 0.05376692341046299),
 ('퇴사', 0.05376692341046299),
 ('프리', 0.05376692341046299),
 ('선언', 0.05376692341046299),
 ('웨이', 0.05376692341046299),
 ('3', 0.05119586529608225),
 ('전', 0.03413057686405483),
 ('년', 0.03413057686405483),
 ('사람', 0.03413057686405483),
 ('관계자', 0.030610769897849048),
 ('12', 0.030610769897849048),
 ('12일', 0.030610769897849048),
 ('키', 0.030610769897849048),
 ('예정', 0.030610769897849048),
 ('등', 0.030610769897849048),
 ('월', 0.030610769897849048),
 ('발표', 0.030610769897849048),
 ('영', 0.030610769897849048),
 ('당시', 0.030610769897849048),
 ('강원', 0.030610769897849048),
 ('강원도', 0.030610769897849048),
 ('도', 0.030610769897849048),
 ('과정', 0.030610769897849048),
 ('현재', 0.030610769897849048),
 ('분', 0.030610769897849048),
 ('페이스', 0.030610769897849048),
 ('페이스북', 0.030610769897849048),
 ('트위터', 0.030610769897849048),
 ('1', 0.029818465540940

 ('1', 0.029818465540940437),
 ('18일', 0.017065288432027415),
 ('한', 0.017065288432027415),
 ('관련', 0.017065288432027415),
 ('중인', 0.017065288432027415),
 ('이', 0.017065288432027415),
 ('북', 0.017065288432027415),
 ('전재', 0.017065288432027415),
 ('주', 0.014909232770470219),
 ('18', 0.007454616385235109),
 ('중', 0.007454616385235109),
 ('무단', 0.007454616385235109),
 ('금지', 0.007454616385235109),
 ('본문', 0.0),
 ('일', 0.0),
 ('기자', 0.0),
 ('배포', 0.0)]
[('2', 0.11369714533486788),
 ('18일', 0.04225690468882979),
 ('등', 0.03789904844495596),
 ('과정', 0.03789904844495596),
 ('1', 0.0369181001935453),
 ('18', 0.027688575145158975),
 ('관련', 0.021128452344414895),
 ('중인', 0.021128452344414895),
 ('건', 0.01894952422247798),
 ('5', 0.01894952422247798),
 ('관계자', 0.01894952422247798),
 ('예정', 0.01894952422247798),
 ('발표', 0.01894952422247798),
 ('전', 0.010564226172207447),
 ('이', 0.010564226172207447),
 ('전재', 0.010564226172207447),
 ('중', 0.009229525048386325),
 ('주', 0.0046147625241931625),
 ('무단'

 ('배포', 0.0)]
[('2', 0.11369714533486788),
 ('18일', 0.04225690468882979),
 ('등', 0.03789904844495596),
 ('과정', 0.03789904844495596),
 ('1', 0.0369181001935453),
 ('18', 0.027688575145158975),
 ('관련', 0.021128452344414895),
 ('중인', 0.021128452344414895),
 ('건', 0.01894952422247798),
 ('5', 0.01894952422247798),
 ('관계자', 0.01894952422247798),
 ('예정', 0.01894952422247798),
 ('발표', 0.01894952422247798),
 ('전', 0.010564226172207447),
 ('이', 0.010564226172207447),
 ('전재', 0.010564226172207447),
 ('중', 0.009229525048386325),
 ('내용', 0.009229525048386325),
 ('플레이어', 0.009229525048386325),
 ('주', 0.0046147625241931625),
 ('무단', 0.0046147625241931625),
 ('금지', 0.0046147625241931625),
 ('오류', 0.0046147625241931625),
 ('우회', 0.0046147625241931625),
 ('본문', 0.0),
 ('일', 0.0),
 ('기자', 0.0),
 ('배포', 0.0)]
[('딸', 0.6989700043360189),
 ('137', 0.3994114310491536),
 ('부부', 0.22739429066973577),
 ('만', 0.1705457180023018),
 ('년', 0.1584633925831117),
 ('월', 0.11369714533486788),
 ('137년만', 0.099852857762

 ('위', 0.01615166883467607),
 ('함수', 0.01615166883467607),
 ('추가', 0.01615166883467607),
 ('본문', 0.0),
 ('일', 0.0),
 ('기자', 0.0),
 ('배포', 0.0)]
[('강원', 0.1989700043360188),
 ('강원도', 0.1989700043360188),
 ('도', 0.1989700043360188),
 ('1', 0.07268250975604232),
 ('중', 0.07268250975604232),
 ('3', 0.0554621874040891),
 ('이', 0.0554621874040891),
 ('내용', 0.04845500650402821),
 ('플레이어', 0.04845500650402821),
 ('18', 0.024227503252014105),
 ('무단', 0.024227503252014105),
 ('금지', 0.024227503252014105),
 ('오류', 0.024227503252014105),
 ('우회', 0.024227503252014105),
 ('위', 0.024227503252014105),
 ('함수', 0.024227503252014105),
 ('추가', 0.024227503252014105),
 ('본문', 0.0),
 ('일', 0.0),
 ('기자', 0.0),
 ('배포', 0.0)]
[('신혼', 0.6452030809255559),
 ('일기', 0.53766923410463),
 ('윤', 0.4839023106941669),
 ('2', 0.3979400086720376),
 ('김소영', 0.3763684638732409),
 ('이동', 0.3763684638732409),
 ('조', 0.3763684638732409),
 ('조윤', 0.3763684638732409),
 ('신혼일기2', 0.3763684638732409),
 ('오상', 0.32260154046277795),
 

 ('구혜', 0.05376692341046299),
 ('혜', 0.05376692341046299),
 ('배경', 0.05376692341046299),
 ('동화', 0.05376692341046299),
 ('화제', 0.05376692341046299),
 ('바', 0.05376692341046299),
 ('측은', 0.05376692341046299),
 ('최근', 0.05376692341046299),
 ('준비', 0.05376692341046299),
 ('조윤희씨', 0.05376692341046299),
 ('임신', 0.05376692341046299),
 ('만큼', 0.05376692341046299),
 ('중요', 0.05376692341046299),
 ('시점', 0.05376692341046299),
 ('고심', 0.05376692341046299),
 ('본의', 0.05376692341046299),
 ('피해', 0.05376692341046299),
 ('회복', 0.05376692341046299),
 ('전념', 0.05376692341046299),
 ('한번', 0.05376692341046299),
 ('진심', 0.05376692341046299),
 ('송구', 0.05376692341046299),
 ('점', 0.05376692341046299),
 ('양해', 0.05376692341046299),
 ('한편', 0.05376692341046299),
 ('퇴사', 0.05376692341046299),
 ('프리', 0.05376692341046299),
 ('선언', 0.05376692341046299),
 ('웨이', 0.05376692341046299),
 ('3', 0.05119586529608225),
 ('전', 0.03413057686405483),
 ('년', 0.03413057686405483),
 ('사람', 0.03413057686405483),
 ('관계자', 0.030

 ('제작진', 0.16130077023138897),
 ('결혼', 0.16130077023138897),
 ('결정', 0.16130077023138897),
 ('4', 0.12244307959139619),
 ('오상진', 0.10753384682092598),
 ('진', 0.10753384682092598),
 ('투입', 0.10753384682092598),
 ('고딕', 0.10753384682092598),
 ('4폰트', 0.10753384682092598),
 ('현', 0.10753384682092598),
 ('러브', 0.10753384682092598),
 ('방송', 0.10753384682092598),
 ('모델', 0.10753384682092598),
 ('장윤주', 0.10753384682092598),
 ('그', 0.10753384682092598),
 ('남편', 0.10753384682092598),
 ('정승', 0.10753384682092598),
 ('정승민', 0.10753384682092598),
 ('민', 0.10753384682092598),
 ('커플', 0.10753384682092598),
 ('생활', 0.10753384682092598),
 ('끝', 0.10753384682092598),
 ('에서', 0.10753384682092598),
 ('신혼일기', 0.10753384682092598),
 ('시즌', 0.10753384682092598),
 ('프로그램', 0.10753384682092598),
 ('건강', 0.10753384682092598),
 ('건', 0.09183230969354715),
 ('5', 0.09183230969354715),
 ('공개', 0.09183230969354715),
 ('장', 0.061221539795698096),
 ('두', 0.061221539795698096),
 ('마음', 0.061221539795698096),
 ('단독', 

 ('전재', 0.010564226172207447),
 ('중', 0.009229525048386325),
 ('내용', 0.009229525048386325),
 ('플레이어', 0.009229525048386325),
 ('주', 0.0046147625241931625),
 ('무단', 0.0046147625241931625),
 ('금지', 0.0046147625241931625),
 ('오류', 0.0046147625241931625),
 ('우회', 0.0046147625241931625),
 ('위', 0.0046147625241931625),
 ('함수', 0.0046147625241931625),
 ('본문', 0.0),
 ('일', 0.0),
 ('기자', 0.0),
 ('배포', 0.0)]
[('딸', 0.6989700043360189),
 ('사진', 0.5991171465737304),
 ('틀', 0.499264288811442),
 ('137', 0.3994114310491536),
 ('137년', 0.2995585732868652),
 ('윌', 0.2995585732868652),
 ('부부', 0.22739429066973577),
 ('미국', 0.1997057155245768),
 ('가문', 0.1997057155245768),
 ('켈렌', 0.1997057155245768),
 ('만', 0.1705457180023018),
 ('년', 0.1584633925831117),
 ('월', 0.11369714533486788),
 ('137년만', 0.0998528577622884),
 ('아들', 0.0998528577622884),
 ('부잣집', 0.0998528577622884),
 ('7', 0.0998528577622884),
 ('7월', 0.0998528577622884),
 ('매체', 0.0998528577622884),
 ('피플', 0.0998528577622884),
 ('사우스캐롤라이', 0.09

 ('마음', 0.05684857266743394),
 ('페이스', 0.05684857266743394),
 ('페이스북', 0.05684857266743394),
 ('트위터', 0.05684857266743394),
 ('등에', 0.05684857266743394),
 ('36', 0.05684857266743394),
 ('6', 0.05684857266743394),
 ('한', 0.03169267851662234),
 ('사람', 0.03169267851662234),
 ('북', 0.03169267851662234),
 ('전재', 0.03169267851662234),
 ('내용', 0.027688575145158975),
 ('플레이어', 0.027688575145158975),
 ('주', 0.013844287572579488),
 ('무단', 0.013844287572579488),
 ('금지', 0.013844287572579488),
 ('오류', 0.013844287572579488),
 ('우회', 0.013844287572579488),
 ('위', 0.013844287572579488),
 ('함수', 0.013844287572579488),
 ('추가', 0.013844287572579488),
 ('본문', 0.0),
 ('일', 0.0),
 ('기자', 0.0),
 ('배포', 0.0)]
[('년', 0.2218487496163564),
 ('공개', 0.1989700043360188),
 ('한', 0.1109243748081782),
 ('전', 0.1109243748081782),
 ('3', 0.07394958320545213),
 ('18일', 0.07394958320545213),
 ('사람', 0.07394958320545213),
 ('4', 0.06632333477867293),
 ('장', 0.06632333477867293),
 ('키', 0.06632333477867293),
 ('당시', 0.0663

 ('배포', 0.0)]
[('년', 0.2218487496163564),
 ('공개', 0.1989700043360188),
 ('한', 0.1109243748081782),
 ('전', 0.1109243748081782),
 ('3', 0.07394958320545213),
 ('18일', 0.07394958320545213),
 ('사람', 0.07394958320545213),
 ('4', 0.06632333477867293),
 ('장', 0.06632333477867293),
 ('키', 0.06632333477867293),
 ('당시', 0.06632333477867293),
 ('현재', 0.06632333477867293),
 ('분', 0.06632333477867293),
 ('만', 0.06632333477867293),
 ('36', 0.06632333477867293),
 ('1', 0.04845500650402821),
 ('주', 0.04845500650402821),
 ('관련', 0.03697479160272606),
 ('중인', 0.03697479160272606),
 ('북', 0.03697479160272606),
 ('18', 0.03230333766935214),
 ('내용', 0.03230333766935214),
 ('플레이어', 0.03230333766935214),
 ('중', 0.01615166883467607),
 ('오류', 0.01615166883467607),
 ('우회', 0.01615166883467607),
 ('위', 0.01615166883467607),
 ('함수', 0.01615166883467607),
 ('추가', 0.01615166883467607),
 ('본문', 0.0),
 ('일', 0.0),
 ('기자', 0.0),
 ('배포', 0.0)]
[('강원', 0.1989700043360188),
 ('강원도', 0.1989700043360188),
 ('도', 0.19897000

 ('배포', 0.0)]
[('강원', 0.1989700043360188),
 ('강원도', 0.1989700043360188),
 ('도', 0.1989700043360188),
 ('6', 0.1989700043360188),
 ('1', 0.07268250975604232),
 ('중', 0.07268250975604232),
 ('3', 0.0554621874040891),
 ('이', 0.0554621874040891),
 ('내용', 0.04845500650402821),
 ('플레이어', 0.04845500650402821),
 ('18', 0.024227503252014105),
 ('무단', 0.024227503252014105),
 ('금지', 0.024227503252014105),
 ('오류', 0.024227503252014105),
 ('우회', 0.024227503252014105),
 ('위', 0.024227503252014105),
 ('함수', 0.024227503252014105),
 ('추가', 0.024227503252014105),
 ('본문', 0.0),
 ('일', 0.0),
 ('기자', 0.0),
 ('배포', 0.0)]
[('신혼', 0.6452030809255559),
 ('일기', 0.53766923410463),
 ('윤', 0.4839023106941669),
 ('2', 0.3979400086720376),
 ('김소영', 0.3763684638732409),
 ('이동', 0.3763684638732409),
 ('조', 0.3763684638732409),
 ('조윤', 0.3763684638732409),
 ('신혼일기2', 0.3763684638732409),
 ('오상', 0.32260154046277795),
 ('폰트', 0.32260154046277795),
 ('하차', 0.268834617052315),
 ('사이즈', 0.268834617052315),
 ('출연', 0.268834

 ('마음', 0.061221539795698096),
 ('단독', 0.05376692341046299),
 ('단독신혼일기2', 0.05376692341046299),
 ('설정', 0.05376692341046299),
 ('안내', 0.05376692341046299),
 ('안내나눔고딕', 0.05376692341046299),
 ('나눔', 0.05376692341046299),
 ('2돋움', 0.05376692341046299),
 ('돋움', 0.05376692341046299),
 ('3바탕', 0.05376692341046299),
 ('바탕', 0.05376692341046299),
 ('1폰트', 0.05376692341046299),
 ('2폰트', 0.05376692341046299),
 ('3폰트', 0.05376692341046299),
 ('복수', 0.05376692341046299),
 ('확정', 0.05376692341046299),
 ('녹화', 0.05376692341046299),
 ('러브콜', 0.05376692341046299),
 ('콜', 0.05376692341046299),
 ('응답', 0.05376692341046299),
 ('합류', 0.05376692341046299),
 ('합류키', 0.05376692341046299),
 ('일신상', 0.05376692341046299),
 ('이유', 0.05376692341046299),
 ('의견', 0.05376692341046299),
 ('전달', 0.05376692341046299),
 ('번', 0.05376692341046299),
 ('적', 0.05376692341046299),
 ('일상', 0.05376692341046299),
 ('신부', 0.05376692341046299),
 ('애정', 0.05376692341046299),
 ('각종', 0.05376692341046299),
 ('예능', 0.053766923410462

 ('하트', 0.05376692341046299),
 ('애처가', 0.05376692341046299),
 ('면모', 0.05376692341046299),
 ('요리', 0.05376692341046299),
 ('실력', 0.05376692341046299),
 ('수준급', 0.05376692341046299),
 ('4개월차', 0.05376692341046299),
 ('개월', 0.05376692341046299),
 ('차', 0.05376692341046299),
 ('신혼부부', 0.05376692341046299),
 ('아나운서', 0.05376692341046299),
 ('선후배', 0.05376692341046299),
 ('사이', 0.05376692341046299),
 ('연예계', 0.05376692341046299),
 ('모범', 0.05376692341046299),
 ('모범부부', 0.05376692341046299),
 ('신혼생활', 0.05376692341046299),
 ('기대', 0.05376692341046299),
 ('5월', 0.05376692341046299),
 ('30', 0.05376692341046299),
 ('30일', 0.05376692341046299),
 ('2년간', 0.05376692341046299),
 ('간', 0.05376692341046299),
 ('열애', 0.05376692341046299),
 ('백', 0.05376692341046299),
 ('백년가약', 0.05376692341046299),
 ('가약', 0.05376692341046299),
 ('맺었', 0.05376692341046299),
 ('사내', 0.05376692341046299),
 ('비밀', 0.05376692341046299),
 ('연애', 0.05376692341046299),
 ('사단', 0.05376692341046299),
 ('시리즈', 0.05376692341046

 ('6월', 0.0998528577622884),
 ('25', 0.0998528577622884),
 ('25일', 0.0998528577622884),
 ('루이즈', 0.0998528577622884),
 ('을', 0.0998528577622884),
 ('품', 0.0998528577622884),
 ('일곱', 0.0998528577622884),
 ('일곱살', 0.0998528577622884),
 ('살', 0.0998528577622884),
 ('난', 0.0998528577622884),
 ('큰아들', 0.0998528577622884),
 ('롤', 0.0998528577622884),
 ('롤랜드', 0.0998528577622884),
 ('랜드', 0.0998528577622884),
 ('를', 0.0998528577622884),
 ('포함', 0.0998528577622884),
 ('일가친척', 0.0998528577622884),
 ('흥분', 0.0998528577622884),
 ('몇', 0.0998528577622884),
 ('광고업계', 0.0998528577622884),
 ('업계', 0.0998528577622884),
 ('동료', 0.0998528577622884),
 ('도로', 0.0998528577622884),
 ('옆', 0.0998528577622884),
 ('광고판', 0.0998528577622884),
 ('12', 0.05684857266743394),
 ('12일', 0.05684857266743394),
 ('두', 0.05684857266743394),
 ('영', 0.05684857266743394),
 ('마음', 0.05684857266743394),
 ('페이스', 0.05684857266743394),
 ('페이스북', 0.05684857266743394),
 ('트위터', 0.05684857266743394),
 ('등에', 0.05684857266743394),


 ('중인', 0.021128452344414895),
 ('건', 0.01894952422247798),
 ('5', 0.01894952422247798),
 ('관계자', 0.01894952422247798),
 ('예정', 0.01894952422247798),
 ('발표', 0.01894952422247798),
 ('등에', 0.01894952422247798),
 ('추가', 0.01845905009677265),
 ('전', 0.010564226172207447),
 ('이', 0.010564226172207447),
 ('전재', 0.010564226172207447),
 ('중', 0.009229525048386325),
 ('내용', 0.009229525048386325),
 ('플레이어', 0.009229525048386325),
 ('주', 0.0046147625241931625),
 ('무단', 0.0046147625241931625),
 ('금지', 0.0046147625241931625),
 ('오류', 0.0046147625241931625),
 ('우회', 0.0046147625241931625),
 ('위', 0.0046147625241931625),
 ('함수', 0.0046147625241931625),
 ('본문', 0.0),
 ('일', 0.0),
 ('기자', 0.0),
 ('배포', 0.0)]
[('딸', 0.6989700043360189),
 ('사진', 0.5991171465737304),
 ('틀', 0.499264288811442),
 ('카', 0.499264288811442),
 ('137', 0.3994114310491536),
 ('137년', 0.2995585732868652),
 ('윌', 0.2995585732868652),
 ('카터', 0.2995585732868652),
 ('터', 0.2995585732868652),
 ('부부', 0.22739429066973577),
 ('미국', 0.1

 ('관계자', 0.030610769897849048),
 ('12', 0.030610769897849048),
 ('12일', 0.030610769897849048),
 ('키', 0.030610769897849048),
 ('예정', 0.030610769897849048),
 ('등', 0.030610769897849048),
 ('월', 0.030610769897849048),
 ('발표', 0.030610769897849048),
 ('영', 0.030610769897849048),
 ('당시', 0.030610769897849048),
 ('강원', 0.030610769897849048),
 ('강원도', 0.030610769897849048),
 ('도', 0.030610769897849048),
 ('과정', 0.030610769897849048),
 ('현재', 0.030610769897849048),
 ('분', 0.030610769897849048),
 ('페이스', 0.030610769897849048),
 ('페이스북', 0.030610769897849048),
 ('트위터', 0.030610769897849048),
 ('1', 0.029818465540940437),
 ('18일', 0.017065288432027415),
 ('한', 0.017065288432027415),
 ('관련', 0.017065288432027415),
 ('중인', 0.017065288432027415),
 ('이', 0.017065288432027415),
 ('북', 0.017065288432027415),
 ('전재', 0.017065288432027415),
 ('주', 0.014909232770470219),
 ('18', 0.007454616385235109),
 ('중', 0.007454616385235109),
 ('무단', 0.007454616385235109),
 ('금지', 0.007454616385235109),
 ('본문', 0.0)

 ('부부', 0.22739429066973577),
 ('미국', 0.1997057155245768),
 ('가문', 0.1997057155245768),
 ('켈렌', 0.1997057155245768),
 ('소식', 0.1997057155245768),
 ('집안', 0.1997057155245768),
 ('광고', 0.1997057155245768),
 ('만', 0.1705457180023018),
 ('년', 0.1584633925831117),
 ('월', 0.11369714533486788),
 ('137년만', 0.0998528577622884),
 ('아들', 0.0998528577622884),
 ('부잣집', 0.0998528577622884),
 ('7', 0.0998528577622884),
 ('7월', 0.0998528577622884),
 ('매체', 0.0998528577622884),
 ('피플', 0.0998528577622884),
 ('사우스캐롤라이', 0.0998528577622884),
 ('38', 0.0998528577622884),
 ('6월', 0.0998528577622884),
 ('25', 0.0998528577622884),
 ('25일', 0.0998528577622884),
 ('루이즈', 0.0998528577622884),
 ('을', 0.0998528577622884),
 ('품', 0.0998528577622884),
 ('일곱', 0.0998528577622884),
 ('일곱살', 0.0998528577622884),
 ('살', 0.0998528577622884),
 ('난', 0.0998528577622884),
 ('큰아들', 0.0998528577622884),
 ('롤', 0.0998528577622884),
 ('롤랜드', 0.0998528577622884),
 ('랜드', 0.0998528577622884),
 ('를', 0.0998528577622884),
 ('포함', 

 ('관련', 0.021128452344414895),
 ('중인', 0.021128452344414895),
 ('건', 0.01894952422247798),
 ('5', 0.01894952422247798),
 ('관계자', 0.01894952422247798),
 ('예정', 0.01894952422247798),
 ('발표', 0.01894952422247798),
 ('등에', 0.01894952422247798),
 ('추가', 0.01845905009677265),
 ('전', 0.010564226172207447),
 ('이', 0.010564226172207447),
 ('전재', 0.010564226172207447),
 ('중', 0.009229525048386325),
 ('내용', 0.009229525048386325),
 ('플레이어', 0.009229525048386325),
 ('주', 0.0046147625241931625),
 ('무단', 0.0046147625241931625),
 ('금지', 0.0046147625241931625),
 ('오류', 0.0046147625241931625),
 ('우회', 0.0046147625241931625),
 ('위', 0.0046147625241931625),
 ('함수', 0.0046147625241931625),
 ('본문', 0.0),
 ('일', 0.0),
 ('기자', 0.0),
 ('배포', 0.0)]
[('딸', 0.6989700043360189),
 ('사진', 0.5991171465737304),
 ('틀', 0.499264288811442),
 ('카', 0.499264288811442),
 ('137', 0.3994114310491536),
 ('137년', 0.2995585732868652),
 ('윌', 0.2995585732868652),
 ('카터', 0.2995585732868652),
 ('터', 0.2995585732868652),
 ('이름', 0.

 ('백년가약', 0.05376692341046299),
 ('가약', 0.05376692341046299),
 ('맺었', 0.05376692341046299),
 ('사내', 0.05376692341046299),
 ('비밀', 0.05376692341046299),
 ('연애', 0.05376692341046299),
 ('사단', 0.05376692341046299),
 ('시리즈', 0.05376692341046299),
 ('하나', 0.05376692341046299),
 ('시즌2', 0.05376692341046299),
 ('씨도', 0.05376692341046299),
 ('시즌1', 0.05376692341046299),
 ('재현', 0.05376692341046299),
 ('구', 0.05376692341046299),
 ('구혜', 0.05376692341046299),
 ('혜', 0.05376692341046299),
 ('배경', 0.05376692341046299),
 ('동화', 0.05376692341046299),
 ('화제', 0.05376692341046299),
 ('바', 0.05376692341046299),
 ('측은', 0.05376692341046299),
 ('최근', 0.05376692341046299),
 ('준비', 0.05376692341046299),
 ('조윤희씨', 0.05376692341046299),
 ('임신', 0.05376692341046299),
 ('만큼', 0.05376692341046299),
 ('중요', 0.05376692341046299),
 ('시점', 0.05376692341046299),
 ('고심', 0.05376692341046299),
 ('본의', 0.05376692341046299),
 ('피해', 0.05376692341046299),
 ('회복', 0.05376692341046299),
 ('전념', 0.05376692341046299),
 ('한번'

 ('신혼일기', 0.10753384682092598),
 ('시즌', 0.10753384682092598),
 ('프로그램', 0.10753384682092598),
 ('건강', 0.10753384682092598),
 ('건', 0.09183230969354715),
 ('5', 0.09183230969354715),
 ('공개', 0.09183230969354715),
 ('장', 0.061221539795698096),
 ('두', 0.061221539795698096),
 ('마음', 0.061221539795698096),
 ('단독', 0.05376692341046299),
 ('단독신혼일기2', 0.05376692341046299),
 ('설정', 0.05376692341046299),
 ('안내', 0.05376692341046299),
 ('안내나눔고딕', 0.05376692341046299),
 ('나눔', 0.05376692341046299),
 ('2돋움', 0.05376692341046299),
 ('돋움', 0.05376692341046299),
 ('3바탕', 0.05376692341046299),
 ('바탕', 0.05376692341046299),
 ('1폰트', 0.05376692341046299),
 ('2폰트', 0.05376692341046299),
 ('3폰트', 0.05376692341046299),
 ('복수', 0.05376692341046299),
 ('확정', 0.05376692341046299),
 ('녹화', 0.05376692341046299),
 ('러브콜', 0.05376692341046299),
 ('콜', 0.05376692341046299),
 ('응답', 0.05376692341046299),
 ('합류', 0.05376692341046299),
 ('합류키', 0.05376692341046299),
 ('일신상', 0.05376692341046299),
 ('이유', 0.05376692341

 ('과정', 0.030610769897849048),
 ('현재', 0.030610769897849048),
 ('분', 0.030610769897849048),
 ('페이스', 0.030610769897849048),
 ('페이스북', 0.030610769897849048),
 ('트위터', 0.030610769897849048),
 ('1', 0.029818465540940437),
 ('18일', 0.017065288432027415),
 ('한', 0.017065288432027415),
 ('관련', 0.017065288432027415),
 ('중인', 0.017065288432027415),
 ('이', 0.017065288432027415),
 ('북', 0.017065288432027415),
 ('전재', 0.017065288432027415),
 ('주', 0.014909232770470219),
 ('18', 0.007454616385235109),
 ('중', 0.007454616385235109),
 ('무단', 0.007454616385235109),
 ('금지', 0.007454616385235109),
 ('본문', 0.0),
 ('일', 0.0),
 ('기자', 0.0),
 ('배포', 0.0)]
[('살충제', 0.1997057155245768),
 ('2', 0.11369714533486788),
 ('18일', 0.04225690468882979),
 ('등', 0.03789904844495596),
 ('과정', 0.03789904844495596),
 ('1', 0.0369181001935453),
 ('18', 0.027688575145158975),
 ('관련', 0.021128452344414895),
 ('중인', 0.021128452344414895),
 ('건', 0.01894952422247798),
 ('5', 0.01894952422247798),
 ('관계자', 0.01894952422247798),

 ('이', 0.017065288432027415),
 ('북', 0.017065288432027415),
 ('전재', 0.017065288432027415),
 ('주', 0.014909232770470219),
 ('18', 0.007454616385235109),
 ('중', 0.007454616385235109),
 ('무단', 0.007454616385235109),
 ('금지', 0.007454616385235109),
 ('본문', 0.0),
 ('일', 0.0),
 ('기자', 0.0),
 ('배포', 0.0)]
[('계란', 0.4659800028906792),
 ('살충제', 0.1997057155245768),
 ('결과', 0.13264666955734586),
 ('2', 0.11369714533486788),
 ('전국', 0.0998528577622884),
 ('최종', 0.0665685718415256),
 ('18일', 0.04225690468882979),
 ('등', 0.03789904844495596),
 ('과정', 0.03789904844495596),
 ('1', 0.0369181001935453),
 ('18', 0.027688575145158975),
 ('관련', 0.021128452344414895),
 ('중인', 0.021128452344414895),
 ('건', 0.01894952422247798),
 ('5', 0.01894952422247798),
 ('관계자', 0.01894952422247798),
 ('예정', 0.01894952422247798),
 ('발표', 0.01894952422247798),
 ('등에', 0.01894952422247798),
 ('추가', 0.01845905009677265),
 ('전', 0.010564226172207447),
 ('이', 0.010564226172207447),
 ('전재', 0.010564226172207447),
 ('중', 0.00922

 ('인턴기자', 0.0998528577622884),
 ('네이버', 0.0998528577622884),
 ('네이버포스트', 0.0998528577622884),
 ('포스트', 0.0998528577622884),
 ('12', 0.05684857266743394),
 ('12일', 0.05684857266743394),
 ('두', 0.05684857266743394),
 ('영', 0.05684857266743394),
 ('마음', 0.05684857266743394),
 ('페이스', 0.05684857266743394),
 ('페이스북', 0.05684857266743394),
 ('트위터', 0.05684857266743394),
 ('등에', 0.05684857266743394),
 ('36', 0.05684857266743394),
 ('6', 0.05684857266743394),
 ('남', 0.05684857266743394),
 ('한', 0.03169267851662234),
 ('사람', 0.03169267851662234),
 ('북', 0.03169267851662234),
 ('전재', 0.03169267851662234),
 ('내용', 0.027688575145158975),
 ('플레이어', 0.027688575145158975),
 ('주', 0.013844287572579488),
 ('무단', 0.013844287572579488),
 ('금지', 0.013844287572579488),
 ('오류', 0.013844287572579488),
 ('우회', 0.013844287572579488),
 ('위', 0.013844287572579488),
 ('함수', 0.013844287572579488),
 ('추가', 0.013844287572579488),
 ('본문', 0.0),
 ('일', 0.0),
 ('기자', 0.0),
 ('배포', 0.0)]
[('년', 0.2218487496163564),
 ('공

 ('분', 0.030610769897849048),
 ('페이스', 0.030610769897849048),
 ('페이스북', 0.030610769897849048),
 ('트위터', 0.030610769897849048),
 ('1', 0.029818465540940437),
 ('18일', 0.017065288432027415),
 ('한', 0.017065288432027415),
 ('관련', 0.017065288432027415),
 ('중인', 0.017065288432027415),
 ('이', 0.017065288432027415),
 ('북', 0.017065288432027415),
 ('전재', 0.017065288432027415),
 ('주', 0.014909232770470219),
 ('18', 0.007454616385235109),
 ('중', 0.007454616385235109),
 ('무단', 0.007454616385235109),
 ('금지', 0.007454616385235109),
 ('본문', 0.0),
 ('일', 0.0),
 ('기자', 0.0),
 ('배포', 0.0)]
[('농장', 0.6989700043360189),
 ('개', 0.5325485747322048),
 ('계란', 0.4659800028906792),
 ('살충제', 0.1997057155245768),
 ('49', 0.1331371436830512),
 ('49개', 0.1331371436830512),
 ('결과', 0.13264666955734586),
 ('2', 0.11369714533486788),
 ('전국', 0.0998528577622884),
 ('확인', 0.07579809688991192),
 ('최종', 0.0665685718415256),
 ('18일', 0.04225690468882979),
 ('등', 0.03789904844495596),
 ('과정', 0.03789904844495596),
 ('1', 0

 ('우회', 0.024227503252014105),
 ('위', 0.024227503252014105),
 ('함수', 0.024227503252014105),
 ('추가', 0.024227503252014105),
 ('본문', 0.0),
 ('일', 0.0),
 ('기자', 0.0),
 ('배포', 0.0)]
[('신혼', 0.6452030809255559),
 ('일기', 0.53766923410463),
 ('윤', 0.4839023106941669),
 ('2', 0.3979400086720376),
 ('김소영', 0.3763684638732409),
 ('이동', 0.3763684638732409),
 ('조', 0.3763684638732409),
 ('조윤', 0.3763684638732409),
 ('신혼일기2', 0.3763684638732409),
 ('오상', 0.32260154046277795),
 ('폰트', 0.32260154046277795),
 ('하차', 0.268834617052315),
 ('사이즈', 0.268834617052315),
 ('출연', 0.268834617052315),
 ('이동건과', 0.21506769364185196),
 ('건과', 0.21506769364185196),
 ('오상진과', 0.21506769364185196),
 ('진과', 0.21506769364185196),
 ('부부', 0.1836646193870943),
 ('이동건', 0.16130077023138897),
 ('일간', 0.16130077023138897),
 ('일간스포츠', 0.16130077023138897),
 ('스포츠', 0.16130077023138897),
 ('제작진', 0.16130077023138897),
 ('결혼', 0.16130077023138897),
 ('결정', 0.16130077023138897),
 ('4', 0.12244307959139619),
 ('오상진', 0.10753384

 ('플레이어', 0.03230333766935214),
 ('중', 0.01615166883467607),
 ('오류', 0.01615166883467607),
 ('우회', 0.01615166883467607),
 ('위', 0.01615166883467607),
 ('함수', 0.01615166883467607),
 ('추가', 0.01615166883467607),
 ('본문', 0.0),
 ('일', 0.0),
 ('기자', 0.0),
 ('배포', 0.0)]
[('강원', 0.1989700043360188),
 ('강원도', 0.1989700043360188),
 ('도', 0.1989700043360188),
 ('6', 0.1989700043360188),
 ('군', 0.1989700043360188),
 ('1', 0.07268250975604232),
 ('중', 0.07268250975604232),
 ('3', 0.0554621874040891),
 ('이', 0.0554621874040891),
 ('오후', 0.0554621874040891),
 ('내용', 0.04845500650402821),
 ('플레이어', 0.04845500650402821),
 ('18', 0.024227503252014105),
 ('무단', 0.024227503252014105),
 ('금지', 0.024227503252014105),
 ('오류', 0.024227503252014105),
 ('우회', 0.024227503252014105),
 ('위', 0.024227503252014105),
 ('함수', 0.024227503252014105),
 ('추가', 0.024227503252014105),
 ('본문', 0.0),
 ('일', 0.0),
 ('기자', 0.0),
 ('배포', 0.0)]
[('신혼', 0.6452030809255559),
 ('일기', 0.53766923410463),
 ('윤', 0.4839023106941669),
 

 ('기자', 0.0),
 ('배포', 0.0)]
[('강원', 0.1989700043360188),
 ('강원도', 0.1989700043360188),
 ('도', 0.1989700043360188),
 ('6', 0.1989700043360188),
 ('군', 0.1989700043360188),
 ('1', 0.07268250975604232),
 ('중', 0.07268250975604232),
 ('3', 0.0554621874040891),
 ('이', 0.0554621874040891),
 ('오후', 0.0554621874040891),
 ('내용', 0.04845500650402821),
 ('플레이어', 0.04845500650402821),
 ('18', 0.024227503252014105),
 ('무단', 0.024227503252014105),
 ('금지', 0.024227503252014105),
 ('오류', 0.024227503252014105),
 ('우회', 0.024227503252014105),
 ('위', 0.024227503252014105),
 ('함수', 0.024227503252014105),
 ('추가', 0.024227503252014105),
 ('본문', 0.0),
 ('일', 0.0),
 ('기자', 0.0),
 ('배포', 0.0)]
[('신혼', 0.6452030809255559),
 ('일기', 0.53766923410463),
 ('윤', 0.4839023106941669),
 ('2', 0.3979400086720376),
 ('김소영', 0.3763684638732409),
 ('이동', 0.3763684638732409),
 ('조', 0.3763684638732409),
 ('조윤', 0.3763684638732409),
 ('신혼일기2', 0.3763684638732409),
 ('오상', 0.32260154046277795),
 ('폰트', 0.32260154046277795),
 (

 ('내용', 0.04845500650402821),
 ('플레이어', 0.04845500650402821),
 ('18', 0.024227503252014105),
 ('무단', 0.024227503252014105),
 ('금지', 0.024227503252014105),
 ('오류', 0.024227503252014105),
 ('우회', 0.024227503252014105),
 ('위', 0.024227503252014105),
 ('함수', 0.024227503252014105),
 ('추가', 0.024227503252014105),
 ('본문', 0.0),
 ('일', 0.0),
 ('기자', 0.0),
 ('배포', 0.0)]
[('신혼', 0.6452030809255559),
 ('일기', 0.53766923410463),
 ('윤', 0.4839023106941669),
 ('2', 0.3979400086720376),
 ('김소영', 0.3763684638732409),
 ('이동', 0.3763684638732409),
 ('조', 0.3763684638732409),
 ('조윤', 0.3763684638732409),
 ('신혼일기2', 0.3763684638732409),
 ('오상', 0.32260154046277795),
 ('폰트', 0.32260154046277795),
 ('하차', 0.268834617052315),
 ('사이즈', 0.268834617052315),
 ('출연', 0.268834617052315),
 ('이동건과', 0.21506769364185196),
 ('건과', 0.21506769364185196),
 ('오상진과', 0.21506769364185196),
 ('진과', 0.21506769364185196),
 ('부부', 0.1836646193870943),
 ('이동건', 0.16130077023138897),
 ('일간', 0.16130077023138897),
 ('일간스포츠', 0.1613

 ('일간스포츠', 0.16130077023138897),
 ('스포츠', 0.16130077023138897),
 ('제작진', 0.16130077023138897),
 ('결혼', 0.16130077023138897),
 ('결정', 0.16130077023138897),
 ('4', 0.12244307959139619),
 ('오상진', 0.10753384682092598),
 ('진', 0.10753384682092598),
 ('투입', 0.10753384682092598),
 ('고딕', 0.10753384682092598),
 ('4폰트', 0.10753384682092598),
 ('현', 0.10753384682092598),
 ('러브', 0.10753384682092598),
 ('방송', 0.10753384682092598),
 ('모델', 0.10753384682092598),
 ('장윤주', 0.10753384682092598),
 ('그', 0.10753384682092598),
 ('남편', 0.10753384682092598),
 ('정승', 0.10753384682092598),
 ('정승민', 0.10753384682092598),
 ('민', 0.10753384682092598),
 ('커플', 0.10753384682092598),
 ('생활', 0.10753384682092598),
 ('끝', 0.10753384682092598),
 ('에서', 0.10753384682092598),
 ('신혼일기', 0.10753384682092598),
 ('시즌', 0.10753384682092598),
 ('프로그램', 0.10753384682092598),
 ('건강', 0.10753384682092598),
 ('건', 0.09183230969354715),
 ('5', 0.09183230969354715),
 ('공개', 0.09183230969354715),
 ('장', 0.061221539795698096),
 ('두'

 ('이름', 0.2995585732868652),
 ('부부', 0.22739429066973577),
 ('미국', 0.1997057155245768),
 ('가문', 0.1997057155245768),
 ('켈렌', 0.1997057155245768),
 ('소식', 0.1997057155245768),
 ('집안', 0.1997057155245768),
 ('광고', 0.1997057155245768),
 ('중앙일보', 0.1997057155245768),
 ('일보', 0.1997057155245768),
 ('만', 0.1705457180023018),
 ('년', 0.1584633925831117),
 ('월', 0.11369714533486788),
 ('중앙', 0.11369714533486788),
 ('137년만', 0.0998528577622884),
 ('아들', 0.0998528577622884),
 ('부잣집', 0.0998528577622884),
 ('7', 0.0998528577622884),
 ('7월', 0.0998528577622884),
 ('매체', 0.0998528577622884),
 ('피플', 0.0998528577622884),
 ('사우스캐롤라이', 0.0998528577622884),
 ('38', 0.0998528577622884),
 ('6월', 0.0998528577622884),
 ('25', 0.0998528577622884),
 ('25일', 0.0998528577622884),
 ('루이즈', 0.0998528577622884),
 ('을', 0.0998528577622884),
 ('품', 0.0998528577622884),
 ('일곱', 0.0998528577622884),
 ('일곱살', 0.0998528577622884),
 ('살', 0.0998528577622884),
 ('난', 0.0998528577622884),
 ('큰아들', 0.0998528577622884),
 ('롤

 ('기자', 0.0),
 ('배포', 0.0)]
[('년', 0.2218487496163564),
 ('공개', 0.1989700043360188),
 ('뉴스', 0.13264666955734586),
 ('한', 0.1109243748081782),
 ('전', 0.1109243748081782),
 ('3', 0.07394958320545213),
 ('18일', 0.07394958320545213),
 ('사람', 0.07394958320545213),
 ('4', 0.06632333477867293),
 ('장', 0.06632333477867293),
 ('키', 0.06632333477867293),
 ('당시', 0.06632333477867293),
 ('현재', 0.06632333477867293),
 ('분', 0.06632333477867293),
 ('만', 0.06632333477867293),
 ('36', 0.06632333477867293),
 ('남', 0.06632333477867293),
 ('중앙', 0.06632333477867293),
 ('결과', 0.06632333477867293),
 ('확인', 0.06632333477867293),
 ('1', 0.04845500650402821),
 ('주', 0.04845500650402821),
 ('관련', 0.03697479160272606),
 ('중인', 0.03697479160272606),
 ('북', 0.03697479160272606),
 ('오후', 0.03697479160272606),
 ('18', 0.03230333766935214),
 ('내용', 0.03230333766935214),
 ('플레이어', 0.03230333766935214),
 ('중', 0.01615166883467607),
 ('오류', 0.01615166883467607),
 ('우회', 0.01615166883467607),
 ('위', 0.01615166883467607)

 ('살충제', 0.1997057155245768),
 ('곳', 0.1997057155245768),
 ('폐기', 0.166421429603814),
 ('49', 0.1331371436830512),
 ('49개', 0.1331371436830512),
 ('산란계', 0.1331371436830512),
 ('결과', 0.13264666955734586),
 ('2', 0.11369714533486788),
 ('전국', 0.0998528577622884),
 ('기준치', 0.0998528577622884),
 ('확인', 0.07579809688991192),
 ('최종', 0.0665685718415256),
 ('성분', 0.0665685718415256),
 ('위생', 0.0665685718415256),
 ('부실', 0.0665685718415256),
 ('뉴스', 0.05684857266743394),
 ('18일', 0.04225690468882979),
 ('등', 0.03789904844495596),
 ('과정', 0.03789904844495596),
 ('1', 0.0369181001935453),
 ('확인상보', 0.0332842859207628),
 ('상보', 0.0332842859207628),
 ('경남', 0.0332842859207628),
 ('창녕', 0.0332842859207628),
 ('창녕군', 0.0332842859207628),
 ('유어', 0.0332842859207628),
 ('유어면', 0.0332842859207628),
 ('면', 0.0332842859207628),
 ('초과', 0.0332842859207628),
 ('군청', 0.0332842859207628),
 ('가축', 0.0332842859207628),
 ('위생관리', 0.0332842859207628),
 ('관리', 0.0332842859207628),
 ('처분', 0.0332842859207628),
 (

 ('중인', 0.03697479160272606),
 ('북', 0.03697479160272606),
 ('오후', 0.03697479160272606),
 ('18', 0.03230333766935214),
 ('내용', 0.03230333766935214),
 ('플레이어', 0.03230333766935214),
 ('중', 0.01615166883467607),
 ('오류', 0.01615166883467607),
 ('우회', 0.01615166883467607),
 ('위', 0.01615166883467607),
 ('함수', 0.01615166883467607),
 ('추가', 0.01615166883467607),
 ('본문', 0.0),
 ('일', 0.0),
 ('기자', 0.0),
 ('배포', 0.0)]
[('강원', 0.1989700043360188),
 ('강원도', 0.1989700043360188),
 ('도', 0.1989700043360188),
 ('6', 0.1989700043360188),
 ('군', 0.1989700043360188),
 ('철', 0.0994850021680094),
 ('1', 0.07268250975604232),
 ('중', 0.07268250975604232),
 ('3', 0.0554621874040891),
 ('이', 0.0554621874040891),
 ('오후', 0.0554621874040891),
 ('내용', 0.04845500650402821),
 ('플레이어', 0.04845500650402821),
 ('18', 0.024227503252014105),
 ('무단', 0.024227503252014105),
 ('금지', 0.024227503252014105),
 ('오류', 0.024227503252014105),
 ('우회', 0.024227503252014105),
 ('위', 0.024227503252014105),
 ('함수', 0.024227503252014

 ('하나', 0.05376692341046299),
 ('시즌2', 0.05376692341046299),
 ('씨도', 0.05376692341046299),
 ('시즌1', 0.05376692341046299),
 ('재현', 0.05376692341046299),
 ('구', 0.05376692341046299),
 ('구혜', 0.05376692341046299),
 ('혜', 0.05376692341046299),
 ('배경', 0.05376692341046299),
 ('동화', 0.05376692341046299),
 ('화제', 0.05376692341046299),
 ('바', 0.05376692341046299),
 ('측은', 0.05376692341046299),
 ('최근', 0.05376692341046299),
 ('준비', 0.05376692341046299),
 ('조윤희씨', 0.05376692341046299),
 ('임신', 0.05376692341046299),
 ('만큼', 0.05376692341046299),
 ('중요', 0.05376692341046299),
 ('시점', 0.05376692341046299),
 ('고심', 0.05376692341046299),
 ('본의', 0.05376692341046299),
 ('피해', 0.05376692341046299),
 ('회복', 0.05376692341046299),
 ('전념', 0.05376692341046299),
 ('한번', 0.05376692341046299),
 ('진심', 0.05376692341046299),
 ('송구', 0.05376692341046299),
 ('점', 0.05376692341046299),
 ('양해', 0.05376692341046299),
 ('한편', 0.05376692341046299),
 ('퇴사', 0.05376692341046299),
 ('프리', 0.05376692341046299),
 ('선언', 0.

 ('기자', 0.0),
 ('배포', 0.0)]
[('신혼', 0.6452030809255559),
 ('일기', 0.53766923410463),
 ('윤', 0.4839023106941669),
 ('2', 0.3979400086720376),
 ('김소영', 0.3763684638732409),
 ('이동', 0.3763684638732409),
 ('조', 0.3763684638732409),
 ('조윤', 0.3763684638732409),
 ('신혼일기2', 0.3763684638732409),
 ('오상', 0.32260154046277795),
 ('폰트', 0.32260154046277795),
 ('하차', 0.268834617052315),
 ('사이즈', 0.268834617052315),
 ('출연', 0.268834617052315),
 ('이동건과', 0.21506769364185196),
 ('건과', 0.21506769364185196),
 ('오상진과', 0.21506769364185196),
 ('진과', 0.21506769364185196),
 ('부부', 0.1836646193870943),
 ('이동건', 0.16130077023138897),
 ('일간', 0.16130077023138897),
 ('일간스포츠', 0.16130077023138897),
 ('스포츠', 0.16130077023138897),
 ('제작진', 0.16130077023138897),
 ('결혼', 0.16130077023138897),
 ('결정', 0.16130077023138897),
 ('4', 0.12244307959139619),
 ('오상진', 0.10753384682092598),
 ('진', 0.10753384682092598),
 ('투입', 0.10753384682092598),
 ('고딕', 0.10753384682092598),
 ('4폰트', 0.10753384682092598),
 ('현', 0.107533846

 ('살충제', 0.1997057155245768),
 ('곳', 0.1997057155245768),
 ('유통', 0.1997057155245768),
 ('폐기', 0.166421429603814),
 ('49', 0.1331371436830512),
 ('49개', 0.1331371436830512),
 ('산란계', 0.1331371436830512),
 ('결과', 0.13264666955734586),
 ('2', 0.11369714533486788),
 ('전국', 0.0998528577622884),
 ('기준치', 0.0998528577622884),
 ('적합', 0.0998528577622884),
 ('1190', 0.0998528577622884),
 ('1190개', 0.0998528577622884),
 ('전체', 0.0998528577622884),
 ('확인', 0.07579809688991192),
 ('최종', 0.0665685718415256),
 ('성분', 0.0665685718415256),
 ('위생', 0.0665685718415256),
 ('부실', 0.0665685718415256),
 ('즉시', 0.0665685718415256),
 ('957', 0.0665685718415256),
 ('뉴스', 0.05684857266743394),
 ('18일', 0.04225690468882979),
 ('등', 0.03789904844495596),
 ('과정', 0.03789904844495596),
 ('1', 0.0369181001935453),
 ('확인상보', 0.0332842859207628),
 ('상보', 0.0332842859207628),
 ('경남', 0.0332842859207628),
 ('창녕', 0.0332842859207628),
 ('창녕군', 0.0332842859207628),
 ('유어', 0.0332842859207628),
 ('유어면', 0.0332842859207628

 ('5월', 0.05376692341046299),
 ('30', 0.05376692341046299),
 ('30일', 0.05376692341046299),
 ('2년간', 0.05376692341046299),
 ('간', 0.05376692341046299),
 ('열애', 0.05376692341046299),
 ('백', 0.05376692341046299),
 ('백년가약', 0.05376692341046299),
 ('가약', 0.05376692341046299),
 ('맺었', 0.05376692341046299),
 ('사내', 0.05376692341046299),
 ('비밀', 0.05376692341046299),
 ('연애', 0.05376692341046299),
 ('사단', 0.05376692341046299),
 ('시리즈', 0.05376692341046299),
 ('하나', 0.05376692341046299),
 ('시즌2', 0.05376692341046299),
 ('씨도', 0.05376692341046299),
 ('시즌1', 0.05376692341046299),
 ('재현', 0.05376692341046299),
 ('구', 0.05376692341046299),
 ('구혜', 0.05376692341046299),
 ('혜', 0.05376692341046299),
 ('배경', 0.05376692341046299),
 ('동화', 0.05376692341046299),
 ('화제', 0.05376692341046299),
 ('바', 0.05376692341046299),
 ('측은', 0.05376692341046299),
 ('최근', 0.05376692341046299),
 ('준비', 0.05376692341046299),
 ('조윤희씨', 0.05376692341046299),
 ('임신', 0.05376692341046299),
 ('만큼', 0.05376692341046299),
 ('중요'

 ('추가', 0.024227503252014105),
 ('본문', 0.0),
 ('일', 0.0),
 ('기자', 0.0),
 ('배포', 0.0)]
[('신혼', 0.6452030809255559),
 ('일기', 0.53766923410463),
 ('윤', 0.4839023106941669),
 ('2', 0.3979400086720376),
 ('김소영', 0.3763684638732409),
 ('이동', 0.3763684638732409),
 ('조', 0.3763684638732409),
 ('조윤', 0.3763684638732409),
 ('신혼일기2', 0.3763684638732409),
 ('오상', 0.32260154046277795),
 ('폰트', 0.32260154046277795),
 ('하차', 0.268834617052315),
 ('사이즈', 0.268834617052315),
 ('출연', 0.268834617052315),
 ('이동건과', 0.21506769364185196),
 ('건과', 0.21506769364185196),
 ('오상진과', 0.21506769364185196),
 ('진과', 0.21506769364185196),
 ('부부', 0.1836646193870943),
 ('이동건', 0.16130077023138897),
 ('일간', 0.16130077023138897),
 ('일간스포츠', 0.16130077023138897),
 ('스포츠', 0.16130077023138897),
 ('제작진', 0.16130077023138897),
 ('결혼', 0.16130077023138897),
 ('결정', 0.16130077023138897),
 ('4', 0.12244307959139619),
 ('오상진', 0.10753384682092598),
 ('진', 0.10753384682092598),
 ('투입', 0.10753384682092598),
 ('고딕', 0.10753384682

 ('피해', 0.05376692341046299),
 ('회복', 0.05376692341046299),
 ('전념', 0.05376692341046299),
 ('한번', 0.05376692341046299),
 ('진심', 0.05376692341046299),
 ('송구', 0.05376692341046299),
 ('점', 0.05376692341046299),
 ('양해', 0.05376692341046299),
 ('한편', 0.05376692341046299),
 ('퇴사', 0.05376692341046299),
 ('프리', 0.05376692341046299),
 ('선언', 0.05376692341046299),
 ('웨이', 0.05376692341046299),
 ('3', 0.05119586529608225),
 ('전', 0.03413057686405483),
 ('년', 0.03413057686405483),
 ('사람', 0.03413057686405483),
 ('관계자', 0.030610769897849048),
 ('12', 0.030610769897849048),
 ('12일', 0.030610769897849048),
 ('키', 0.030610769897849048),
 ('예정', 0.030610769897849048),
 ('등', 0.030610769897849048),
 ('월', 0.030610769897849048),
 ('발표', 0.030610769897849048),
 ('영', 0.030610769897849048),
 ('당시', 0.030610769897849048),
 ('강원', 0.030610769897849048),
 ('강원도', 0.030610769897849048),
 ('도', 0.030610769897849048),
 ('과정', 0.030610769897849048),
 ('현재', 0.030610769897849048),
 ('분', 0.030610769897849048),
 

 ('오상진과', 0.21506769364185196),
 ('진과', 0.21506769364185196),
 ('부부', 0.1836646193870943),
 ('이동건', 0.16130077023138897),
 ('일간', 0.16130077023138897),
 ('일간스포츠', 0.16130077023138897),
 ('스포츠', 0.16130077023138897),
 ('제작진', 0.16130077023138897),
 ('결혼', 0.16130077023138897),
 ('결정', 0.16130077023138897),
 ('4', 0.12244307959139619),
 ('오상진', 0.10753384682092598),
 ('진', 0.10753384682092598),
 ('투입', 0.10753384682092598),
 ('고딕', 0.10753384682092598),
 ('4폰트', 0.10753384682092598),
 ('현', 0.10753384682092598),
 ('러브', 0.10753384682092598),
 ('방송', 0.10753384682092598),
 ('모델', 0.10753384682092598),
 ('장윤주', 0.10753384682092598),
 ('그', 0.10753384682092598),
 ('남편', 0.10753384682092598),
 ('정승', 0.10753384682092598),
 ('정승민', 0.10753384682092598),
 ('민', 0.10753384682092598),
 ('커플', 0.10753384682092598),
 ('생활', 0.10753384682092598),
 ('끝', 0.10753384682092598),
 ('에서', 0.10753384682092598),
 ('신혼일기', 0.10753384682092598),
 ('시즌', 0.10753384682092598),
 ('프로그램', 0.10753384682092598),
 

 ('년', 0.03413057686405483),
 ('사람', 0.03413057686405483),
 ('관계자', 0.030610769897849048),
 ('12', 0.030610769897849048),
 ('12일', 0.030610769897849048),
 ('키', 0.030610769897849048),
 ('예정', 0.030610769897849048),
 ('등', 0.030610769897849048),
 ('월', 0.030610769897849048),
 ('발표', 0.030610769897849048),
 ('영', 0.030610769897849048),
 ('당시', 0.030610769897849048),
 ('강원', 0.030610769897849048),
 ('강원도', 0.030610769897849048),
 ('도', 0.030610769897849048),
 ('과정', 0.030610769897849048),
 ('현재', 0.030610769897849048),
 ('분', 0.030610769897849048),
 ('페이스', 0.030610769897849048),
 ('페이스북', 0.030610769897849048),
 ('트위터', 0.030610769897849048),
 ('1', 0.029818465540940437),
 ('18일', 0.017065288432027415),
 ('한', 0.017065288432027415),
 ('관련', 0.017065288432027415),
 ('중인', 0.017065288432027415),
 ('이', 0.017065288432027415),
 ('북', 0.017065288432027415),
 ('전재', 0.017065288432027415),
 ('주', 0.014909232770470219),
 ('18', 0.007454616385235109),
 ('중', 0.007454616385235109),
 ('무단', 0.00745

 ('공개', 0.1989700043360188),
 ('뉴스', 0.13264666955734586),
 ('한', 0.1109243748081782),
 ('전', 0.1109243748081782),
 ('3', 0.07394958320545213),
 ('18일', 0.07394958320545213),
 ('사람', 0.07394958320545213),
 ('4', 0.06632333477867293),
 ('장', 0.06632333477867293),
 ('키', 0.06632333477867293),
 ('당시', 0.06632333477867293),
 ('현재', 0.06632333477867293),
 ('분', 0.06632333477867293),
 ('만', 0.06632333477867293),
 ('36', 0.06632333477867293),
 ('남', 0.06632333477867293),
 ('중앙', 0.06632333477867293),
 ('결과', 0.06632333477867293),
 ('확인', 0.06632333477867293),
 ('1', 0.04845500650402821),
 ('주', 0.04845500650402821),
 ('관련', 0.03697479160272606),
 ('중인', 0.03697479160272606),
 ('북', 0.03697479160272606),
 ('오후', 0.03697479160272606),
 ('18', 0.03230333766935214),
 ('내용', 0.03230333766935214),
 ('플레이어', 0.03230333766935214),
 ('중', 0.01615166883467607),
 ('오류', 0.01615166883467607),
 ('우회', 0.01615166883467607),
 ('위', 0.01615166883467607),
 ('함수', 0.01615166883467607),
 ('추가', 0.01615166883467

 ('이', 0.0554621874040891),
 ('오후', 0.0554621874040891),
 ('내용', 0.04845500650402821),
 ('플레이어', 0.04845500650402821),
 ('18', 0.024227503252014105),
 ('무단', 0.024227503252014105),
 ('금지', 0.024227503252014105),
 ('오류', 0.024227503252014105),
 ('우회', 0.024227503252014105),
 ('위', 0.024227503252014105),
 ('함수', 0.024227503252014105),
 ('추가', 0.024227503252014105),
 ('본문', 0.0),
 ('일', 0.0),
 ('기자', 0.0),
 ('배포', 0.0)]
[('신혼', 0.6452030809255559),
 ('일기', 0.53766923410463),
 ('윤', 0.4839023106941669),
 ('2', 0.3979400086720376),
 ('김소영', 0.3763684638732409),
 ('이동', 0.3763684638732409),
 ('조', 0.3763684638732409),
 ('조윤', 0.3763684638732409),
 ('신혼일기2', 0.3763684638732409),
 ('오상', 0.32260154046277795),
 ('폰트', 0.32260154046277795),
 ('하차', 0.268834617052315),
 ('사이즈', 0.268834617052315),
 ('출연', 0.268834617052315),
 ('이동건과', 0.21506769364185196),
 ('건과', 0.21506769364185196),
 ('오상진과', 0.21506769364185196),
 ('진과', 0.21506769364185196),
 ('부부', 0.1836646193870943),
 ('이동건', 0.1613007702

 ('기자', 0.0),
 ('배포', 0.0)]
[('강원', 0.1989700043360188),
 ('강원도', 0.1989700043360188),
 ('도', 0.1989700043360188),
 ('6', 0.1989700043360188),
 ('군', 0.1989700043360188),
 ('철', 0.0994850021680094),
 ('조사', 0.0994850021680094),
 ('1', 0.07268250975604232),
 ('중', 0.07268250975604232),
 ('3', 0.0554621874040891),
 ('이', 0.0554621874040891),
 ('오후', 0.0554621874040891),
 ('내용', 0.04845500650402821),
 ('플레이어', 0.04845500650402821),
 ('18', 0.024227503252014105),
 ('무단', 0.024227503252014105),
 ('금지', 0.024227503252014105),
 ('오류', 0.024227503252014105),
 ('우회', 0.024227503252014105),
 ('위', 0.024227503252014105),
 ('함수', 0.024227503252014105),
 ('추가', 0.024227503252014105),
 ('본문', 0.0),
 ('일', 0.0),
 ('기자', 0.0),
 ('배포', 0.0)]
[('신혼', 0.6452030809255559),
 ('일기', 0.53766923410463),
 ('윤', 0.4839023106941669),
 ('2', 0.3979400086720376),
 ('김소영', 0.3763684638732409),
 ('이동', 0.3763684638732409),
 ('조', 0.3763684638732409),
 ('조윤', 0.3763684638732409),
 ('신혼일기2', 0.3763684638732409),
 ('오상

 ('시즌', 0.10753384682092598),
 ('프로그램', 0.10753384682092598),
 ('건강', 0.10753384682092598),
 ('건', 0.09183230969354715),
 ('5', 0.09183230969354715),
 ('공개', 0.09183230969354715),
 ('장', 0.061221539795698096),
 ('두', 0.061221539795698096),
 ('마음', 0.061221539795698096),
 ('단독', 0.05376692341046299),
 ('단독신혼일기2', 0.05376692341046299),
 ('설정', 0.05376692341046299),
 ('안내', 0.05376692341046299),
 ('안내나눔고딕', 0.05376692341046299),
 ('나눔', 0.05376692341046299),
 ('2돋움', 0.05376692341046299),
 ('돋움', 0.05376692341046299),
 ('3바탕', 0.05376692341046299),
 ('바탕', 0.05376692341046299),
 ('1폰트', 0.05376692341046299),
 ('2폰트', 0.05376692341046299),
 ('3폰트', 0.05376692341046299),
 ('복수', 0.05376692341046299),
 ('확정', 0.05376692341046299),
 ('녹화', 0.05376692341046299),
 ('러브콜', 0.05376692341046299),
 ('콜', 0.05376692341046299),
 ('응답', 0.05376692341046299),
 ('합류', 0.05376692341046299),
 ('합류키', 0.05376692341046299),
 ('일신상', 0.05376692341046299),
 ('이유', 0.05376692341046299),
 ('의견', 0.0537669234104

 ('배포', 0.0)]
[('년', 0.2218487496163564),
 ('공개', 0.1989700043360188),
 ('뉴스', 0.13264666955734586),
 ('한', 0.1109243748081782),
 ('전', 0.1109243748081782),
 ('3', 0.07394958320545213),
 ('18일', 0.07394958320545213),
 ('사람', 0.07394958320545213),
 ('4', 0.06632333477867293),
 ('장', 0.06632333477867293),
 ('키', 0.06632333477867293),
 ('당시', 0.06632333477867293),
 ('현재', 0.06632333477867293),
 ('분', 0.06632333477867293),
 ('만', 0.06632333477867293),
 ('36', 0.06632333477867293),
 ('남', 0.06632333477867293),
 ('중앙', 0.06632333477867293),
 ('결과', 0.06632333477867293),
 ('확인', 0.06632333477867293),
 ('1', 0.04845500650402821),
 ('주', 0.04845500650402821),
 ('관련', 0.03697479160272606),
 ('중인', 0.03697479160272606),
 ('북', 0.03697479160272606),
 ('오후', 0.03697479160272606),
 ('18', 0.03230333766935214),
 ('내용', 0.03230333766935214),
 ('플레이어', 0.03230333766935214),
 ('중', 0.01615166883467607),
 ('오류', 0.01615166883467607),
 ('우회', 0.01615166883467607),
 ('위', 0.01615166883467607),
 ('함수', 0.01

 ('큰아들', 0.0998528577622884),
 ('롤', 0.0998528577622884),
 ('롤랜드', 0.0998528577622884),
 ('랜드', 0.0998528577622884),
 ('를', 0.0998528577622884),
 ('포함', 0.0998528577622884),
 ('일가친척', 0.0998528577622884),
 ('흥분', 0.0998528577622884),
 ('몇', 0.0998528577622884),
 ('광고업계', 0.0998528577622884),
 ('업계', 0.0998528577622884),
 ('동료', 0.0998528577622884),
 ('도로', 0.0998528577622884),
 ('옆', 0.0998528577622884),
 ('광고판', 0.0998528577622884),
 ('탄생', 0.0998528577622884),
 ('축하', 0.0998528577622884),
 ('걸기', 0.0998528577622884),
 ('이벤트', 0.0998528577622884),
 ('라', 0.0998528577622884),
 ('귀', 0.0998528577622884),
 ('남성적', 0.0998528577622884),
 ('성적', 0.0998528577622884),
 ('느낌', 0.0998528577622884),
 ('중성', 0.0998528577622884),
 ('온', 0.0998528577622884),
 ('새', 0.0998528577622884),
 ('구성원', 0.0998528577622884),
 ('출현', 0.0998528577622884),
 ('정우', 0.0998528577622884),
 ('정우영', 0.0998528577622884),
 ('인턴', 0.0998528577622884),
 ('인턴기자', 0.0998528577622884),
 ('네이버', 0.0998528577622884),
 ('네이버포스

 ('뉴스', 0.13264666955734586),
 ('한', 0.1109243748081782),
 ('전', 0.1109243748081782),
 ('3', 0.07394958320545213),
 ('18일', 0.07394958320545213),
 ('사람', 0.07394958320545213),
 ('4', 0.06632333477867293),
 ('장', 0.06632333477867293),
 ('키', 0.06632333477867293),
 ('당시', 0.06632333477867293),
 ('현재', 0.06632333477867293),
 ('분', 0.06632333477867293),
 ('만', 0.06632333477867293),
 ('36', 0.06632333477867293),
 ('남', 0.06632333477867293),
 ('중앙', 0.06632333477867293),
 ('결과', 0.06632333477867293),
 ('확인', 0.06632333477867293),
 ('1', 0.04845500650402821),
 ('주', 0.04845500650402821),
 ('관련', 0.03697479160272606),
 ('중인', 0.03697479160272606),
 ('북', 0.03697479160272606),
 ('오후', 0.03697479160272606),
 ('18', 0.03230333766935214),
 ('내용', 0.03230333766935214),
 ('플레이어', 0.03230333766935214),
 ('중', 0.01615166883467607),
 ('오류', 0.01615166883467607),
 ('우회', 0.01615166883467607),
 ('위', 0.01615166883467607),
 ('함수', 0.01615166883467607),
 ('추가', 0.01615166883467607),
 ('본문', 0.0),
 ('일', 0.

 ('롤', 0.0998528577622884),
 ('롤랜드', 0.0998528577622884),
 ('랜드', 0.0998528577622884),
 ('를', 0.0998528577622884),
 ('포함', 0.0998528577622884),
 ('일가친척', 0.0998528577622884),
 ('흥분', 0.0998528577622884),
 ('몇', 0.0998528577622884),
 ('광고업계', 0.0998528577622884),
 ('업계', 0.0998528577622884),
 ('동료', 0.0998528577622884),
 ('도로', 0.0998528577622884),
 ('옆', 0.0998528577622884),
 ('광고판', 0.0998528577622884),
 ('탄생', 0.0998528577622884),
 ('축하', 0.0998528577622884),
 ('걸기', 0.0998528577622884),
 ('이벤트', 0.0998528577622884),
 ('라', 0.0998528577622884),
 ('귀', 0.0998528577622884),
 ('남성적', 0.0998528577622884),
 ('성적', 0.0998528577622884),
 ('느낌', 0.0998528577622884),
 ('중성', 0.0998528577622884),
 ('온', 0.0998528577622884),
 ('새', 0.0998528577622884),
 ('구성원', 0.0998528577622884),
 ('출현', 0.0998528577622884),
 ('정우', 0.0998528577622884),
 ('정우영', 0.0998528577622884),
 ('인턴', 0.0998528577622884),
 ('인턴기자', 0.0998528577622884),
 ('네이버', 0.0998528577622884),
 ('네이버포스트', 0.0998528577622884),
 ('포스

이렇게 하여 Indexing이 마무리되었다. 다음에는 Query를 주면 적절한 문서를 반환하는 Retriever를 만들어 보도록 하겠다.

## Retriever

이제까지 잘 따라왔다면 이제 우리는 Crawler와 Indexer를 완성하였다. 이제 사용자의 Query를 입력 받고 해당되는 내용을 문서상에서 찾아 가장 관련이 높은내용을 찾아주는 Retriever를 만들어 보도록 하겠다. 먼저 사용되는 패키지들을 import 하도록 한다.

In [17]:
from konlpy.tag import Kkma
import re
import math
import pickle

Retriever에서는 Indexer에서 사용하였던 preprocessing, indexing, maxfreq함수를 사용할 것이기 때문에 따로 다시 선언하지 않겠다. 이번에 Retriever에 적합하도록 weighting 함수를 다시 설계하도록 하겠다.

In [18]:
def weighting_retrieve(tool, total, maxtf, featurelist, indexlist):
    weightlist = dict()
    if tool == 'tfidf':
        for term, freq in featurelist.items():
            tf = freq/maxtf
            idf = 0
            
            if term in indexlist.keys():
                idf = math.log10(total/len(indexlist[term]))
                
            tfidf = tf*idf
            
            if tfidf > 0:
                weightlist[term] = tfidf 
    elif tool == 'halfnorm':
        for term, freq in featurelist.items():
            tf = 0.5 + 0.5*(freq/maxtf) # tf 보다 idf에 가중치를 더 주겠다는 말. tf는 1이 나올 확률이 높다. query의 사이즈가 너무 작기 때문에 보정을 해준다.
            idf = 0
            if term in indexlist.keys():
                idf = math.log10(total/len(indexlist[term]))
            tfidf = tf*idf
            if tfidf > 0:
                weightlist[term] = tfidf
    return weightlist

Retriever의 weighting 함수는 새롭게 설계하였다. 먼저 tool을 2가지로 선택할 수 있으며 tf-idf 기법이나 halfnorm 기법을 사용할 수 있다.

tf-idf기법은 앞에서 Indexer 부분에서도 설명한 기법으로, 단어를 입력받았을 때 해당 단어에 대한 tf값과 idf 값을 계산하여 weightlist 딕셔너리에 해당 단어와 tf-idf 값을 저장한다.

Halfnorm 기법은 tf-idf 기법에서 가중치를 조절하는 기법이다. tf의 경우에는 기본 보장값 0.5에 기존의 tf값에 가중치를 곱하여 계산한다. 기본 보장값 0.5를 더해주는 이유는 만일 쿼리가 매우 짧을 경우 사용되는 단어가 적기 때문에 각 단어들의 frequency가 매우 줄어들기 때문에 결과적으로 tf값이 매우 작아져 정상적인 작동을 보장할 수 없다. 이때 tf의 식은 (단어의 빈도)/(단어의 최고 빈도)이기 때문에 0 ~ 1의 범위 내로 normalize 되어있다. 그렇기 때문에 기본적으로 평균값인 0.5를 더해줌으로 써 정상적 작동을 보장한다. 반대로 idf값의 경우에는 앞의 tf값과 달리  기존의 idf값에 log를 취하여 각각 크기를 normalize하여 사용하였다.

이제 단어의 중요도에 대해서 측정하는 weighting 함수를 완료하였고 이제 이를 기반으로 검색하는 함수 retrieving를 만들어 보도록 하겠다.

In [19]:
def retrieving(tool, weightlist, indexlist, docvlength):
    resultlist = dict()
    if tool == 'euclidean': #squeare root 안씌움
        for term, weight in weightlist.items():
            if term in indexlist.keys():
                for docname, indexweight in indexlist[term].items():
                    distance = (weight - indexweight)
                    distance = distance ** 2
                    if docname in resultlist:
                        resultlist[docname] += distance
                    else:
                        resultlist[docname] = distance
        for docname, weight in resultlist.items():
            resultlist[docname] = resultlist[docname]                  
    elif tool == 'cosine':
        for term, weight in weightlist.items():
            if term in indexlist.keys():
                for docname, indexweight in indexlist[term].items():
                    if docname in resultlist:
                        resultlist[docname] += (weight * indexweight)
                    else:
                        resultlist[docname] = (weight * indexweight)
    
        for docname, vsize in docvlength.items():
            if docname in resultlist:
                resultlist[docname] /= math.sqrt(vsize)
    resultlist = sorted(resultlist.items(), key=operator.itemgetter(1), reverse=True)          
    return resultlist

Retriving 함수에서는 사용자가 입력한 쿼리에 있는 단어와 현재 문서에 있는 단어의 거리를 측정한다. 거리 측정의 기준은 Euclidean 기법과 Cosine 기법을 사용하고 있다. Euclidian 기법은 쿼리 안 단어의 가중치와 문서 안 단어의 가중치를 사용하어 Euclidian 거리를 구하는 방법이고, Cosine은 백터의 내적을 이용해서 구하는 방법이다. 기본적으로 $\cos \theta =  \frac{a\cdot b}{|a|*|b|}$ 공식을 이용하고 a 에는 쿼리 백터를 넣고, b에는 문서 백터를 넣는다. 한 쿼리를 중심으로 검색을 하기 때문에 쿼

In [23]:
query = input()
featurelist = dict()
weightlist = dict()
with open('./weight/termweight.pkl', 'rb') as f:
    indexlist = pickle.load(f)
with open('./weight/docvlength.pkl', 'rb') as f:
    docvlength = pickle.load(f)
if not (indexlist or docvlength):
    print('색인파일이 없습니다.')
else:
    content = preprocessing(query) 
    termlist = indexing('phrase', content) 
    if not termlist:
        print('검색어가 없습니다.')
    else:
        maxtf = maxfreq(termlist) 
        weightlist = weighting_retrieve('tfidf', len(docvlength), maxtf, termlist, indexlist)   
        resultlist = retrieving('euclidean', weightlist, indexlist, docvlength)
        print('Euclidian')
        print('Query - ' + query)
        print('IndexTerm - ' + str(weightlist.keys()))
        
        for docname, similarity in resultlist:
            with open('./news/' + docname, 'r',encoding = 'utf-8') as f:
                newscontent = f.read()
            newscontent = preprocessing(newscontent)
            
            print('{}[{:.5f}] - {}...'.format(docname, similarity, newscontent[:50]))
            
        resultlist = retrieving('cosine', weightlist, indexlist, docvlength)
        
        print()
        print('Cosine')
        print('Query - ' + query)
        print('IndexTerm - ' + str(weightlist.keys()))
        
        for docname, similarity in resultlist:
            with open('./news/' + docname, 'r',encoding = 'utf-8') as f:
                newscontent = f.read()
            newscontent = preprocessing(newscontent)
            
            print('{}[{:.5f}] - {}...'.format(docname, similarity, newscontent[:50]))
        

1
색인파일이 없습니다.


사용자가 검색하고자 하는 내용을 입력시 이를 분석하여 적절한 단어들만을 뽑아낸다. 만일 치킨의 가격에 대한 뉴스를 찾고싶어 '치킨 가격 뉴스'라는 쿼리를 입력했다고 하자. 그러면 진행되는 절차는 아래와 같다.
* 우선 쿼리를 전처리 한 다음 쿼리를 분석하여 적절한 단어를 뽑아낸다.
* 뽑아낸 단어들이 수집된 문서상에서 얼마나 나왔는지에 대해서 분석하여 빈도수 얻는다.
* 단어들의 빈도수와 더불어 이전에 Indexer에서 수집한 값들을 사용하여 tf-idf 또는 halfnorm 기법으로 가중치를 얻는다.
* 얻어낸 가중치를 사용하여 문서와의 연관도를 Euclidian 거리 또는 Cosine을 사용하여 구한다.
* 구한 거리들 중 가장 높은 문서들부터 차례로 결과를 표시한다.
이러한 절차를 통하여 결과가 나왔으며 Euclidian과 Cosine 각각 비교를 하였을때 상대적으로 다른 결과를 얻을 수 있었다.

## 마무리

이렇게 하여 간단한 Information Retrieval 프로그램을 제작하였다. 우리가 만든 것은 정말로 '간단한' 프로그램이기 때문에 자료가 많지 않아 정확성은 상대적으로 떨어진다. 하지만 현존하는 프로그램들도 이러한 기법을 기반으로 구성되어있다는 것을 생각하면서 이번 장을 마치도록 하겠다.