In [8]:
import markovify
import pandas as pd

# 모방할 텍스트를 가져옴
df = pd.read_csv("./datasets/airport_reviews.csv")
#print(df.head())

from itertools import chain
# 개별 리뷰를 하나의 큰 문자열로 합치고, 공항 리뷰를 사용해 마르코프 연쇄 모델을 구축함.
N = 100
review_subset = df['content'][0:N]
text = "".join(chain.from_iterable(review_subset))
markov_chain_model = markovify.Text(text)

# 마르코프 연쇄 모델을 이용해 문장 4개를 만든다.
for i in range(5):
    print(markov_chain_model.make_sentence())
    print()

# 마르코프 연쇄 모델을 이용해, 140자 이하의 문장 4개를 만든다.
for i in range(5):
    print(markov_chain_model.make_short_sentence(140))
    print()

After my laptop was scanned, I was however required to take off shoes since they have a long underground tunnel with lots of escalators.

Security check takes forever and is quite comfortable for a very short period.

Security was so rude but as I was collected by car.

The ticket barcode access scanner for fast track security and off to the airport makes you go up and down on escalators and long distances so enough time is needed even if you are through security.

No lines at passport control areas are old overcrowded messy and do not speak English still not used to like travelling through Brussels take it.Flew from the airport to Leuven as well as traffic can be long here at busy times with only some of the critical reviews on this site and while Brussels airport a lot of shopping and eating options lounges are well signposted.

Good selection of food excellent.

Had to rush to my gate to immigration counters in concourse B with onward flight to Geneva from T-1 Brussels Airlines.

Fi

- text class의 init 매소드의 snippet code
```python
def __init__(
        self,
        input_text,
        state_size=2,
        chain=None,
        parsed_sentences=None,
        retain_original=True,
        well_formed=True,
        reject_reg="",
    ):   
```

- 매개변수 설명
    - input_text: A string.
    - state_size: An integer, indicating the number of words in the model's state.
    - chain: A trained markovify.Chain instance for this text, if pre-processed.
    - parsed_sentences: A list of lists, where each outer list is a "run" of the process (e.g. a single sentence), and each inner list contains the steps (e.g. words) in the run. If you want to simulate an infinite process, you can come very close by passing just one, very long run.
    - retain_original: Indicates whether to keep the original corpus.
    - well_formed: Indicates whether sentences should be well-formed, preventing unmatched quotes, parenthesis by default, or a custom regular expression can be provided.
    - reject_reg: If well_formed is True, this can be provided to override the standard rejection pattern.

- state_size = 2는 마르코프 연쇄의 전이가 단어의 연속적인 쌍 간에 일어나는 것을 의미함.
    - 훈련 텍스트를 통해 특정 단어 다음에 나올 단어의 확률(전이 확률)을 계산할 수 있음
    - 이 값을 증가시키면, 더 현실적으로 모방한 문장을 만들 수 있음.