In [7]:
import markovify
import pandas as pd

# 모방할 텍스트를 가져옴
df = pd.read_csv("./datasets/airport_reviews.csv")
#print(df.head())

from itertools import chain
# 개별 리뷰를 하나의 큰 문자열로 합치고, 공항 리뷰를 사용해 마르코프 연쇄 모델을 구축함.
N = 100
review_subset = df['content'][0:N]
text = "".join(chain.from_iterable(review_subset))
markov_chain_model = markovify.Text(text)

# 마르코프 연쇄 모델을 이용해 문장 4개를 만든다.
for i in range(5):
    print(markov_chain_model.make_sentence())
    print()

# 마르코프 연쇄 모델을 이용해, 140자 이하의 문장 4개를 만든다.
for i in range(5):
    print(markov_chain_model.make_short_sentence(140))
    print()

In the new terminal over a gangway.

Luggage delivery is unorganized and usually directly from the airport maximises transfer times due to heightened measures ahead of the airport.

A cheap alternative to a single stinking toilet.

Short lines at passport control + security resulted in my working life.

The bus parks ± 50 metres from the UK on a connecting bus to drive a further few kms to get through the formalities the shops or catering so can't comment there.

Overall nothing is horrible but it's a clean and friendly.

Clean airport with modern facilities a good 200m from the time you get ripped off.

Whereas departures may go smoothly arrivals are a crude rip off.

It is a well-organised easy to navigate around the world I am always amazed about the poor quality of Brussels Zaventem airport.

Baggage drop off was very efficient my luggage was on the 4th of August for my trip to Bangkok via Istanbul.



- text class의 init 매소드의 snippet code
```python
def __init__(
        self,
        input_text,
        state_size=2,
        chain=None,
        parsed_sentences=None,
        retain_original=True,
        well_formed=True,
        reject_reg="",
    ):   
```

- 매개변수 설명
    - input_text: A string.
    - state_size: An integer, indicating the number of words in the model's state.
    - chain: A trained markovify.Chain instance for this text, if pre-processed.
    - parsed_sentences: A list of lists, where each outer list is a "run" of the process (e.g. a single sentence), and each inner list contains the steps (e.g. words) in the run. If you want to simulate an infinite process, you can come very close by passing just one, very long run.
    - retain_original: Indicates whether to keep the original corpus.
    - well_formed: Indicates whether sentences should be well-formed, preventing unmatched quotes, parenthesis by default, or a custom regular expression can be provided.
    - reject_reg: If well_formed is True, this can be provided to override the standard rejection pattern.

- state_size = 2는 마르코프 연쇄의 전이가 단어의 연속적인 쌍 간에 일어나는 것을 의미함.
    - 훈련 텍스츠를 통해 특정 단아 다음에 나올 단어의 확률(전이 확률)을 계산할 수 있음
    - 이 값을 증가시키면, 더 현실적으로 모방한 문장을 만들ㅇ 