### Web Crawling of Wikipedia

In [1]:
import requests
from bs4 import BeautifulSoup
print ("Ready.")

Ready.


### Define `entity` for simple search

In [2]:
entity = 'President of South Korea'
print ("entity:[%s]"%(entity))

entity:[President of South Korea]


### Now crawl

In [3]:
entity_ = entity.replace(" ", "+")
search_url = f"https://en.wikipedia.org/w/index.php?search={entity_}"
response_text = requests.get(search_url).text
soup = BeautifulSoup(response_text, features="html.parser")
result_divs = soup.find_all("div", {"class": "mw-search-result-heading"})
if result_divs:  # mismatch
    print ("entity:[%s] mismatched."%(entity))
else:
    page = [p.get_text().strip() for p in soup.find_all("p") + soup.find_all("ul")]

In [4]:
page

['',
 '',
 '',
 '',
 'The president of the Republic of Korea (Korean:\xa0대한민국 대통령; RR:\xa0Daehanmin-guk daetongnyeong), also known as the president of South Korea (Korean:\xa0대통령), is the head of state and head of government of the Republic of Korea. The president leads the State Council, and is the chief of the executive branch of the national government as well as the commander-in-chief of the Republic of Korea Armed Forces.',
 'The Constitution and the amended Presidential Election Act of 1987 provide for election of the president by direct, secret ballot, ending sixteen years of indirect presidential elections under the preceding two authoritarian governments. The president is directly elected to a five-year term, with no possibility of re-election.[2] If a presidential vacancy should occur, a successor must be elected within sixty days, during which time presidential duties are to be performed by the prime minister or other senior cabinet members in the order of priority as determ

### Clean `page` to `paragraphs`

In [5]:
def clean_str(p):
    return p.encode().decode("unicode-escape").encode("latin1").decode("utf-8")
page_clean = ""
for p in page:
    page_clean += clean_str(p)
    if not p.endswith('\n'):
        page_clean += '\n'
paragraphs = page_clean.split("\n")
paragraphs = [p.strip() for p in paragraphs if p.strip()]
print ("We have [%d] paragraphs."%(len(paragraphs)))

We have [358] paragraphs.


In [6]:
paragraphs

['The president of the Republic of Korea (Korean:\xa0대한민국 대통령; RR:\xa0Daehanmin-guk daetongnyeong), also known as the president of South Korea (Korean:\xa0대통령), is the head of state and head of government of the Republic of Korea. The president leads the State Council, and is the chief of the executive branch of the national government as well as the commander-in-chief of the Republic of Korea Armed Forces.',
 'The Constitution and the amended Presidential Election Act of 1987 provide for election of the president by direct, secret ballot, ending sixteen years of indirect presidential elections under the preceding two authoritarian governments. The president is directly elected to a five-year term, with no possibility of re-election.[2] If a presidential vacancy should occur, a successor must be elected within sixty days, during which time presidential duties are to be performed by the prime minister or other senior cabinet members in the order of priority as determined by law. The pre

In [7]:
min_char_len = 100 # minimum number of characters in a paragraph
first_k      = 5   # we want 'first_k' paragraphs to be included
top_m_excluding_first_k = 3 # we want to get 'top_m' exluding 'first_k' making the total 'k+m'
paragraphs_filtered = [p for p in paragraphs if len(p) >= min_char_len]
paragraphs_first_k  = paragraphs_filtered[:first_k]
praagraphs_remain   = paragraphs_filtered[first_k:]
paragraphs_sorted   = sorted(praagraphs_remain,key=len,reverse=True)
paragraphs_top_m    = paragraphs_sorted[:top_m_excluding_first_k]
paragraphs_return   = paragraphs_first_k + paragraphs_top_m
print ("Now we have [%d] filtered paragraphs and [%d] returning paragraphs where k:[%d] and m:[%d]"%
       (len(paragraphs_filtered),len(paragraphs_return),first_k,top_m_excluding_first_k))

Now we have [31] filtered paragraphs and [8] returning paragraphs where k:[5] and m:[3]


In [8]:
for p_idx,p in enumerate(paragraphs_return):
    print ("[%d/%d]\n%s"%(p_idx,len(paragraphs_return),p))

[0/8]
The president of the Republic of Korea (Korean: 대한민국 대통령; RR: Daehanmin-guk daetongnyeong), also known as the president of South Korea (Korean: 대통령), is the head of state and head of government of the Republic of Korea. The president leads the State Council, and is the chief of the executive branch of the national government as well as the commander-in-chief of the Republic of Korea Armed Forces.
[1/8]
The Constitution and the amended Presidential Election Act of 1987 provide for election of the president by direct, secret ballot, ending sixteen years of indirect presidential elections under the preceding two authoritarian governments. The president is directly elected to a five-year term, with no possibility of re-election.[2] If a presidential vacancy should occur, a successor must be elected within sixty days, during which time presidential duties are to be performed by the prime minister or other senior cabinet members in the order of priority as determined by law. The presid

### Summarize each paragraph using `GPT`

In [9]:
import openai
from IPython.display import Markdown,display
def printmd(string):
    display(Markdown(string))
class GPTchatClass():
    def __init__(self,
                 gpt_model = 'gpt-4',
                 role_msg  = 'Your are a helpful assistant.',
                 VERBOSE   = True
                ):
        self.gpt_model     = gpt_model
        self.messages      = [{'role':'system','content':f'{role_msg}'}]
        self.init_messages = [{'role':'system','content':f'{role_msg}'}]
        self.VERBOSE       = VERBOSE
        self.response      = None
        if self.VERBOSE:
            print ("Chat agent using [%s] initialized with the follow role:[%s]"%
                   (self.gpt_model,role_msg))
    
    def _add_message(self,role='assistant',content=''):
        """
            role: 'assistant' / 'user'
        """
        self.messages.append({'role':role, 'content':content})
        
    def _get_response_content(self):
        if self.response:
            return self.response['choices'][0]['message']['content']
        else:
            return None
        
    def _get_response_status(self):
        if self.response:
            return self.response['choices'][0]['message']['finish_reason']
        else:
            return None
    
    def chat(self,user_msg='hi',
             PRINT_USER_MSG=True,PRINT_GPT_OUTPUT=True,
             RESET_CHAT=False,RETURN_RESPONSE=True):
        self._add_message(role='user',content=user_msg)
        self.response = openai.ChatCompletion.create(
            model    = self.gpt_model,
            messages = self.messages
        )
        # Backup response for continous chatting
        self._add_message(role='assistant',content=self._get_response_content())
        if PRINT_USER_MSG:
            print("[USER_MSG]")
            printmd(user_msg)
        if PRINT_GPT_OUTPUT:
            print("[GPT_OUTPUT]")
            printmd(self._get_response_content())
        # Reset
        if RESET_CHAT:
            self.messages = self.init_messages
        # Return
        if RETURN_RESPONSE:
            return self._get_response_content()
print ("Ready.")

Ready.


In [10]:
key_path = '../key/rilab_key.txt'
with open(key_path, 'r') as f: OPENAI_API_KEY = f.read()
openai.api_key = OPENAI_API_KEY
GPT = GPTchatClass(gpt_model='gpt-4',role_msg  = 'Your are a helpful assistant.')

Chat agent using [gpt-4] initialized with the follow role:[Your are a helpful assistant.]


### Summarize each paragraph into one sentence using `GPT`

In [11]:
for p_idx,p in enumerate(paragraphs_return):
    print ("[%d/%d]\n%s"%(p_idx,len(paragraphs_return),p))
    user_msg = "Could you summarize the following paragraph into one setence? \n "+p
    response_content = GPT.chat(
        user_msg=user_msg,PRINT_USER_MSG=False,PRINT_GPT_OUTPUT=False,
        RESET_CHAT=True,RETURN_RESPONSE=True)
    # Print summarized sentence with a markdown format
    printmd(response_content)

[0/8]
The president of the Republic of Korea (Korean: 대한민국 대통령; RR: Daehanmin-guk daetongnyeong), also known as the president of South Korea (Korean: 대통령), is the head of state and head of government of the Republic of Korea. The president leads the State Council, and is the chief of the executive branch of the national government as well as the commander-in-chief of the Republic of Korea Armed Forces.


The president of the Republic of Korea, also known as the president of South Korea, is the head of state and government, leading the State Council, overseeing the executive branch of the national government, and also functioning as the commander-in-chief of the Republic of Korea Armed Forces.

[1/8]
The Constitution and the amended Presidential Election Act of 1987 provide for election of the president by direct, secret ballot, ending sixteen years of indirect presidential elections under the preceding two authoritarian governments. The president is directly elected to a five-year term, with no possibility of re-election.[2] If a presidential vacancy should occur, a successor must be elected within sixty days, during which time presidential duties are to be performed by the prime minister or other senior cabinet members in the order of priority as determined by law. The president is exempt from criminal liability (except for insurrection or treason).


The Constitution and the amended Presidential Election Act of 1987 necessitate a direct, secret ballot election for the president who serves a non-renewable five-year term, with succession rules in case of a vacancy, and the president is immune to criminal charges unless for insurrection or treason.

[2/8]
The current president, Yoon Suk Yeol, a former prosecutor general and member of the conservative People Power Party, assumed office on 10 May 2022,[3][4] after defeating the Democratic Party's nominee Lee Jae-myung with a narrow 48.5% plurality in the 2022 South Korean presidential election.[5]


The current South Korean president, Yoon Suk Yeol, a conservative People Power Party member and former prosecutor general, took office on May 10, 2022, defeating Democratic Party's nominee Lee Jae-myung in a narrow 48.5% plurality in the 2022 presidential election.


[3/8]
Prior to the establishment of the First Republic in 1948, the Provisional Government of the Republic of Korea established in Shanghai in September 1919 as the continuation of several governments proclaimed in the aftermath of March 1st Movement earlier that year coordinated Korean people's resistance against the Japanese occupation. The legitimacy of the Provisional Government has been recognized and succeeded by South Korea in the latter's original Constitution of 1948 and the current Constitution of 1988.


Before the First Republic in 1948, the Provisional Government of the Republic of Korea, set up in Shanghai in September 1919 following the March 1st Movement, led resistance against Japanese occupation, a government whose legitimacy was later recognized and succeeded by South Korea's original Constitution in 1948 and current Constitution in 1988.

[4/8]
The presidential term has been set at five years since 1988. It was previously set at four years from 1948 to 1972, six years from 1972 to 1981, and seven years from 1981 to 1988. Since 1981, the president has been barred from re-election.


Since 1988, the presidential term has been set at five years, following varied term lengths from 1948 to 1988, and re-election has been prohibited for the president since 1981.


[5/8]
The 1987 Constitution removed the 1980 Constitution's explicit provisions that empowered the government to temporarily suspend the freedoms and rights of the people. However, the president is permitted to take other measures that could amend or abolish existing laws for the duration of a crisis. It is unclear whether such emergency measures could temporarily suspend portions of the Constitution itself. Emergency measures must be referred to the National Assembly for concurrence. If not endorsed by the assembly, the emergency measures can be revoked; any laws that had been overridden by presidential order regain their original effect. In this respect, the power of the legislature is more vigorously asserted than in cases of ratification of treaties or declarations of war, in which the Constitution simply states that the National Assembly "has the right to consent" to the president's actions. In a change from the 1980 Constitution, the 1987 Constitution stated that the president is

The 1987 Constitution removed provisions that allowed the government to suspend people's freedoms and rights, instead permitting the president to amend or abolish laws during a crisis, although it remains unclear if these powers extend to suspending parts of the Constitution itself; these changes must be approved by the National Assembly to remain in effect, with general legislative power getting emphasized more strongly than in treaty ratifications or declarations of war, meanwhile, unlike the 1980 Constitution, the 1987 version doesn't permit the president to dissolve the National Assembly.


[6/8]
These constitutional organs included the National Security Council, which provided advice concerning the foreign, military, and domestic policies bearing on national security. Chaired by the president, the council in 1990 had as its statutory members the prime minister, the deputy prime minister, the ministers for foreign affairs, home affairs, finance, and national defense, the director of the Agency for National Security Planning (ANSP) which was known as the Korean Central Intelligence Agency (KCIA) until December 1980, and others designated by the president. Another important body is the Peaceful Unification Advisory Council, inaugurated in June 1981 under the chairmanship of the president. From its inception, this body had no policy role, but rather appeared to serve as a government sounding board and as a means to disburse political rewards by providing large numbers of dignitaries and others with titles and opportunities to meet periodically with the president and other se

The National Security Council, which includes the president and key governmental members, advises on foreign, military, and domestic policies affecting national security, and the Peaceful Unification Advisory Council, established in June 1981 under the president's chairmanship, essentially functions as a sounding board for the government and an avenue to distribute political rewards, with no direct policy role.

[7/8]
One controversial constitutional organ was the Advisory Council of Elder Statesmen, which replaced a smaller body in February 1988, just before Roh Tae Woo was sworn in as president. This body was supposed to be chaired by the immediate former president; its expansion to eighty members, broadened functions, and elevation to cabinet rank made it appear to have been designed, as one Seoul newspaper said, to "preserve the status and position of a certain individual." The government announced plans to reduce the size and functions of this body immediately after Roh's inauguration. Public suspicions that the council might provide former President Chun with a power base within the Sixth Republic were rendered moot when Chun withdrew to an isolated Buddhist temple in self-imposed exile in November 1988.


The Advisory Council of Elder Statesmen, a controversial constitutional organ that was expanded in stature and function just before Roh Tae Woo's presidency, was suspected of being a power base for former President Chun, a concern that became insignificant when Chun chose self-imposed exile in a Buddhist temple in November 1988.