<a href="https://colab.research.google.com/github/suprech/class2025Spring/blob/main/nlp_class2025spring.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [82]:
!pip install selenium



In [83]:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
import re
import requests
from bs4 import BeautifulSoup

# Google Patent 기본 URL
base_url = "https://patents.google.com/patent"

# 추출 대상 특허 공개번호(출원번호)
search_publication_number = "US20250058223"
#search_publication_number = "US20220336530A1"

search_url = f"{base_url}/{search_publication_number}"

# Beautiful Soup 설정

In [84]:
# User-Agent 설정
headers = {
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
    "AppleWebKit/537.36 (KHTML, like Gecko)"
}

# 페이지 요청
response = requests.get(search_url, headers=headers)
response.encoding = "utf-8"

# HTML 파싱
soup = BeautifulSoup(response.text, "html.parser")

# Selenium 설정

In [85]:
# headless 모드 설정
chrome_options = Options()
chrome_options.add_argument("--headless")
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument("--disable-dev-shm-usage")

# 크롬드라이버로 접속
driver = webdriver.Chrome(options=chrome_options)
driver.get(search_url)

In [86]:
# 청구항 총 개수 추출
claim_selector = (
    "#claims > h3 > "
    "div:nth-child(1) > "
    "div.flex.style-scope.patent-result > "
    "span"
)

claim_count = driver.find_element(By.CSS_SELECTOR, claim_selector)
claim_count_text = claim_count.text
claim_count_number = int(re.search(r'\d+', claim_count.text).group())
print(f"청구항 개수: {claim_count_number}")

# 도면 총 개수 추출
drawings_selector = "#thumbnails > h3 > span"
drawing_count = driver.find_element(By.CSS_SELECTOR, drawings_selector)
drawing_count_text = drawing_count.text
print(f"도면 개수: {drawing_count_text}")

청구항 개수: 19
도면 개수: 22


In [87]:
# 브라우저 종료
driver.quit()

# 특허 명세서 각 항목 추출 시작(제목 - 요약서 - 청구항 - 발명의 설명 순)

In [88]:
# 특허 제목 추출
title = soup.find("span", itemprop="title")
print(title.text)

Computer-readable non-transitory storage medium having game program stored therein, game apparatus, game system, and game processing method 
     


In [89]:
# 특허 명세서에서 요약서(abstract) 추출
abstract = soup.find_all("div", class_="abstract")
print(abstract[0].text)

When a first condition related to a change in a position of a virtual microphone in a virtual space is satisfied, a residual virtual microphone is placed at a position of the virtual microphone before the change in the position, and residual virtual microphone acquisition sound data whose volume is set on the basis of a distance between the residual virtual microphone and a virtual sound source and current virtual microphone acquisition sound data whose volume is set on the basis of a distance between the virtual microphone after the change in the position and the virtual sound source are outputted to a speaker such that an output level of the residual virtual microphone acquisition sound data is gradually decreased and an output level of the current virtual microphone acquisition sound data is gradually increased.


In [90]:
# 특허 명세서에서 청구항(Claims) 추출
claims = []

for i in range(1, claim_count_number+1):
    formatted = f"{i:05d}"
    claims.append(soup.select_one(f"#CLM-{formatted} > div").text)

for claim in claims:
    print(claim + "\n")

 1. A method for controlling a sound outputted from a speaker by game processing, the method comprising:
placing one or more virtual sound sources with each of which sound data is associated, in a virtual space; reproducing the sound data of any of the one or more virtual sound sources; moving a virtual microphone in the virtual space; determining that a first condition related to a change in a position of the virtual microphone in the virtual space is satisfied; and when the first condition related to the change in the position of the virtual microphone in the virtual space is satisfied,
placing a residual virtual microphone at a position corresponding to a position of the virtual microphone before the change in the position, and
crossfading the sound outputted from the speaker, from a second output sound based on sound data acquired virtually by the residual virtual microphone from one of the one or more virtual sound sources, to a first output sound based on sound data acquired virt

In [91]:
# 특허 명세서에서 발명의 설명(Description) 추출
description = soup.find("section", itemprop="description").get_text()
print(description)


Description

CROSS REFERENCE TO RELATED APPLICATION
   This application is a continuation of U.S. patent application Ser. No. 17/957,638 filed Sep. 30, 2022, which claims priority to Japanese Patent Application No. 2021-166400 filed on Oct. 8, 2021, the entire contents of which are incorporated herein by reference.
 FIELD
   The present disclosure relates to sound control processing of outputting a sound to a speaker.
 BACKGROUND AND SUMMARY
   Conventionally, a technology for controlling the sound volume, etc., of an output sound on the basis of the distance between a virtual sound source and a virtual microphone in a virtual space, is known.
    However, in the above technology, when the relative positional relationship between the virtual microphone and the virtual sound source is abruptly changed, there is a possibility that, for example, an abrupt change for sound output occurs, making a user feel uncomfortable.
    Therefore, an object of the present disclosure is to provide a c

# Description을 n글자 단위로 분할

In [92]:
# Description을 5000자 단위로 분할
chunk_size = 5000
description_chunks = []

for i in range(0, len(description), chunk_size):
    chunk = description[i:i + chunk_size]
    description_chunks.append(chunk)

# 분할된 청크 확인
print(f"총 {len(description_chunks)}개의 청크로 분할되었습니다.")
for i, chunk in enumerate(description_chunks):
    print(f"청크 {i+1} 길이: {len(chunk)}자")
    print(chunk)

총 21개의 청크로 분할되었습니다.
청크 1 길이: 5000자

Description

CROSS REFERENCE TO RELATED APPLICATION
   This application is a continuation of U.S. patent application Ser. No. 17/957,638 filed Sep. 30, 2022, which claims priority to Japanese Patent Application No. 2021-166400 filed on Oct. 8, 2021, the entire contents of which are incorporated herein by reference.
 FIELD
   The present disclosure relates to sound control processing of outputting a sound to a speaker.
 BACKGROUND AND SUMMARY
   Conventionally, a technology for controlling the sound volume, etc., of an output sound on the basis of the distance between a virtual sound source and a virtual microphone in a virtual space, is known.
    However, in the above technology, when the relative positional relationship between the virtual microphone and the virtual sound source is abruptly changed, there is a possibility that, for example, an abrupt change for sound output occurs, making a user feel uncomfortable.
    Therefore, an object of the p