# **1. 크롤링**
크롤링(Crawling)이란 웹 크롤러(Web Crawler) 또는 스크레이퍼(Scraper)라고 불리는 프로그램이나 스크립트를 사용하여 인터넷상의 웹 페이지에서 데이터를 자동으로 수집하는 과정을 말합니다. 주로 검색 엔진이 웹사이트를 탐색하고 색인(Index)에 추가하기 위해 사용하는 기술이며, 특정 주제나 데이터를 수집해 분석하는 데에도 활용됩니다. 크롤링은 HTML, CSS, JavaScript로 구성된 웹페이지 구조를 파싱(Parsing)하여 원하는 정보를 추출하고, 이를 활용 가능한 데이터 형식으로 저장합니다. 크롤링 시에는 웹사이트의 이용 약관과 로봇 배제 표준(Robots.txt)을 준수하여 법적, 윤리적 문제를 방지해야 합니다.

# **2. Basic English Speaking**
- [사이트](https://basicenglishspeaking.com/daily-english-conversation-topics/)

In [2]:
import requests
from bs4 import BeautifulSoup

In [4]:
site = 'https://basicenglishspeaking.com/daily-english-conversation-topics/'
request = requests.get(site)
print(request) # 200: 정상적인 접속

<Response [200]>


In [5]:
request.text

'<!DOCTYPE html><html lang="en-US"><head><meta charset="UTF-8"/>\n<script>var __ezHttpConsent={setByCat:function(src,tagType,attributes,category,force,customSetScriptFn=null){var setScript=function(){if(force||window.ezTcfConsent[category]){if(typeof customSetScriptFn===\'function\'){customSetScriptFn();}else{var scriptElement=document.createElement(tagType);scriptElement.src=src;attributes.forEach(function(attr){for(var key in attr){if(attr.hasOwnProperty(key)){scriptElement.setAttribute(key,attr[key]);}}});var firstScript=document.getElementsByTagName(tagType)[0];firstScript.parentNode.insertBefore(scriptElement,firstScript);}}};if(force||(window.ezTcfConsent&&window.ezTcfConsent.loaded)){setScript();}else if(typeof getEzConsentData==="function"){getEzConsentData().then(function(ezTcfConsent){if(ezTcfConsent&&ezTcfConsent.loaded){setScript();}else{console.error("cannot get ez consent data");force=true;setScript();}});}else{force=true;setScript();console.error("getEzConsentData is not

In [6]:
soup = BeautifulSoup(request.text, "html.parser")

In [7]:
box_div = soup.find('div', {'class':'thrv-columns'})
box_div

<div class="thrv_wrapper thrv-columns" style="--tcb-col-el-width:792;"><div class="tcb-flex-row tcb--cols--3"><div class="tcb-flex-col"><div class="tcb-col"><div class="thrv_wrapper thrv_text_element"><p>1. <a class="tve-froala" href="https://basicenglishspeaking.com/family/" style="outline: none;">Family</a><br/>2. <a class="tve-froala" href="https://basicenglishspeaking.com/restaurant/" style="outline: none;">Restaurant</a><br/>3. <a href="https://basicenglishspeaking.com/books/">Books</a><br/>4. <a href="https://basicenglishspeaking.com/travel/">Travel</a><br/>5. <a href="https://basicenglishspeaking.com/website/">Website</a><br/>6. <a href="https://basicenglishspeaking.com/accident/">Accident</a><br/>7. <a class="tve-froala" href="https://basicenglishspeaking.com/childhood-memory/" style="outline: none;">Childhood memory</a><br/>8. <a class="tve-froala" href="https://basicenglishspeaking.com/favorite-rooms/" style="outline: none;">Favorite rooms</a><br/>9. <a href="https://basiceng

In [8]:
links = box_div.find_all('a')
links

[<a class="tve-froala" href="https://basicenglishspeaking.com/family/" style="outline: none;">Family</a>,
 <a class="tve-froala" href="https://basicenglishspeaking.com/restaurant/" style="outline: none;">Restaurant</a>,
 <a href="https://basicenglishspeaking.com/books/">Books</a>,
 <a href="https://basicenglishspeaking.com/travel/">Travel</a>,
 <a href="https://basicenglishspeaking.com/website/">Website</a>,
 <a href="https://basicenglishspeaking.com/accident/">Accident</a>,
 <a class="tve-froala" href="https://basicenglishspeaking.com/childhood-memory/" style="outline: none;">Childhood memory</a>,
 <a class="tve-froala" href="https://basicenglishspeaking.com/favorite-rooms/" style="outline: none;">Favorite rooms</a>,
 <a href="https://basicenglishspeaking.com/presents/">Presents</a>,
 <a class="tve-froala" href="https://basicenglishspeaking.com/historical-place/" style="outline: none;">Historical place</a>,
 <a class="tve-froala" href="https://basicenglishspeaking.com/newspaper-magazi

In [9]:
links[0].text

'Family'

In [10]:
for link in links:
    print(link.text)

Family
Restaurant
Books
Travel
Website
Accident
Childhood memory
Favorite rooms
Presents
Historical place
Newspaper/ Magazine
A memorable event
A favorite subject
A museum
A favorite movie
A foreign country
Parties
A teacher
A friend
A hotel
A letter
Hobbies
Music
Shopping
Holiday
Animals
A practical skill
Sport
A School
Festival
Food
Household appliance
A music band
Weather
Neighbor
Natural scenery
Outdoor activities
Law
Pollution
Traffic jam
TV program
Architect/ Building
Electronic Media
Job/ Career
Competition/ contest
A garden
Hometown
Clothing
Advertisement
A project
A wedding
A Coffee shop
Culture
Transport
Politician
Communication
Business
Computer
Exercise
Goal/ ambition
Art
Fashion
Jewelry
Cosmetic
Indoor Game
Phone conversation
Learning A Second language
A Creative Person
A celebrity
A Health Problem
Technological advancements
A Landmark
Handcraft Items
Plastic Surgery
Success


In [11]:
subject = []
for link in links:
    subject.append(link.text)

In [12]:
print(len(subject))

75


In [13]:
print(f'총 {len(subject)}개의 주제를 찾았습니다.')
for i in range(len(subject)):
    print(f'{i+1}, {subject[i]}')

총 75개의 주제를 찾았습니다.
1, Family
2, Restaurant
3, Books
4, Travel
5, Website
6, Accident
7, Childhood memory
8, Favorite rooms
9, Presents
10, Historical place
11, Newspaper/ Magazine
12, A memorable event
13, A favorite subject
14, A museum
15, A favorite movie
16, A foreign country
17, Parties
18, A teacher
19, A friend
20, A hotel
21, A letter
22, Hobbies
23, Music
24, Shopping
25, Holiday
26, Animals
27, A practical skill
28, Sport
29, A School
30, Festival
31, Food
32, Household appliance
33, A music band
34, Weather
35, Neighbor
36, Natural scenery
37, Outdoor activities
38, Law
39, Pollution
40, Traffic jam
41, TV program
42, Architect/ Building
43, Electronic Media
44, Job/ Career
45, Competition/ contest
46, A garden
47, Hometown
48, Clothing
49, Advertisement
50, A project
51, A wedding
52, A Coffee shop
53, Culture
54, Transport
55, Politician
56, Communication
57, Business
58, Computer
59, Exercise
60, Goal/ ambition
61, Art
62, Fashion
63, Jewelry
64, Cosmetic
65, Indoor Game
6

# **3. 다음 뉴스기사 제목 크롤링**

- https://v.daum.net/v/20250715064132016

- https://v.daum.net/v/20250715062440818

- https://v.daum.net/v/20250715064141021

In [14]:
def daum_news_title(news_id):
    url = f'https://v.daum.net/v/{news_id}'
    request = requests.get(url)
    soup = BeautifulSoup(request.text, 'html.parser')
    title = soup.find('h3', {'class':'tit_view'})
    if title:
        return title.text.strip()
    return '제목없음'

In [15]:
daum_news_title('20250715064132016')

'키움, 후반기 앞두고 초강수…홍원기 감독·고형욱 단장·김창현 수석코치과 결별'

In [16]:
daum_news_title('20250715062440818')

'이승엽 양아들? 작년까지 SNS로 욕만 먹었는데.. → 그의 \'각성\'은 우연이 아니다. "아직 성에 덜 차시겠지만"'

In [17]:
daum_news_title('20250715100936185')

'“뜨거워도 너무 뜨겁다”…국내 가상자산 거래소 하루 거래량 폭증[투자360]'

# **4. 벅스뮤직 차트**
- [사이트](https://music.bugs.co.kr/chart)

In [18]:
request = requests.get('https://music.bugs.co.kr/chart')
print(request)

<Response [200]>


In [23]:
soup = BeautifulSoup(request.text, 'html.parser')

titles = soup.find_all('p', {'class':'title'})
# print(titles)
artists = soup.find_all('p', {'class':'artist'})
# print(artists)

In [25]:
# 1. 노래제목 - 가수
for i, (t, a) in enumerate(zip(titles, artists), 1):
    title = t.text.strip()
    artist = a.text.strip().split('\n')[0]
    print(f'{i}. {title} - {artist}')

1. Golden - HUNTR/X
2. 뛰어(JUMP) - BLACKPINK
3. Soda Pop - Saja Boys
4. FAMOUS - ALLDAY PROJECT
5. Dirty Work - aespa
6. Your Idol - Saja Boys
7. WICKED - ALLDAY PROJECT
8. Drowning - WOODZ
9. How It’s Done - HUNTR/X
10. Pookie - FIFTY FIFTY
11. 눈물참기 - QWER
12. 너에게 닿기를 - 10CM
13. like JENNIE - 제니 (JENNIE)
14. 여름이었다 - H1-KEY (하이키)
15. 시작의 아이 - 마크툽(MAKTUB)
16. 빌려온 고양이 (Do the Dance) - 아일릿(ILLIT)
17. Whiplash - aespa
18. LIKE YOU BETTER - 프로미스나인
19. 모르시나요(PROD.로코베리) - 조째즈
20. Takedown - HUNTR/X
21. STYLE - Hearts2Hearts (하츠투하츠)
22. Never Ending Story - 아이유(IU)
23. Free - Rumi
24. HANDS UP - MEOVV (미야오)
25. CHILLER - NCT DREAM
26. toxic till the end - 로제(ROSÉ)
27. BTTF - NCT DREAM
28. REBEL HEART - IVE (아이브)
29. 나는 반딧불 - 황가람
30. 청춘만화 - 이무진
31. 네모의 꿈 - 아이유(IU)
32. TOO BAD (feat. Anderson .Paak) - G-DRAGON
33. HAPPY - DAY6 (데이식스)
34. HOT - LE SSERAFIM (르세라핌)
35. APT. - 로제(ROSÉ)
36. 한 페이지가 될 수 있게 - DAY6 (데이식스)
37. THIS IS FOR - TWICE (트와이스)
38. Maybe Tomorrow - DAY6 (데이식스)
39. 고민중독 - QWER
40. 

# **5. 멜론 차트**
- [사이트](https://www.melon.com/chart/index.htm)

In [26]:
request = requests.get('https://www.melon.com/chart/index.htm')
request

<Response [406]>

In [27]:
# Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/138.0.0.0 Safari/537.36
headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64)'}
request = requests.get('https://www.melon.com/chart/index.htm', headers=headers)
request

<Response [200]>

In [30]:
soup = BeautifulSoup(request.text, 'html.parser')
titles = soup.find_all('div', {'class':'rank01'})
artists = soup.find_all('span', {'class':'checkEllipsis'})

In [32]:
for i, (t, a) in enumerate(zip(titles, artists), 1):
    title = t.text.strip()
    artist = a.text.strip()
    print(f'{i}. {title} - {artist}')

1. Golden - HUNTR/X, EJAE, AUDREY NUNA, REI AMI, KPop Demon Hunters Cast
2. FAMOUS - ALLDAY PROJECT
3. Dirty Work - aespa
4. Soda Pop - KPop Demon Hunters Cast, Danny Chung, Saja Boys, Andrew Choi, Neckwav, Kevin Woo, samUIL Lee
5. 뛰어(JUMP) - BLACKPINK
6. Drowning - WOODZ
7. 시작의 아이 - 마크툽 (MAKTUB)
8. 너에게 닿기를 - 10CM
9. 모르시나요(PROD.로코베리) - 조째즈
10. 어제보다 슬픈 오늘 - 우디 (Woody)
11. Whiplash - aespa
12. like JENNIE - 제니 (JENNIE)
13. Never Ending Story - 아이유
14. HOME SWEET HOME (feat. 태양, 대성) - G-DRAGON
15. WICKED - ALLDAY PROJECT
16. Your Idol - KPop Demon Hunters Cast, Danny Chung, Saja Boys, Andrew Choi, Neckwav, Kevin Woo, samUIL Lee
17. 청춘만화 - 이무진
18. 나는 반딧불 - 황가람
19. TOO BAD (feat. Anderson .Paak) - G-DRAGON
20. 눈물참기 - QWER
21. HANDS UP - MEOVV (미야오)
22. APT. - 로제 (ROSÉ), Bruno Mars
23. HAPPY - DAY6 (데이식스)
24. LIKE YOU BETTER - 프로미스나인
25. REBEL HEART - IVE (아이브)
26. 오늘만 I LOVE YOU - BOYNEXTDOOR
27. 소나기 - 이클립스 (ECLIPSE)
28. Flower - 오반(OVAN)
29. 한 페이지가 될 수 있게 - DAY6 (데이식스)
30. 빌려온 고양이 (Do the 

# **6. 네이버 증권**
- https://finance.naver.com/item/main.naver?code=260930

In [37]:
site = 'https://finance.naver.com/item/main.naver?code=260930'
request = requests.get(site)
request

<Response [200]>

In [38]:
with open('snapshot_2025-07-15.html', 'w', encoding='utf-8') as f:
    f.write(request.text)

In [40]:
soup = BeautifulSoup(request.text, 'html.parser')

div_totalinfo = soup.find('div', {'class':'new_totalinfo'})
div_totalinfo

<div class="new_totalinfo" id="middle">
<dl class="blind">
<dt>종목 시세 정보</dt>
<dd>2025년 07월
        15일 12시
        24분 기준 장중</dd>
<dd>종목명 씨티케이</dd>
<dd>종목코드 260930 코스닥</dd>
<dd>현재가 7,420 전일대비 상승 890
            플러스
            13.63 퍼센트
        </dd>
<dd>전일가 6,530</dd>
<dd>시가 6,880</dd>
<dd>고가 8,050</dd>
<dd>상한가 8,480</dd>
<dd>저가 6,530</dd>
<dd>하한가 4,580</dd>
<dd>거래량 21,025,922</dd>
<dd>거래대금 157,010백만</dd>
</dl>
<div class="h_company">
<div class="wrap_company">
<h2><a href="#" onclick="clickcr(this, 'sop.title', '', '', event);window.location.reload();">씨티케이</a>
</h2>
<div class="description">
<span class="code">260930</span>
<img alt="코스닥" class="kosdaq" height="16" src="https://ssl.pstatic.net/imgstock/item_renewal/btn_kosdaq.gif" width="33"/>
<span class="blind">날짜</span>
<span id="time">
<em class="date">2025.07.15  12:24 <span>기준(KRX 장중)</span></em>
</span>
<em class="realtime">
<span class="blind">실시간</span>
</em>
<em class="summary">
<a href="#" onclick="togglePannel('summary_l

In [41]:
name = div_totalinfo.find('h2').text
name

'씨티케이\n'

In [44]:
div_today = div_totalinfo.find('div', {'class':'today'})
# div_today
price = div_today.find('span', {'class':'blind'}).text
# price
price = int(price.replace(",",""))
price

7420

In [50]:
table_no_info = soup.find('table', {'class':'no_info'})
# table_no_info
tds = table_no_info.find_all('td')
# tds
volume = tds[2].find('span', {'class':'blind'}).text
volume = int(volume.replace(",",""))
volume

21025922

In [51]:
dic = {"name":name, "code":"260930", "price":price, "volume":volume}
dic

{'name': '씨티케이\n', 'code': '260930', 'price': 7420, 'volume': 21025922}

In [52]:
def naver_finance(code):
    site = f'https://finance.naver.com/item/main.naver?code={code}'
    request = requests.get(site)

    soup = BeautifulSoup(request.text, 'html.parser')

    div_totalinfo = soup.find('div', {'class':'new_totalinfo'})
    name = div_totalinfo.find('h2').text
    div_today = div_totalinfo.find('div', {'class':'today'})
    price = div_today.find('span', {'class':'blind'}).text
    price = int(price.replace(",",""))
    table_no_info = soup.find('table', {'class':'no_info'})
    tds = table_no_info.find_all('td')
    volume = tds[2].find('span', {'class':'blind'}).text
    volume = int(volume.replace(",",""))
    dic = {"name":name, "code":code, "price":price, "volume":volume}
    return dic

In [53]:
naver_finance('260930')

{'name': '씨티케이\n', 'code': '260930', 'price': 7350, 'volume': 21375013}

In [54]:
naver_finance('004140')

{'name': '동방\n', 'code': '004140', 'price': 3895, 'volume': 56383315}