<a href="https://colab.research.google.com/github/peculab/PythonAI4Beginners/blob/main/week13_%E7%B6%B2%E8%B7%AF%E8%B3%87%E6%96%99%E8%87%AA%E5%8B%95%E7%88%AC%E5%8F%96%E8%88%87%E5%88%86%E6%9E%90.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

✅ 1. 從網路上抓取新聞（Web Scraping）

以 ptt 為例，使用 requests + BeautifulSoup 抓取新聞標題與內容。

✅ 2. 分析新聞內容（NLP 分析）

用 SnowNLP（中文）進行情緒分析。

✅ 3. 視覺化分析結果

用 pandas 做表格處理，plotly 視覺化結果。

In [20]:
!pip install snownlp plotly beautifulsoup4



In [22]:
import requests
from bs4 import BeautifulSoup
from snownlp import SnowNLP
import pandas as pd
import plotly.express as px

# 模擬 header 與 cookies（解鎖 Gossiping）
HEADERS = {
    'User-Agent': 'Mozilla/5.0'
}
COOKIES = {'over18': '1'}

In [23]:
def get_ptt_articles(board='Stock', max_pages=2):
    base_url = f'https://www.ptt.cc/bbs/{board}/index.html'
    articles = []

    for _ in range(max_pages):
        res = requests.get(base_url, headers=HEADERS, cookies=COOKIES)
        soup = BeautifulSoup(res.text, 'html.parser')
        entries = soup.select('div.r-ent')

        for entry in entries:
            title_tag = entry.select_one('div.title > a')
            if title_tag:
                title = title_tag.text.strip()
                link = 'https://www.ptt.cc' + title_tag['href']

                try:
                    article_res = requests.get(link, headers=HEADERS, cookies=COOKIES)
                    article_soup = BeautifulSoup(article_res.text, 'html.parser')
                    content = article_soup.select_one('#main-content').text.split('--')[0]
                    content = '\n'.join(line for line in content.split('\n') if not line.startswith('※'))
                    sentiment = SnowNLP(content).sentiments
                except Exception as e:
                    content = ""
                    sentiment = None

                articles.append({
                    '標題': title,
                    '連結': link,
                    '情緒分數': sentiment
                })

        # 下一頁連結
        paging = soup.select_one('div.btn-group-paging > a.btn.wide:nth-child(2)')
        if paging:
            base_url = 'https://www.ptt.cc' + paging['href']
        else:
            break

    return pd.DataFrame(articles)

In [24]:
# 抓 PTT Stock 看板 2 頁
df = get_ptt_articles('Stock', max_pages=2)
df = df.dropna(subset=['情緒分數'])

In [25]:
df

Unnamed: 0,標題,連結,情緒分數
0,[新聞] 台積論壇 首揭機器人願景,https://www.ptt.cc/bbs/Stock/M.1745536426.A.B9...,2.643736e-10
1,[新聞] Alphabet Q1營收增長亮眼、將砸700億美,https://www.ptt.cc/bbs/Stock/M.1745536823.A.1D...,0.0
2,[新聞] 烏克蘭每個月消耗10萬台無人機　京鼎、緯,https://www.ptt.cc/bbs/Stock/M.1745537298.A.24...,0.0
3,[情報] intel財報發佈，盤後跌5%,https://www.ptt.cc/bbs/Stock/M.1745540291.A.0F...,2.990941e-13
4,[新聞] 各說各話？川普曝美中官員已會晤 中方打,https://www.ptt.cc/bbs/Stock/M.1745540821.A.03...,0.0
5,Re: [新聞] 特斯拉財報重挫71%！馬斯克嘆氣盼川普降,https://www.ptt.cc/bbs/Stock/M.1745540981.A.3B...,4.773877e-08
6,[閒聊] 2025/04/25 盤中閒聊,https://www.ptt.cc/bbs/Stock/M.1745541002.A.F7...,1.036439e-08
7,Re: [新聞] 特斯拉財報重挫71%！馬斯克嘆氣盼川普降,https://www.ptt.cc/bbs/Stock/M.1745544624.A.62...,0.0
8,[公告] 股票板板規 v4.7 (2024/10/06 修正),https://www.ptt.cc/bbs/Stock/M.1719872231.A.9B...,0.0003780178
9,[公告] 4-6-1的初犯罰則在三個月內將加重至30天,https://www.ptt.cc/bbs/Stock/M.1739730011.A.26...,0.0


In [26]:
# 畫圖
fig = px.bar(
    df.sort_values('情緒分數'),
    x='情緒分數',
    y='標題',
    orientation='h',
    title='PTT 文章情緒分析',
    height=600
)
fig.update_layout(yaxis=dict(automargin=True))
fig.show()