## 웹 개발발

### BeautifulSoup: 웹 데이터 추출을 위한 도구

- https://www.crummy.com/software/BeautifulSoup/

Python으로 HTML 및 XML 파일을 파싱하기 위한 라이브러리 

In [1]:
!pip install beautifulsoup4

Collecting beautifulsoup4
  Using cached beautifulsoup4-4.12.3-py3-none-any.whl.metadata (3.8 kB)
Collecting soupsieve>1.2 (from beautifulsoup4)
  Using cached soupsieve-2.6-py3-none-any.whl.metadata (4.6 kB)
Using cached beautifulsoup4-4.12.3-py3-none-any.whl (147 kB)
Using cached soupsieve-2.6-py3-none-any.whl (36 kB)
Installing collected packages: soupsieve, beautifulsoup4
Successfully installed beautifulsoup4-4.12.3 soupsieve-2.6


In [2]:
from bs4 import BeautifulSoup
import requests

# 웹 페이지 가져오기
response = requests.get("https://example.com")
data = response.text

# BeautifulSoup 객체 생성
soup = BeautifulSoup(data, 'html.parser')

# 타이틀 태그 찾기
title = soup.find('title')
print("Page Title:", title.text)

# 모든 하이퍼링크 찾기
for link in soup.find_all('a'):
    print("Hyperlink:", link.get('href'))

Page Title: Example Domain
Hyperlink: https://www.iana.org/domains/example


### HTTPX: 최신 비동기 HTTP 클라이언트

- https://www.python-httpx.org/

최신 비동기 HTTP 클라이언트 라이브러리. requests 라이브러리의 사용 편의성을 유지하면서 비동기 기능을 제공하고, HTTP/1.1, HTTP/2, 및 자동 콘텐츠 디코딩 등의 현대적 기능을 지원

In [4]:
!pip install httpx

Collecting httpx
  Using cached httpx-0.28.1-py3-none-any.whl.metadata (7.1 kB)
Collecting httpcore==1.* (from httpx)
  Using cached httpcore-1.0.7-py3-none-any.whl.metadata (21 kB)
Using cached httpx-0.28.1-py3-none-any.whl (73 kB)
Using cached httpcore-1.0.7-py3-none-any.whl (78 kB)
Installing collected packages: httpcore, httpx
Successfully installed httpcore-1.0.7 httpx-0.28.1


In [5]:
# 동기코드
import httpx

response = httpx.get('https://www.example.com/')
print(response.status_code)
print(response.text)

200
<!doctype html>
<html>
<head>
    <title>Example Domain</title>

    <meta charset="utf-8" />
    <meta http-equiv="Content-type" content="text/html; charset=utf-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1" />
    <style type="text/css">
    body {
        background-color: #f0f0f2;
        margin: 0;
        padding: 0;
        font-family: -apple-system, system-ui, BlinkMacSystemFont, "Segoe UI", "Open Sans", "Helvetica Neue", Helvetica, Arial, sans-serif;
        
    }
    div {
        width: 600px;
        margin: 5em auto;
        padding: 2em;
        background-color: #fdfdff;
        border-radius: 0.5em;
        box-shadow: 2px 3px 7px 2px rgba(0,0,0,0.02);
    }
    a:link, a:visited {
        color: #38488f;
        text-decoration: none;
    }
    @media (max-width: 700px) {
        div {
            margin: 0 auto;
            width: auto;
        }
    }
    </style>    
</head>

<body>
<div>
    <h1>Example Domain</h1>
    <p>This d

In [None]:
# 비동기 코드 / .py 에서 실행
import httpx
import asyncio

async def main():
    async with httpx.AsyncClient() as client:
        response = await client.get('https://www.example.com/')
        print(response.status_code)
        print(response.text)

asyncio.run(main())

### MechanicalSoup: 자동화 + 웹 스크레이핑 라이브러리

- http://mechanicalsoup.readthedocs.io/

파이썬으로 웹 스크레이핑을 수행할 수 있게 해주는 라이브러리. 웹 페이지의 HTML을 파싱하고, 폼을 채우고, 버튼을 클릭하는 등의 동작을 자동화

In [7]:
!pip install MechanicalSoup

Collecting MechanicalSoup
  Downloading MechanicalSoup-1.3.0-py3-none-any.whl.metadata (6.0 kB)
Collecting lxml (from MechanicalSoup)
  Using cached lxml-5.3.0-cp310-cp310-win_amd64.whl.metadata (3.9 kB)
Downloading MechanicalSoup-1.3.0-py3-none-any.whl (19 kB)
Using cached lxml-5.3.0-cp310-cp310-win_amd64.whl (3.8 MB)
Installing collected packages: lxml, MechanicalSoup
Successfully installed MechanicalSoup-1.3.0 lxml-5.3.0


In [10]:
!pip install lxml



In [11]:
import mechanicalsoup

# 브라우저 객체 생성
browser = mechanicalsoup.StatefulBrowser()

# 웹 페이지 열기
browser.open("https://example.com/")

# 현재 페이지의 HTML 출력
print(browser.page)

# 폼을 찾고 데이터 입력
browser.select_form('form[name="example"]')
browser["some_field"] = "some value"

# 폼 제출
response = browser.submit_selected()

# 제출 후의 페이지 HTML 출력
print(response.text)

FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml. Do you need to install a parser library?

### PyQuery: HTML 문서 파싱 및 조작 라이브러리

- http://pyquery.rtfd.org/

 jQuery의 문법을 사용하여 Python에서 HTML 문서를 파싱하고 조작하기 위한 라이브러리

In [12]:
!pip install pyquery

Collecting pyquery
  Downloading pyquery-2.0.1-py3-none-any.whl.metadata (9.0 kB)
Collecting cssselect>=1.2.0 (from pyquery)
  Downloading cssselect-1.2.0-py2.py3-none-any.whl.metadata (2.2 kB)
Downloading pyquery-2.0.1-py3-none-any.whl (22 kB)
Downloading cssselect-1.2.0-py2.py3-none-any.whl (18 kB)
Installing collected packages: cssselect, pyquery
Successfully installed cssselect-1.2.0 pyquery-2.0.1


In [13]:
from pyquery import PyQuery as pq

# HTML 문서 정의
html = """
<html>
    <head>
        <title>PyQuery 예제</title>
    </head>
    <body>
        <h1>안녕하세요, PyQuery!</h1>
        <p>PyQuery를 사용해서 HTML을 쉽게 다룰 수 있습니다.</p>
    </body>
</html>
"""

# PyQuery 객체 생성
doc = pq(html)

# h1 태그의 텍스트 가져오기
h1_text = doc('h1').text()
print(f'h1 태그의 텍스트: {h1_text}')

# p 태그의 텍스트 변경하기
doc('p').text('이제 내용을 변경했습니다!')
print(f'변경된 p 태그의 텍스트: {doc("p").text()}')


h1 태그의 텍스트: 안녕하세요, PyQuery!
변경된 p 태그의 텍스트: 이제 내용을 변경했습니다!


### PyZMQ: ZeroMQ 라이브러리의 Python 바인딩

- https://zguide.zeromq.org/

 ZeroMQ 메시징 라이브러리의 Python 바인딩으로, 분산 시스템을 위한 고성능 비동기 메시징 라이브러리

In [14]:
!pip install pyzmq



In [None]:
# 서버 코드
import zmq

context = zmq.Context()
socket = context.socket(zmq.REP)
socket.bind("tcp://*:5555")

while True:
    # 클라이언트로부터 메시지를 받고, 응답을 보냄
    message = socket.recv()
    print(f"Received request: {message}")
    socket.send(b"World")

In [None]:
# 클라이언트 코드
import zmq

context = zmq.Context()
socket = context.socket(zmq.REQ)
socket.connect("tcp://localhost:5555")

for request in range(10):
    socket.send(b"Hello")
    message = socket.recv()
    print(f"Received reply {request} [ {message} ]")

### Requests: 간결한 HTTP 라이브러리

- https://requests.readthedocs.io/en/latest/

HTTP 요청을 보내는 작업을 간단하게 만드는 인기 있는 라이브러리

In [1]:
import requests

# 간단한 GET 요청
response = requests.get('https://api.github.com')

# 응답 내용 출력
print(response.text)

# JSON 응답 처리
data = response.json()
print(data)

# POST 요청
post_response = requests.post('https://httpbin.org/post', data={'key': 'value'})
print(post_response.json())

{"current_user_url":"https://api.github.com/user","current_user_authorizations_html_url":"https://github.com/settings/connections/applications{/client_id}","authorizations_url":"https://api.github.com/authorizations","code_search_url":"https://api.github.com/search/code?q={query}{&page,per_page,sort,order}","commit_search_url":"https://api.github.com/search/commits?q={query}{&page,per_page,sort,order}","emails_url":"https://api.github.com/user/emails","emojis_url":"https://api.github.com/emojis","events_url":"https://api.github.com/events","feeds_url":"https://api.github.com/feeds","followers_url":"https://api.github.com/user/followers","following_url":"https://api.github.com/user/following{/target}","gists_url":"https://api.github.com/gists{/gist_id}","hub_url":"https://api.github.com/hub","issue_search_url":"https://api.github.com/search/issues?q={query}{&page,per_page,sort,order}","issues_url":"https://api.github.com/issues","keys_url":"https://api.github.com/user/keys","label_searc

### Scapy: 대화형 패킷 조작 프로그램

- https://scapy.net/
- https://github.com/secdev/scapy

강력한 대화형 패킷 조작 프로그램. 네트워크 테스트, 패킷 분석, 보안 테스트 등에 사용되며 사용자가 네트워크 패킷을 송수신하고 분석

In [3]:
!pip install scapy

Collecting scapy
  Downloading scapy-2.6.1-py3-none-any.whl.metadata (5.6 kB)
Downloading scapy-2.6.1-py3-none-any.whl (2.4 MB)
   ---------------------------------------- 0.0/2.4 MB ? eta -:--:--
   ------------------------- -------------- 1.6/2.4 MB 12.0 MB/s eta 0:00:01
   ------------------------- -------------- 1.6/2.4 MB 12.0 MB/s eta 0:00:01
   ------------------------------ --------- 1.8/2.4 MB 2.7 MB/s eta 0:00:01
   ---------------------------------------- 2.4/2.4 MB 3.2 MB/s eta 0:00:00
Installing collected packages: scapy
Successfully installed scapy-2.6.1


In [4]:
from scapy.all import *

# 대상 IP 주소 설정
target_ip = "8.8.8.8"
# ICMP Echo 요청 생성
packet = IP(dst=target_ip)/ICMP()
# 패킷 전송 및 응답 수신
response = sr1(packet)
# 응답 출력
response.show()



OSError: Windows native L3 Raw sockets are only usable as administrator ! Please install Npcap to workaround !