# 프로젝트명: 데이터 수집 실습 1 (BeautifulSoup4)

#### 📌 BeautifulSoup는 정적인 사이트 수집에 주로 사용되며, html 태그의 데이터들을 가져옵니다.
- BeautifulSoup - 정적, Selenium - 동적
- BeautifulSoup의 단점은 웹 페이지에 스크롤로 움직여야지 데이터가 나오는 구조에선 사용하기 어렵습니다.

#### 🚨 주의사항
- 데이터 수집을 싫어하는 사이트의 정책 존중하기
- 많은 요청은 서버에서 차단할 수 있으니, time.sleep(5) 실행해서 서버 과부하 예방하기

In [1]:
import requests
from bs4 import BeautifulSoup
import pandas as pd

### 1. requests로 데이터 가져오기

In [2]:
url = "https://weworkremotely.com/remote-full-time-jobs"
response = requests.get(url)

## 해당 URL을 요청했을때 정상적으로 응답이 왔는지 확인
if response.status_code == 200:
    print("성공")
else :
    print("실패")

성공


### 2. BS4 사용하기

In [3]:
soup = BeautifulSoup(response.content, 'html.parser')

### 2-1. BS 주요 메소드 
- find : 정의한 요소 1개만 찾아줍니다.
- find_all : 정의한 요소들을 다 찾아줍니다.

![태그의클래스](https://firebasestorage.googleapis.com/v0/b/ls-storage-e452a.appspot.com/o/%E1%84%90%E1%85%A2%E1%84%80%E1%85%B3%E1%84%8F%E1%85%B3%E1%86%AF%E1%84%85%E1%85%A2%E1%84%89%E1%85%B3.png?alt=media&token=4431179e-8b1a-4c24-88ef-655d601cdd22)

In [4]:
# 최상단 제목 크롤링(class 사용)
soup.find('span', class_='title')

<span class="title">Full-Stack Wordpress Developer</span>

In [5]:
soup.find('span', class_='title').text

'Full-Stack Wordpress Developer'

In [6]:
# 최상단 제목 크롤링(selector 사용)
soup.select('#job_list > section > article > ul > li:nth-child(1) > a > span.title')

[<span class="title">Full-Stack Wordpress Developer</span>]

In [7]:
# find all을 사용하여 페이지의 모든 제목 크롤링
all = soup.find_all('span', class_='title')
all

[<span class="title">Full-Stack Wordpress Developer</span>,
 <span class="title">Admin and Support Specialist </span>,
 <span class="title">Senior iOS Engineer</span>,
 <span class="title">Social Media Video Content Creator</span>,
 <span class="title">Front end Manager</span>,
 <span class="title">German Customer Support Specialist</span>,
 <span class="title">HR Business Partner</span>,
 <span class="title">Senior Account Manager – Automotive Industry</span>,
 <span class="title">Account Strategist, Mid-Market Sales, Google Customer Solutions (Italian, English)</span>,
 <span class="title"> Manager, Mandiant Proactive Services, Google Cloud</span>,
 <span class="title">Partnership Manager</span>,
 <span class="title">Growth Manager</span>,
 <span class="title">Managing Editor</span>,
 <span class="title">Project Manager</span>,
 <span class="title">Customer Success Manager (career changers welcome, no prior experience needed, B2B cold email agency, fully remote)</span>,
 <span class=

In [8]:
all[2].text

'Senior iOS Engineer'

반복문을 이용해 다음 페이지 자동 크롤링

In [9]:
for i in range(1,11):
    page = str(i)
    url = "https://weworkremotely.com/remote-full-time-jobs?page=" + page
    print(url)
    # response = requests.get(url)
    # soup = BeautifulSoup(response.content, 'html.parser')
    # all = soup.find_all('span', class_='title')
    # for j in range(len(all)):
    #     print(all[j].text)

https://weworkremotely.com/remote-full-time-jobs?page=1
https://weworkremotely.com/remote-full-time-jobs?page=2
https://weworkremotely.com/remote-full-time-jobs?page=3
https://weworkremotely.com/remote-full-time-jobs?page=4
https://weworkremotely.com/remote-full-time-jobs?page=5
https://weworkremotely.com/remote-full-time-jobs?page=6
https://weworkremotely.com/remote-full-time-jobs?page=7
https://weworkremotely.com/remote-full-time-jobs?page=8
https://weworkremotely.com/remote-full-time-jobs?page=9
https://weworkremotely.com/remote-full-time-jobs?page=10


동적 웹페이지 크롤링 시도

In [10]:
url = "http://m.sports.naver.com"
response = requests.get(url)

## 크롤링 시도
soup = BeautifulSoup(response.content, 'html.parser')
soup.find_all('strong', class_='ReporterSubscription_news_title__voqZ4')

[]