# Intro to BeautifulSoup

In [1]:
import requests
from bs4 import BeautifulSoup

In [2]:
url = 'https://en.wikipedia.org/wiki/Python_(programming_language)'

In [3]:
response = requests.get(url)

In [4]:
response.status_code

200

In [6]:
type(response.text)

str

In [11]:
soup = BeautifulSoup(response.text) # 'html.parser', 'xml', 'html5lib', 'lxml' (default)

In [12]:
type(soup)

bs4.BeautifulSoup

In [14]:
#  print(soup.prettify())
soup.name

'[document]'

In [16]:
soup.attrs

{}

In [19]:
tag = soup.find('div')

In [21]:
type(tag)

bs4.element.Tag

In [22]:
tag.attrs

{'id': 'mw-page-base', 'class': ['noprint']}

In [23]:
tag

<div class="noprint" id="mw-page-base"></div>

In [24]:
tags = soup.find_all('div')

In [25]:
len(tags)

150

In [27]:
tags = soup.find_all(name = ['div', 'p'])

In [28]:
len(tags)

230

In [34]:
tags = soup.find_all(attrs={'aria-labelledby':True})

In [35]:
len(tags)

23

**Find_all attributes:**
- recursive - by default
- limit - find not more than ...

### Example

1. Let's try to find all links

In [38]:
links = soup.find_all('a', attrs={'href': True})

In [39]:
len(links)

2234

In [40]:
links[0]

<a href="/wiki/Wikipedia:Good_articles" title="This is a good article. Click here for more information."><img alt="This is a good article. Click here for more information." data-file-height="185" data-file-width="180" decoding="async" height="20" src="//upload.wikimedia.org/wikipedia/en/thumb/9/94/Symbol_support_vote.svg/19px-Symbol_support_vote.svg.png" srcset="//upload.wikimedia.org/wikipedia/en/thumb/9/94/Symbol_support_vote.svg/29px-Symbol_support_vote.svg.png 1.5x, //upload.wikimedia.org/wikipedia/en/thumb/9/94/Symbol_support_vote.svg/39px-Symbol_support_vote.svg.png 2x" width="19"/></a>

In [41]:
import re

In [42]:
links = soup.find_all('a', attrs={'href': re.compile('https://')})

In [43]:
len(links)

538

In [50]:
links[3]

<a class="extiw" href="https://en.wikibooks.org/wiki/Python_Programming" title="wikibooks:Python Programming">Python Programming</a>

Get the text from those links

In [49]:
links[3].text

'Python Programming'

**If we want to take some special info from page**

![image.png](attachment:image.png)

In [51]:
result = soup.find('div', attrs={'id': 'toc'})
result

<div aria-labelledby="mw-toc-heading" class="toc" id="toc" role="navigation"><input class="toctogglecheckbox" id="toctogglecheckbox" role="button" style="display:none" type="checkbox"/><div class="toctitle" dir="ltr" lang="en"><h2 id="mw-toc-heading">Contents</h2><span class="toctogglespan"><label class="toctogglelabel" for="toctogglecheckbox"></label></span></div>
<ul>
<li class="toclevel-1 tocsection-1"><a href="#History"><span class="tocnumber">1</span> <span class="toctext">History</span></a></li>
<li class="toclevel-1 tocsection-2"><a href="#Design_philosophy_and_features"><span class="tocnumber">2</span> <span class="toctext">Design philosophy and features</span></a></li>
<li class="toclevel-1 tocsection-3"><a href="#Syntax_and_semantics"><span class="tocnumber">3</span> <span class="toctext">Syntax and semantics</span></a>
<ul>
<li class="toclevel-2 tocsection-4"><a href="#Indentation"><span class="tocnumber">3.1</span> <span class="toctext">Indentation</span></a></li>
<li class

In [53]:
elements = result.find_all('li')
len(elements)

29

In [54]:
for element in elements:
    print(element.text)

1 History
2 Design philosophy and features
3 Syntax and semantics

3.1 Indentation
3.2 Statements and control flow
3.3 Expressions
3.4 Methods
3.5 Typing
3.6 Arithmetic operations


3.1 Indentation
3.2 Statements and control flow
3.3 Expressions
3.4 Methods
3.5 Typing
3.6 Arithmetic operations
4 Programming examples
5 Libraries
6 Development environments
7 Implementations

7.1 Reference implementation
7.2 Other implementations
7.3 Unsupported implementations
7.4 Cross-compilers to other languages
7.5 Performance


7.1 Reference implementation
7.2 Other implementations
7.3 Unsupported implementations
7.4 Cross-compilers to other languages
7.5 Performance
8 Development
9 API documentation generators
10 Naming
11 Popularity
12 Uses
13 Languages influenced by Python
14 See also
15 References

15.1 Sources


15.1 Sources
16 Further reading
17 External links
