# References

* [BS4 Quick Start](https://www.crummy.com/software/BeautifulSoup/bs4/doc/#quick-start)

In [1]:
import re
import json

import requests
from bs4 import BeautifulSoup

In [2]:
%%html
<style>
table {float:left}
</style>

# Data

In [3]:
html_doc = """
<html>
<head><title>The Dormouse's story</title></head>
<body>
<p class="title"><b>The Dormouse's story</b></p>

<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>

<p class="story">...</p>

<ix:nonfraction 
  contextref="i531402faf1d04969ac2b2ba0e1680766_I20210403" 
  decimals="-3" 
  format="ixt:numdotdecimal" 
  id="f05f-df5e-45b4-ba6f-72638eca470f" 
  name="us-gaap:CashAndCashEquivalentsAtCarryingValue" 
  scale="3" 
  unitref="usd"
>
1,397,880
</ix:nonfraction>

</body>
</html>
"""

# Runtime

In [4]:
from bs4 import BeautifulSoup
soup = BeautifulSoup(html_doc, 'html.parser')

---
# Find(name, string, limit, recursive)

> Any argument that’s not recognized will be turned into **a filter on one of a tag’s attributes**. If you pass in a value for an argument called id, Beautiful Soup will filter against each tag’s ‘id’ attribute:

## Find tag(s)

In [5]:
soup.find(name='a')

<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>

## Search specific string(s)

* [The string argument](https://www.crummy.com/software/BeautifulSoup/bs4/doc/#the-string-argument) 

> With string you can search for strings instead of tags. As with name and the keyword arguments, you can pass in a string, a regular expression, a list, a function, or the value ```True```.

> The string argument is new in Beautiful Soup 4.4.0. In earlier versions it was called ```text```

In [6]:
soup.find_all(string=['Lacie', 'Elsie'])

['Elsie', 'Lacie']

In [7]:
soup.find_all(string=re.compile(r'^Lacie$|^Elsie$'))

['Elsie', 'Lacie']

In [8]:
soup.find_all(string=lambda x: len(x) > 20)

['Once upon a time there were three little sisters; and their names were\n',
 ';\nand they lived at the bottom of a well.']

## Search all strings

* [True argument value](https://www.crummy.com/software/BeautifulSoup/bs4/doc/#true)

> The value True **matches everything it can**. This code finds all the tags in the document, but none of the text strings:

In [9]:
soup.find_all(string=True)

['\n',
 '\n',
 "The Dormouse's story",
 '\n',
 '\n',
 "The Dormouse's story",
 '\n',
 'Once upon a time there were three little sisters; and their names were\n',
 'Elsie',
 ',\n',
 'Lacie',
 ' and\n',
 'Tillie',
 ';\nand they lived at the bottom of a well.',
 '\n',
 '...',
 '\n',
 '\n1,397,880\n',
 '\n',
 '\n',
 '\n']

## Search string in a tag

In [10]:
soup.a.find_all(string=True, recursive=False)

['Elsie']

In [11]:
soup.find(name='a').find_all(string=True, recursive=False)

['Elsie']

## Filter with tag argument value(s)

> Any argument that’s not recognized will be turned into **a filter on one of a tag’s attributes**. If you pass in a value for an argument called id, Beautiful Soup will filter against each tag’s ‘id’ attribute:

In [12]:
soup.find_all(name='a', id=re.compile('link1|link2'))

[<a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
 <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a>]

## Find tag having specific attributes

In [26]:
tags = soup.find_all(
    string=re.compile(r"[0-9]+"),
    attrs={
        "name": re.compile(r"us-gaap:CashAndCashEquivalents.*"),
        "unitref": True,
        "decimals": True
    }
)

In [27]:
tags

[<ix:nonfraction contextref="i531402faf1d04969ac2b2ba0e1680766_I20210403" decimals="-3" format="ixt:numdotdecimal" id="f05f-df5e-45b4-ba6f-72638eca470f" name="us-gaap:CashAndCashEquivalentsAtCarryingValue" scale="3" unitref="usd">
 1,397,880
 </ix:nonfraction>]