# Parsing only part of a document

Let’s say you want to use Beautiful Soup look at a document’s `<a>`tags. It’s a waste of time and memory to parse the entire document and then go over it again looking for `<a>` tags. It would be much faster to ignore everything that wasn’t an `<a>` tag in the first place. The ``SoupStrainer`` class allows you to choose which parts of an incoming document are parsed. You just create a ``SoupStrainer`` and pass it in to the BeautifulSoup constructor as the ``parse_only`` argument.

(Note that *this feature won’t work if you’re using the html5lib parser*. If you use html5lib, the whole document will be parsed, no matter what. This is because html5lib constantly rearranges the parse tree as it works, and if some part of the document didn’t actually make it into the parse tree, it’ll crash. To avoid confusion, in the examples below I’ll be forcing Beautiful Soup to use Python’s built-in parser.)


Analisando apenas parte de um documento
=======================================

Digamos que você queira usar o Beautiful Soup para ver as tags `<a>` do
documento. É um desperdício de tempo e memória analisar o documento inteiro
e, em seguida, passar por ele procurando pelas tags `<a>`. Seria muito
mais rápido ignorar o que não era uma tag `<a>` desde o início. A
classe ``SoupStrainer`` te permite escolher que partes do documento de
entrada que serão analisadas. Você só cria um ``SoupStrainer``e o passa
pelo construtor ``BeautifulSoup`` como o argumento ``parse_only`` (analisar
somente).

(Observe que *este recurso não funcionará se você estiver usando o analisador
html5lib*. Se você usar o html5lib, o documento inteiro será analisado
independentemete. Isso ocorre porque o html5lib reorganiza constantemente a
árvore de análise conforme trabalha e se alguma parte do documento não encaixar
na árvore de análise, a análise irá falhar. Para evitar confusão, nos exemplos
abaixo eu vou forçar o Beautiful Soupe a usar o analisador built-in do Python.)

## ``SoupStrainer``

The ``SoupStrainer`` class takes the same arguments as a typical method from `Searching the tree`_: `name <name>`, `attrs <attrs>`, `string <string>`, and ``**kwargs <kwargs>``. Here are three ``SoupStrainer`` objects:

## ``SoupStrainer``

A classe ``SoupStrainer`` usa os mesmos argumentos de um método típico de `Searching the tree`_: `name <name>`_, `attrs <attrs>`, `string <string>`, e `**kwargs <kwargs>`. Eis aqui três objetos ``SoupStrainer``:

In [33]:
from bs4 import BeautifulSoup, SoupStrainer

html_doc = """
<!DOCTYPE html>
<html>
<head>
	<title>The Dormouse's Story</title>
</head>
<body>
	<p class="title"><b>The Dormouse's Story</b></p>

	<p class="story">Once upon a time there were three little sisters; and their names where <a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>, <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a> and they lived at the botton of a well.</p>
</body>
</html>"""

only_a_tags = SoupStrainer('a')
only_tags_with_id_link2 = SoupStrainer(id='link2')

def is_short_string(string):
    if string is None:
        return False
    return len(string) < 10

only_short_strings = SoupStrainer(string=is_short_string)

I’m going to bring back the “three sisters” document one more time, and we’ll see what the document looks like when it’s parsed with these three `SoupStrainer` objects:

In [34]:
print(BeautifulSoup(html_doc, 'html.parser', parse_only=only_a_tags).prettify())

<a class="sister" href="http://example.com/elsie" id="link1">
 Elsie
</a>
<a class="sister" href="http://example.com/lacie" id="link2">
 Lacie
</a>
<a class="sister" href="http://example.com/tillie" id="link3">
 Tillie
</a>


In [35]:
print(BeautifulSoup(html_doc, 'html.parser', parse_only=only_tags_with_id_link2).prettify())

<a class="sister" href="http://example.com/lacie" id="link2">
 Lacie
</a>


In [38]:
print(BeautifulSoup(html_doc, 'html.parser', parse_only=only_short_strings).prettify())




You can also pass a `SoupStrainer` into any of the methods covered in Searching the tree. This probably isn’t terribly useful, but I thought I’d mention it:

In [39]:
soup = BeautifulSoup(html_doc, 'html.parser')
print(soup.find_all(only_short_strings))

[]
