# Soups - Editing

Soups is a module that contains functions for the Beautiful Soup library.

This notebook shows functions that can be used to edit a soup object.

# Initialization

The following code imports soups. The code assumes that the current directory contains the scrape package.

In [1]:
import os
import sys
CURR_DIR = os.path.dirname(os.path.abspath('..'))
print('Current dir: ' + CURR_DIR)
sys.path.append(CURR_DIR)
from scrape import soups
from bs4 import BeautifulSoup

Current dir: D:\Projects\Python\projects\scrape
Initializing scrape ...


# Editing a soup

In [2]:
page = """<html><head><title>The Dormouse's story</title></head>
    <body>
    <p class="title"><b>The Dormouse's story</b></p>

    <p class="story">Once upon a time there were three little sisters; and their names were
    <a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>,
    <a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
    <a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
    and they lived at the bottom of a well.</p>

    <p class="story">...</p>
    """

soup = soups.get_soup(page)

### Splitting and merging a soup
The method `split` returns a Beautiful Soup object, meta, which contains all HTML code outside \< body \>, plus an empty \< body \> and a tag which contains the \< body \>. Splitting provides a deep copy.

In [3]:
[meta,body] = soups.split(soup)
print('Meta:')
print(meta)
print('\nBody:')
print(body)

Meta:
<html><head><title>The Dormouse's story</title></head>
<body></body></html>

Body:
<body>
<p class="title"><b>The Dormouse's story</b></p>
<p class="story">Once upon a time there were three little sisters; and their names were
    <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
    <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a> and
    <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>;
    and they lived at the bottom of a well.</p>
<p class="story">...</p>
</body>


In [4]:
#Change meta and body
meta.title.string = meta.title.string.upper()
atag = body.find(id="link1")
atag.string = atag.string.upper()

new_soup = soups.merge(meta,body)
print('Old Soup:')
print(soup)

print('\nNew Soup:')
print(new_soup)

Old Soup:
<html><head><title>The Dormouse's story</title></head>
<body>
<p class="title"><b>The Dormouse's story</b></p>
<p class="story">Once upon a time there were three little sisters; and their names were
    <a class="sister" href="http://example.com/elsie" id="link1">Elsie</a>,
    <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a> and
    <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>;
    and they lived at the bottom of a well.</p>
<p class="story">...</p>
</body></html>

New Soup:
<html><head><title>THE DORMOUSE'S STORY</title></head>
<body>
<p class="title"><b>The Dormouse's story</b></p>
<p class="story">Once upon a time there were three little sisters; and their names were
    <a class="sister" href="http://example.com/elsie" id="link1">ELSIE</a>,
    <a class="sister" href="http://example.com/lacie" id="link2">Lacie</a> and
    <a class="sister" href="http://example.com/tillie" id="link3">Tillie</a>;
    and they lived at the b