remove references to clean_html, now replaced by BeautifulSoup

nltk · Sep 9, 2015 · 1d83322 · 1d83322
1 parent 03f3e1d
commit 1d83322
Show file tree

Hide file tree

Showing 2 changed files with 1 addition and 9 deletions.
diff --git a/book/ch03.rst b/book/ch03.rst
@@ -300,9 +300,6 @@ of a blog, as shown below:
     'was', 'being', 'au', 'courant', ',', 'I', 'mentioned', 'the', 'expression',
     'DUI4XIANG4', '\u5c0d\u8c61', '("', 'boy', '/', 'girl', 'friend', '"', ...]
 
-..
-       >>> word_tokenize(nltk.clean_html(llog.entries[2].content[0].value))
-
 With some further work, we can write programs to create a small corpus of blog posts,
 and use this as the basis for our |NLP| work.
 

diff --git a/book/ch11.rst b/book/ch11.rst
@@ -544,7 +544,7 @@ the original file *using the original word processor*.
 
 Once we know the data is correctly formatted, we
 can write other programs to convert the data into a different format.
-The program in code-html2csv_ strips out the HTML markup using ``nltk.clean_html()``,
+The program in code-html2csv_ strips out the HTML markup using the ``BeautifulSoup`` library,
 extracts the words and their pronunciations, and generates output
 in "comma-separated value" (CSV) format.
 
@@ -572,11 +572,6 @@ with gzip.open(fn+".gz","wb") as f_out:
     f_out.write(bytes(s, 'UTF-8'))
 
 
-.. note::
-   For more sophisticated processing of |HTML|, use the *Beautiful Soup* package,
-   available from ``http://www.crummy.com/software/BeautifulSoup/``
-
-
 Obtaining Data from Spreadsheets and Databases
 ----------------------------------------------