#Table of Contents
* [Plan](#Plan)
* [Variable & Function Definitions](#Variable-&-Function-Definitions)
* [Test reading in just 1 Ne 1](#Test-reading-in-just-1-Ne-1)
* [Test reading in all of 1 Ne & constructing dictionary](#Test-reading-in-all-of-1-Ne-&-constructing-dictionary)
* [Test pickling dictionary](#Test-pickling-dictionary)
* [Read 1 Ne and 2 Ne & save](#Read-1-Ne-and-2-Ne-&-save)
* [Now let's get the whole Book of Mormon and save it!](#Now-let's-get-the-whole-Book-of-Mormon-and-save-it!)


In [1]:
%%javascript
IPython.load_extensions('calico-document-tools');

<IPython.core.display.Javascript object>

In [42]:
from bs4 import BeautifulSoup
import urllib2
from collections import namedtuple
import pickle
from __future__ import division
from __future__ import print_function

# Plan

Purpose: Extract text of all verses in the current online Book of Mormon from lds.org

Inputs:
- List containing tuples to specify book names & # of chapters like this: [ ('1-ne', 22), ('2-ne', 33), etc.]
  - Short chapters such as Enos, Jarom, etc. are specified to have one chapter
- Base url for Book of Mormon: https://www.lds.org/scriptures/bofm/
  - Chapter urls are constructed like this: https://www.lds.org/scriptures/bofm/1-ne/1?lang=eng

Ouput: dict

    { '1-ne':[ [list for chapter 1, each element being the text for the corresponding verse], 
               [same for chapter 2], etc.
             ], 
      '2-ne':[ same type of list of chapter lists], 
      ...
    }
    
  Example access of Moroni 10:5 (we're indexing from zero so we access chp_num-1 and verse_num-1):
  
    bom_text['moro'][9][4]
    
**Questions**

1. How save final dictionary? Need to put into a date labeled file so I can build up a history of the text over time.
  1. http://stackoverflow.com/questions/19201290/python-how-to-read-save-dict-to-file
  2. https://wiki.python.org/moin/UsingPickle

# Variable & Function Definitions

In [20]:
urlbase = 'https://www.lds.org/scriptures/bofm/'
urlpostfix = '?lang=eng'
def make_url( book, chapter, urlbase='https://www.lds.org/scriptures/bofm/', urlpostfix='?lang=eng'):
    return urlbase + book + '/' + str(chapter) + urlpostfix

In [21]:
make_url('1-ne',1)

'https://www.lds.org/scriptures/bofm/1-ne/1?lang=eng'

In [72]:
Book = namedtuple('Book', 'book num_chps')
bookofmormon = [Book('1-ne', 22),
                Book('2-ne', 33),
                Book('jacob', 7),
                Book('enos', 1),
                Book('jarom', 1),
                Book('omni', 1),
                Book('w-of-m', 1),
                Book('mosiah', 29),
                Book('alma', 63),
                Book('hel', 16),
                Book('3-ne', 30),
                Book('4-ne', 1),
                Book('morm', 9),
                Book('ether', 15),
                Book('moro', 10)]

print(len(bookofmormon), bookofmormon[0], bookofmormon[3].book, bookofmormon[3].num_chps, bookofmormon[3][1])

15 Book(book='1-ne', num_chps=22) enos 1 1


In [31]:
print(make_url(bookofmormon[0].book,1))

https://www.lds.org/scriptures/bofm/1-ne/1?lang=eng


In [33]:
def extract_verse_text(verse_html, verbose=False):
    s = ''
    for i,item in enumerate(verse_html):
        if verbose is True: 
            print(i, item)
        if item.name in ['span', 'sup']:
            if verbose is True: 
                print('   *** IS span OR sup SO SKIP***')
        elif item.name is 'a':
            temp = item['class']
            if verbose is True: 
                print('   --- ', temp, '---', temp[0], item.text, item.string)
            if 'footnote' in temp:
                if verbose is True: 
                    print('      --- THIS IS A FOOTNOTE ---')
                s += item.text
            pass
        else:
            s += item.string
    return s

def extract_verse_text_in_chapter_from_verses_html(verses_html):
    '''
    Purpose: given the html for the p tags in div class='verses' from a church chapter
    web page, extract the text in all verses.
    
    Arguments:
      verses_html - list containing all of the p tags
    
    Returns:
      list of strings, with each string containing the text of the corresponding verse
    '''
    verses = []
    for verse_html in verses_html:
        verses.append(extract_verse_text(verse_html))
    return verses

def extract_verse_text_in_chapter_from_url(chapter_url):
    '''
    Purpose: given a url to a chapter of text in one of the lds standard works, extract
    the text in all verses.
    
    Arguments:
      chapter_url - string containing the url of the chapter from which to extract text
    
    Returns:
      list of strings, with each string containing the text of the corresponding verse
    '''
    webpage = urllib2.urlopen(chapter_url)
    soup = BeautifulSoup(webpage)
    verses_div = soup.findAll('div',attrs={'class':'verses'})
    verses_html = verses_div[0].findAll('p')
    return extract_verse_text_in_chapter_from_verses_html(verses_html)

# Test reading in just 1 Ne 1

In [34]:
tempurl = make_url(bookofmormon[0].book,1)
tempresult = extract_verse_text_in_chapter_from_url(tempurl)
tempresult

[u'I, Nephi, having been born of goodly parents, therefore I was taught somewhat in all the learning of my father; and having seen many afflictions in the course of my days, nevertheless, having been highly favored of the Lord in all my days; yea, having had a great knowledge of the goodness and the mysteries of God, therefore I make a record of my proceedings in my days.',
 u'Yea, I make a record in the language of my father, which consists of the learning of the Jews and the language of the Egyptians.',
 u'And I know that the record which I make is true; and I make it with mine own hand; and I make it according to my knowledge.',
 u'For it came to pass in the commencement of the first year of the reign of Zedekiah, king of Judah, (my father, Lehi, having dwelt at Jerusalem in all his days); and in that same year there came many prophets, prophesying unto the people that they must repent, or the great city Jerusalem must be destroyed.',
 u'Wherefore it came to pass that my father, Leh

# Test reading in all of 1 Ne & constructing dictionary

In [35]:
booktext = []
for i in range(bookofmormon[0].num_chps):
    chpnum = i+1
    tempurl = make_url(bookofmormon[0].book,chpnum)
    tempresult = extract_verse_text_in_chapter_from_url(tempurl)
    booktext.append(tempresult)

len(booktext)

22

In [36]:
booktext[0]

[u'I, Nephi, having been born of goodly parents, therefore I was taught somewhat in all the learning of my father; and having seen many afflictions in the course of my days, nevertheless, having been highly favored of the Lord in all my days; yea, having had a great knowledge of the goodness and the mysteries of God, therefore I make a record of my proceedings in my days.',
 u'Yea, I make a record in the language of my father, which consists of the learning of the Jews and the language of the Egyptians.',
 u'And I know that the record which I make is true; and I make it with mine own hand; and I make it according to my knowledge.',
 u'For it came to pass in the commencement of the first year of the reign of Zedekiah, king of Judah, (my father, Lehi, having dwelt at Jerusalem in all his days); and in that same year there came many prophets, prophesying unto the people that they must repent, or the great city Jerusalem must be destroyed.',
 u'Wherefore it came to pass that my father, Leh

In [37]:
bom_text = { bookofmormon[0].book:booktext }

In [39]:
bom_text['1-ne'][0]

[u'I, Nephi, having been born of goodly parents, therefore I was taught somewhat in all the learning of my father; and having seen many afflictions in the course of my days, nevertheless, having been highly favored of the Lord in all my days; yea, having had a great knowledge of the goodness and the mysteries of God, therefore I make a record of my proceedings in my days.',
 u'Yea, I make a record in the language of my father, which consists of the learning of the Jews and the language of the Egyptians.',
 u'And I know that the record which I make is true; and I make it with mine own hand; and I make it according to my knowledge.',
 u'For it came to pass in the commencement of the first year of the reign of Zedekiah, king of Judah, (my father, Lehi, having dwelt at Jerusalem in all his days); and in that same year there came many prophets, prophesying unto the people that they must repent, or the great city Jerusalem must be destroyed.',
 u'Wherefore it came to pass that my father, Leh

In [40]:
bom_text['1-ne'][1]

[u'For behold, it came to pass that the Lord spake unto my father, yea, even in a dream, and said unto him: Blessed art thou Lehi, because of the things which thou hast done; and because thou hast been faithful and declared unto this people the things which I commanded thee, behold, they seek to take away thy life.',
 u'And it came to pass that the Lord commanded my father, even in a dream, that he should take his family and depart into the wilderness.',
 u'And it came to pass that he was obedient unto the word of the Lord, wherefore he did as the Lord commanded him.',
 u'And it came to pass that he departed into the wilderness. And he left his house, and the land of his inheritance, and his gold, and his silver, and his precious things, and took nothing with him, save it were his family, and provisions, and tents, and departed into the wilderness.',
 u'And he came down by the borders near the shore of the Red Sea; and he traveled in the wilderness in the borders which are nearer the R

In [41]:
bom_text['1-ne'][1][1]

u'And it came to pass that the Lord commanded my father, even in a dream, that he should take his family and depart into the wilderness.'

In [48]:
bom_text['1-ne'][21][30]

u'Wherefore, ye need not suppose that I and my father are the only ones that have testified, and also taught them. Wherefore, if ye shall be obedient to the commandments, and endure to the end, ye shall be saved at the last day. And thus it is. Amen.'

# Test pickling dictionary 

In [44]:
pickle.dump( bom_text, open( "test_save.pickle", "wb" ) )

In [45]:
test_read_pickle = pickle.load( open( "test_save.pickle", "rb" ) )

In [46]:
test_read_pickle['1-ne'][1][1]

u'And it came to pass that the Lord commanded my father, even in a dream, that he should take his family and depart into the wilderness.'

In [49]:
test_read_pickle['1-ne'][21][30]

u'Wherefore, ye need not suppose that I and my father are the only ones that have testified, and also taught them. Wherefore, if ye shall be obedient to the commandments, and endure to the end, ye shall be saved at the last day. And thus it is. Amen.'

**Works great!**

# Read 1 Ne and 2 Ne & save

In [52]:
def get_book_text(bookinfo):
    booktext = []
    for i in range(bookinfo.num_chps):
        chpnum = i+1
        tempresult = extract_verse_text_in_chapter_from_url( make_url(bookinfo.book, chpnum) )
        booktext.append(tempresult)
    return booktext

In [56]:
bom_text = {}
for i in range(2):
    print(i, bookofmormon[i].book, ' starting...')
    bom_text[bookofmormon[i].book] = get_book_text(bookofmormon[i])

0 1-ne  starting...
1 2-ne  starting...


In [57]:
print(len(bom_text))
print(bom_text.keys())

2
['2-ne', '1-ne']


In [58]:
bom_text['2-ne'][30][12]

u'Wherefore, my beloved brethren, I know that if ye shall follow the Son, with full purpose of heart, acting no hypocrisy and no deception before God, but with real intent, repenting of your sins, witnessing unto the Father that ye are willing to take upon you the name of Christ, by baptism\u2014yea, by following your Lord and your Savior down into the water, according to his word, behold, then shall ye receive the Holy Ghost; yea, then cometh the baptism of fire and of the Holy Ghost; and then can ye speak with the tongue of angels, and shout praises unto the Holy One of Israel.'

In [59]:
pickle.dump( bom_text, open( "test_save_1ne2ne.pickle", "wb" ) )

In [60]:
test_read_pickle = pickle.load( open( "test_save_1ne2ne.pickle", "rb" ) )
test_read_pickle['2-ne'][30][12]

u'Wherefore, my beloved brethren, I know that if ye shall follow the Son, with full purpose of heart, acting no hypocrisy and no deception before God, but with real intent, repenting of your sins, witnessing unto the Father that ye are willing to take upon you the name of Christ, by baptism\u2014yea, by following your Lord and your Savior down into the water, according to his word, behold, then shall ye receive the Holy Ghost; yea, then cometh the baptism of fire and of the Holy Ghost; and then can ye speak with the tongue of angels, and shout praises unto the Holy One of Israel.'

**Works great!**

# Now let's get the whole Book of Mormon and save it!

In [61]:
len(bookofmormon)

15

In [64]:
for i,b in enumerate(bookofmormon):
    print(i, b)

0 Book(book='1-ne', num_chps=22)
1 Book(book='2-ne', num_chps=33)
2 Book(book='jacob', num_chps=7)
3 Book(book='enos', num_chps=1)
4 Book(book='jarom', num_chps=1)
5 Book(book='omni', num_chps=1)
6 Book(book='w-o-m', num_chps=1)
7 Book(book='mosiah', num_chps=29)
8 Book(book='alma', num_chps=63)
9 Book(book='hel', num_chps=16)
10 Book(book='3-ne', num_chps=30)
11 Book(book='4-ne', num_chps=1)
12 Book(book='morm', num_chps=9)
13 Book(book='ether', num_chps=15)
14 Book(book='moro', num_chps=10)


In [77]:
bom_text = {}
for b in bookofmormon:
    print(b.book, ' starting...')
    bom_text[b.book] = get_book_text(b)

1-ne  starting...
2-ne  starting...
jacob  starting...
enos  starting...
jarom  starting...
omni  starting...
w-of-m  starting...
mosiah  starting...
alma  starting...
hel  starting...
3-ne  starting...
4-ne  starting...
morm  starting...
ether  starting...
moro  starting...


In [79]:
pickle.dump( bom_text, open( "bom_150524.pickle", "wb" ) )

In [82]:
test_read_pickle = pickle.load( open( "bom_150524.pickle", "rb" ) )
test_read_pickle['moro'][9][4]

u'And by the power of the Holy Ghost ye may know the truth of all things.'

In [81]:
bom_text['moro'][9][4]

u'And by the power of the Holy Ghost ye may know the truth of all things.'