## Book -- Basic Usage

- make sure you are opening this `notebook` in the same folder of `Book.py`

In [1]:
# import the Book class to ipython env
from Book import Book

In [2]:
# get the instance of the Book class by specifying the date and creator and bookname
book = Book('test', '2018-07-02', 'MF')

In [3]:
book.bookname, book.date, book.creator

('test', datetime.datetime(2018, 7, 2, 0, 0), 'MF')

`bookname` and `creator` are `str`, `date` is a `datetime` object.

### Fetch Data

A `Book` instance can be used to scrape Han-Ji website.

In [None]:
# make sure get the first page of Han-Ji book which you want to scrape
book.fetch_data('(First webpage URL to a Han-Ji book)',
                pages_limit=1000, print_bookmark=True,)

While fetching the data, `print_bookmark` will tell you which page you have achieved. 

After fetching the data from web, you can write these `html`s to a folder `data/`.

In [None]:
# writing htmls into a folder
book.write_htmls(path="data", html_cutoff=True) # html_cutoff ensures you only get the text iteslf and a bookmark

An, in the next time, you can load the htmls from `data/`.

In [4]:
# loading files to book
book.load_htmls(path="data")

INFO:root:Stop at loading data/test_0582.html.
INFO:root:Total length of the data is 582.


### Pretty Print

It is also possible to pretty print the html source page directly in your output cell.

In [5]:
book.pretty_print(7) # give the function the index of the page you want to show

0
...42...


### Flat Bodies

`book.flat_bodies` store a list of `bs4` objects you scraped from the Han-Ji book. The index of the `flat_bodies` stand for the order of different pages in your book.

In [6]:
# The 7th piece of the work is 白雉詩 in the book instance
book.flat_bodies[7].find("h3")

<h3>　　　　白雉詩</h3>

You can use `flat_bodies` to further decompose the tree structure of the Han-Ji book you are interested.

### Bookmark 

`book.paths` store a list of bookmarks in the order of piece of works in your book.

In [7]:
# get the bookmarks from source page
book.extract_paths()

In [8]:
book.paths[7]

'集／總集／文選／卷第一\u3000賦甲之一／京都上之一／班孟堅兩都賦二首／東都賦／白雉詩(P.42)'

The bookmark of the 7th piece of work show the full hierarchical structure of the work in the scraped book.

### Other Class Methods

In [9]:
# __repr__ show the brief summary of the instance
book

       type       variable                 method current_length
0      meta      flat_meta      self.extract_meta              0
1      path          paths     self.extract_paths            582
2  passages  flat_passages  self.extract_passages              0

In the first row, you can see that we have 582 bookmarks in the `book.paths`. 

In [10]:
# __len__ show the number of works in the book instance
len(book)

582

## HTML Highlighting in `flat_bodies`

In [11]:
# __getitem__, or indexing, give you the syntax highlighting of the html page of Han-Ji
book[6]