<a href="https://colab.research.google.com/github/jasmeet0817/booklm/blob/main/booklm_usage.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Setup

In [None]:
!pip install -U sentence-transformers
!pip install llama-index
!pip install ebooklib

In [2]:
from google.colab import drive
drive.mount('data')

Mounted at data


In [3]:
DATA_FOLDER = '/content/data/MyDrive/Colab Notebooks/book-llm/data/'

In [4]:
from sentence_transformers import SentenceTransformer

model_load_path = DATA_FOLDER + 'finetuned_bge_small_v3'
model = SentenceTransformer(model_load_path)

## READ BOOK

In [5]:
PARAGRAPH_GROUP_COUNT = 75

class ParagraphGroup:
    def __init__(self, chater_index, paragraphs, paragraph_index):
        self.chater_index = chater_index
        self.paragraphs = paragraphs
        self.paragraphs_content = "\n".join(paragraphs)
        self.paragraph_index = paragraph_index
        # self.paragraph_embeddings = openai_embedding(paragraphs_content)

    def __str__(self):
        return self.paragraphs_content

class Chapter:
    def __init__(self, index, chapter_content, paragraphs):
        self.index = index
        self.chapter_content = chapter_content
        self.paragraph_groups = []

        split_paragraphs = [paragraphs[i:i + PARAGRAPH_GROUP_COUNT] for i in range(0, len(paragraphs), PARAGRAPH_GROUP_COUNT)]
        for split_paragraph in split_paragraphs:
            self.add_paragraphs_(split_paragraph)

    def __str__(self):
        return self.chapter_content

    def add_paragraphs_(self, paragraphs):
        if len(paragraphs) == 0:
            return
        paragraph_gorup = ParagraphGroup(self.index, paragraphs, len(self.paragraph_groups))
        self.paragraph_groups.append(paragraph_gorup)

class Book:
    def __init__(self):
        self.chapters = []

    def add_chapter(self, chapters_content, paragraphs):
        self.chapters.append(Chapter(len(self.chapters), chapters_content, paragraphs))

In [6]:
import ebooklib
from ebooklib import epub
import bs4

def read_until_string(file_path, search_str=None):
    book = Book()
    book_content = epub.read_epub(DATA_FOLDER + file_path)
    word_count = 0
    for item in book_content.get_items_of_type(ebooklib.ITEM_DOCUMENT):
        soup = bs4.BeautifulSoup(item.get_content(), 'html.parser')
        paragraphs = [paragraph.getText() for paragraph in soup.find_all('p')]
        chapter_content = soup.get_text()
        word_count += len(chapter_content)
        if search_str is not None:
            index = chapter_content.find(search_str)
            if index != -1:
                # Search string found, stop reading the book.
                chapter_content = chapter_content[:index + len(search_str)]
                book.add_chapter(chapter_content, paragraphs)
                break
        book.add_chapter(chapter_content, paragraphs)
        continue
    return book

book = read_until_string('waybound.epub', 'I had it in my hands! Why did I have to give it back?')



## Test semantics

In [13]:
from sentence_transformers import util

def get_detailed_instruct(task_description: str, query: str) -> str:
    return f'Instruct: {task_description}\nQuery: {query}'

# Each query must come with a one-sentence instruction that describes the task
# task = 'Given a web search query, retrieve relevant passages that answer the query'
task = ''
queries = [
    get_detailed_instruct(task, 'What did Reigan Shen create out of Subject one\' core binding?'),
    get_detailed_instruct(task, 'Who is Ozmanthus Arelius?'),
    get_detailed_instruct(task, 'Did Reigan Shen do his own Soulsmithing?'),
]
# No need to add instruction for retrieval documents
passages = [
    """Reigan Shen didn’t do his own Soulsmithing. He had people for that.\nBut the skills of Ozmanthus Arelius, one of the greatest Soulsmiths of all time, still flowed through his mind and spirit. Instincts honed by years of practice, the insight of a genius, and decades if not centuries of weapons-crafting experience now lurked inside Reigan Shen. Now and then, he even felt a shadow of the human’s arrogance bubbling up.\nIt was the one thing he appreciated about the man.\nThe core binding of Subject One was too valuable a material for Reigan to trust to others, but it was also unique and irreplaceable, and thus unsuitable for amateurs. His teams of expert Soulsmiths had labored ceaselessly for days while he breathed down their necks, giving them direction filtered through the talents of his greatest enemy.\nThey finally turned it into the form he wanted, and they had certainly earned their reputations. If they weren’t fine craftsmen, he wouldn’t have retained their services in the first place. Even his memories of Ozmanthus approved.\nThe Wraith Horn—which was his current working title for the wide-mouthed trumpet made from Subject One’s binding—was carved with delicate swirls until it resembled a seashell. It looked like it had grown into the shape of a horn by natural forces rather than design.\nIt was a pure, smooth gray-white, like most of Subject One’s body, but Reigan could only admire its surface in brief stints. The treasure warped the air around it by the mere weight of its existence, so he usually had to keep it sealed away.\nThe Horn had several applications, as expected from the product of such a fine material. But one was of most interest to him at the moment. He could send a call through it, which would reach the other Dreadgods as though it came from the Slumbering Wraith itself.\nIntelligent as they now were, they might be aware it was a trap, but this spoke to their instincts. They would follow its lure.\nThis was the leash he had placed on the Dreadgods.""" +
    """ His current plan.\nHis first plan, to gain the power of a Dreadgod for himself, had been ruined by the very man whose Soulsmithing skill now infused Reigan’s spirit.\nReigan looked over the distant Sacred Valley and reflected on how much time and money he’d wasted.\nIf only he’d known who Eithan was. Reigan Shen would have been Tiberian’s best friend. He might have even followed the man’s plan; there were ways to turn forced ascension to an advantage.\nBut now wasn’t the time for regret. Now was the time for desperate survival.\nHe had no need to call the Dreadgods now. They were headed where he wanted them anyway: for Lindon and Yerin Arelius. No matter what else he did, he needed those two gone. His greatest nightmare was that they could return centuries later as Ozmanthus had, in disguise, but he suspected that had only been allowed because the Arelius Patriarch had disguised himself as one of his own descendants.\nHe needed the rest of them out of Cradle before they left any little humans behind.\nNow the Weeping Dragon was going to do his job for him, but Reigan Shen needed to make sure everything went according to plan. Then again, this plan was already going wrong.\nHe couldn’t sense Lindon anywhere.\nThere was a barrier around Sacred Valley, projected by the great labyrinth, and he had expected Lindon to be waiting behind it. He didn’t sense as much, but that told him little. No matter what detection methods he used, there was always the possibility that Lindon had come up with a way to hide from him.\nHe had requested each Monarch tell him what Lindon had stolen from them, but no one had cooperated. They might even know where Lindon was, but they hadn’t shared that with him either. As far as he knew, Lindon could be almost anywhere and could have access to practically anything.\nAnd from Reigan Shen, he had stolen the core to a pocket world.\nReigan had to assume that Lindon was tucked away somewhere in a space that had been time-warped to the extreme. Days could be passing every second.\nIn the worst-case scenario, half a dozen Monarchs could burst out at any moment. They could swarm the Weeping Dragon and from its corpse fashion a weapon to slay Reigan Shen.\nThat was monumentally unlikely. For one thing, they didn’t have Eithan leading them forward now, so they were far more likely to run into one of the thousand potential roadblocks to advancement.\nIf it was so easy to manufacture Monarchs, someone would have done it already.\nThen again, Lindon had access to the labyrinth, with all its unexplored secrets. He had the unlimited consumption powers of Subject One, an unknown number of resources and hidden projects stolen from Monarchs, and—perhaps worst of all—guidance from Ozmanthus Arelius.\nA feeling of smug arrogance drifted up from the Soulsmith inheritance inside Reigan, and he had to force it down.\nAs much as he tried to convince himself that advancing multiple people to Monarch at once was impossible, Reigan Shen had the uncomfortable premonition that it might really happen.\nHe needed to take action immediately, but first he floated in the sky for a long moment, considering his options.\nLindon would have preparations against attacks, and Reigan was more than familiar with the capabilities of the labyrinth. With that under his control, Lindon could have any number of nasty surprises ready and waiting.\nWhat if Lindon wasn’t in Sacred Valley at all?\nWhat if he was?\nReigan could break down the barrier Lindon had left around the Valley, given enough time, but was that the best way to pressure him?\nHe needed to corner Lindon. To run the young man out of energy, focus, and time. To exhaust him so he couldn’t face the Weeping Dragon.\nPack tactics. Cut off the prey’s escape routes and run it into the ground, until it collapsed from exhaustion and waited to be eaten. A hunt worthy of a lion.\nHe only had to pull Lindon out.\n* * *\nLindon found Ziel seated in a cycling position in front of the Paths of Heaven, which was what Dross called the eight rooms filled with illusions of the ancient Abidan.\nSeven of the Paths were dormant, their constructs inactive. With no illusions, they were nothing but plain three-sided rooms of white stone.\nOnly one of the displays was activated: the second one from the left, with the symbol that reminded Lindon strangely of the Wandering Titan. It displayed a pure, shining blue wall, and it radiated authority that suggested an unbreakable shield.\nDespite the feeling of protection and security it generated, Lindon still couldn’t regard the display directly for long. Even this replica was too far beyond him. Staring at the real thing long enough to make it had almost made him pass out.\n“I’ll reach the peak of Archlord soon,” Ziel said, without turning around. “Thought I’d prepare myself early.”\nHis worn gray cloak spread out over his shoulders and onto the ground behind him, displaying the symbol that resembled spread wings. The emblem of the Dawnwing Sect.\n“You’re close,” Lindon said.\n“I’m on the edge of something, but I still need one last step. Like stepping off a cliff.”\nLindon remembered his own first contact with an Icon and nodded. It had taken him new insight into himself to touch the Void Icon, but from everything he had come to learn, it wasn’t about understanding alone.\n“It takes action to trigger,” Lindon said. “What Icon is it?”\nZiel deactivated the Paths of Heaven display, and both Lindon and Dross let out a relieved breath. He stood, brushing himself off without looking at Lindon directly. “I’d rather not say.”\nDross stared at him with one wide eye. [What?]\n“That’s his decision,” Lindon said to Dross, but he was disappointed too. Did Ziel still not trust them?\nZiel shifted uncomfortably. He glanced to Dross and then back up at the sky. “It’s…embarrassing,” he muttered at last.\n[Oh, then you can tell us quietly.]\nThere was a collection of memories embedded into the labyrinth, and many of them were from Sages. Some, like Malice and Northstrider, had gone on to become Monarchs. Lindon understood something of the general knowledge about Icons. Some were more common, but other Icons had shown up only a few times in history.\nSome were considered unique, like Eithan’s Broom Icon. He had even mentioned a Joy Icon, which Lindon had never heard from anyone else.\n[If Ziel taps into the Joy Icon, I will give up forever, because the world no longer makes sense.]\nDross didn’t send that message to Ziel, but Lindon still considered what he knew of the other man.\nThe Hammer Icon was manifested by Soulsmiths as often as people who used hammers in combat, but it tended to have different powers depending on whether it represented creation or battle. That led to great debate over whether there were two different Hammer Icons or whether hammers had greater depth of meaning.\nThere was no such thing as a Script Icon; Lindon was fairly certain of that. Scripts themselves were made up of many runes that each represented a fragment of meaning, but now that he thought of it, there had to be some Icon that scriptors could manifest.\nMaybe the Scribe Icon? Scholars had manifested that throughout history, in the form of a quill or brush or pen over a page.\nZiel could clearly see the thoughts moving behind Lindon’s eyes, because he grumbled under his breath. “If I can’t reach it on my own, I’ll tell you. But I don’t know how I’ll reach it here.”\n“You probably won’t,” Lindon agreed.\nHis understanding of the exact mechanics of Sage advancement was vague—in fact, as far as he’d learned, no one could predict exactly how Icons behaved—but Ziel had to take action to trigger the advancement, and actions he took while locked away in Ghostwind Hall wouldn’t touch the larger world.\nLindon thought of advancing to Sage here as something like trying to reach the ocean while trapped in a fishbowl.\n“I’ll need a little longer to reach peak Archlord,” Ziel said. “But since that’s all I can do in here, I’ll figure out—”\nLindon opened his void key and called out three dream tablets.\nThe first one slapped into Ziel’s palm as Lindon explained what it was. “All the memories about the Rune Queen Emala from the labyrinth, both from her and from her rivals or peers.” A second one flew at Ziel, and he plucked it from the air. “Dross’ analysis of her scripting patterns and our suggestions on how to operate the Grand Oath Array with your Path.”\n[We had to speculate wildly,] Dross said. [I’d say probably forty, forty-five percent is us making things up.]\nZiel caught the third tablet.\n“That one’s from Northstrider,” Lindon said. “Dross took it from his oracle codex. It contains research on Emala’s powers and insights into the manipulation of time.”\nZiel looked down to the tablets and back up to Lindon. “If you’re teaching me how to use it, does that mean...”\nLindon had been waiting for that.\nFrom his soulspace, he released a Divine Treasure. It resembled a silver moon orbited by rings of intricate silver script.\n“I finished it last night. It’s not precisely the same as Emala’s original, but no two Divine Treasures are exactly alike. The core construct is made from Northstrider’s prototype Abidan artifact, which was designed to lock time in stasis. The rest came from a handful of Remnants with minor time aspects and the samples of the Rune Queen’s madra you brought back from Shatterspine Castle.”\nReverently, Ziel took the Grand Oath Array. “You said you could do it, but I still thought…How did you learn to do this?”\n“Compressing the time of this pocket world was good practice,” Lindon said. “And, of course, I had Dross’ help. But mostly…"""
]

num_queries = len(queries)
embeddings = model.encode(queries + passages)
scores = util.cos_sim(embeddings[:num_queries], embeddings[num_queries:]) * 100
print(scores.tolist())


[[41.59849548339844], [35.92758560180664], [61.07059860229492]]
