In [1]:
%load_ext autoreload
%autoreload 2

# Full Document

In [7]:
import breakdown_sections as bs

In [3]:
def count_words(text: str) -> int:
    """
    Counts the number of words in a given text string.
    
    Parameters:
    text (str): The text string from which to count words.
    
    Returns:
    int: The number of words in the text.
    
    Raises:
    TypeError: If the input is not a string.
    """
    if not isinstance(text, str):
        raise TypeError("Input must be a string.")
    normalized_text = ' '.join(text.split())
    words = normalized_text.split()
    return len(words)


def calculate_summary_length(text_length: int) -> int:
    """
    Determine the appropriate summary length based on the length of the input text.
    
    Parameters:
    text_length (int): The length of the input text in terms of number of words.
    
    Returns:
    int: The recommended number of words for the summary.
    
    Raises:
    ValueError: If input is not a non-negative integer representing the word count of the text.
    """
    if text_length < 0:
        raise ValueError("Input must be a non-negative integer representing the word count of the text.")
    if text_length == 0:
        return 0  # No words to summarize if the text length is 0.
    summary_length = text_length  # Default to full length if no specific rules are applied.
    if text_length > 50:
        if text_length < 300:
            summary_length = int(0.5 * text_length)  # 50% of original for short texts.
        elif text_length < 1000:
            summary_length = max(int(0.3 * text_length), 150)  # Scale down to 30% or 150 words, whichever is more.
        else:
            summary_length = max(int(0.15 * text_length), 600)  # Scale down to 10% or 600 words, whichever is more.
    return summary_length


def hydrate_summary_prompt(text: str, sum_length: int) -> str:
    """
    Generates a customized summarization prompt based on the type of content.

    Parameters:
    - text (str): The text to be summarized.
    - sum_length (int): Target length of the summary in words.
    """
    return f"""
        You are an expert summarizer capable of understanding the content and summarizing aptly, keeping most valid information intact.
        Develop a summarizer that efficiently condenses the text into a concise summary. 
        The summaries should capture essential information and convey the main points clearly and accurately. 
        The summarizer must be able to produce the summary in the same language as the original text. 
        It should prioritize key facts, arguments, and conclusions while maintaining the integrity and tone of the original text. 
        Aim for a summary with at least {sum_length} words in length. 
        Focus on clarity, brevity, and relevance to ensure the summary is both informative and readable with bullet points. 
        Output only the summary, no need for an introduction.
        You may add information when appropriate.
        The text is as follows: {text}
        """

In [4]:
import os
import ollama

def summarize_markdown_document(md_text, model='llama3.1:70b', stream=False) -> iter:
    summary_length = calculate_summary_length(text_length=count_words(md_text))
    
    response = ollama.generate(
        model=model,
        prompt=hydrate_summary_prompt(text=md_text, sum_length=summary_length),
        stream=stream,
    )

    if stream:
        return (chunk['response'] for chunk in response)
    else:
        return response['response']

In [5]:
with open('../outputs/digital-thermometer-pdfplumber.md', 'r') as file:
    text = file.read()

for token in summarize_markdown_document(md_text=text, stream=True):
    print(token, end='', flush=True)

The text appears to be a technical document describing the operation and communication protocol of the DS18B20 temperature sensor, which uses a 1-Wire interface.

Here is a summary of the main points:

**Initialization Procedure**

* The bus master initiates communication with a reset pulse.
* The DS18B20 responds with a presence pulse.

**ROM Commands**

* Five ROM commands are used to operate on the unique 64-bit ROM codes of each slave device:
	+ Search Rom [F0h]: Identifies all slave devices on the bus.
	+ Read Rom [33h]: Reads the ROM code of a single slave device.
	+ Match Rom [55H]: Addresses a specific slave device.
	+ Skip Rom [CCh]: Addresses all devices on the bus simultaneously.
	+ Alarm Search [ECh]: Identifies devices with an alarm condition.

**DS18B20 Function Commands**

* Six function commands are used to interact with the DS18B20:
	+ Convert T [44h]: Initiates a temperature conversion.
	+ Write Scratchpad [4Eh]: Writes data to the scratchpad memory.
	+ Read Scratchpa

# Sections

In [8]:
with open('../outputs/digital-thermometer-pdfplumber.md', 'r') as file:
    text = file.read()
    
sections = bs.markdown_sections_to_dict(md_text=text)

for section_title, section_content in sections.items():
    print(summarize_markdown_document(md_text=section_content, model='llama3.1'))
    print("-" * 100)


**Summary:**

The DS18B20 digital thermometer offers 9- to 12-bit Celsius temperature measurements with an alarm function for user-programmable upper and lower trigger points. It communicates over a single 1-Wire bus line (with ground) and can derive power from the data line, eliminating the need for an external power supply. Each DS18B20 has a unique 64-bit serial code, allowing multiple devices to share the same bus. This feature is ideal for applications such as:

• HVAC environmental controls
• Temperature monitoring systems in buildings, equipment, or machinery
• Process monitoring and control systems

This flexibility makes the DS18B20 suitable for various temperature measurement needs across a wide range of industries and applications.
----------------------------------------------------------------------------------------------------
Here's a concise summary of the provided text in bullet points:

• **Key Areas:** The text covers four primary areas: thermostatic controls, indus

# Pages

In [9]:
with open('../outputs/digital-thermometer-pdfplumber-overwrite-tables.md', 'r') as file:
    text = file.read()
    
for page_text in bs.page_range_iterator(md_text=text, pages_per_iteration=5):
    for token in summarize_markdown_document(md_text=page_text, stream=True, model='llama3.1:70b'):
        print(token, end='', flush=True)
    print("=" * 100)

**DS18B20 Programmable Resolution 1-Wire Digital Thermometer Summary**

The DS18B20 is a programmable resolution 1-wire digital thermometer that provides high accuracy temperature measurements with a user-configurable resolution of 9 to 12 bits. The device features a unique 64-bit serial code, allowing multiple devices to be addressed on the same bus.

**Key Features and Benefits**

* Unique 1-Wire interface requires only one port pin for communication
* Reduces component count with integrated temperature sensor and EEPROM
* Measures temperatures from -55°C to +125°C with ±0.5°C accuracy from -10°C to +85°C
* Programmable resolution from 9 bits to 12 bits
* No external components required
* Parasitic power mode requires only two pins for operation (DQ and GND)
* Simplifies distributed temperature-sensing applications with multidrop capability

**Applications**

* Thermostatic controls
* Industrial systems
* Consumer products
* Thermometers
* Thermally sensitive systems

**Pin Configura