In [2]:

import pandas as pd
from pathlib import Path
from langchain_core.documents import Document


doc = Document(page_content="""
LMM is an abbreviation that stands for **Large Language Model**.

It is a type of **Artificial Intelligence (AI)** program that is trained on a **massive amount of text data** (hence the word "large"). This extensive training allows the model to:

* **Understand** human language (like text you input).
* **Generate** human-like text (like answers, articles, summaries, and code).

### ðŸ”‘ Key Characteristics

* **Scale and Capacity:** LLMs are characterized by their immense size, often containing billions or even trillions of **parameters**. This size allows them to capture intricate patterns and nuances in language.
* **Transformer Architecture:** Most modern LLMs are built on a neural network architecture called a **Transformer**, which is highly effective at tracking relationships and context across long sequences of text.
* **Core Function:** At a fundamental level, an LLM works by being very good at **predicting the next word** in a sequence, which allows it to generate coherent and contextually relevant text.

### ðŸ’¡ Common Applications

LLMs power many of the advanced AI tools you hear about today. They are used for:

* **Conversational AI/Chatbots:** Like the one you are interacting with now.
* **Content Generation:** Drafting emails, writing articles, creating marketing copy.
* **Summarization and Translation:** Quickly condensing large documents or translating languages.
* **Code Generation:** Writing or debugging computer code based on natural language instructions.

**Examples of LLMs** include models like OpenAI's GPT series (which powers ChatGPT), Google's Gemini and PaLM, Anthropic's Claude, and Meta's Llama.

---

Would you like to know more about how LLMs are trained or see some specific examples of what they can do?
               """, metadata={"source": "gemini",
                              "created_date": "2025-11-15",
                              "author": "AI Knowledge Base",
                                "tags": ["LLM", "AI", "Language Model"]
                              })



In [3]:
from pathlib import Path

data_segment = {
    "lower_gemini_llm_info.txt":doc.page_content.lower() , 
    "upper_gemini_llm_info.txt":doc.page_content.upper() ,
    "orginal_gemini_llm_info.txt": doc.page_content
}

basePath = Path("data/documents")
basePath.mkdir(parents=True, exist_ok=True)
for key,val in data_segment.items():
    key = basePath.joinpath(key)
    with open(key, 'w', encoding='utf-8') as f:
        f.write(val)
    