In [None]:
!pip install pymilvus ollama llama-index-llms-ollama llama-index-vector-stores-milvus

In [None]:
!pip install llama-index-embeddings-jinaai llama-index-readers-file

In [None]:
!pip install sentence-transformers llama-index-embeddings-huggingface

In [3]:
from llama_index.embeddings.huggingface import HuggingFaceEmbedding


embed_model = HuggingFaceEmbedding(model_name="jinaai/jina-embeddings-v2-base-de")
len(embed_model.get_text_embedding('This is a test'))

  from .autonotebook import tqdm as notebook_tqdm
Some weights of BertModel were not initialized from the model checkpoint at jinaai/jina-embeddings-v2-base-de and are newly initialized: ['embeddings.position_embeddings.weight', 'encoder.layer.0.intermediate.dense.bias', 'encoder.layer.0.intermediate.dense.weight', 'encoder.layer.0.output.LayerNorm.bias', 'encoder.layer.0.output.LayerNorm.weight', 'encoder.layer.0.output.dense.bias', 'encoder.layer.0.output.dense.weight', 'encoder.layer.1.intermediate.dense.bias', 'encoder.layer.1.intermediate.dense.weight', 'encoder.layer.1.output.LayerNorm.bias', 'encoder.layer.1.output.LayerNorm.weight', 'encoder.layer.1.output.dense.bias', 'encoder.layer.1.output.dense.weight', 'encoder.layer.10.intermediate.dense.bias', 'encoder.layer.10.intermediate.dense.weight', 'encoder.layer.10.output.LayerNorm.bias', 'encoder.layer.10.output.LayerNorm.weight', 'encoder.layer.10.output.dense.bias', 'encoder.layer.10.output.dense.weight', 'encoder.layer.11.int

768

In [19]:
from llama_index.llms.ollama import Ollama
from llama_index.vector_stores.milvus import MilvusVectorStore

from llama_index.core import StorageContext, ServiceContext

llm = Ollama(model="llama3", request_timeout=120.0)

service_context_jina = ServiceContext.from_defaults(llm=llm, embed_model=embed_model, chunk_size=350)

vector_store_jina = MilvusVectorStore(
    uri="milvus_local_rag.db",
    collection_name="jina_embeddings",
    dim=768,  # the value changes with embedding model
    overwrite=True  # drop table if exist and then create
    )
storage_context_jina = StorageContext.from_defaults(vector_store=vector_store_jina)

  service_context_jina = ServiceContext.from_defaults(llm=llm, embed_model=embed_model, chunk_size=350)


In [20]:
from llama_index.core import SimpleDirectoryReader, VectorStoreIndex

In [21]:
docs = SimpleDirectoryReader(input_files=["data/pdfs/S19-14497.pdf"]).load_data()

In [22]:
vector_index_jina = VectorStoreIndex.from_documents(docs, storage_context=storage_context_jina, service_context=service_context_jina)

In [23]:
from llama_index.core.tools import RetrieverTool, ToolMetadata

milvus_tool_openai = RetrieverTool(
    retriever=vector_index_jina.as_retriever(similarity_top_k=3),  # retrieve top_k results
    metadata=ToolMetadata(
        name="CustomRetriever",
        description='Retrieve relevant information from provided documents.'
    ),
)

In [24]:
query_engine = vector_index_jina.as_query_engine()
response = query_engine.query("How many services use a Fax Machines in Berlin?")
print(response)

According to the provided context, several institutions and authorities in Berlin use fax machines for communication purposes. These include:

* The Kammergericht (KG Dez. VI Aus- und Fort) for sending Krankmeldungen von Anwärter/in,
* The Frauenvertreterin's office located in the Dienstgebäude Ringstraße,
* Gerichte, and
* Senatsverwaltung für Wirtschaft, Energie und Betriebe.

Based on this information, it can be inferred that at least these few services use fax machines in Berlin.


In [25]:
print(query_engine.query('How many administrative services processes involve a fax requirement? Please Translate the text'))

Based on the provided context, it can be seen that the following administrative services processes involve a fax requirement:

* Zuwendungsanträge Schul- und Sportamt (school and sports funding applications)
* Bearbeitung von Anträgen auf Schulwegbeförderung / Schulwegbegleitung (processing of school route transportation requests)
* Vergabe einer Dienstleistung zur Sicherstellung der Schulwegbeförderung, hier Beförderung von schulpflichtigen Kindern mit sonderpädagogischem Förderbedarf vom Wohnsitz zur Schule und zurück, sowie Abrechnung der Leistung (provision of services for school route transportation)
* Anträge auf Übernahme von Aufwendungen für Gebärdendolmetscher/innen und Kommunikationshelfer/innen nach der Schul-kommunikationsverordnung (requests for reimbursement of expenses for sign language interpreters and communication assistants)

In total, four administrative services processes involve a fax requirement.


## Semantic Chunking

In [26]:
from llama_index.core.node_parser import SemanticSplitterNodeParser

splitter = SemanticSplitterNodeParser(
    buffer_size=1, breakpoint_percentile_threshold=95, embed_model=embed_model
)

nodes = splitter.get_nodes_from_documents(docs)



In [27]:
vector_index_semantic = VectorStoreIndex(nodes, storage_context=storage_context_jina, service_context=service_context_jina)
query_engine_semantic = vector_index_semantic.as_query_engine()


In [28]:
print(query_engine_semantic.query('How many services use a Fax Machines in Berlin?'))

Based on the provided context information, I can infer that several institutions use fax machines for communication purposes. Specifically:

* Justizvollzugsanstalt Heidering uses fax machines to communicate with external Dienstleistern (e.g., Apotheke, Labor Berlin) and other organizations.
* Justizvollzugsanstalt Moabit uses fax machines to communicate with Staatsanwaltschaften, Gerichte, Rechtsanwälten, and Anwälten.

While I don't have direct information on the number of services that use fax machines in Berlin, it is reasonable to assume that multiple institutions in the justice system and related fields utilize this technology for official communication.


In [29]:
print(query_engine_semantic.query('How many services require the use of a Fax Machine in Berlin?'))

According to the provided context, there are some instances where fax machines are used for official purposes. However, it is essential to note that these instances are limited and mostly related to exceptional circumstances or when personal delivery is not feasible.

In the Kammergericht (KG) Dez. VI Aus- und Fort, Krankmeldungen from apprentices who are voluntarily insured are transmitted via telefax to the Central Office for Remuneration and Remuneration.

In rare cases, fax machines are used to send very urgent and paper-based documents to the Representative of Women's Affairs' office.

Furthermore, some rechtsbehelfsverfahren (Widerspruch u. Einspruch) in Verwaltungs- und Ordnungswidrigkeiten-verfahren require the use of a fax machine due to specific regulations.

It is difficult to provide an exact number of services that require the use of a Fax Machine in Berlin, as this information is not explicitly provided in the given context. However, it can be inferred that fax machines a

In [30]:
print(query_engine_semantic.query('Wie viele Bearbeitungen von Verwaltungsdienstleistungen beinhalten ein Faxerfordernis?'))

Based on the provided context, I can see that there are several instances where fax requests are mentioned as a means of communication with other Berlin-based authorities. Specifically, it appears that fax is used for Grundlage der Kommunikation mit anderen Berliner Verwaltungen per Fax.

As for your query, Wie viele Bearbeitungen von Verwaltungsdienstleistungen beinhalten ein Faxerfordernis?, I would say that there are at least 5-6 instances where fax requests are mentioned as a means of communication with other authorities.


In [31]:
print(query_engine_semantic.query('How many administrative services processes involve a fax requirement? Please Translate the text'))

According to the provided context, three administrative service processes involve a fax requirement:

1. Renten-, Krankenversicherung (Pension and Health Insurance)
2. Senatsverwaltung für Wirtschaft, Energie und Betriebe (Berlin Senate Administration for Economy, Energy, and Enterprises)

Translated text:

"Foundation for communication with other Berlin administrations via fax. Additionally: Reasons for communication with other Berlin administrations via fax.

Renten-, Krankenversicherung (Pension and Health Insurance): § 4 ff.
Senatsverwaltung für Wirtschaft, Energie und Betriebe (Berlin Senate Administration for Economy, Energy, and Enterprises)

Foundation for communication with other Berlin administrations via fax. Additionally: Reasons for communication with other Berlin administrations via fax.

Courts Urgent Deliveries as Expert Witnesses
Krankenhaus des Maßregelvollzugs Berlin (KMV) (Berlin Correctional Facility Hospital)
Prompt Delivery of Court Decisions to Patients Ensuring