# Performant with mistrial model

## Retreive Embeddings

In [1]:
redis_url = ""
index_name = "projects"
inference_server_url=""
schema_name = "projects_schema.yaml"

In [18]:
from langchain.embeddings.huggingface import HuggingFaceEmbeddings
from langchain.vectorstores.redis import Redis
from langchain.schema import Document
from langchain.prompts import PromptTemplate
from langchain.chains.llm import LLMChain
from langchain.llms import HuggingFaceTextGenInference
from langchain.chains.summarize import load_summarize_chain
from langchain.chains import AnalyzeDocumentChain
from langchain.chains import RetrievalQA

In [3]:
embeddings = HuggingFaceEmbeddings()
vectorstore = Redis.from_existing_index(
    embeddings,
    redis_url=redis_url,
    index_name=index_name,
    schema=schema_name
)

  from .autonotebook import tqdm as notebook_tqdm


In [4]:
def combine_metadata(doc) -> str:
    metadata = doc.metadata
    return (
        "Project Number: " + metadata["number"] + ". " +
        "Project Name: " + metadata["name"] + ". " +
        "Project Detail: " + metadata["detail"] + "." +
        "Project Author: " + metadata["author"] + "." +
        "Project Technology: " + metadata["technology"] + "."
    )


In [9]:
def get_relevant_documents(query):
    docs = []
    for doc in vectorstore.similarity_search(query):
        content = combine_metadata(doc)
        docs.append(Document(
            page_content=content,
            metadata=doc.metadata
        ))
        return docs

In [10]:
adocs = get_relevant_documents("Andromeda")

In [11]:
adocs[0].page_content

'Project Number: 1. Project Name: Andromeda   . Project Detail:   The Andromeda galaxy, also known as M31, is a spiral galaxy located approximately 2.5 million light-years away from Earth in the constellation Andromeda. It is the closest major galaxy to our own Milky Way galaxy and is a member of the Local Group of galaxies, which also includes the Milky Way, the Triangulum galaxy (M33), and several smaller galaxies and dwarf galaxies.  The Andromeda galaxy has a diameter of about 100,000 light-years and is estimated to contain about 2 trillion stars, making it one of the largest galaxies in the observable universe. It is also home to a variety of other celestial objects, including black holes, supernovae, and planetary nebulae.  The Andromeda galaxy is classified as a barred spiral galaxy, meaning it has a central bar-shaped structure composed of stars and gas. It also has several spiral arms that wind outward from the center, containing many stars, gas, and dust. The galaxy is believ

## Extract the page content

In [12]:
def getNameDetailAndAuthor(text, start, end):
    detailAndAuthor = text[text.find(start)+len(start):text.rfind(end)]
    return detailAndAuthor

In [13]:
def getTechnology(text, start):
    technology = text[text.find(start)+len(start):]
    return technology

In [14]:
aname = getNameDetailAndAuthor(adocs[0].page_content, 'Project Name:','Project Detail:')
aname

' Andromeda   . '

In [15]:
adetail = getNameDetailAndAuthor(adocs[0].page_content,'Project Detail:', 'Project Author:')
adetail

'   The Andromeda galaxy, also known as M31, is a spiral galaxy located approximately 2.5 million light-years away from Earth in the constellation Andromeda. It is the closest major galaxy to our own Milky Way galaxy and is a member of the Local Group of galaxies, which also includes the Milky Way, the Triangulum galaxy (M33), and several smaller galaxies and dwarf galaxies.  The Andromeda galaxy has a diameter of about 100,000 light-years and is estimated to contain about 2 trillion stars, making it one of the largest galaxies in the observable universe. It is also home to a variety of other celestial objects, including black holes, supernovae, and planetary nebulae.  The Andromeda galaxy is classified as a barred spiral galaxy, meaning it has a central bar-shaped structure composed of stars and gas. It also has several spiral arms that wind outward from the center, containing many stars, gas, and dust. The galaxy is believed to be rotating rapidly, with stars and gas moving at speeds

In [16]:
aauthor = getNameDetailAndAuthor(adocs[0].page_content,'Project Author:','Project Technology:')
aauthor

'   Ivan Yefremov   .'

In [17]:
atechnology = getTechnology(adocs[0].page_content,'Project Technology:')
atechnology

'    Java  .'

## Setup llm

In [19]:
llm = HuggingFaceTextGenInference(
    inference_server_url=inference_server_url,
    max_new_tokens=512,
    top_k=10,
    top_p=0.95,
    typical_p=0.95,
    temperature=0.1,
    repetition_penalty=1.175
)




In [20]:
def summarize(topic):
    prompt_template = """Write a concise summary of the following:


    {text}


    CONCISE SUMMARY IN GERMAN:"""
    PROMPT = PromptTemplate(template=prompt_template, input_variables=["text"])
    summary_chain = load_summarize_chain(llm, chain_type="map_reduce", map_prompt=PROMPT, combine_prompt=PROMPT)
    summarize_document_chain = AnalyzeDocumentChain(combine_docs_chain=summary_chain)
    db_retriever = vectorstore.as_retriever()
    langchain_qa = RetrievalQA.from_chain_type(llm=llm, 
                                 chain_type="stuff", 
                                 retriever=db_retriever)
    relevant_data = langchain_qa(topic)
    summary = summarize_document_chain.run(relevant_data['result'])
    return summary

## Summarize the project detail

In [21]:
summary = summarize(adetail)
summary

score_threshold is deprecated. Use distance_threshold instead.score_threshold should only be used in similarity_search_with_relevance_scores.score_threshold will be removed in a future release.


'\n\n    Die folgende Passage beschreibt die Entwicklung von einer neuen Technologie, welche eine neue Form der Energieverwendung ermöglicht. Diese Technologie wird als "Ja" bezeichnet und ist in der Lage, aus dem Wasserstoff im Luft zu gewinnen. Der Autor stellt fest, dass diese Technologie sehr viel Potenzial hat, da sie nicht nur die Möglichkeit bietet, die Energieversorgung zu verbessern, sondern auch die Umwelt zu schützen. Er erklärt jedoch, dass dieses Potenzial bisher noch nicht vollständig genutzt werden kann, weil es fehlender Forschungsbedarf gibt.'

In [22]:
def generate(topic):
    prompt_template = """Use the context below to write a 400 word blog post about the topic below:
    Context: {context}
    Topic: {topic}
    Blog post:"""

    PROMPT = PromptTemplate(
        template=prompt_template, input_variables=["context", 
                                                   "topic"]
    )

    chain = LLMChain(llm=llm, prompt=PROMPT)
    docs = vectorstore.similarity_search(topic, k=1)
    inputs = [{"context": doc.page_content, 
               "topic": topic} for doc in docs]
    gen = chain.apply(inputs)
    return gen

## Generate the content for author

In [23]:
output = generate(aauthor)
output[0]['text']

' The Cosmic Vision of Ivan Yefremov\n\nIvan Yefremov was a Russian scientist and writer, who is best known for his work in astrophysics. He was born on July 16, 1905, in St. Petersburg, Russia, and died on April 27, 1983, in Moscow. Yefremov\'s most famous book, "Hail Mary," was published in 1961 and quickly became one of the most popular science fiction novels in the Soviet Union.\n\nYefremov\'s vision of space was heavily influenced by his scientific background. As an astrophysicist, he had a deep understanding of the universe and its workings. This knowledge allowed him to create vivid and believable descriptions of alien worlds and civilizations. In "Hail Mary," for example, Yefremov describes a planet that is covered in ice and snow, with towering mountains that reach up into the sky. He also depicts a society that has evolved to live underground, where they use advanced technology to survive in the harsh environment above ground.\n\nIn addition to his scientific expertise, Yefre

## Generate the content for technology

In [24]:
outputTech = generate(atechnology)
outputTech[0]['text']

' Exploring the relationship between Java and the Milky Way.\n\nExploring the Relationship Between Java and the Milky Way\n\nJava is one of the most popular programming languages in use today, with millions of developers worldwide using it to create everything from simple desktop applications to complex enterprise systems. But what many people may not realize is that there is actually a deep connection between Java and something much larger than our planet - the Milky Way galaxy.\n\nAt its core, Java is an object-oriented language that allows developers to create reusable code by defining classes and objects that can be manipulated and combined in various ways. This makes it a powerful tool for building large, complex systems that are capable of handling vast amounts of data and processing power. And this is exactly what makes Java so well-suited for exploring the mysteries of the universe, including the Milky Way.\n\nOne of the key areas where Java has been used to study the Milky Way

## Doc generation

In [25]:
print(f'''
{summary}


{output[0]['text']}


{outputTech[0]['text']}


''')




    Die folgende Passage beschreibt die Entwicklung von einer neuen Technologie, welche eine neue Form der Energieverwendung ermöglicht. Diese Technologie wird als "Ja" bezeichnet und ist in der Lage, aus dem Wasserstoff im Luft zu gewinnen. Der Autor stellt fest, dass diese Technologie sehr viel Potenzial hat, da sie nicht nur die Möglichkeit bietet, die Energieversorgung zu verbessern, sondern auch die Umwelt zu schützen. Er erklärt jedoch, dass dieses Potenzial bisher noch nicht vollständig genutzt werden kann, weil es fehlender Forschungsbedarf gibt.


 The Cosmic Vision of Ivan Yefremov

Ivan Yefremov was a Russian scientist and writer, who is best known for his work in astrophysics. He was born on July 16, 1905, in St. Petersburg, Russia, and died on April 27, 1983, in Moscow. Yefremov's most famous book, "Hail Mary," was published in 1961 and quickly became one of the most popular science fiction novels in the Soviet Union.

Yefremov's vision of space was heavily influenced b