# Class 5 (Local Inference)

**Author: Abraham R.**

The following notebook is an example of using a local API in this case Ollama.

### Experimental integration

Ollama's OpenAI API format is Experimental, for more info check this [readme](https://github.com/ollama/ollama/blob/main/docs/openai.md)



# Example using OpenAI's client

In [1]:
pip install openai


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.2[0m[39;49m -> [0m[32;49m24.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


In [1]:
import openai
from settings import Config

## Reading config

In [3]:
config = Config.from_yaml(filepath="../settings.yml")

In [4]:
client = openai.OpenAI(
  base_url=config.ollama.url,
  api_key= "apiKey"
)

# Conversation history caching with Redis

In [5]:
pip install redis


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.2[0m[39;49m -> [0m[32;49m24.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


In [6]:
from cache import RedisClient
cache = RedisClient(config.redis.host, config.redis.port)

## Extracting data from website


In [7]:
pip install bs4


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.2[0m[39;49m -> [0m[32;49m24.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


In [8]:
from data_fetch import extract_text_from_url

The University of Buenos Aires (Spanish: Universidad de Buenos Aires, UBA) is a public research university in Buenos Aires, Argentina. It was established in 1821. It has educated 17 Argentine presidents, produced four of the country's five Nobel Prize laureates, and is responsible for approximately 40% of the country's research output.[12][13][14]

The university's academic strength and regional leadership make it attractive to many international students, especially at the postgraduate level.[15][16] Just over 4 percent of undergraduates are foreigners, while 15 percent of postgraduate students come from abroad.[17] The Faculty of Economic Sciences has the highest rate of international postgraduate students at 30 percent, in line with its reputation as a "top business school with significant international influence."[18][19]

The University of Buenos Aires enrolls more than 328,000 students and is organized into 13 independent faculties.[20] It administers 6 hospitals, 16 museums, 13 

In [9]:
extracted_text = extract_text_from_url("https://lse.posgrados.fi.uba.ar/consultas/inscripciones")

In [10]:
extracted_text

'A continuación se presenta la información detallada de las Especialidades, Diplomaturas y Maestrías de cada una de las temáticas presentadas por el Laboratorio.\nPara consultas e inscripción escribir un correo a inscripcion.lse@fi.uba.ar\xa0\nDiplomatura de Extensión en Inteligencia Artificial Aplicada: \xa0-Duración: 6 meses,en base 6 horas de clase por semana.-Dirigida a: Personas que no cuenten con título universitario y quieran iniciar en las temáticas de la IA.\nDiplomatura Universitaria Superior en Inteligencia Artificial Aplicada: -Duración: 6 meses, en base 6 horas de clase por semana.-Dirigido a: Personas con título universitario o título técnico superior\nCarrera de Especialización en Inteligencia Artificial: -Duración: 1 año, en base de 9 horas semanales de clases y 9 horas de dedicación por fuera de clases.-Dirigido a: Personas con Título Universitario con una duración mayor o igual a 4 años. \nMaestría en Inteligencia Artificial:-Se obtiene al completar la Carrera de Espe

## Calling Mistral-7B-Instruct
from [preloaded](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3) huggingface model

In [11]:
# Function to handle chat completions
def chat_completion(user_input, injected_text = None):

    if injected_text is not None:
        cache.add_to_conversation("user", injected_text)

    cache.add_to_conversation("user", user_input)
    # Add the user's message to the conversation history
    conversation_history = cache.get_conversation_history()

    # Generate the assistant's response in stream mode
    completion = client.chat.completions.create(
        stream=True,
        model="mistral",
        messages=conversation_history,
        max_tokens=256,
        temperature=0.8,
    )

    # Collect and accumulate the response from the generator
    full_response = ""
    for chunk in completion:
        if chunk.choices[0].delta.content is not None:
            full_response += chunk.choices[0].delta.content

    # Add the complete assistant's response to conversation history
    #conversation_history.append({"role": "assistant", "content": full_response})
    cache.add_to_conversation("assistant", full_response)
    # Return the assistant's response for display
    return full_response

# Function to format and display the conversation history
def format_conversation(conversation_history):
    return "\n".join(
        f"{msg['role'].capitalize()}: {msg['content']}"
        for msg in conversation_history if 'content' in msg
    )

def process_input(user_input, url=None):
    # Extract content from URL if provided, otherwise proceed with the conversation
    injected_text = extract_text_from_url(url) if url else None
    
    # Generate a response with optional injected text
    return chat_completion(user_input, injected_text)

## Gradio

In [12]:
pip install gradio


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.2[0m[39;49m -> [0m[32;49m24.3.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


In [13]:
import gradio as gr

def on_submit(user_message):
    # This function calls the chat_completion function and updates the output
    return chat_completion(user_message)

with gr.Blocks() as demo:
    gr.Markdown("### Chat with Mistral AI")
    user_input = gr.Textbox(label="Ask a question", placeholder="Type your message here...")
    conversation_output = gr.Textbox(label="Conversation History", lines=10, interactive=False)

    # Link inputs and outputs
    user_input.submit(process_input, [
        user_input,
        gr.Textbox(label="URL to inject text (optional)")
                                      ], conversation_output)

# Launch the app
demo.launch()

  from .autonotebook import tqdm as notebook_tqdm


* Running on local URL:  http://127.0.0.1:7861

To create a public link, set `share=True` in `launch()`.


