# Building the FastAPI Backend using Langserve

Let's first review what we have accomplished so far in deploying our Smart Bot:

1) **Notebook 12**: Instructions on deploying a Backend API using the Azure Bot Service.
2) **Notebook 13**: Guidelines on interfacing with the Bot Service programmatically using POST requests.

Here are the pros and cons of using the Bot Service:

**Pros**:
- Easily connects to multiple channels, including O365 emails, MS Teams, web chat plugins, etc.
- The Bot Framework Python SDKs provide numerous utilities like typing indicators, proactive messages, cards, file uploads, etc.
- Includes built-in authentication and logging mechanisms, requiring minimal effort from us.
- Offers SDKs for Python, JavaScript, and .NET.
- Enables easy integration with the Application Insights Service for application monitoring.
- Like other Microsoft services, it is backed by the Microsoft product and support teams.

**Cons**:
- Does not yet support streaming.
- Lacks support for private endpoints.
- As a service, it cannot be containerized or run on Kubernetes, container apps, etc.
- Requires a steeper learning curve to fully understand all its capabilities.

As an alternative, in this notebook, we will build another Backend API using FastAPI with LangServe. <br>This API is self-contained, allowing it to be packaged in a Docker container and deployed anywhere. 

In this notebook, we will zip the code and upload it to a new slot in the same Azure Web App service where the BotService API resides.


From the [LANGSERVE DOCUMENTATION](https://python.langchain.com/docs/langserve):

    LangServe helps developers deploy LangChain runnables and chains as a REST API.

    This library is integrated with FastAPI and uses pydantic for data validation.

    In addition, it provides a client that can be used to call into runnables deployed on a server. A JavaScript client is available in LangChain.js.

## The main file: Server.py

Just as the main code in the Bot Service API resides in bot.py, in this FastAPI backend, the main code resides in `apps/backend/langserve/app/server.py`

**Take a look at it!**

In `server.py` you will see that we created 4 endpoints:

- `/docs/` 
  - This endpoint shows the OpenAPI definition (Swagger) of the API
- `/chatgpt/`
  - This endpoint uses a simple LLM to answer with no system prompt
- `/joke/`
  - This endpoint uses chain with a LLM + prompt + a custom structured json output (adds the timestamp of the server)
- `/agent/`
  - This is our the endpoint for our SMART GPT Bot brain agent 
  
For every endpoint all these routes are available: `/invoke/`, `/batch/`, `/stream/` and `/stream_events/`

## Deploy in Azure App service

In `apps/backend/langserve/README.md` you will find all the instructions on how to Zip the code and upload it to the Azure Web App. We will be using the same Azure Web App Service created for the Bot Service API.

=> **GO AHEAD NOW AND FOLLOW THE INSTRUCTIONS in `apps/backend/langserve/README.md`**

## (optional) Deploy the server locally

1) Go to the file `apps/backend/langserve/app/server.py` and uncomment the following code to test locally:
```python
    ### uncomment this section to run server in local host #########

    # from pathlib import Path
    # from dotenv import load_dotenv
    # # Calculate the path three directories above the current script
    # library_path = Path(__file__).resolve().parents[4]
    # sys.path.append(str(library_path))
    # load_dotenv(str(library_path) + "/credentials.env")
    # os.environ["AZURE_OPENAI_MODEL_NAME"] = os.environ["GPT35_DEPLOYMENT_NAME"]

    ###################################
```
2) Open a terminal, activate the right conda environment, then go to this folder `apps/backend/langserve/app` and run this command:
    
```bash
python server.py
```

Alternatively, you can go to this folder `apps/backend/langserve/` and run this command:
```bash
langchain serve
```

This will run the backend server API in localhost port 8000. 

3) If you are working on an Azure ML compute instance you can access the OpenAPI (Swagger) definition in this address:

    https:\<your_compute_name\>-8000.\<your_region\>.instances.azureml.ms/
    
    for example:
    https://pabmar1-8000.australiaeast.instances.azureml.ms/

## Talk to the API using POST requests

In [1]:
import requests
import json
import sys
import time
import random

### Functions to post and read responses from the API. It supports streaming!!

In [2]:
def process_line(line):
    """Process a single line from the stream."""
    # print("line:",line)
    if line.startswith('data: '):
        # Extract JSON data following 'data: '
        json_data = line[len('data: '):]
        try:
            data = json.loads(json_data)
            if "event" in data:
                handle_event(data)
            elif "content" in data:
                # If there is immediate content to print
                print(data["content"], end="", flush=True)
            elif "steps" in data:
                print(data["steps"])
            elif "output" in data:
                print(data["output"])
        except json.JSONDecodeError as e:
            print(f"JSON decoding error: {e}")
    elif line.startswith('event: '):
        pass
    elif ": ping" in line:
        pass
    else:
        print(line)

def handle_event(event):
    """Handles specific events, adjusting output based on event type."""
    kind = event["event"]
    if kind == "on_chain_start" and event["name"] == "AgentExecutor":
        print(f"Starting agent: {event['name']}")
    elif kind == "on_chain_end" and event["name"] == "AgentExecutor":
        print("\n--")
        print(f"Done agent: {event['name']}")
    elif kind == "on_chat_model_stream":
        content = event["data"]["chunk"]["content"]
        if content:  # Ensure content is not None or empty
            print(content, end="", flush=True)
    elif kind == "on_tool_start":
        # Assuming event['data'].get('input') is a dictionary
        tool_inputs = event['data'].get('input')
        if isinstance(tool_inputs, dict):
            # Joining the dictionary into a string format key: 'value'
            inputs_str = ", ".join(f"'{v}'" for k, v in tool_inputs.items())
        else:
            # Fallback if it's not a dictionary or in an unexpected format
            inputs_str = str(tool_inputs)
        print(f"Starting tool: {event['name']} with input: {inputs_str}")
    elif kind == "on_tool_end":
        print(f"Done tool: {event['name']}\n--")

    
def consume_api(url, payload):
    """Uses requests POST to talkt to the FastAPI backend, supports streaming"""
    
    headers = {'Content-Type': 'application/json'}
    
    with requests.post(url, json=payload, headers=headers, stream=True) as response:
        try:
            response.raise_for_status()  # Raises a HTTPError if the response is not 200
            
            for line in response.iter_lines():
                if line:  # Check if the line is not empty
                    decoded_line = line.decode('utf-8')
                    process_line(decoded_line)
                    
                    
        except requests.exceptions.HTTPError as err:
            print(f"HTTP Error: {err}")
        except Exception as e:
            print(f"An error occurred: {e}")


### Base URL

In [3]:
base_url = "https://<YOUR_BACKEND_WEBAPP_NAME>-staging.azurewebsites.net"  # Note that "-staging" is the Azure App Service slot where the LangServe API was deployed
# base_url = "http://localhost:8000" # If you deployed locally

### `/chatgpt/` endpoint

In [4]:
payload = {'input': 'explain long covid in just 2 short sentences'}  # Your POST request payload

In [5]:
# URL of the FastAPI Invoke endpoint
url = base_url + '/chatgpt/invoke'
consume_api(url, payload)

{"output":{"content":"Long COVID refers to a range of symptoms that persist for weeks or months after the initial infection with COVID-19. These symptoms can include fatigue, shortness of breath, and cognitive difficulties.","additional_kwargs":{},"response_metadata":{"token_usage":{"completion_tokens":38,"prompt_tokens":16,"total_tokens":54},"model_name":"gpt-35-turbo","system_fingerprint":"fp_2f57f81c11","prompt_filter_results":[{"prompt_index":0,"content_filter_results":{"hate":{"filtered":false,"severity":"safe"},"self_harm":{"filtered":false,"severity":"safe"},"sexual":{"filtered":false,"severity":"safe"},"violence":{"filtered":false,"severity":"safe"}}}],"finish_reason":"stop","logprobs":null,"content_filter_results":{"hate":{"filtered":false,"severity":"safe"},"self_harm":{"filtered":false,"severity":"safe"},"sexual":{"filtered":false,"severity":"safe"},"violence":{"filtered":false,"severity":"safe"}}},"type":"ai","name":null,"id":"run-c6276c52-6d60-4c94-82bb-c0cc2c9eaf0b-0","ex

In [6]:
# URL of the FastAPI streaming endpoint
url = base_url + '/chatgpt/stream'
consume_api(url, payload)

Long COVID is a condition where people continue to experience symptoms of COVID-19 long after the initial infection has cleared. These symptoms can include fatigue, shortness of breath, and brain fog.

### `/joke` endpoint : chain with custom output

In [7]:
payload = {'input': {"topic": "highschool", "language":"english"}}

url = base_url + '/joke/invoke'

consume_api(url, payload)

{"output":{"content":"Why don't you ever hear a pterodactyl using the bathroom in high school? Because they're extinct!","info":{"timestamp":"2024-04-19T22:21:24.182157"}},"metadata":{"run_id":"da46e2af-5363-4f65-b91c-80db8c82cea5","feedback_tokens":[]}}


In [8]:
# URL of the FastAPI streaming endpoint
url = base_url + '/joke/stream_events'

consume_api(url, payload)

Why did the math book go to high school?

It wanted to become well-rounded!

### `/agent` endpoint : our complex smart bot

In [9]:
random_session_id = "session"+ str(random.randint(1, 1000))
ramdom_user_id = "user"+ str(random.randint(1, 1000))

config={"configurable": {"session_id": random_session_id, "user_id": ramdom_user_id}}
print(random_session_id, ramdom_user_id)

session462 user68


In [10]:
payload = {'input': {"question": "Hi, I am Pablo, what is your name?"}, 'config': config}
 
url = base_url + '/agent/invoke'

consume_api(url, payload)

{"output":{"output":"I'm here to assist you with any questions or tasks you have. How can I help you today?"},"metadata":{"run_id":"fae317ea-2efd-4f85-b145-65ca4878df0a","feedback_tokens":[]}}


In [11]:
payload = {'input': {"question": "docsearch, what is CLP?"}, 'config': config}
 
url = base_url + '/agent/stream_events'

consume_api(url, payload)

Starting agent: AgentExecutor
Starting tool: docsearch with input: 'CLP'
Done tool: docsearch
--
I have found multiple meanings and applications of the term "CLP" in different contexts. Here are the different meanings and their respective sources:

1. **Constraint Logic Programming (CLP):**
   - **Definition:** Constraint Logic Programming (CLP) is a powerful extension of conventional logic programming that incorporates constraint languages and constraint solving methods into logic programming languages.
   - **Key Concepts:** CLP involves the parametrization of a logic programming language with respect to a constraint language and a domain of computation, yielding soundness and completeness results for an operational semantics relying on a constraint solver for the employed constraint language.
   - **Source:** [arXiv:cs/0008036v1](https://datasetsgptsmartsearch.blob.core.windows.net/arxivcs/pdf/0008/0008036v1.pdf?sv=2022-11-02&ss=b&srt=sco&sp=rl&se=2026-01-03T02:11:44Z&st=2024-01-02T

In [26]:
payload = {'input': {"question": "bing, give me the current salary of a dental hygenist in texas"}, 'config': config}
 
url = base_url + '/agent/stream_events'

consume_api(url, payload)

Starting agent: AgentExecutor
Starting tool: bing with input: 'current salary of a dental hygienist in Texas'
Done tool: bing
--
I have found various estimates for the average salary of a dental hygienist in Texas from different sources. Here are the estimates:

1. **Indeed:** The average salary for a dental hygienist is $46.88 per hour in Texas, based on 2.8k reported salaries. [Source](https://www.indeed.com/career/dental-hygienist/salaries/TX)

2. **Salary.com:** The average salary for a dental hygienist in Texas is $81,346 as of March 26, 2024, with a range typically falling between $71,983 and $91,085. [Source](https://www.salary.com/research/salary/benchmark/dental-hygienist-salary/tx)

3. **Glassdoor:** The highest reported salary for a dental hygienist in Texas is $130,309 per year, based on anonymous submissions. [Source](https://www.glassdoor.com/Salaries/texas-dental-hygienist-salary-SRCH_IL.0,5_IS1347_KO6,22_IP4.htm)

4. **ZipRecruiter:** The average hourly pay for a dental

In [27]:
payload = {'input': {"question": "docsearch, How Covid affects obese people? and elderly"}, 'config': config}
 
url = base_url + '/agent/stream_events'

consume_api(url, payload)

Starting agent: AgentExecutor
Starting tool: docsearch with input: 'How does COVID-19 affect obese people?'
Starting tool: docsearch with input: 'How does COVID-19 affect elderly people?'
Done tool: docsearch
--
Done tool: docsearch
--
### How COVID-19 Affects Obese People

Obesity has been identified as a significant risk factor for the severity of COVID-19, leading to more serious symptoms and negative prognoses for infected individuals. Here are some key impacts of COVID-19 on obese individuals:

1. **Increased Risk of Severe Disease**: Studies have shown that obese patients with COVID-19 have increased odds of progressing to severe disease. They are more likely to exhibit symptoms such as cough and fever compared to non-obese patients. Additionally, men who are obese have increased odds of developing severe COVID-19 compared to those with normal weight.

2. **Higher Likelihood of Serious Complications**: The World Health Organization (WHO) considers obesity as a major risk factor f

In [18]:
payload = {'input': {"question": "sqlsearch, how many people were hospitalized in CA?"}, 'config': config}
 
url = base_url + '/agent/stream_events'

consume_api(url, payload)

Starting agent: AgentExecutor
Starting tool: sqlsearch with input: 'number of people hospitalized in California'
Done tool: sqlsearch
--
The total number of people currently hospitalized in California is 2,653,612.

This information was obtained by querying the `covidtracking` table for the sum of the `hospitalizedCurrently` column where the state is 'CA'. The SQL query used for this purpose is:

```sql
SELECT SUM(hospitalizedCurrently) AS total_hospitalized FROM covidtracking WHERE state = 'CA'
```

If you need further information or assistance, feel free to ask!
--
Done agent: AgentExecutor


In [19]:
payload = {'input': {"question": "thank you!"}, 'config': config}
 
url = base_url + '/agent/stream_events'

consume_api(url, payload)

Starting agent: AgentExecutor
You're welcome! If you have any more questions or if there's anything else I can help you with, feel free to ask.
--
Done agent: AgentExecutor


## Now let's try all endpoints and routes using langchain local RemoteRunnable

All these are also available in TypeScript, see LangServe.JS documentation

In [28]:
from langchain.schema import SystemMessage, HumanMessage
from langchain.prompts import ChatPromptTemplate
from langchain.schema.runnable import RunnableMap
from langserve import RemoteRunnable

chatgpt_chain = RemoteRunnable(base_url + "/chatgpt/")
joke_chain = RemoteRunnable(base_url + "/joke/")
agent_chain = RemoteRunnable(base_url + "/agent/")


In [29]:
joke_chain.invoke({"topic": "cars", "language":"english"})

{'content': 'Why did the car break up with the motorcycle?\nBecause it was tired of being co-dependent!',
 'info': {'timestamp': '2024-04-19T22:25:56.305034'}}

In [30]:
# or async
await joke_chain.ainvoke({"topic": "parrots", "language":"spanish"})

{'content': '¿Por qué los loros no saben contar chistes? Porque siempre repiten los mismos.',
 'info': {'timestamp': '2024-04-19T22:26:00.392746'}}

In [32]:
prompt = [
    SystemMessage(content='you are a helpful assistant that responds to the user question.'),
    HumanMessage(content='explain long covid')
]

# Supports astream
async for msg in chatgpt_chain.astream(prompt):
    print(msg.content, end="", flush=True)

Long COVID, also known as post-acute sequelae of SARS-CoV-2 infection (PASC), refers to a range of symptoms that persist for weeks or months after the acute phase of a COVID-19 infection has resolved. These symptoms can include fatigue, shortness of breath, chest pain, joint pain, and brain fog, among others. The exact cause of long COVID is not yet fully understood, but it is believed to involve a combination of factors, including lingering viral effects, immune system dysregulation, and potential damage to organs or tissues. Long COVID can significantly impact a person's quality of life and may require ongoing medical care and support.

In [36]:
async for event in agent_chain.astream_events({"question": " booksearch, what is the story about the stolen kidney, and what book is it in?"}, config=config, version="v1"):
    kind = event["event"]
    if kind == "on_chain_start":
        if (event["name"] == "AgentExecutor"):  
            print(f"Starting agent: {event['name']}")
    elif kind == "on_chain_end":
        if (event["name"] == "AgentExecutor"):
            print()
            print("--")
            print(f"Done agent: {event['name']}")
    if kind == "on_chat_model_stream":
        content = event["data"]["chunk"].content
        if content:
            print(content, end="", flush=True)
    elif kind == "on_tool_start":
        print("--")
        print(f"Starting tool: {event['name']} with inputs: {event['data'].get('input')}")
    elif kind == "on_tool_end":
        print(f"Done tool: {event['name']}")
        # print(f"Tool output was: {event['data'].get('output')}")
        print("--")

Starting agent: AgentExecutor
--
Starting tool: booksearch with inputs: {'query': 'stolen kidney'}
Done tool: booksearch
--
The concept of a "stolen kidney" is often associated with an urban legend known as the "Kidney Heist tale." This urban legend typically involves a scenario where an individual is drugged, wakes up in an ice-filled bathtub, and discovers that one of their kidneys has been surgically removed. The story is often used as a cautionary tale about accepting drinks from strangers and has circulated in various versions over the years, each sharing the core elements of the drugged drink, the ice-filled bathtub, and the kidney-theft punch line.

The Kidney Heist tale is an example of a story that sticks in people's minds. It is memorable, understandable, and effective in changing thought or behavior. This urban legend shares many traits with other successful ideas, such as unexpected outcomes, concrete details, and emotional impact. The story's ability to stick in people's m