Skip to content

Azure OpenAI, OSS LLM 🌊1. Vector storage and 🦙langchain 🔎2. Azure Search ChatGpt demo 3. Microsoft ♾️Semantic-Kernel with 🌌 Cosmos DB, etc.

Notifications You must be signed in to change notification settings

sujitrulz/azure-openai-elastic-vector-langchain

 
 

Repository files navigation

updated: 07/05/2023

Azure OpenAI + LLM (Large language model)

This repository contains references to open-source models similar to ChatGPT, as well as Langchain and prompt engineering libraries. It also includes related samples and research on Langchain, Vector Search (including feasibility checks on Elasticsearch, Azure Cognitive Search, Azure Cosmos DB), and more.

Not being able to keep up with and test every recent update, sometimes I simply copied them into this repository for later review. some code might be outdated.

Rule: Brief each item on one or a few lines as much as possible.

What's the difference between Azure OpenAI and OpenAI?

  1. OpenAI is a better option if you want to use the latest features like function calling, plug-ins, and access to the latest models.
  2. Azure OpenAI is recommended if you require a reliable, secure, and compliant environment.
  3. Azure OpenAI provides seamless integration with other Azure services..
  4. Azure OpenAI offers private networking and role-based authentication, and responsible AI content filtering.
  5. Azure OpenAI provides a Service Level Agreement (SLA) that guarantees a certain level of uptime and support for the service.
  6. Azure OpenAI does not use user input as training data for other customers. Data, privacy, and security for Azure OpenAI

Table of contents

Section 1 : llama-index and Vector Storage (Database)

This section has been created for testing and feasibility checks using elastic search as a vector database and integration with llama-index. llama-index is specialized in integration layers to external data sources.

Opensearch/Elasticsearch setup

  • docker : Opensearch Docker-compose
  • docker-elasticsearch : Not working for ES v8, requiring security plug-in with mandatory
  • docker-elk : Elasticsearch Docker-compose, Optimized Docker configurations with solving security plug-in issues.
  • es-open-search-set-analyzer.py : Put Language analyzer into Open search
  • es-open-search.py : Open search sample index creation
  • es-search-set-analyzer.py : Put Language analyzer into Elastic search
  • es-search.py : Usage of Elastic search python client
  • files : The Sample file for consuming

llama-index

  • index.json : Vector data local backup created by llama-index

  • index_vector_in_opensearch.json : Vector data stored in Open search (Source: files\all_h1.pdf)

  • llama-index-azure-elk-create.py: llama-index ElasticsearchVectorClient (Unofficial file to manipulate vector search, Created by me, Not Fully Tested)

  • llama-index-lang-chain.py : Lang chain memory and agent usage with llama-index

  • llama-index-opensearch-create.py : Vector index creation to Open search

  • llama-index-opensearch-query-chatgpt.py : Test module to access Azure Open AI Embedding API.

  • llama-index-opensearch-query.py : Vector index query with questions to Open search

  • llama-index-opensearch-read.py : llama-index ElasticsearchVectorClient (Unofficial file to manipulate vector search, Created by me, Not Fully Tested)

  • env.template : The properties. Change its name to .env once your values settings is done.

    OPENAI_API_TYPE=azure
    OPENAI_API_BASE=https://????.openai.azure.com/
    OPENAI_API_VERSION=2022-12-01
    OPENAI_API_KEY=<your value in azure>
    OPENAI_DEPLOYMENT_NAME_A=<your value in azure>
    OPENAI_DEPLOYMENT_NAME_B=<your value in azure>
    OPENAI_DEPLOYMENT_NAME_C=<your value in azure>
    OPENAI_DOCUMENT_MODEL_NAME=<your value in azure>
    OPENAI_QUERY_MODEL_NAME=<your value in azure>
    
    INDEX_NAME=gpt-index-demo
    INDEX_TEXT_FIELD=content
    INDEX_EMBEDDING_FIELD=embedding
    ELASTIC_SEARCH_ID=elastic
    ELASTIC_SEARCH_PASSWORD=elastic
    OPEN_SEARCH_ID=admin
    OPEN_SEARCH_PASSWORD=admin

llama-index example

  • llama-index-es-handson\callback-debug-handler.py: callback debug handler
  • llama-index-es-handson\chat-engine-flare-query.py: FLARE
  • llama-index-es-handson\chat-engine-react.py: ReAct
  • llama-index-es-handson\milvus-create-query.py: Milvus Vector storage

Vector Storage Comparison

Vector Storage Options for Azure

Milvus Embedded

  • pip install milvus

  • Docker compose: https://milvus.io/docs/install_offline-docker.md

  • Milvus Embedded through python console only works in Linux and Mac OS.

  • In Windows, Use this link, https://github.com/matrixji/milvus/releases.

    # Step 1. Start Milvus
    
    1. Unzip the package
    Unzip the package, and you will find a milvus directory, which contains all the files required.
    
    2. Start a MinIO service
    Double-click the run_minio.bat file to start a MinIO service with default configurations. Data will be stored in the subdirectory s3data.
    
    3. Start an etcd service
    Double-click the run_etcd.bat file to start an etcd service with default configurations.
    
    4. Start Milvus service
    Double-click the run_milvus.bat file to start the Milvus service.
    
    # Step 2. Run hello_milvus.py
    
    After starting the Milvus service, you can test by running hello_milvus.py. See Hello Milvus for more information.
    

Conclusion

  • Azure Open AI Embedding API, text-embedding-ada-002, supports 1536 dimensions. Elastic search, Lucene based engine, supports 1024 dimensions as a max. Open search can insert 16,000 dimensions as a vector storage. Open search is available to use as a vector database with Azure Open AI Embedding API.

  • @citation: open ai documents: text-embedding-ada-002: Smaller embedding size. The new embeddings have only 1536 dimensions, one-eighth the size of davinci-001 embeddings, making the new embeddings more cost effective in working with vector databases. https://openai.com/blog/new-and-improved-embedding-model

  • @citation: open search documents: However, one exception to this is that the maximum dimension count for the Lucene engine is 1,024, compared with 16,000 for the other engines. https://opensearch.org/docs/latest/search-plugins/knn/approximate-knn/

  • @llama-index ElasticsearchReader class: The name of the class in llama-index is ElasticsearchReader. However, actually, it can only work with open search.

llama-index Deep dive

Section 2 : ChatGPT + Enterprise data with Azure OpenAI and Cognitive Search

The files in this directory, extra_steps, have been created for managing extra configurations and steps for launching the demo repository.

https://github.com/Azure-Samples/azure-search-openai-demo : Python, ReactJs, Typescript

sk

Configuration

  1. (optional) Check Azure module installation in Powershell by running ms_internal_az_init.ps1 script
  2. (optional) Set your Azure subscription Id to default

Start the following commands in ./azure-search-openai-demo directory

  1. (deploy azure resources) Simply Run azd up

The azd stores relevant values in the .env file which is stored at ${project_folder}\.azure\az-search-openai-tg\.env.

AZURE_ENV_NAME=<your_value_in_azure>
AZURE_LOCATION=<your_value_in_azure>
AZURE_OPENAI_SERVICE=<your_value_in_azure>
AZURE_PRINCIPAL_ID=<your_value_in_azure>
AZURE_SEARCH_INDEX=<your_value_in_azure>
AZURE_SEARCH_SERVICE=<your_value_in_azure>
AZURE_STORAGE_ACCOUNT=<your_value_in_azure>
AZURE_STORAGE_CONTAINER=<your_value_in_azure>
AZURE_SUBSCRIPTION_ID=<your_value_in_azure>
BACKEND_URI=<your_value_in_azure>
  1. Move to app by cd app command
  2. (sample data loading) Move to scripts then Change into Powershell by Powershell command, Run prepdocs.ps1
  • console output (excerpt)

            Uploading blob for page 29 -> role_library-29.pdf
            Uploading blob for page 30 -> role_library-30.pdf
    Indexing sections from 'role_library.pdf' into search index 'gptkbindex'
    Splitting './data\role_library.pdf' into sections
            Indexed 60 sections, 60 succeeded
    
  1. Move to app by cd .. and cd app command
  2. (locally running) Run start.cmd
  • console output (excerpt)

    Building frontend
    
    
    > frontend@0.0.0 build \azure-search-openai-demo\app\frontend
    > tsc && vite build
    
    vite v4.1.1 building for production...
    ✓ 1250 modules transformed.
    ../backend/static/index.html                    0.49 kB
    ../backend/static/assets/github-fab00c2d.svg    0.96 kB
    ../backend/static/assets/index-184dcdbd.css     7.33 kB │ gzip:   2.17 kB
    ../backend/static/assets/index-41d57639.js    625.76 kB │ gzip: 204.86 kB │ map: 5,057.29 kB
    
    Starting backend
    
    * Serving Flask app 'app'
    * Debug mode: off
    WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
    * Running on http://127.0.0.1:5000
    Press CTRL+C to quit
    ...
    

Running from second times

  1. Move to app by cd .. and cd app command
  2. (locally running) Run start.cmd

(optional)

  • fix_from_origin : The modified files, setup related
  • ms_internal_az_init.ps1 : Powershell script for Azure module installation
  • ms_internal_troubleshootingt.ps1 : Set Specific Subscription Id as default

Introducing Azure OpenAI Service On Your Data in Public Preview

  • Azure OpenAI Service On Your Data in Public Preview Link

Azure OpenAI samples

  • Azure OpenAI samples: Link

  • A simple ChatGPT Plugin: Link

  • The repository for all Azure OpenAI Samples complementing the OpenAI cookbook.: Link

Another Reference Architectures

Azure OpenAI Embeddings QnA Azure Cosmos DB + OpenAI ChatGPT C# blazor and Azure Custom Template
embeddin_azure_csharp gpt-cosmos
C# Implementation ChatGPT + Enterprise data with Azure OpenAI and Cognitive Search Simple ChatGPT UI application Typescript, ReactJs and Flask
embeddin_azure_csharp gpt-cosmos
Azure Video Indexer demo Azure Video Indexer + OpenAI -
demo-videoindexer - -

Azure Open AI work with Cognitive Search act as a Long-term memory

Azure Cognitive Search : Vector Search

Options: 1. Vector similarity search, 2. Pure Vector Search, 3. Hybrid Search, 4. Semantic Hybrid Search

  • azure-search-vector-sample\azure-search-vector-python-sample.ipynb: Azure Cognitive Search - Vector and Hybrid Search

Section 3 : Microsoft Semantic Kernel with Azure Cosmos DB

  • Microsoft Langchain Library supports C# and Python and offers several features, some of which are still in development and may be unclear on how to implement. However, it is simple, stable, and faster than Python-based open-source software. The features listed on the link include: Semantic Kernel Feature Matrix

    sk
  • This section includes how to utilize Azure Cosmos DB for vector storage and vector search by leveraging the Semantic-Kernel.

Semantic-Kernel

  • appsettings.template.json : Environment value configuration file.
  • ComoseDBVectorSearch.cs : Vector Search using Azure Cosmos DB
  • CosmosDBKernelBuild.cs : Kernel Build code (test)
  • CosmosDBVectorStore.cs : Embedding Text and store it to Azure Cosmos DB
  • LoadDocumentPage.cs : PDF splitter class. Split the text to unit of section. (C# version of azure-search-openai-demo/scripts/prepdocs.py)
  • LoadDocumentPageOutput : LoadDocumentPage class generated output
  • MemoryContextAndPlanner.cs : Test code of context and planner
  • MemoryConversationHistory.cs : Test code of conversation history
  • Program.cs : Run a demo. Program Entry point
  • SemanticFunction.cs : Test code of conversation history
  • semanticKernelCosmos.csproj : C# Project file
  • Settings.cs : Environment value class
  • SkillBingSearch.cs : Bing Search Skill
  • SkillDALLEImgGen.cs : DALLE Skill (Only OpenAI, Azure Open AI not supports yet)

Environment variable

{
  "Type": "azure",
  "Model": "<model_deployment_name>",
  "EndPoint": "https://<your-endpoint-value>.openai.azure.com/",
  "AOAIApiKey": "<your-key>",
  "OAIApiKey": "",
  "OrdId": "-", //The value needs only when using Open AI.
  "BingSearchAPIKey": "<your-key>",
  "aoaiDomainName": "<your-endpoint-value>",
  "CosmosConnectionString": "<cosmos-connection-string>"
}
  • Semantic Kernel has recently introduced support for Azure Cognitive Search as a memory. However, it currently only supports Azure Cognitive Search with a Semantic Search interface, lacking any features to store vectors to ACS.

  • According to the comments, this suggests that the strategy of the plan could be divided into two parts. One part focuses on Semantic Search, while the other involves generating embeddings using OpenAI.

Azure Cognitive Search automatically indexes your data semantically, so you don't need to worry about embedding generation. samples/dotnet/kernel-syntax-examples/Example14_SemanticMemory.cs.

// TODO: use vectors
// @Microsoft Semactic Kernel
var options = new SearchOptions
{
        QueryType = SearchQueryType.Semantic,
        SemanticConfigurationName = "default",
        QueryLanguage = "en-us",
        Size = limit,
};
  • SemanticKernel Implementation sample to overcome Token limits of Open AI model. Semantic Kernel でトークンの限界を超えるような長い文章を分割してスキルに渡して結果を結合したい (zenn.dev) Semantic Kernel でトークンの限界を超える

Bing search Web UI and Semantic Kernel sample code

Semantic Kernel sample code to integrate with Bing Search (ReAct??)

\ms-semactic-bing-notebook

  • gs_chatgpt.ipynb: Azure Open AI ChatGPT sample to use Bing Search
  • gs_davinci.ipynb: Azure Open AI Davinci sample to use Bing Search

Bing Search UI for demo

\bing-search-webui: (Utility, to see the search results from Bing Search API)

bingwebui

Section 4 : Langchain

Langchain Cheetsheet

Langchain Impressive Features

Langchain Quick Start: How to Use and Useful Utilities

  • Langchain_1_(믹스의_인공지능).ipynb : Langchain Get started

  • langchain_1_(믹스의_인공지능).py : -

  • Langchain_2_(믹스의_인공지능).ipynb : Langchain Utilities

  • langchain_2_(믹스의_인공지능).py : -

    from langchain.chains.summarize import load_summarize_chain
    chain = load_summarize_chain(chat, chain_type="map_reduce", verbose=True)
    chain.run(docs[:3])

    @citation: @practical-ai

Langchain chain type: Summarizer

  • stuff: Sends everything at once in LLM. If it's too long, an error will occur.
  • map_reduce: Summarizes by dividing and then summarizing the entire summary.
  • refine: (Summary + Next document) => Summary
  • map_rerank: Ranks by score and summarizes to important points.

langflow

  • langflow: LangFlow is a UI for LangChain, designed with react-flow.

Langchain vs llama-index

  • Basically llmaindex is a smart storage mechanism, while Langchain is a tool to bring multiple tools together. @citation

  • LangChain offers many features and focuses on using chains and agents to connect with external APIs. In contrast, LlamaIndex is more specialized and excels at indexing data and retrieving documents.

Section 5: Prompt Engineering, and Langchain vs Semantic Kernel

Prompt Engineering

  1. Zero-shot

  2. Few-shot Learning

  3. Chain of Thought (CoT): ReAct and Self Consistency also inherit the CoT concept.

  4. Recursively Criticizes and Improves (RCI)

  5. ReAct: Grounding with external sources. (Reasoning and Act)

  6. Chain-of-Thought Prompting

  7. Tree of Thought (github)

    • tree-of-thought\forest_of_thought.py: Forest of thought Decorator sample
    • tree-of-thought\tree_of_thought.py: Tree of thought Decorator sample
    • tree-of-thought\react-prompt.py: ReAct sample without Langchain
  • Prompt Concept

    1. Question-Answering
    2. Roll-play: Act as a [ROLE] perform [TASK] in [FORMAT]
    3. Reasoning
    4. Prompt-Chain
    5. Program Aided Language Model
    6. Recursive Summarization: Long Text -> Chunks -> Summarize pieces -> Concatenate -> Summarize
  • 🤩Prompt Engineering : ⭐⭐⭐⭐⭐

  • Prompt Engineering Guide: Copyright © 2023 DAIR.AI

Azure OpenAI Prompt Guide

OpenAI Prompt Guide

DeepLearning.ai Prompt Engineering COURSE and others

Awesome ChatGPT Prompts

ChatGPT : “user”, “assistant”, and “system” messages.

To be specific, the ChatGPT API allows for differentiation between “user”, “assistant”, and “system” messages.

  1. always obey "system" messages.
  2. all end user input in the “user” messages.
  3. "assistant" messages as previous chat responses from the assistant.

Presumably, the model is trained to treat the user messages as human messages, system messages as some system level configuration, and assistant messages as previous chat responses from the assistant. (@https://blog.langchain.dev/using-chatgpt-api-to-evaluate-chatgpt/)

Finetuning

PEFT: Parameter-Efficient Fine-Tuning (Youtube)

Sparsification

  • @citation: Binghchat

    Sparsification is a technique used to reduce the size of large language models (LLMs) by removing redundant parameters without significantly affecting their performance. It is one of the methods used to compress LLMs. LLMs are neural networks that are trained on massive amounts of data and can generate human-like text. The term “sparsification” refers to the process of removing redundant parameters from these models.

Small size with Textbooks: High quality synthetic dataset

  • ph-1: Despite being small in size, phi-1 attained 50.6% on HumanEval and 55.5% on MBPP.
  • Orca: Orca learns from rich signals from GPT 4 including explanation traces; step-by-step thought processes; and other complex instructions, guided by teacher assistance from ChatGPT.

Large Transformer Model Inference Optimization

Langchain vs Semantic Kernel

Langchain Semantic Kernel
Memory Memory
Tookit Skill
Tool LLM prompts (semantic functions) or native C# or Python code (native function)
Agent Planner
Chain Steps, Pipeline
Tool Connector

Semantic Kernel : Semantic Function

expressed in natural language in a text file "skprompt.txt" using SK's Prompt Template language. Each semantic function is defined by a unique prompt template file, developed using modern

Semantic Kernel : Prompt Template language Key takeaways

  1. Variables : use the {{$variableName}} syntax : Hello {{$name}}, welcome to Semantic Kernel!

  2. Function calls: use the {{namespace.functionName}} syntax : The weather today is {{weather.getForecast}}.

  3. Function parameters: {{namespace.functionName $varName}} and {{namespace.functionName "value"}} syntax : The weather today in {{$city}} is {{weather.getForecast $city}}.

  4. Prompts needing double curly braces : {{ "{{" }} and {{ "}}" }} are special SK sequences.

  5. Values that include quotes, and escaping :

For instance:

... {{ 'no need to \"escape" ' }} ... is equivalent to:

... {{ 'no need to "escape" ' }} ...

Langchain Agent

  1. If you're using a text LLM, first try zero-shot-react-description.

  2. If you're using a Chat Model, try chat-zero-shot-react-description.

  3. If you're using a Chat Model and want to use memory, try conversational-react-description.

  4. self-ask-with-search: self ask with search paper

  5. react-docstore: ReAct paper

Sementic Kernel Glossary

  • Glossary in Git

  • Glossary in MS Doc

    sk
    Journey Short Description
    ASK A user's goal is sent to SK as an ASK
    Kernel The kernel orchestrates a user's ASK
    Planner The planner breaks it down into steps based upon resources that are available
    Resources Planning involves leveraging available skills, memories, and connectors
    Steps A plan is a series of steps for the kernel to execute
    Pipeline Executing the steps results in fulfilling the user's ASK
    GET And the user gets what they asked for ...

Langchain vs Sementic Kernel vs Azure Machine Learning Prompt flow

  • What's the difference between LangChain and Semantic Kernel?

    LangChain has many agents, tools, plugins etc. out of the box. More over, LangChain has 10x more popularity, so has about 10x more developer activity to improve it. On other hand, Semantic Kernel architecture and quality is better, that's quite promising for Semantic Kernel. Link

  • What's the difference between Azure Machine Laering PromptFlow and Semantic Kernel?

    1. Low/No Code vs C#, Python, Java
    2. Focused on Prompt orchestrating vs Integrate LLM into their existing app.

guidance

guidance: Simple, intuitive syntax, based on Handlebars templating. Domain Specific Language (DSL) for handling model interaction.

Section 6 : Improvement

Introducing 100K Context Windows

Math problem-solving skill

Table Extraction

OpenAI's plans according to Sam Altman

Token counting & Token-limits

Avoid AI hallucination

  • NeMo Guardrails: Building Trustworthy, Safe and Secure LLM Conversational Systems

Gorilla: An API store for LLMs

  • Gorilla: An API store for LLMs: Gorilla: Large Language Model Connected with Massive APIs

    1. Used GPT-4 to generate a dataset of instruction-api pairs for fine-tuning Gorilla.

    2. Used the abstract syntax tree (AST) of the generated code to match with APIs in the database and test set for evaluation purposes.

    3. @citation Link

    Another user asked how Gorilla compared to LangChain; Patil replied: Langchain is a terrific project that tries to teach agents how to use tools using prompting. Our take on this is that prompting is not scalable if you want to pick between 1000s of APIs. So Gorilla is a LLM that can pick and write the semantically and syntactically correct API for you to call! A drop in replacement into Langchain!

  • Meta: Toolformer: Language Models That Can Use Tools, by MetaAI

Memory Optimization

  • PagedAttention : vLLM: Easy, Fast, and Cheap LLM Serving with PagedAttention, 24x Faster LLM Inference Link

Open AI Plugin and function calling

  • ChatGPT Plugin

  • ChatGPT Function calling

    Under the hood, functions are injected into the system message in a syntax the model has been trained on. This means functions count against the model's context limit and are billed as input tokens. If running into context limits, we suggest limiting the number of functions or the length of documentation you provide for function parameters.

Section 7 : Generative AI Landscape / List of OSS LLM

Generative AI Revolution: Exploring the Current Landscape

List of OSS LLM

Huggingface Open LLM Learboard

Hugging face Transformer

Hugging face StarCoder

Section 8 : References

picoGPT

  • An unnecessarily tiny implementation of GPT-2 in NumPy. picoGPT: Transformer Decoder

RLHF(Reinforcement Learning from Human Feedback)

  • Machine learning technique that trains a "reward model" directly from human feedback and uses the model as a reward function to optimize an agent's policy using reinforcement learning

  • Libraries: TRL, trlX, Argilla

Langchain and Prompt engineering library

AutoGPT / Communicative Agents

Democratizing the magic of ChatGPT with open models

  • The LLMs mentioned here are just small parts of the current advancements in the field. Most OSS LLM models have been built on the facebookresearch/llama. For a comprehensive list and the latest updates, please refer to the "Generative AI Landscape / List of OSS LLM" section.

  • facebookresearch/llama: Not licensed for commercial use

  • Falcon LLM Apache 2.0 license

  • LLM

Large Language and Vision Assistant

MLLM (multimodal large language model)

  • Facebook: ImageBind / SAM

    1. facebookresearch/ImageBind: ImageBind One Embedding Space to Bind Them All (github.com)
    2. facebookresearch/segment-anything(SAM): The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model. (github.com)
  • Microsoft: Kosmos-1 / Kosmos-2

    1. Language Is Not All You Need: Aligning Perception with Language Models 2302.14045
    2. Kosmos-2: Grounding Multimodal Large Language Models to the World
  • TaskMatrix.AI

    1. TaskMatrix.AI: Completing Tasks by Connecting Foundation Models with Millions of APIs

Application incl. UI/UX

Edge and Chrome Extension & Plugin

Awesome demo

日本語(Japanese Materials)

Section 9 : Relavant solutions and links

Section 10 : AI Tools

@citation: The best AI Chatbots in 2023.: twitter.com/slow_developer

Acknowledgements

  • @TODO

About

Azure OpenAI, OSS LLM 🌊1. Vector storage and 🦙langchain 🔎2. Azure Search ChatGpt demo 3. Microsoft ♾️Semantic-Kernel with 🌌 Cosmos DB, etc.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 60.7%
  • Python 19.7%
  • TypeScript 5.8%
  • C# 5.1%
  • JavaScript 1.9%
  • Bicep 1.9%
  • Other 4.9%