## Cookbook for using Azure OpenAI with Embedchain

### Step-1: Install embedchain package

In [22]:
#!pip install embedchain[dataloaders]

### Step-2: Set Azure OpenAI related environment variables

You can find these env variables on your Azure OpenAI dashboard.

In [23]:
import os
from embedchain import Pipeline as App

os.environ["OPENAI_API_TYPE"] = "azure"
os.environ["OPENAI_API_BASE"] = "https://openai-sailvan-eastus2-proxy.valsun.cn/openai"
os.environ["OPENAI_API_KEY"] = "b0fc062d729a4b66af7edd12cab4636f"
os.environ["OPENAI_API_VERSION"] = "2023-05-15"

### Step-3: Define your llm and embedding model config

In [24]:
config = """
llm:
  provider: azure_openai
  config:
    model: gpt-35-turbo-0613
    deployment_name: gpt-35-turbo-0613
    temperature: 0.5
    max_tokens: 1000
    top_p: 1
    stream: false

embedder:
  provider: azure_openai
  config:
    model: text-embedding-ada-002
    deployment_name: text-embedding-ada-002

vectordb:
  provider: elasticsearch
  config:
    collection_name: 'sailvan_vector_db'
    es_url: http://10.199.1.77:9200
    basic_auth:
      - elastic
      - SOLxtbk=mNnpFTaj4SOV
    verify_certs: false
"""

# Write the multi-line string to a YAML file
with open('azure_openai.yaml', 'w') as file:
    file.write(config)

### Step-4 Create embedchain app based on the config

In [25]:
app = App.from_config(config_path="azure_openai.yaml")



Creating index sailvan_vector_db_1536 {'mappings': {'properties': {'text': {'type': 'text'}, 'embeddings': {'type': 'dense_vector', 'index': False, 'dims': 1536}}}}


### Step-5: Add data sources to your app

In [33]:
#app.add("https://baike.baidu.com/item/%E8%B5%9B%E7%BB%B4%E6%97%B6%E4%BB%A3%E7%A7%91%E6%8A%80%E8%82%A1%E4%BB%BD%E6%9C%89%E9%99%90%E5%85%AC%E5%8F%B8/51194114?fr=ge_ala", metadata={"catalog": "company", "dataset": "sailvan"})

Inserting batches in elasticsearch: 100%|██████████| 1/1 [00:00<00:00, 45.73batch/s]

Successfully saved https://baike.baidu.com/item/%E8%B5%9B%E7%BB%B4%E6%97%B6%E4%BB%A3%E7%A7%91%E6%8A%80%E8%82%A1%E4%BB%BD%E6%9C%89%E9%99%90%E5%85%AC%E5%8F%B8/51194114?fr=ge_ala (DataType.WEB_PAGE). New chunks count: 1





'2e8011e1acc0778f7e07faccf4f90425'

### Step-6: All set. Now start asking questions related to your data

In [38]:
# print(app.query("What is the name of the company Elon Musk founded?"))
print(app.query("赛维时代有哪些高管？", where={"metadata.catalog": "company", "metadata.dataset": "sailvan1"}))

# while(True):
#     question = input("Enter question: ")

#     if question in ['q', 'exit', 'quit']:
#         break
#     answer = app.query(question)
#     print(answer)



赛维时代有以下高管：
- 陈文平：董事长、董事、总经理
- 陈文辉：董事
- 陈晓兰：董事
- 王绪成：董事、副总经理
- 王志伟：董事
- 吴亚宏：董事
- 张贞智：董事
- 戴建宏：独立董事
- 郭东：独立董事
- 江百灵：独立董事
- 吴星宇：独立董事
- 艾帆：董事会秘书
- 潘旭东：监事会主席、监事
- 陈永峰：监事
- 帅勇：副总经理
- 林文佳：财务总监
- 蔡丽宏：职工监事
