# Experiments

Embedchain is an Open Source RAG Framework that makes it easy to create and deploy AI apps. At its core, Embedchain follows the design principle of being "Conventional but Configurable" to serve both software engineers and machine learning engineers.


Here is a very simple demo about how it work!

Check us out: https://github.com/embedchain/embedchain

First of all we install the dependencies:

In [8]:
# !pip install --upgrade embedchain
# Python version = 3.9
# python -m venv .venv
# !pip install --upgrade -r requirements.txt

Now we import the dependencies:

In [9]:
import os

from embedchain import App


We instantiate the embechain bot. Remember to change the API key with you OpenAI api key.

In [10]:
from enum import Enum 

class ModelProvider(Enum):
    OpenAI: str = "OpenAI"
    Google: str = "Google"

In [11]:

def create_model_config(model_provider: ModelProvider, api_key: str) -> dict:
    """
    Creates a configuration dictionary for the EmbedChain application based on the model provider.

    Args:
        model_provider: The model provider to use (e.g., OpenAI, Google).
        api_key: The model provider API key to use (e.g., OpenAI, Google).

    Returns:
        A dictionary containing the configuration for the EmbedChain application.
    """

    base_config = {
        "vectordb": {
            "provider": "chroma",
            "config": {
                "collection_name": "chat-pdf",
                "allow_reset": True,
            },
        },
        "chunker": {
            "chunk_size": 2000,
            "chunk_overlap": 0,
            "length_function": "len",
        },
    }

    if model_provider == ModelProvider.OpenAI:
        base_config["llm"] = {
            "provider": "openai",
            "config": {
                "model": "gpt-3.5-turbo-1106",
                "temperature": 0.5,
                "max_tokens": 1000,
                "top_p": 1,
                "stream": True,
                "api_key": "{api_key}",  # Placeholder for API key
            },
        }
        base_config["embedder"] = {
            "provider": "openai",
            "config": {"api_key": "{api_key}"},
        }

        os.environ["OPENAI_API_KEY"] = api_key
    else:
        base_config["llm"] = {
            "provider": "google",
            "config": {
                "model": "gemini-1.5-pro-latest", # gemini-pro
                "temperature": 0.1,
                "max_tokens": 2048,
                "top_p": 1,
                "stream": True,
                # "api_key": "{api_key}",  # Placeholder for API key
            },
        }
        base_config["embedder"] = {
            "provider": "google",
            "config": {
                "model": "models/text-embedding-004", # embedding-001
                "task_type": "retrieval_document",
                "title": "Embeddings for SatoshiMentor",
                # "api_key": "{api_key}",  # Placeholder for API key
            },
        }

        os.environ["GOOGLE_API_KEY"] = api_key

    return base_config


In [12]:
# os.environ["OPENAI_API_KEY"] = "sk-xxx"
api_key = "AIzaSyDZAc6cy1U1SxIxKSYB5bMK8VVSr96_-sc"
config = create_model_config(ModelProvider.Google, api_key)
app = App.from_config(config=config)

Now, add different data sources using embedchain's `.add()` method:

In [13]:
# app.add("https://en.wikipedia.org/wiki/Elon_Musk")
# app.add("https://www.forbes.com/profile/elon-musk")

app.add('./TrainingData/block.md')

Inserting batches in chromadb: 100%|██████████| 1/1 [00:02<00:00,  2.12s/it]


'ef2349f9e65b65c1483dfb9a6827e811'

Your bot is ready now. Ask your bot any questions using `.query()` method:

In [14]:
print(app.query("What is a block in Bitcoin with diagram?"))

```
+--------------------+
|---Magic-Number-----| *--4--bytes
+--------------------+
|-----Blocksize------| *--4--bytes
+--------------------+
|----Block-Header----| *--80--bytes
+--------------------+
|----Txn-counter-----| *--VarInt(1-9-bytes)
+--------------------+
|---Transactions-----| *---transaction--list
+--------------------+

```

*  **Magic Number:** This serves as the network identifier. 
*  **Block Size:** This is the size of the block in bytes.
*  **Block Header:** This is used to identify a particular block on a bitcoin network.
*  **Transaction Counter:** This is the number of Bitcoin transactions included in a block. 



## Resources

In [15]:
# https://github.com/embedchain/embedchain
# https://medium.com/@WamiqRaza/how-to-create-virtual-environment-jupyter-kernel-python-6836b50f4bf4
# 
# https://ai.google.dev/gemini-api/docs/models/gemini
# https://github.com/dzstudio/similar-text/tree/master
# https://firebase.google.com/docs/functions/get-started?gen=2nd#python
# https://medium.com/msackiit/what-is-text-similarity-and-how-to-implement-it-c74c8b641883
# https://stackoverflow.com/questions/57882417/is-it-possible-to-use-google-bert-to-calculate-similarity-between-two-textual-do
# 