# Exploring EmbedChain, a powerful LangChain wrapper to build chat bots over any dataset. 
### Embedchain abstracts the entire process of loading a dataset, chunking it, creating embeddings and then storing in a vector database (manual https://docs.embedchain.ai for more details )

## Default configuration
* LLM: OpenAi (gpt-3.5-turbo)
* Embedder: OpenAi (text-embedding-ada-002)
* Database: ChromaDB

In [1]:
from embedchain import App

wizard_bot = App()

In [2]:
wizard_bot.db.count()

173

In [5]:
wizard_bot.query("What is WizardLM?")

"I'm sorry, but I don't have enough information to answer your query."

In [6]:
wizard_bot.add('https://arxiv.org/pdf/2304.12244.pdf')

Successfully saved https://arxiv.org/pdf/2304.12244.pdf (DataType.PDF_FILE). New chunks count: 133


'e37d9bcb93126c400b1b3035d7442856'

In [7]:
wizard_bot.db.count()

133

In [8]:
wizard_bot.query("What is WizardLM?")

'WizardLM is a language model that has been evaluated and compared to other models such as Alpaca and Vicuna. It has been found to achieve better response quality than these models on the automatic evaluation of GPT-4. Additionally, labelers prefer the outputs of WizardLM over outputs from ChatGPT under complex test instructions. While WizardLM performs worse than ChatGPT on the OnEvol-Instruct test set, it outperforms ChatGPT in the high-difficulty section of the Evol-Instruct test set. This suggests that WizardLM has the ability to handle complex instructions effectively.'

In [9]:
wizard_bot.query("How to use WizardLM?")

'The given context does not provide information on how to use WizardLM.'

In [11]:
wizard_bot.add('https://www.youtube.com/watch?v=I6sER-qivYk')

Doc content has not changed. Skipping creating chunks and embeddings


'd785b93aa2aa5ed5f491cfb44ff625c6'

In [12]:
wizard_bot.db.count()

140

In [13]:
wizard_bot.query("What is WizardLM?")

"WizardLM is a new project that aims to enhance large language models, such as OpenAI's chat GPT, by improving their ability to follow complex instructions. It uses a method called evolve instruct to generate instruction data and fine-tune the language model. The goal of WizardLM is to overcome the challenge of creating large amounts of instruction data with varying levels of complexity, which can be time-consuming and labor-intensive for humans."

In [14]:
wizard_bot.query("How to use WizardLM?")

'The video does not provide specific instructions on how to use WizardLM.'

In [15]:
wizard_bot.query("What are specific advantages of WizardLM mentioned it its YouTube video?")

"Specific advantages of WizardLM mentioned in its YouTube video are:\n\n1. Enhancing large language models: WizardLM aims to enhance large language models by improving their ability to follow complex instructions.\n\n2. Generating instruction data: WizardLM uses LMs themselves to generate instruction data, which helps in fine-tuning the LM and improving its performance.\n\n3. Overcoming time-consuming and labor-intensive process: Creating large amounts of instruction data with varying levels of complexity can be time-consuming and labor-intensive for humans. WizardLM's approach of using LMs to generate instruction data helps overcome this challenge.\n\n4. Contextual generation: WizardLM allows for contextual generation, which means it can generate language that is relevant and appropriate to the given context.\n\nThese are some of the specific advantages mentioned in the YouTube video about WizardLM."

In [17]:
wizard_bot.query('How many parameters have been used for training? What versions are available for download?')

'The given context does not provide any information about the number of parameters used for training or the available versions for download.'

In [18]:
wizard_bot.chat("Please prepare 200 words summary of video transcript")

'The video transcript provides information about a research project focused on fine-tuning large language models. The project is not currently available for commercial use, but the team is continuously working towards achieving that goal in the future. The speaker encourages viewers to keep an eye on this revolutionary project. The video aims to inform and educate viewers about the potential impact of fine-tuning different large language models.\n\nIn the 200-word summary, the speaker emphasizes that the information shared is for research purposes only and cannot be used commercially at the moment. However, the project is expected to bring significant advancements in fine-tuning large language models. The speaker encourages viewers to stay updated on the progress of the project, as it has the potential to revolutionize the field. The video concludes with a reminder to subscribe and turn on notifications for future content. Overall, the video provides an informative overview of the rese

In [20]:
wizard_bot.add('https://www.youtube.com/watch?v=SaJ8wyKMBds')

Successfully saved https://www.youtube.com/watch?v=SaJ8wyKMBds (DataType.YOUTUBE_VIDEO). New chunks count: 8


'ac856417ec1defdbb059d4f3b73461d8'

In [21]:
wizard_bot.db.count()

148

In [24]:
wizard_bot.chat("Please prepare 200 words summary about WizardLM. Please do not repeat yourself")

"The video transcript provides an in-depth look at Wizard LM, a new enemy model in the local NM Arena. Wizard LM is touted as one of the best, if not the best, local NNA model currently available. The model is a small 7 billion parameters model that was trained using a technique called evil instruct. This technique utilizes LM's to automatically generate complex instructor options, improving the model's performance. According to the video, Wizard LM has the potential to outperform Chan GPT in certain scenarios. The speaker highlights that Wizard LM is a powerful model capable of following complex instructions, as reported by many online users and the speaker's own off-camera testing.\n\nThe video aims to compare the results of Wizard LM with Vicuna, another 7 billion parameters model, to determine which one is currently the best. The comparison is expected to yield interesting results, given the fair nature of the comparison. The speaker encourages viewers to try out the Wizard LM mode

In [29]:
wizard_bot.query('What was mentioned about 7 billion parameters models variant? Any other varians available?')

'In the video transcript, it is mentioned that Wizard LM is a small 7 billion parameters model. The speaker compares Wizard LM with another 7 billion parameters model called Vicuna to determine which one is currently the best. However, it is not mentioned if there are any other variants of 7 billion parameters models available.'

In [32]:
wizard_bot.chat("May I use it on CPU only? Whether is GPU use essential?")

'The provided context does not mention whether the WizardLM project can be used on CPU only or if GPU use is essential.'

In [33]:
wizard_bot.add('https://github.com/nlpxucan/WizardLM')

Successfully saved https://github.com/nlpxucan/WizardLM (DataType.WEB_PAGE). New chunks count: 25


'a96e81c62557447f876c7d86bb1b4e4b'

In [34]:
wizard_bot.db.count()

173

In [35]:
wizard_bot.chat("May I use it on CPU only? Whether is GPU use essential?")

'The provided context does not mention whether the WizardLM project can be used on CPU only or if GPU use is essential.'

In [37]:
wizard_bot.query('What was mentioned about 7B  model variant? Any other varians available on web-page?')

'The provided context does not mention any specific details about the 7B model variant. It only mentions the release of the WizardLM-70B-V1.0 model. There is no information provided about any other variants of the model on the web page.'

In [43]:
wizard_bot.chat("When was WizardLM-70B-V1.0  was released? What says the comment next to the model name?")

'The provided context does not mention a specific release date for the WizardLM-70B-V1.0 model. Additionally, there is no comment mentioned next to the model name in the given context.'

In [6]:
wizard_bot.add("/home/drphyl/Work_python/GPT-local/working-with-LLMs/Input/brother_printer_reset-counter.pdf", "pdf_file")

Successfully saved /home/drphyl/Work_python/GPT-local/working-with-LLMs/Input/brother_printer_reset-counter.pdf (DataType.PDF_FILE). New chunks count: 3


'0960f85c1fac7c1138db10c400c9cbe9'

In [7]:
wizard_bot.chat("How to reset a Brother printer counter?")

'To reset the counter on a Brother printer, you need to put the printer in maintenance mode. The specific method for entering maintenance mode may vary depending on the model of your printer. However, for most modern Brother printers, you can try the following steps:\n\n1. Ensure that the printer is turned on and ready to print.\n2. Remove the power cable from the printer or turn off the printer if it has a fixed power source.\n3. Press and hold the "Menu" or "Set" button on the printer.\n4. While holding the "Menu" or "Set" button, reconnect the power cable or turn on the printer.\n5. Continue holding the "Menu" or "Set" button until the maintenance mode message appears on the printer\'s display.\n6. Once in maintenance mode, use the numerical pad or arrow keys to navigate through the menu options.\n7. Look for an option related to resetting the purge counter. The specific wording may vary, but it should be something like "Reset Purge," "Purge Count Reset," or "Clear Counter."\n8. Sel

In [11]:
wizard_bot.chat('Explain it shorter, in 50 words. What code to enter?')

'To reset a Brother printer counter, enter maintenance mode and navigate through the menu options to find the code for resetting the counter. The specific code may vary depending on the printer model.'

In [12]:
wizard_bot.chat('I believe you must enter 4 digits to put PURGE counter to zero.')

'Yes, that is correct. In step 2 of the process, you need to enter the code "2783" using the numerical keypad to zero the numbers after "PURGE".'

In [13]:
wizard_bot.chat('How to enter mantenance mode?')

'To enter maintenance mode on a Brother printer, hold down the MENU/SET/START button and quickly press * 2 8 6 4.'