## Introduction
In this tutorial, you can learn about OpenAI Assistant and explore its capabilities further.

## table of content
>- What is OpenAI assistant?
>- Leveraging LangChain to utilize OpenAI Assistant capabilities.
>- Combining OpenAI Assistant and LangChain tool to extend capabilities in the context of OpenAI Assistant.

## What is OpenAI assistant?
The Assistants API enables the creation of AI assistants directly within your applications. These assistants are equipped with instructions and can utilize various models, tools, and knowledge to provide responses to user inquiries. Currently, the Assistants API supports three categories of tools: 
1. Code Interpreter
2. Retrieval
3. Function Calling

For more details follow [LangChain](https://python.langchain.com/docs/modules/agents/agent_types/openai_assistants) and [OpenAI assistants](https://platform.openai.com/docs/assistants/overview)

## Leveraging LangChain to utilize OpenAI Assistant capabilities
Let's craft our own AI assistant!🚀

### Project info
Let's create a research assistant! `ScholarlySphere` is a Document Analysis and Search assistant.
Users can upload PDF or text files containing documents or research papers. `ScholarlySphere` can then extract key information, such as keywords, topics, or main points, and use search engines to find relevant information to answer user questions based on the content of the uploaded files.


In [16]:
# First we need to import some packages
import os
from dotenv import load_dotenv
from langchain.agents.openai_assistant import OpenAIAssistantRunnable # import OpenAIAssistantRunnable from langchain
from langchain.agents import AgentExecutor

from langchain.utilities import DuckDuckGoSearchAPIWrapper
from langchain.tools import DuckDuckGoSearchResults
from openai import OpenAI

In [17]:
# We need to have an openAI API key
load_dotenv()
os.environ['OPENAI_API_KEY'] = os.getenv('OPENAI_API_KEY')

In [18]:
# we need an instruction. An instruction is how the Assistant and model should behave or respond.
instructions = """
    You are a scholarly expert, you have the skill to extract crucial details from academic papers provided 
    for you and utilize online resources to find similar papers. Your expertise enables you
    to generate comprehensive outputs based on user desires.

"""

# What we need in the output?
file_prompt = """
    Write a detailed summary of the input paper, synthesizing its key findings, methodologies, and 
    implications, providing overview for reference and comprehension purposes."""


online_prompt = """
    Give structured JSON file for similar papers that you found contains: paper_title, authors_names, 
    publication_date, link_of_paper and the conference_or_journal_name where it was published.
"""


In [24]:
# filrst we need ti initiate OpenAI client instance
client = OpenAI()

# Upload a file with an "assistants" purpose
file = client.files.create(
  file=open("attention_is_all_you_need.pdf", "rb"),
  purpose='assistants'
)

# Add the file to the assistant
assistant = client.beta.assistants.create(
  name="Document Analysis and Search assistant",
  instructions=instructions,
  model="gpt-4-1106-preview",
  tools=[{"type": "retrieval"}], # 
  file_ids=[file.id]
)

# create a thread
thread = client.beta.threads.create()

message = client.beta.threads.messages.create(
  thread_id=thread.id,
  role="user",
  content=file_prompt,
  file_ids=[file.id]
)


In [28]:
create_run_2 = client.beta.threads.runs.create(
    thread_id=thread.id,
    assistant_id=assistant.id
)

retrieve_run = client.beta.threads.runs.retrieve(
    thread_id=thread.id,
    run_id=create_run_2.id
)

responses = client.beta.threads.messages.list(
    thread_id=thread.id
)

print(responses)

SyncCursorPage[ThreadMessage](data=[ThreadMessage(id='msg_Ntd8pntQfbchrtwBxf5YubmV', assistant_id='asst_ilzVq2Zpu4cY9h4x0zA039BP', content=[MessageContentText(text=Text(annotations=[TextAnnotationFileCitation(end_index=1163, file_citation=TextAnnotationFileCitationFileCitation(file_id='file-xztVmJZF96ktWcJQDNaD2pX2', quote='Abstract\n\n\nThe dominant sequence transduction models are based on complex recurrent or\nconvolutional neural networks that include an encoder and a decoder. The best\nperforming models also connect the encoder and decoder through an attention\nmechanism. We propose a new simple network architecture the Transformer\nbased solely on attention mechanisms dispensing with recurrence and convolutions\nentirely. Experiments on two machine translation tasks show these models to\nbe superior in quality while being more parallelizable and requiring significantly\nless time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-\nto-German translation task improving

In [29]:
print(responses.data)

[ThreadMessage(id='msg_Ntd8pntQfbchrtwBxf5YubmV', assistant_id='asst_ilzVq2Zpu4cY9h4x0zA039BP', content=[MessageContentText(text=Text(annotations=[TextAnnotationFileCitation(end_index=1163, file_citation=TextAnnotationFileCitationFileCitation(file_id='file-xztVmJZF96ktWcJQDNaD2pX2', quote='Abstract\n\n\nThe dominant sequence transduction models are based on complex recurrent or\nconvolutional neural networks that include an encoder and a decoder. The best\nperforming models also connect the encoder and decoder through an attention\nmechanism. We propose a new simple network architecture the Transformer\nbased solely on attention mechanisms dispensing with recurrence and convolutions\nentirely. Experiments on two machine translation tasks show these models to\nbe superior in quality while being more parallelizable and requiring significantly\nless time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-\nto-German translation task improving over the existing best results inc

In [34]:
# To connect our agent to online resources we need some search engine as tools.
# create the search tool based on new sholars
wrapper = DuckDuckGoSearchAPIWrapper(time="m")
tools = [DuckDuckGoSearchResults(api_wrapper=wrapper, source="scholar")]

# create the agent of assistant-openai
agent = OpenAIAssistantRunnable(
    tools=tools,
    assistant_id=assistant.id,
    model="gpt-4-1106-preview",
    as_agent=True
)


In [35]:
agent_executor = AgentExecutor(agent=agent, tools=tools, return_intermediate_steps=True)
output = agent_executor.invoke({"content": online_prompt})

In [36]:
print(output)

{'content': '\n    Give structured JSON file for similar papers that you found contains: paper_title, authors_names, \n    publication_date, link_of_paper and the conference_or_journal_name where it was published.\n', 'output': 'Here is the requested information in JSON format for the similar papers:\n\n```json\n[\n    {\n        "paper_title": "End-to-end memory networks",\n        "authors_names": ["Sainbayar Sukhbaatar", "Arthur Szlam", "Jason Weston", "Rob Fergus"],\n        "publication_date": "2015",\n        "link_of_paper": "https://papers.nips.cc/paper/5846-end-to-end-memory-networks",\n        "conference_or_journal_name": "Advances in Neural Information Processing Systems 28 (NIPS 2015)"\n    },\n    {\n        "paper_title": "Sequence to sequence learning with neural networks",\n        "authors_names": ["Ilya Sutskever", "Oriol Vinyals", "Quoc VV Le"],\n        "publication_date": "2014",\n        "link_of_paper": "https://papers.nips.cc/paper/5346-sequence-to-sequence-lea