<a href="https://colab.research.google.com/github/oluwafemidiakhoa/Mindserach/blob/master/AI_Researcher.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
%%capture

%pip install langchain langchain-community
%pip install langchainhub
%pip install langchain-chroma
%pip install langchain-groq
%pip install langchain-huggingface
%pip install gradio

In [None]:
import os
import zipfile
import requests
import langchain
import chromadb
from groq import Groq
from langchain.embeddings import HuggingFaceEmbeddings
import xml.etree.ElementTree as ET
from langchain.document_loaders import TextLoader, DirectoryLoader
from sklearn.datasets import fetch_20newsgroups
from langchain.vectorstores import Chroma
from langchain.document_loaders import CSVLoader, DirectoryLoader
from langchain.prompts import PromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain.schema.runnable import RunnablePassthrough
from langchain_groq import ChatGroq  # Replaced ChatGroq with ChatOpenAI as ChatGroq doesn't exist
import gradio as gr




In [None]:
from google.colab import userdata

groq_api_key = userdata.get('GROQ_API_KEY')

client = Groq(
   api_key=groq_api_key,
)

In [None]:
# Create a directory to store the abstracts
os.makedirs('ai_papers', exist_ok=True)

# List of arXiv paper IDs to download abstracts for
paper_ids = [
    '2301.10045',
    '2301.10046',
    '2301.10047',
    '2301.10048',
    '2301.10049',
    # Add more paper IDs as needed
]

# Function to download abstracts
def download_abstracts(paper_ids):
    for paper_id in paper_ids:
        url = f'https://export.arxiv.org/api/query?id_list={paper_id}'
        response = requests.get(url)
        if response.status_code == 200:
            content = response.text
            # Extract the abstract using XML parsing
            root = ET.fromstring(content)
            summary = root.find('.//{http://www.w3.org/2005/Atom}summary')
            if summary is not None:
                abstract = summary.text.strip()
                # Save the abstract to a text file
                with open(f'ai_papers/{paper_id}.txt', 'w') as f:
                    f.write(abstract)
                print(f'Downloaded abstract for paper {paper_id}')
            else:
                print(f'Abstract not found for paper {paper_id}')
        else:
            print(f'Failed to download paper {paper_id}')

download_abstracts(paper_ids)


Downloaded abstract for paper 2301.10045
Downloaded abstract for paper 2301.10046
Downloaded abstract for paper 2301.10047
Downloaded abstract for paper 2301.10048
Downloaded abstract for paper 2301.10049


In [None]:
loader = DirectoryLoader('ai_papers', glob='*.txt', loader_cls=TextLoader)
data = loader.load()
print(f'Loaded {len(data)} documents.')


Loaded 5 documents.


In [None]:
embed_model = HuggingFaceEmbeddings(model_name='sentence-transformers/all-MiniLM-L6-v2')




In [None]:
vectorstore = Chroma.from_documents(
    documents=data,
    embedding=embed_model,
    persist_directory='/content/ai_papers_vectorstore',
)



In [None]:
retriever = vectorstore.as_retriever()



In [None]:
template = """You are an AI research assistant.
Use the provided context to answer the question.
If you don't know the answer, say so. Provide a detailed explanation.
Do not mention the context in your answer.

Context: {context}

Question: {question}

Answer:"""

rag_prompt = PromptTemplate.from_template(template)


In [None]:
llm = ChatGroq(model="llama-3.1-70b-versatile", api_key=groq_api_key)


In [None]:
rag_chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | rag_prompt
    | llm
    | StrOutputParser()
)



In [None]:
response = rag_chain.invoke("What are the recent advancements in transformer architectures?")
print(response)



There are recent advancements in transformer architectures specifically in the context of video inpainting. One such advancement is the flow-guided transformer (FGT) architecture, which aims to address the issue of query degradation in the multi-head self-attention (MHSA) mechanism. 

The FGT architecture has been further improved to create FGT++, which includes several key advancements. Firstly, a lightweight flow completion network is designed using local aggregation and edge loss. 

Secondly, a flow guidance feature integration module is proposed, which uses motion discrepancy to enhance features, along with a flow-guided feature propagation module that warps features according to the flows. 

Lastly, the transformer is decoupled along the temporal and spatial dimensions, where flows are used to select tokens through a temporally deformable MHSA mechanism, and global tokens are combined with inner-window local tokens through a dual perspective MHSA mechanism. 

These advancements ha

In [None]:
def rag_memory_stream(text):
    partial_text = ""
    for new_text in rag_chain.stream(text):
        partial_text += new_text
        yield partial_text

title = "AI Research Assistant with Groq API and LangChain"

demo = gr.Interface(
    fn=rag_memory_stream,
    inputs="text",
    outputs="text",
    title=title,
    description="Ask questions about recent AI research papers.",
    allow_flagging="never",
)

demo.launch(debug=True)


Setting queue=True in a Colab notebook requires sharing enabled. Setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. This cell will run indefinitely so that you can see errors and logs. To turn off, set debug=False in launch().
Running on public URL: https://35875db43ff8a9334f.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)
