<a href="https://colab.research.google.com/github/kalai2315/Langchain_for_AgentiAI/blob/main/Document_Retriever_Search_Engine.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Project: Build a Document Retriever Search Engine on Wikipedia Data

## Install OpenAI, and LangChain dependencies

In [None]:
!pip install langchain==0.3.10
!pip install langchain-openai==0.2.12
!pip install langchain-community==0.3.11
!pip install langchain-huggingface==0.1.2
!pip install jq==1.8.0
!pip install pymupdf==1.25.1

Collecting langchain-openai==0.2.12
  Downloading langchain_openai-0.2.12-py3-none-any.whl.metadata (2.7 kB)
Collecting openai<2.0.0,>=1.55.3 (from langchain-openai==0.2.12)
  Downloading openai-1.57.3-py3-none-any.whl.metadata (24 kB)
Collecting tiktoken<1,>=0.7 (from langchain-openai==0.2.12)
  Downloading tiktoken-0.8.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.6 kB)
Downloading langchain_openai-0.2.12-py3-none-any.whl (50 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m50.7/50.7 kB[0m [31m2.3 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading openai-1.57.3-py3-none-any.whl (390 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m390.2/390.2 kB[0m [31m22.2 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading tiktoken-0.8.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.2 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.2/1.2 MB[0m [31m21.9 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling colle

## Install Chroma Vector DB and LangChain wrapper

In [None]:
!pip install langchain-chroma==0.1.4

Collecting langchain-chroma
  Downloading langchain_chroma-0.1.4-py3-none-any.whl.metadata (1.6 kB)
Collecting chromadb!=0.5.4,!=0.5.5,<0.6.0,>=0.4.0 (from langchain-chroma)
  Downloading chromadb-0.5.23-py3-none-any.whl.metadata (6.8 kB)
Collecting fastapi<1,>=0.95.2 (from langchain-chroma)
  Downloading fastapi-0.115.6-py3-none-any.whl.metadata (27 kB)
Collecting build>=1.0.3 (from chromadb!=0.5.4,!=0.5.5,<0.6.0,>=0.4.0->langchain-chroma)
  Downloading build-1.2.2.post1-py3-none-any.whl.metadata (6.5 kB)
Collecting chroma-hnswlib==0.7.6 (from chromadb!=0.5.4,!=0.5.5,<0.6.0,>=0.4.0->langchain-chroma)
  Downloading chroma_hnswlib-0.7.6-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (252 bytes)
Collecting uvicorn>=0.18.3 (from uvicorn[standard]>=0.18.3->chromadb!=0.5.4,!=0.5.5,<0.6.0,>=0.4.0->langchain-chroma)
  Downloading uvicorn-0.32.1-py3-none-any.whl.metadata (6.6 kB)
Collecting posthog>=2.4.0 (from chromadb!=0.5.4,!=0.5.5,<0.6.0,>=0.4.0->langchain-chroma)
  Do

## Enter Open AI API Key

In [None]:
from getpass import getpass

OPENAI_KEY = getpass('Enter Open AI API Key: ')

Enter Open AI API Key: ··········


## Setup Environment Variables

In [None]:
import os

os.environ['OPENAI_API_KEY'] = OPENAI_KEY

### Open AI Embedding Models

LangChain enables us to access Open AI embedding models which include the newest models: a smaller and highly efficient `text-embedding-3-small` model, and a larger and more powerful `text-embedding-3-large` model.

In [None]:
from langchain_openai import OpenAIEmbeddings

# details here: https://openai.com/blog/new-embedding-models-and-api-updates
openai_embed_model = OpenAIEmbeddings(model='text-embedding-3-small')

## Loading and Processing the Data

### Get the dataset

In [None]:
# if you can't download using the following code
# go to https://drive.google.com/file/d/1aZxZejfteVuofISodUrY2CDoyuPLYDGZ download it
# manually upload it on colab
!gdown 1aZxZejfteVuofISodUrY2CDoyuPLYDGZ

Downloading...
From: https://drive.google.com/uc?id=1aZxZejfteVuofISodUrY2CDoyuPLYDGZ
To: /content/rag_docs.zip
  0% 0.00/5.92M [00:00<?, ?B/s]100% 5.92M/5.92M [00:00<00:00, 134MB/s]


In [None]:
!unzip rag_docs.zip

Archive:  rag_docs.zip
   creating: rag_docs/
  inflating: rag_docs/attention_paper.pdf  
  inflating: rag_docs/cnn_paper.pdf  
  inflating: rag_docs/resnet_paper.pdf  
  inflating: rag_docs/vision_transformer.pdf  
  inflating: rag_docs/wikidata_rag_demo.jsonl  


### Load JSON Documents from Wikipedia Dump

In [None]:
from langchain.document_loaders import JSONLoader

loader = JSONLoader(file_path='./rag_docs/wikidata_rag_demo.jsonl',
                    jq_schema='.',
                    text_content=False,
                    json_lines=True)
wiki_docs = loader.load()

In [None]:
len(wiki_docs)

1801

In [None]:
wiki_docs[1500]

Document(metadata={'source': '/content/rag_docs/wikidata_rag_demo.jsonl', 'seq_num': 1501}, page_content='{"id": "460169", "title": "Janne Persson", "paragraphs": ["Jan Persson (\\"Janne Lucas\\"), born 3 October 1947 in Gothenburg\'s Gamlestad Parish in Gothenburg, Sweden is a Swedish pianist and singer, scoring several chart successes in Sweden during the 1970s and 1980s. Janne Lucas participated at Melodifestivalen 1980 with the song \\"V\\u00e4xeln hall\\u00e5\\", winning the contest. The upcoming year he participated with the song \\"Rocky Mountain\\" ending up third.", "For many years, Janne Lucas also acted as pianist for \\"Vi i femman\\"", "Janne also accompanied the vocal group \\"Noviserna\\" for a while, where Anna-Lisa Cederquist participated."]}')

In [None]:
import json
from langchain.docstore.document import Document
wiki_docs_processed = []

for doc in wiki_docs:
    doc = json.loads(doc.page_content)
    metadata = {
        "title": doc['title'],
        "id": doc['id'],
        "source": "Wikipedia"
    }
    data = ' '.join(doc['paragraphs'])
    wiki_docs_processed.append(Document(page_content=data, metadata=metadata))

In [None]:
wiki_docs_processed[1500]

Document(metadata={'title': 'Janne Persson', 'id': '460169', 'source': 'Wikipedia'}, page_content='Jan Persson ("Janne Lucas"), born 3 October 1947 in Gothenburg\'s Gamlestad Parish in Gothenburg, Sweden is a Swedish pianist and singer, scoring several chart successes in Sweden during the 1970s and 1980s. Janne Lucas participated at Melodifestivalen 1980 with the song "Växeln hallå", winning the contest. The upcoming year he participated with the song "Rocky Mountain" ending up third. For many years, Janne Lucas also acted as pianist for "Vi i femman" Janne also accompanied the vocal group "Noviserna" for a while, where Anna-Lisa Cederquist participated.')

### Create function to generate contextual summaries for chunks

Here we borrow inspiration from Anthropic's [contextual retrieval](https://www.anthropic.com/news/contextual-retrieval) strategy which involves create a contextual summary for each chunk and adding it to the chunk before storing in the vector database.

![](https://i.imgur.com/cjnB831.png)

In [None]:
# load PDF files with langchain
from langchain.document_loaders import PyMuPDFLoader

loader = PyMuPDFLoader("./rag_docs/attention_paper.pdf")
doc_pages = loader.load()

In [None]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(chunk_size=3500,
                                          chunk_overlap=0)
doc_chunks = splitter.split_documents(doc_pages)

In [None]:
len(doc_chunks)

16

In [None]:
# the actual research paper
big_doc = '\n'.join([doc.page_content for doc in doc_chunks])

In [None]:
len(big_doc.split(' '))

5050

In [None]:
from langchain_openai import ChatOpenAI

chatgpt = ChatOpenAI(model_name="gpt-4o-mini", temperature=0)

In [None]:
# create a chat prompt
from langchain.prompts import ChatPromptTemplate
from langchain.schema import StrOutputParser


def generate_chunk_context(document, chunk):

    chunk_process_prompt = """You are an AI assistant specializing in research paper analysis.
                            Your task is to provide brief, relevant context for a chunk of text
                            based on the following research paper.

                            Here is the research paper:
                            <paper>
                            {paper}
                            </paper>

                            Here is the chunk we want to situate within the whole document:
                            <chunk>
                            {chunk}
                            </chunk>

                            Provide a concise context (3-4 sentences max) for this chunk,
                            considering the following guidelines:

                            - Give a short succinct context to situate this chunk within the overall document
                            for the purposes of improving search retrieval of the chunk.
                            - Answer only with the succinct context and nothing else.
                            - Context should be mentioned like 'Focuses on ....'
                            do not mention 'this chunk or section focuses on...'

                            Context:
                        """

    prompt_template = ChatPromptTemplate.from_template(chunk_process_prompt)

    agentic_chunk_chain = (prompt_template
                                |
                            chatgpt
                                |
                            StrOutputParser())

    context = agentic_chunk_chain.invoke({'paper': document, 'chunk': chunk})

    return context

In [None]:
print(doc_chunks[5].page_content)

output values. These are concatenated and once again projected, resulting in the final values, as
depicted in Figure 2.
Multi-head attention allows the model to jointly attend to information from different representation
subspaces at different positions. With a single attention head, averaging inhibits this.
MultiHead(Q, K, V ) = Concat(head1, ..., headh)W O
where headi = Attention(QW Q
i , KW K
i , V W V
i )
Where the projections are parameter matrices W Q
i
∈Rdmodel×dk, W K
i
∈Rdmodel×dk, W V
i
∈Rdmodel×dv
and W O ∈Rhdv×dmodel.
In this work we employ h = 8 parallel attention layers, or heads. For each of these we use
dk = dv = dmodel/h = 64. Due to the reduced dimension of each head, the total computational cost
is similar to that of single-head attention with full dimensionality.
3.2.3
Applications of Attention in our Model
The Transformer uses multi-head attention in three different ways:
• In "encoder-decoder attention" layers, the queries come from the previous decoder layer,
and

In [None]:
generate_chunk_context(big_doc, doc_chunks[5].page_content)

'Describes the implementation of multi-head attention in the Transformer model, detailing how multiple attention heads allow the model to capture diverse information from different representation subspaces. It explains the mathematical formulation of multi-head attention and its application in both encoder-decoder and self-attention layers, emphasizing the importance of maintaining the auto-regressive property in the decoder. Additionally, it introduces the position-wise feed-forward networks that complement the attention mechanisms.'

### Load and Process PDF Documents

In [None]:
from langchain.document_loaders import PyMuPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

def create_contextual_chunks(file_path):

    print('Loading pages:', file_path)
    loader = PyMuPDFLoader(file_path)
    doc_pages = loader.load()

    print('Chunking pages:', file_path)
    splitter = RecursiveCharacterTextSplitter(chunk_size=3500,
                                              chunk_overlap=0)
    doc_chunks = splitter.split_documents(doc_pages)

    print('Generating contextual chunks:', file_path)
    original_doc = '\n'.join([doc.page_content for doc in doc_chunks])
    contextual_chunks = []
    for chunk in doc_chunks:
        context = generate_chunk_context(original_doc, chunk.page_content)
        contextual_chunks.append(Document(page_content=context+'\n'+chunk.page_content,
                                          metadata=chunk.metadata))
    print('Finished processing:', file_path)
    print()
    return contextual_chunks

In [None]:
from glob import glob

pdf_files = glob('./rag_docs/*.pdf')
pdf_files

['./rag_docs/cnn_paper.pdf',
 './rag_docs/attention_paper.pdf',
 './rag_docs/resnet_paper.pdf',
 './rag_docs/vision_transformer.pdf']

In [None]:
paper_docs = []
for fp in pdf_files:
    paper_docs.extend(create_contextual_chunks(fp))

Loading pages: ./rag_docs/cnn_paper.pdf
Chunking pages: ./rag_docs/cnn_paper.pdf
Generating contextual chunks: ./rag_docs/cnn_paper.pdf
Finished processing: ./rag_docs/cnn_paper.pdf

Loading pages: ./rag_docs/attention_paper.pdf
Chunking pages: ./rag_docs/attention_paper.pdf
Generating contextual chunks: ./rag_docs/attention_paper.pdf
Finished processing: ./rag_docs/attention_paper.pdf

Loading pages: ./rag_docs/resnet_paper.pdf
Chunking pages: ./rag_docs/resnet_paper.pdf
Generating contextual chunks: ./rag_docs/resnet_paper.pdf
Finished processing: ./rag_docs/resnet_paper.pdf

Loading pages: ./rag_docs/vision_transformer.pdf
Chunking pages: ./rag_docs/vision_transformer.pdf
Generating contextual chunks: ./rag_docs/vision_transformer.pdf
Finished processing: ./rag_docs/vision_transformer.pdf



In [None]:
len(paper_docs)

79

In [None]:
paper_docs[0]

Document(metadata={'source': './rag_docs/cnn_paper.pdf', 'file_path': './rag_docs/cnn_paper.pdf', 'page': 0, 'total_pages': 11, 'format': 'PDF 1.5', 'title': '', 'author': '', 'subject': '', 'keywords': '', 'creator': 'LaTeX with hyperref package', 'producer': 'pdfTeX-1.40.12', 'creationDate': 'D:20151203014807Z', 'modDate': 'D:20151203014807Z', 'trapped': ''}, page_content='Focuses on the introduction of Convolutional Neural Networks (CNNs) within the broader field of Artificial Neural Networks (ANNs), highlighting their significance in image-driven pattern recognition tasks. It outlines the foundational concepts of ANNs, their architecture, and the evolution of machine learning techniques, setting the stage for a deeper exploration of CNNs and their applications.\nAn Introduction to Convolutional Neural Networks\nKeiron O’Shea1 and Ryan Nash2\n1 Department of Computer Science, Aberystwyth University, Ceredigion, SY23 3DB\nkeo7@aber.ac.uk\n2 School of Computing and Communications, Lan

In [None]:
len(wiki_docs_processed)

1801

In [None]:
total_docs = wiki_docs_processed + paper_docs
len(total_docs)

1880

## Vector Databases

One of the most common ways to store and search over unstructured data is to embed it and store the resulting embedding vectors, and then at query time to embed the unstructured query and retrieve the embedding vectors that are 'most similar' to the embedded query. A vector database takes care of storing embedded data and performing vector search for you.

### Chroma Vector DB

[Chroma](https://docs.trychroma.com/getting-started) is a AI-native open-source vector database focused on developer productivity and happiness. Chroma is licensed under Apache 2.0.

### Create a Vector DB and persist on disk

Here we initialize a connection to a Chroma vector DB client, and also we want to save to disk, so we simply initialize the Chroma client and pass the directory where we want the data to be saved to.

In [None]:
from langchain_chroma import Chroma

# create vector DB of docs and embeddings - takes < 30s on Colab
chroma_db = Chroma.from_documents(documents=total_docs,
                                  collection_name='my_db',
                                  embedding=openai_embed_model,
                                  # need to set the distance function to cosine else it uses euclidean by default
                                  # check https://docs.trychroma.com/guides#changing-the-distance-function
                                  collection_metadata={"hnsw:space": "cosine"},
                                  persist_directory="./my_db")

### Load Vector DB from disk

This is just to show once you have a vector database on disk you can just load and create a connection to it anytime

In [None]:
# load from disk
chroma_db = Chroma(persist_directory="./my_db",
                   collection_name='my_db',
                   embedding_function=openai_embed_model)

In [None]:
chroma_db

<langchain_chroma.vectorstores.Chroma at 0x7a31d82555d0>

## Experiment with Vector Database Retrievers

Here we will explore the following retrieval strategies on our Vector Database:

- Similarity or Ranking based Retrieval
- Multi Query Retrieval
- Contextual Compression Retrieval
- Chained Retrieval Pipeline

### Similarity or Ranking based Retrieval

We use cosine similarity here and retrieve the top 5 similar documents based on the user input query

In [None]:
similarity_retriever = chroma_db.as_retriever(search_type="similarity",
                                              search_kwargs={"k": 5})

In [None]:
from IPython.display import display, Markdown

def display_docs(docs):
    for doc in docs:
        print('Metadata:', doc.metadata)
        print('Content Brief:')
        display(Markdown(doc.page_content[:1000]))
        print()

In [None]:
query = "what is machine learning?"
top_docs = similarity_retriever.invoke(query)
display_docs(top_docs)

Metadata: {'id': '564928', 'source': 'Wikipedia', 'title': 'Machine learning'}
Content Brief:


Machine learning gives computers the ability to learn without being explicitly programmed (Arthur Samuel, 1959). It is a subfield of computer science. The idea came from work in artificial intelligence. Machine learning explores the study and construction of algorithms which can learn and make predictions on data. Such algorithms follow programmed instructions, but can also make predictions or decisions based on data. They build a model from sample inputs. Machine learning is done where designing and programming explicit algorithms cannot be done. Examples include spam filtering, detection of network intruders or malicious insiders working towards a data breach, optical character recognition (OCR), search engines and computer vision.


Metadata: {'id': '359370', 'source': 'Wikipedia', 'title': 'Supervised learning'}
Content Brief:


In machine learning, supervised learning is the task of inferring a function from labelled training data. The results of the training are known beforehand, the system simply learns how to get to these results correctly. Usually, such systems work with vectors. They get the training data and the result of the training as two vectors and produce a "classifier". Usually, the system uses inductive reasoning to generalize the training data.


Metadata: {'id': '663523', 'source': 'Wikipedia', 'title': 'Deep learning'}
Content Brief:


Deep learning (also called deep structured learning or hierarchical learning) is a kind of machine learning, which is mostly used with certain kinds of neural networks. As with other kinds of machine-learning, learning sessions can be unsupervised, semi-supervised, or supervised. In many cases, structures are organised so that there is at least one intermediate layer (or hidden layer), between the input layer and the output layer. Certain tasks, such as as recognizing and understanding speech, images or handwriting, is easy to do for humans. However, for a computer, these tasks are very difficult to do. In a multi-layer neural network (having more than two layers), the information processed will become more abstract with each added layer. Deep learning models are inspired by information processing and communication patterns in biological nervous systems; they are different from the structural and functional properties of biological brains (especially the human brain) in many ways, whic


Metadata: {'id': '6360', 'source': 'Wikipedia', 'title': 'Artificial intelligence'}
Content Brief:


Artificial intelligence (AI) is the ability of a computer program or a machine to think and learn. It is also a field of study which tries to make computers "smart". They work on their own without being encoded with commands. John McCarthy came up with the name "Artificial Intelligence" in 1955. In general use, the term "artificial intelligence" means a programme which mimics human cognition. At least some of the things we associate with other minds, such as learning and problem solving can be done by computers, though not in the same way as we do. Andreas Kaplan and Michael Haenlein define AI as a system’s ability to correctly interpret external data, to learn from such data, and to use those learnings to achieve specific goals and tasks through flexible adaptation. An ideal (perfect) intelligent machine is a flexible agent which perceives its environment and takes actions to maximize its chance of success at some goal or objective. As machines become increasingly capable, mental facu


Metadata: {'id': '44742', 'source': 'Wikipedia', 'title': 'Artificial neural network'}
Content Brief:


A neural network (also called an ANN or an artificial neural network) is a sort of computer software, inspired by biological neurons. Biological brains are capable of solving difficult problems, but each neuron is only responsible for solving a very small part of the problem. Similarly, a neural network is made up of cells that work together to produce a desired result, although each individual cell is only responsible for solving a small part of the problem. This is one method for creating artificially intelligent programs. Neural networks are an example of machine learning, where a program can change as it learns to solve a problem. A neural network can be trained and improved with each example, but the larger the neural network, the more examples it needs to perform well—often needing millions or billions of examples in the case of deep learning. There are two ways to think of a neural network. First is like a human brain. Second is like a mathematical equation.




In [None]:
query = "what is ML?"
top_docs = similarity_retriever.invoke(query)
display_docs(top_docs)

Metadata: {'id': '312307', 'source': 'Wikipedia', 'title': 'Standard ML'}
Content Brief:


Standard ML is a functional programming language which is a dialect of ML (programming language). It is sometimes used for writing compilers and in theorem provers. Here is an example of a factorial function written in a simple, non-tail recursive, style.


Metadata: {'id': '564928', 'source': 'Wikipedia', 'title': 'Machine learning'}
Content Brief:


Machine learning gives computers the ability to learn without being explicitly programmed (Arthur Samuel, 1959). It is a subfield of computer science. The idea came from work in artificial intelligence. Machine learning explores the study and construction of algorithms which can learn and make predictions on data. Such algorithms follow programmed instructions, but can also make predictions or decisions based on data. They build a model from sample inputs. Machine learning is done where designing and programming explicit algorithms cannot be done. Examples include spam filtering, detection of network intruders or malicious insiders working towards a data breach, optical character recognition (OCR), search engines and computer vision.


Metadata: {'id': '15798', 'source': 'Wikipedia', 'title': 'Major League Baseball'}
Content Brief:


Major League Baseball (MLB) is a professional baseball league in North America. It is often considered to be the highest level of professional baseball in the world. There are two leagues that make up the MLB: the American League, also called AL, and National League, also called NL. There are currently 30 teams in the MLB, 29 from the United States and one from Canada, the Toronto Blue Jays. The official website of MLB is known as "MLB.com" (www.mlb.com). The 30 teams in MLB are divided into two leagues: American and National. Each league is divided into three divisions: East, Central, West. Since the 2013 season, each division has had five teams. The most recent change took place after the 2012 season, when the Houston Astros moved from the NL Central to the AL West.


Metadata: {'id': '196959', 'source': 'Wikipedia', 'title': 'Mathematical Reviews'}
Content Brief:


Mathematical Reviews is a journal and online database published by the American Mathematical Society that contains many articles in mathematics, statistics, and related topics.


Metadata: {'id': '757418', 'source': 'Wikipedia', 'title': 'VRML'}
Content Brief:


VRML (Virtual Reality Modeling Language, pronounced "vermal", or by its initials, known before 1995 as Virtual Reality Markup Language) is a standard 3-dimensional (3D) interactive vector graphics file format designed for the World Wide Web. It has been succeeded by X3D. VRML uses text files. The vertices, edges, surface colors, UV-mapped textures, shininess, transparency and more of a 3D polygon can be specified. Graphical components can be made to fetch web pages or other VRML files from the Internet from URLs when the user clicks on the graphical component. Animations, sounds, lighting, and other things about the virtual world can interact with the user or can happen when external events say so, such as timers. A special Script Node allows program code (such as program code in Java or ECMAScript) to be added to a VRML file. VRML files are commonly called "worlds" and have the .wrl extension (for example, a VRML file can be called island.wrl). VRML files are in plain text and usually




In [None]:
query = "what is the difference between transformers and vision transformers?"
top_docs = similarity_retriever.invoke(query)
display_docs(top_docs)

Metadata: {'author': '', 'creationDate': 'D:20210604001958Z', 'creator': 'LaTeX with hyperref', 'file_path': './rag_docs/vision_transformer.pdf', 'format': 'PDF 1.5', 'keywords': '', 'modDate': 'D:20210604001958Z', 'page': 7, 'producer': 'pdfTeX-1.40.21', 'source': './rag_docs/vision_transformer.pdf', 'subject': '', 'title': '', 'total_pages': 22, 'trapped': ''}
Content Brief:


Focuses on a controlled scaling study of various models, including Vision Transformers and ResNets, evaluating their transfer performance from the JFT-300M dataset. It highlights the performance versus pre-training cost, revealing that Vision Transformers generally outperform ResNets in terms of efficiency and scalability, while also discussing the implications for future model scaling efforts.
Published as a conference paper at ICLR 2021
4.4
SCALING STUDY
We perform a controlled scaling study of different models by evaluating transfer performance from
JFT-300M. In this setting data size does not bottleneck the models’ performances, and we assess
performance versus pre-training cost of each model. The model set includes: 7 ResNets, R50x1,
R50x2 R101x1, R152x1, R152x2, pre-trained for 7 epochs, plus R152x2 and R200x3 pre-trained
for 14 epochs; 6 Vision Transformers, ViT-B/32, B/16, L/32, L/16, pre-trained for 7 epochs, plus
L/16 and H/14 pre-trained for 14 epochs; and 5 hybrids, R50+ViT


Metadata: {'author': '', 'creationDate': 'D:20210604001958Z', 'creator': 'LaTeX with hyperref', 'file_path': './rag_docs/vision_transformer.pdf', 'format': 'PDF 1.5', 'keywords': '', 'modDate': 'D:20210604001958Z', 'page': 0, 'producer': 'pdfTeX-1.40.21', 'source': './rag_docs/vision_transformer.pdf', 'subject': '', 'title': '', 'total_pages': 22, 'trapped': ''}
Content Brief:


Focuses on the introduction of the Vision Transformer (ViT) model, which applies a standard Transformer architecture directly to image classification tasks by treating image patches as tokens. It highlights the limitations of traditional convolutional networks in computer vision and presents evidence that a pure Transformer can achieve competitive performance on various image recognition benchmarks when pre-trained on large datasets.
Published as a conference paper at ICLR 2021
AN IMAGE IS WORTH 16X16 WORDS:
TRANSFORMERS FOR IMAGE RECOGNITION AT SCALE
Alexey Dosovitskiy∗,†, Lucas Beyer∗, Alexander Kolesnikov∗, Dirk Weissenborn∗,
Xiaohua Zhai∗, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer,
Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby∗,†
∗equal technical contribution, †equal advising
Google Research, Brain Team
{adosovitskiy, neilhoulsby}@google.com
ABSTRACT
While the Transformer architecture has become the de-facto standard for natural
language processing tasks


Metadata: {'author': '', 'creationDate': 'D:20210604001958Z', 'creator': 'LaTeX with hyperref', 'file_path': './rag_docs/vision_transformer.pdf', 'format': 'PDF 1.5', 'keywords': '', 'modDate': 'D:20210604001958Z', 'page': 2, 'producer': 'pdfTeX-1.40.21', 'source': './rag_docs/vision_transformer.pdf', 'subject': '', 'title': '', 'total_pages': 22, 'trapped': ''}
Content Brief:


Focuses on the architecture and methodology of the Vision Transformer (ViT), detailing how images are processed by splitting them into patches, embedding them, and utilizing a standard Transformer encoder for image classification tasks. It describes the model's design principles, including the use of position embeddings and the integration of a classification token, while referencing foundational work in Transformer architecture.
Published as a conference paper at ICLR 2021
Transformer Encoder
MLP 
Head
Vision Transformer (ViT)
*
Linear Projection of Flattened Patches
* Extra learnable
     [ cl ass]  embedding
1
2
3
4
5
6
7
8
9
0
Patch + Position 
Embedding
Class
Bird
Ball
Car
...
Embedded 
Patches
Multi-Head 
Attention
Norm
MLP
Norm
+
L x
+
Transformer Encoder
Figure 1: Model overview. We split an image into ﬁxed-size patches, linearly embed each of them,
add position embeddings, and feed the resulting sequence of vectors to a standard Transformer
encoder. In order to perform classiﬁ


Metadata: {'author': '', 'creationDate': 'D:20210604001958Z', 'creator': 'LaTeX with hyperref', 'file_path': './rag_docs/vision_transformer.pdf', 'format': 'PDF 1.5', 'keywords': '', 'modDate': 'D:20210604001958Z', 'page': 7, 'producer': 'pdfTeX-1.40.21', 'source': './rag_docs/vision_transformer.pdf', 'subject': '', 'title': '', 'total_pages': 22, 'trapped': ''}
Content Brief:


Focuses on the behavior of attention mechanisms in the Vision Transformer (ViT), highlighting how attention distances vary across layers and the implications of localized attention in hybrid models that incorporate convolutional networks. It also discusses the relationship between attention distance and network depth, emphasizing the model's ability to attend to semantically relevant image regions for classification tasks.
have consistently small attention distances in the low layers. This highly localized attention is
less pronounced in hybrid models that apply a ResNet before the Transformer (Figure 7, right),
suggesting that it may serve a similar function as early convolutional layers in CNNs. Further, the
attention distance increases with network depth. Globally, we ﬁnd that the model attends to image
regions that are semantically relevant for classiﬁcation (Figure 6).
4.6
SELF-SUPERVISION
Transformers show impressive performance on NLP tasks. However, much of their success stems



Metadata: {'author': '', 'creationDate': 'D:20210604001958Z', 'creator': 'LaTeX with hyperref', 'file_path': './rag_docs/vision_transformer.pdf', 'format': 'PDF 1.5', 'keywords': '', 'modDate': 'D:20210604001958Z', 'page': 1, 'producer': 'pdfTeX-1.40.21', 'source': './rag_docs/vision_transformer.pdf', 'subject': '', 'title': '', 'total_pages': 22, 'trapped': ''}
Content Brief:


Focuses on the performance of the Vision Transformer (ViT) in comparison to convolutional neural networks (CNNs), highlighting how large-scale training on extensive datasets enhances its generalization capabilities. It discusses the results achieved by ViT when pre-trained on datasets like ImageNet-21k and JFT-300M, demonstrating its competitive accuracy on various image recognition benchmarks. Additionally, it references related work that explores the integration of self-attention mechanisms in image processing.
Published as a conference paper at ICLR 2021
inherent to CNNs, such as translation equivariance and locality, and therefore do not generalize well
when trained on insufﬁcient amounts of data.
However, the picture changes if the models are trained on larger datasets (14M-300M images). We
ﬁnd that large scale training trumps inductive bias. Our Vision Transformer (ViT) attains excellent
results when pre-trained at sufﬁcient scale and transferred to tasks with fewer datapoints. W




In [None]:
query = "what is a cnn?"
top_docs = similarity_retriever.invoke(query)
display_docs(top_docs)

Metadata: {'id': '3615', 'source': 'Wikipedia', 'title': 'CNN'}
Content Brief:


The Cable News Network (CNN) is an American cable news television channel. It was founded in 1980 by Ted Turner. The Cable News Network first aired on television on June 1, 1980. The Cable News Network's first newscast was anchored (hosted) by David Walker and his wife Lois Hart. In its first year CNN hired many political analysts, including Rowland Evans and Robert Novak. On January 1, 1982 CNN launched a 24-hour sister newscast channel with no talk shows or commentary shows called CNN2. CNN broadcasts programs from its headquarters at the CNN Center in Atlanta, or from the Time Warner Center in New York City, or from studios in Washington, D.C., and Los Angeles. CNN is owned by Time Warner, and the U.S. news channel is a part of the Turner Broadcasting System. The hosts of its opinion shows are Don Lemon, Chris Cuomo, Fredricka Whitfield, Erin Burnett, Brianna Keiler and Brooke Baldwin. CNN has been criticized by the right-wing Media Research Center for having a left-wing bias. Accor


Metadata: {'author': '', 'creationDate': 'D:20151203014807Z', 'creator': 'LaTeX with hyperref package', 'file_path': './rag_docs/cnn_paper.pdf', 'format': 'PDF 1.5', 'keywords': '', 'modDate': 'D:20151203014807Z', 'page': 8, 'producer': 'pdfTeX-1.40.12', 'source': './rag_docs/cnn_paper.pdf', 'subject': '', 'title': '', 'total_pages': 11, 'trapped': ''}
Content Brief:


Focuses on the architectural design of Convolutional Neural Networks (CNNs), specifically the practice of stacking multiple convolutional layers before pooling layers to enhance feature extraction. It discusses the benefits of using smaller convolutional layers to manage computational complexity and memory allocation, while also addressing the importance of input dimensionality and zero-padding in CNN configurations.
Introduction to Convolutional Neural Networks
9
Another common CNN architecture is to stack two convolutional layers before
each pooling layer, as illustrated in Figure 5. This is strongly recommended as
stacking multiple convolutional layers allows for more complex features of the
input vector to be selected.
input
convolution w/ ReLu
pooling
convolution
w/ ReLu
pooling
fully-connected
w/ ReLu
fully-connected
convolution w/ ReLu
pooling
0
9
output 
...
Fig. 5: A common form of CNN architecture in which convolutional layers are
stacked between ReLus continuously before bei


Metadata: {'author': '', 'creationDate': 'D:20151203014807Z', 'creator': 'LaTeX with hyperref package', 'file_path': './rag_docs/cnn_paper.pdf', 'format': 'PDF 1.5', 'keywords': '', 'modDate': 'D:20151203014807Z', 'page': 3, 'producer': 'pdfTeX-1.40.12', 'source': './rag_docs/cnn_paper.pdf', 'subject': '', 'title': '', 'total_pages': 11, 'trapped': ''}
Content Brief:


Focuses on the structural differences between Convolutional Neural Networks (CNNs) and traditional Artificial Neural Networks (ANNs), emphasizing the three-dimensional organization of neurons in CNNs. It outlines the significance of spatial dimensions and depth in the input volume, and introduces the three main types of layers that constitute a CNN architecture, including convolutional, pooling, and fully-connected layers.
4
Keiron O’Shea et al.
One of the key differences is that the neurons that the layers within the CNN
are comprised of neurons organised into three dimensions, the spatial dimen-
sionality of the input (height and the width) and the depth. The depth does not
refer to the total number of layers within the ANN, but the third dimension of a
activation volume. Unlike standard ANNS, the neurons within any given layer
will only connect to a small region of the layer preceding it.
In practice this would mean that for the example given earlier, the input ’vol-
ume’ will have 


Metadata: {'author': '', 'creationDate': 'D:20151203014807Z', 'creator': 'LaTeX with hyperref package', 'file_path': './rag_docs/cnn_paper.pdf', 'format': 'PDF 1.5', 'keywords': '', 'modDate': 'D:20151203014807Z', 'page': 0, 'producer': 'pdfTeX-1.40.12', 'source': './rag_docs/cnn_paper.pdf', 'subject': '', 'title': '', 'total_pages': 11, 'trapped': ''}
Content Brief:


Focuses on the introduction of Convolutional Neural Networks (CNNs) within the broader field of Artificial Neural Networks (ANNs), highlighting their significance in image-driven pattern recognition tasks. It outlines the foundational concepts of ANNs, their architecture, and the evolution of machine learning techniques, setting the stage for a deeper exploration of CNNs and their applications.
An Introduction to Convolutional Neural Networks
Keiron O’Shea1 and Ryan Nash2
1 Department of Computer Science, Aberystwyth University, Ceredigion, SY23 3DB
keo7@aber.ac.uk
2 School of Computing and Communications, Lancaster University, Lancashire, LA1
4YW
nashrd@live.lancs.ac.uk
Abstract. The ﬁeld of machine learning has taken a dramatic twist in re-
cent times, with the rise of the Artiﬁcial Neural Network (ANN). These
biologically inspired computational models are able to far exceed the per-
formance of previous forms of artiﬁcial intelligence in common machine
learning tasks. One of the mos


Metadata: {'author': '', 'creationDate': 'D:20151203014807Z', 'creator': 'LaTeX with hyperref package', 'file_path': './rag_docs/cnn_paper.pdf', 'format': 'PDF 1.5', 'keywords': '', 'modDate': 'D:20151203014807Z', 'page': 1, 'producer': 'pdfTeX-1.40.12', 'source': './rag_docs/cnn_paper.pdf', 'subject': '', 'title': '', 'total_pages': 11, 'trapped': ''}
Content Brief:


Focuses on the foundational concepts of artificial neural networks (ANNs), including their structure and learning paradigms, specifically supervised and unsupervised learning. It highlights the similarities between convolutional neural networks (CNNs) and traditional ANNs, emphasizing the unique application of CNNs in image pattern recognition and the encoding of image-specific features within their architecture.
2
Keiron O’Shea et al.
Input 1
Input 2
Input 3
Input 4
Input Layer
Hidden Layer
Output Layer
Output
Fig. 1: A simple three layered feedforward neural network (FNN), comprised
of a input layer, a hidden layer and an output layer. This structure is the basis
of a number of common ANN architectures, included but not limited to Feed-
forward Neural Networks (FNN), Restricted Boltzmann Machines (RBMs) and
Recurrent Neural Networks (RNNs).
The two key learning paradigms in image processing tasks are supervised and
unsupervised learning. Supervised learning is learning through pre-la




In [None]:
query = "what is deep learning?"
top_docs = similarity_retriever.invoke(query)
display_docs(top_docs)

Metadata: {'id': '663523', 'source': 'Wikipedia', 'title': 'Deep learning'}
Content Brief:


Deep learning (also called deep structured learning or hierarchical learning) is a kind of machine learning, which is mostly used with certain kinds of neural networks. As with other kinds of machine-learning, learning sessions can be unsupervised, semi-supervised, or supervised. In many cases, structures are organised so that there is at least one intermediate layer (or hidden layer), between the input layer and the output layer. Certain tasks, such as as recognizing and understanding speech, images or handwriting, is easy to do for humans. However, for a computer, these tasks are very difficult to do. In a multi-layer neural network (having more than two layers), the information processed will become more abstract with each added layer. Deep learning models are inspired by information processing and communication patterns in biological nervous systems; they are different from the structural and functional properties of biological brains (especially the human brain) in many ways, whic


Metadata: {'author': '', 'creationDate': 'D:20151211011345Z', 'creator': 'LaTeX with hyperref package', 'file_path': './rag_docs/resnet_paper.pdf', 'format': 'PDF 1.5', 'keywords': '', 'modDate': 'D:20151211011345Z', 'page': 0, 'producer': 'pdfTeX-1.40.12', 'source': './rag_docs/resnet_paper.pdf', 'subject': '', 'title': '', 'total_pages': 12, 'trapped': ''}
Content Brief:


Focuses on the introduction of deep residual learning as a framework to facilitate the training of significantly deeper neural networks, addressing challenges such as vanishing gradients and degradation of accuracy. It highlights the empirical success of residual networks on the ImageNet dataset and their application in various visual recognition tasks, including object detection and segmentation, leading to top performances in major competitions.
Deep Residual Learning for Image Recognition
Kaiming He
Xiangyu Zhang
Shaoqing Ren
Jian Sun
Microsoft Research
{kahe, v-xiangz, v-shren, jiansun}@microsoft.com
Abstract
Deeper neural networks are more difﬁcult to train. We
present a residual learning framework to ease the training
of networks that are substantially deeper than those used
previously. We explicitly reformulate the layers as learn-
ing residual functions with reference to the layer inputs, in-
stead of learning unreferenced functions. We provide com-
prehensive empirical evidenc


Metadata: {'author': '', 'creationDate': 'D:20151211011345Z', 'creator': 'LaTeX with hyperref package', 'file_path': './rag_docs/resnet_paper.pdf', 'format': 'PDF 1.5', 'keywords': '', 'modDate': 'D:20151211011345Z', 'page': 1, 'producer': 'pdfTeX-1.40.12', 'source': './rag_docs/resnet_paper.pdf', 'subject': '', 'title': '', 'total_pages': 12, 'trapped': ''}
Content Brief:


Focuses on the introduction of the deep residual learning framework to address the degradation problem in training deep neural networks. It explains the concept of residual mapping, the use of shortcut connections, and presents empirical evidence demonstrating the effectiveness of residual networks in achieving higher accuracy and easier optimization compared to traditional plain networks. Additionally, it highlights the success of these methods in various image recognition tasks, including the ImageNet competition.
identity
weight layer
weight layer
relu
relu
F(x)+x
x
F(x)
x
Figure 2. Residual learning: a building block.
are comparably good or better than the constructed solution
(or unable to do so in feasible time).
In this paper, we address the degradation problem by
introducing a deep residual learning framework.
In-
stead of hoping each few stacked layers directly ﬁt a
desired underlying mapping, we explicitly let these lay-
ers ﬁt a residual mapping. Formally, denoting the des


Metadata: {'author': '', 'creationDate': 'D:20151211011345Z', 'creator': 'LaTeX with hyperref package', 'file_path': './rag_docs/resnet_paper.pdf', 'format': 'PDF 1.5', 'keywords': '', 'modDate': 'D:20151211011345Z', 'page': 0, 'producer': 'pdfTeX-1.40.12', 'source': './rag_docs/resnet_paper.pdf', 'subject': '', 'title': '', 'total_pages': 12, 'trapped': ''}
Content Brief:


Focuses on the challenges associated with training deeper neural networks, specifically addressing the degradation problem where increasing the number of layers leads to higher training errors. It discusses the theoretical expectation that deeper models should not perform worse than shallower ones if constructed correctly, yet empirical evidence shows that current optimization methods struggle to achieve this.
more layers to a suitably deep model leads to higher train-
ing error, as reported in [11, 42] and thoroughly veriﬁed by
our experiments. Fig. 1 shows a typical example.
The degradation (of training accuracy) indicates that not
all systems are similarly easy to optimize. Let us consider a
shallower architecture and its deeper counterpart that adds
more layers onto it. There exists a solution by construction
to the deeper model: the added layers are identity mapping,
and the other layers are copied from the learned shallower
model. The existence of this constructed solution indica


Metadata: {'author': '', 'creationDate': 'D:20151211011345Z', 'creator': 'LaTeX with hyperref package', 'file_path': './rag_docs/resnet_paper.pdf', 'format': 'PDF 1.5', 'keywords': '', 'modDate': 'D:20151211011345Z', 'page': 8, 'producer': 'pdfTeX-1.40.12', 'source': './rag_docs/resnet_paper.pdf', 'subject': '', 'title': '', 'total_pages': 12, 'trapped': ''}
Content Brief:


Provides a list of references that support various concepts and methodologies discussed throughout the paper, including advancements in deep learning techniques, object detection frameworks, and the performance of convolutional neural networks. These references are integral to understanding the theoretical foundations and empirical results presented in the context of deep residual learning and its applications in image recognition tasks.
[28] G. Mont´ufar, R. Pascanu, K. Cho, and Y. Bengio. On the number of
linear regions of deep neural networks. In NIPS, 2014.
[29] V. Nair and G. E. Hinton. Rectiﬁed linear units improve restricted
boltzmann machines. In ICML, 2010.
[30] F. Perronnin and C. Dance. Fisher kernels on visual vocabularies for
image categorization. In CVPR, 2007.
[31] T. Raiko, H. Valpola, and Y. LeCun. Deep learning made easier by
linear transformations in perceptrons. In AISTATS, 2012.
[32] S. Ren, K. He, R. Girshick, and J. Sun. Faster R-CNN: Towards
real-time object det




In [None]:
query = "what is nlp?"
top_docs = similarity_retriever.invoke(query)
display_docs(top_docs)

Metadata: {'id': '335464', 'source': 'Wikipedia', 'title': 'Neurolinguistic programming'}
Content Brief:


Neurolinguistic programming is a way of communicating, created in the 1970s. It is often shortened to "NLP". The discipline assumes there is a link between neurological processes, language and behavior. According to NLP, it is possible to achieve certain goals in life by changing one's behaviour. Certain neuroscientists psychologists and linguists, believe that NLP is unsupported by current scientific evidence and that it uses incorrect and misleading terms and concepts. NLP was invented by Richard Bandler and John Grinder. According to these people, NLP can help solve problems such as phobias, depression, habit disorder, psychosomatic illnesses, and learning disorders.


Metadata: {'id': '40613', 'source': 'Wikipedia', 'title': 'Natural language processing'}
Content Brief:


Natural Language Processing (NLP) is a field in Artificial Intelligence, and is also related to linguistics. On a high level, the goal of NLP is to program computers to automatically understand human languages, and also to automatically write/speak in human languages. We say "Natural Language" to mean human language, and to indicate that we are not talking about computer (programming) languages.


Metadata: {'id': '669662', 'source': 'Wikipedia', 'title': 'Loop AI Labs'}
Content Brief:


Loop AI Labs is an AI and cognitive computing company that focuses on language understanding technology. The company was founded in San Francisco in 2012 by Italian entrepreneur Gianmauro Calafiore, who sold his company Gsmbox to in 2004 and then relocated from Italy to San Francisco. Wanting to start an artificial intelligence company, he recruited two veterans of the project, the largest government-funded AI project in history, who had worked on the project at and Stanford University's . The original company name, "Soshoma", was changed to Loop AI Labs in 2015 after the company decided to change its focus from consumer-oriented to enterprise. Loop AI Labs is headquartered in San Francisco, California, with offices in New York, Milan, and Singapore. The company is privately funded. On May 4, 2017, Loop AI Labs entered into a deal with , a leading European provider of mobile messaging and solutions, to bring their cognitive computing technology to LINK's business clients, which cover 2


Metadata: {'id': '44742', 'source': 'Wikipedia', 'title': 'Artificial neural network'}
Content Brief:


A neural network (also called an ANN or an artificial neural network) is a sort of computer software, inspired by biological neurons. Biological brains are capable of solving difficult problems, but each neuron is only responsible for solving a very small part of the problem. Similarly, a neural network is made up of cells that work together to produce a desired result, although each individual cell is only responsible for solving a small part of the problem. This is one method for creating artificially intelligent programs. Neural networks are an example of machine learning, where a program can change as it learns to solve a problem. A neural network can be trained and improved with each example, but the larger the neural network, the more examples it needs to perform well—often needing millions or billions of examples in the case of deep learning. There are two ways to think of a neural network. First is like a human brain. Second is like a mathematical equation.


Metadata: {'id': '663523', 'source': 'Wikipedia', 'title': 'Deep learning'}
Content Brief:


Deep learning (also called deep structured learning or hierarchical learning) is a kind of machine learning, which is mostly used with certain kinds of neural networks. As with other kinds of machine-learning, learning sessions can be unsupervised, semi-supervised, or supervised. In many cases, structures are organised so that there is at least one intermediate layer (or hidden layer), between the input layer and the output layer. Certain tasks, such as as recognizing and understanding speech, images or handwriting, is easy to do for humans. However, for a computer, these tasks are very difficult to do. In a multi-layer neural network (having more than two layers), the information processed will become more abstract with each added layer. Deep learning models are inspired by information processing and communication patterns in biological nervous systems; they are different from the structural and functional properties of biological brains (especially the human brain) in many ways, whic




### Multi Query Retrieval

Retrieval may produce different results with subtle changes in query wording, or if the embeddings do not capture the semantics of the data well. Prompt engineering / tuning is sometimes done to manually address these problems, but can be tedious.

The [`MultiQueryRetriever`](https://api.python.langchain.com/en/latest/retrievers/langchain.retrievers.multi_query.MultiQueryRetriever.html) automates the process of prompt tuning by using an LLM to generate multiple queries from different perspectives for a given user input query. For each query, it retrieves a set of relevant documents and takes the unique union across all queries to get a larger set of potentially relevant documents.

In [None]:
from langchain_openai import ChatOpenAI

chatgpt = ChatOpenAI(model_name="gpt-4o-mini", temperature=0)

In [None]:
from langchain.retrievers.multi_query import MultiQueryRetriever
# Set logging for the queries
import logging

similarity_retriever = chroma_db.as_retriever(search_type="similarity",
                                              search_kwargs={"k": 5})

mq_retriever = MultiQueryRetriever.from_llm(
    retriever=similarity_retriever, llm=chatgpt
)

logging.basicConfig()
# so we can see what queries are generated by the LLM
logging.getLogger("langchain.retrievers.multi_query").setLevel(logging.INFO)

In [None]:
query = "what is a cnn?"
top_docs = mq_retriever.invoke(query)
display_docs(top_docs)

INFO:langchain.retrievers.multi_query:Generated queries: ['What does CNN stand for and what are its main functions?  ', 'Can you explain the concept and applications of a convolutional neural network?  ', 'What are the key features and uses of CNNs in machine learning?']


Metadata: {'id': '3615', 'source': 'Wikipedia', 'title': 'CNN'}
Content Brief:


The Cable News Network (CNN) is an American cable news television channel. It was founded in 1980 by Ted Turner. The Cable News Network first aired on television on June 1, 1980. The Cable News Network's first newscast was anchored (hosted) by David Walker and his wife Lois Hart. In its first year CNN hired many political analysts, including Rowland Evans and Robert Novak. On January 1, 1982 CNN launched a 24-hour sister newscast channel with no talk shows or commentary shows called CNN2. CNN broadcasts programs from its headquarters at the CNN Center in Atlanta, or from the Time Warner Center in New York City, or from studios in Washington, D.C., and Los Angeles. CNN is owned by Time Warner, and the U.S. news channel is a part of the Turner Broadcasting System. The hosts of its opinion shows are Don Lemon, Chris Cuomo, Fredricka Whitfield, Erin Burnett, Brianna Keiler and Brooke Baldwin. CNN has been criticized by the right-wing Media Research Center for having a left-wing bias. Accor


Metadata: {'author': '', 'creationDate': 'D:20151203014807Z', 'creator': 'LaTeX with hyperref package', 'file_path': './rag_docs/cnn_paper.pdf', 'format': 'PDF 1.5', 'keywords': '', 'modDate': 'D:20151203014807Z', 'page': 3, 'producer': 'pdfTeX-1.40.12', 'source': './rag_docs/cnn_paper.pdf', 'subject': '', 'title': '', 'total_pages': 11, 'trapped': ''}
Content Brief:


Focuses on the structural differences between Convolutional Neural Networks (CNNs) and traditional Artificial Neural Networks (ANNs), emphasizing the three-dimensional organization of neurons in CNNs. It outlines the significance of spatial dimensions and depth in the input volume, and introduces the three main types of layers that constitute a CNN architecture, including convolutional, pooling, and fully-connected layers.
4
Keiron O’Shea et al.
One of the key differences is that the neurons that the layers within the CNN
are comprised of neurons organised into three dimensions, the spatial dimen-
sionality of the input (height and the width) and the depth. The depth does not
refer to the total number of layers within the ANN, but the third dimension of a
activation volume. Unlike standard ANNS, the neurons within any given layer
will only connect to a small region of the layer preceding it.
In practice this would mean that for the example given earlier, the input ’vol-
ume’ will have 


Metadata: {'author': '', 'creationDate': 'D:20151203014807Z', 'creator': 'LaTeX with hyperref package', 'file_path': './rag_docs/cnn_paper.pdf', 'format': 'PDF 1.5', 'keywords': '', 'modDate': 'D:20151203014807Z', 'page': 8, 'producer': 'pdfTeX-1.40.12', 'source': './rag_docs/cnn_paper.pdf', 'subject': '', 'title': '', 'total_pages': 11, 'trapped': ''}
Content Brief:


Focuses on the architectural design of Convolutional Neural Networks (CNNs), specifically the practice of stacking multiple convolutional layers before pooling layers to enhance feature extraction. It discusses the benefits of using smaller convolutional layers to manage computational complexity and memory allocation, while also addressing the importance of input dimensionality and zero-padding in CNN configurations.
Introduction to Convolutional Neural Networks
9
Another common CNN architecture is to stack two convolutional layers before
each pooling layer, as illustrated in Figure 5. This is strongly recommended as
stacking multiple convolutional layers allows for more complex features of the
input vector to be selected.
input
convolution w/ ReLu
pooling
convolution
w/ ReLu
pooling
fully-connected
w/ ReLu
fully-connected
convolution w/ ReLu
pooling
0
9
output 
...
Fig. 5: A common form of CNN architecture in which convolutional layers are
stacked between ReLus continuously before bei


Metadata: {'author': '', 'creationDate': 'D:20151203014807Z', 'creator': 'LaTeX with hyperref package', 'file_path': './rag_docs/cnn_paper.pdf', 'format': 'PDF 1.5', 'keywords': '', 'modDate': 'D:20151203014807Z', 'page': 4, 'producer': 'pdfTeX-1.40.12', 'source': './rag_docs/cnn_paper.pdf', 'subject': '', 'title': '', 'total_pages': 11, 'trapped': ''}
Content Brief:


Focuses on the architecture and functionality of Convolutional Neural Networks (CNNs), detailing the roles of activation functions, pooling layers, and fully-connected layers in transforming input data into class scores for classification and regression tasks. It emphasizes the importance of understanding individual layers and their hyperparameters for effective model creation and optimization.
Introduction to Convolutional Neural Networks
5
an ’elementwise’ activation function such as sigmoid to the output of the
activation produced by the previous layer.
3. The pooling layer will then simply perform downsampling along the spa-
tial dimensionality of the given input, further reducing the number of pa-
rameters within that activation.
4. The fully-connected layers will then perform the same duties found in
standard ANNs and attempt to produce class scores from the activations,
to be used for classiﬁcation. It is also suggested that ReLu may be used
between these layers, as to improve p


Metadata: {'author': '', 'creationDate': 'D:20151203014807Z', 'creator': 'LaTeX with hyperref package', 'file_path': './rag_docs/cnn_paper.pdf', 'format': 'PDF 1.5', 'keywords': '', 'modDate': 'D:20151203014807Z', 'page': 0, 'producer': 'pdfTeX-1.40.12', 'source': './rag_docs/cnn_paper.pdf', 'subject': '', 'title': '', 'total_pages': 11, 'trapped': ''}
Content Brief:


Focuses on the introduction of Convolutional Neural Networks (CNNs) within the broader field of Artificial Neural Networks (ANNs), highlighting their significance in image-driven pattern recognition tasks. It outlines the foundational concepts of ANNs, their architecture, and the evolution of machine learning techniques, setting the stage for a deeper exploration of CNNs and their applications.
An Introduction to Convolutional Neural Networks
Keiron O’Shea1 and Ryan Nash2
1 Department of Computer Science, Aberystwyth University, Ceredigion, SY23 3DB
keo7@aber.ac.uk
2 School of Computing and Communications, Lancaster University, Lancashire, LA1
4YW
nashrd@live.lancs.ac.uk
Abstract. The ﬁeld of machine learning has taken a dramatic twist in re-
cent times, with the rise of the Artiﬁcial Neural Network (ANN). These
biologically inspired computational models are able to far exceed the per-
formance of previous forms of artiﬁcial intelligence in common machine
learning tasks. One of the mos


Metadata: {'author': '', 'creationDate': 'D:20151203014807Z', 'creator': 'LaTeX with hyperref package', 'file_path': './rag_docs/cnn_paper.pdf', 'format': 'PDF 1.5', 'keywords': '', 'modDate': 'D:20151203014807Z', 'page': 5, 'producer': 'pdfTeX-1.40.12', 'source': './rag_docs/cnn_paper.pdf', 'subject': '', 'title': '', 'total_pages': 11, 'trapped': ''}
Content Brief:


Focuses on the function and significance of kernels in convolutional layers of Convolutional Neural Networks (CNNs), detailing how these kernels operate on input data to produce activation maps. It explains the concept of receptive fields and the reduction of model complexity through hyperparameters such as depth, stride, and zero-padding.
6
Keiron O’Shea et al.
These kernels are usually small in spatial dimensionality, but spreads along the
entirety of the depth of the input. When the data hits a convolutional layer,
the layer convolves each ﬁlter across the spatial dimensionality of the input to
produce a 2D activation map. These activation maps can be visualised, as seen
in Figure 3.
As we glide through the input, the scalar product is calculated for each value in
that kernel. (Figure 4) From this the network will learn kernels that ’ﬁre’ when
they see a speciﬁc feature at a given spatial position of the input. These are
commonly known as activations.
0
0
0
1
0
2
0
1
1
4
0
0
0
0
0
0


Metadata: {'author': '', 'creationDate': 'D:20151203014807Z', 'creator': 'LaTeX with hyperref package', 'file_path': './rag_docs/cnn_paper.pdf', 'format': 'PDF 1.5', 'keywords': '', 'modDate': 'D:20151203014807Z', 'page': 10, 'producer': 'pdfTeX-1.40.12', 'source': './rag_docs/cnn_paper.pdf', 'subject': '', 'title': '', 'total_pages': 11, 'trapped': ''}
Content Brief:


Focuses on the references cited in the research paper, highlighting significant contributions to the field of convolutional neural networks (CNNs) and their applications in image processing, character recognition, and object detection. These references provide foundational knowledge and advancements that support the development and understanding of CNN architectures discussed throughout the document.
Introduction to Convolutional Neural Networks
11
4. Cires¸an, D.C., Meier, U., Gambardella, L.M., Schmidhuber, J.: Convolutional neural
network committees for handwritten character classiﬁcation. In: Document Analysis
and Recognition (ICDAR), 2011 International Conference on. pp. 1135–1139. IEEE
(2011)
5. Egmont-Petersen, M., de Ridder, D., Handels, H.: Image processing with neural net-
worksa review. Pattern recognition 35(10), 2279–2301 (2002)
6. Farabet, C., Martini, B., Akselrod, P., Talay, S., LeCun, Y., Culurciello, E.: Hardware
accelerated convolutional neural networks for synthetic


Metadata: {'author': '', 'creationDate': 'D:20151203014807Z', 'creator': 'LaTeX with hyperref package', 'file_path': './rag_docs/cnn_paper.pdf', 'format': 'PDF 1.5', 'keywords': '', 'modDate': 'D:20151203014807Z', 'page': 1, 'producer': 'pdfTeX-1.40.12', 'source': './rag_docs/cnn_paper.pdf', 'subject': '', 'title': '', 'total_pages': 11, 'trapped': ''}
Content Brief:


Focuses on the foundational concepts of artificial neural networks (ANNs), including their structure and learning paradigms, specifically supervised and unsupervised learning. It highlights the similarities between convolutional neural networks (CNNs) and traditional ANNs, emphasizing the unique application of CNNs in image pattern recognition and the encoding of image-specific features within their architecture.
2
Keiron O’Shea et al.
Input 1
Input 2
Input 3
Input 4
Input Layer
Hidden Layer
Output Layer
Output
Fig. 1: A simple three layered feedforward neural network (FNN), comprised
of a input layer, a hidden layer and an output layer. This structure is the basis
of a number of common ANN architectures, included but not limited to Feed-
forward Neural Networks (FNN), Restricted Boltzmann Machines (RBMs) and
Recurrent Neural Networks (RNNs).
The two key learning paradigms in image processing tasks are supervised and
unsupervised learning. Supervised learning is learning through pre-la


Metadata: {'author': '', 'creationDate': 'D:20151203014807Z', 'creator': 'LaTeX with hyperref package', 'file_path': './rag_docs/cnn_paper.pdf', 'format': 'PDF 1.5', 'keywords': '', 'modDate': 'D:20151203014807Z', 'page': 2, 'producer': 'pdfTeX-1.40.12', 'source': './rag_docs/cnn_paper.pdf', 'subject': '', 'title': '', 'total_pages': 11, 'trapped': ''}
Content Brief:


Focuses on the limitations of traditional artificial neural networks (ANNs) in handling image data, particularly regarding computational complexity and the risk of overfitting. It discusses the implications of increasing input dimensionality on the number of weights and the overall architecture of neural networks, emphasizing the need for reduced complexity to enhance predictive performance. Additionally, it introduces the architecture of convolutional neural networks (CNNs) as a solution tailored for image processing tasks.
Introduction to Convolutional Neural Networks
3
more suited for image-focused tasks - whilst further reducing the parameters
required to set up the model.
One of the largest limitations of traditional forms of ANN is that they tend to
struggle with the computational complexity required to compute image data.
Common machine learning benchmarking datasets such as the MNIST database
of handwritten digits are suitable for most forms of ANN, due to its relatively
small 




In [None]:
query = "what is nlp?"
top_docs = mq_retriever.invoke(query)
display_docs(top_docs)

INFO:langchain.retrievers.multi_query:Generated queries: ['What does NLP stand for and what are its main applications?  ', 'Can you explain the concept of natural language processing and its significance?  ', 'What are the key techniques and technologies used in NLP?']


Metadata: {'id': '40613', 'source': 'Wikipedia', 'title': 'Natural language processing'}
Content Brief:


Natural Language Processing (NLP) is a field in Artificial Intelligence, and is also related to linguistics. On a high level, the goal of NLP is to program computers to automatically understand human languages, and also to automatically write/speak in human languages. We say "Natural Language" to mean human language, and to indicate that we are not talking about computer (programming) languages.


Metadata: {'id': '335464', 'source': 'Wikipedia', 'title': 'Neurolinguistic programming'}
Content Brief:


Neurolinguistic programming is a way of communicating, created in the 1970s. It is often shortened to "NLP". The discipline assumes there is a link between neurological processes, language and behavior. According to NLP, it is possible to achieve certain goals in life by changing one's behaviour. Certain neuroscientists psychologists and linguists, believe that NLP is unsupported by current scientific evidence and that it uses incorrect and misleading terms and concepts. NLP was invented by Richard Bandler and John Grinder. According to these people, NLP can help solve problems such as phobias, depression, habit disorder, psychosomatic illnesses, and learning disorders.


Metadata: {'id': '669662', 'source': 'Wikipedia', 'title': 'Loop AI Labs'}
Content Brief:


Loop AI Labs is an AI and cognitive computing company that focuses on language understanding technology. The company was founded in San Francisco in 2012 by Italian entrepreneur Gianmauro Calafiore, who sold his company Gsmbox to in 2004 and then relocated from Italy to San Francisco. Wanting to start an artificial intelligence company, he recruited two veterans of the project, the largest government-funded AI project in history, who had worked on the project at and Stanford University's . The original company name, "Soshoma", was changed to Loop AI Labs in 2015 after the company decided to change its focus from consumer-oriented to enterprise. Loop AI Labs is headquartered in San Francisco, California, with offices in New York, Milan, and Singapore. The company is privately funded. On May 4, 2017, Loop AI Labs entered into a deal with , a leading European provider of mobile messaging and solutions, to bring their cognitive computing technology to LINK's business clients, which cover 2


Metadata: {'author': '', 'creationDate': 'D:20230803000729Z', 'creator': 'LaTeX with hyperref', 'file_path': './rag_docs/attention_paper.pdf', 'format': 'PDF 1.5', 'keywords': '', 'modDate': 'D:20230803000729Z', 'page': 11, 'producer': 'pdfTeX-1.40.25', 'source': './rag_docs/attention_paper.pdf', 'subject': '', 'title': '', 'total_pages': 15, 'trapped': ''}
Content Brief:


Focuses on the references cited in the research paper, which include foundational works in computational linguistics, parsing, attention mechanisms, and neural machine translation. These references support the development and evaluation of the Transformer model, highlighting its contributions to various natural language processing tasks.
[25] Mitchell P Marcus, Mary Ann Marcinkiewicz, and Beatrice Santorini. Building a large annotated
corpus of english: The penn treebank. Computational linguistics, 19(2):313–330, 1993.
[26] David McClosky, Eugene Charniak, and Mark Johnson. Effective self-training for parsing. In
Proceedings of the Human Language Technology Conference of the NAACL, Main Conference,
pages 152–159. ACL, June 2006.
[27] Ankur Parikh, Oscar Täckström, Dipanjan Das, and Jakob Uszkoreit. A decomposable attention
model. In Empirical Methods in Natural Language Processing, 2016.
[28] Romain Paulus, Caiming Xiong, and Richard Socher. A deep reinforced model for abstractive
summ


Metadata: {'id': '44742', 'source': 'Wikipedia', 'title': 'Artificial neural network'}
Content Brief:


A neural network (also called an ANN or an artificial neural network) is a sort of computer software, inspired by biological neurons. Biological brains are capable of solving difficult problems, but each neuron is only responsible for solving a very small part of the problem. Similarly, a neural network is made up of cells that work together to produce a desired result, although each individual cell is only responsible for solving a small part of the problem. This is one method for creating artificially intelligent programs. Neural networks are an example of machine learning, where a program can change as it learns to solve a problem. A neural network can be trained and improved with each example, but the larger the neural network, the more examples it needs to perform well—often needing millions or billions of examples in the case of deep learning. There are two ways to think of a neural network. First is like a human brain. Second is like a mathematical equation.


Metadata: {'id': '52820', 'source': 'Wikipedia', 'title': 'Signal processing'}
Content Brief:


Signal processing is the analysis, interpretation and manipulation of signals. Signals of interest include sound, images, biological signals such as ECG, radar signals, and many others. Processing of such signals includes storage and reconstruction, separation of information from noise (e.g., aircraft identification by radar), compression (e.g., image compression), and feature extraction (e.g., converting text to speech). For analog signals, signal processing may involve the amplification and filtering of audio signals for audio equipment or the modulation and demodulation of signals for telecommunication. For digital signals, signal processing may involve the compression, error checking and error detection of digital signals.


Metadata: {'author': '', 'creationDate': 'D:20230803000729Z', 'creator': 'LaTeX with hyperref', 'file_path': './rag_docs/attention_paper.pdf', 'format': 'PDF 1.5', 'keywords': '', 'modDate': 'D:20230803000729Z', 'page': 1, 'producer': 'pdfTeX-1.40.25', 'source': './rag_docs/attention_paper.pdf', 'subject': '', 'title': '', 'total_pages': 15, 'trapped': ''}
Content Brief:


Focuses on the limitations of recurrent neural networks and the advantages of attention mechanisms in sequence modeling and transduction tasks. It introduces the Transformer architecture, which eliminates recurrence and leverages attention to enhance parallelization and improve translation quality. The discussion highlights the challenges of sequential computation and the need for more efficient models in natural language processing.
1
Introduction
Recurrent neural networks, long short-term memory [13] and gated recurrent [7] neural networks
in particular, have been firmly established as state of the art approaches in sequence modeling and
transduction problems such as language modeling and machine translation [35, 2, 5]. Numerous
efforts have since continued to push the boundaries of recurrent language models and encoder-decoder
architectures [38, 24, 15].
Recurrent models typically factor computation along the symbol positions of the input and output
sequences. Aligning the positions


Metadata: {'author': '', 'creationDate': 'D:20230803000729Z', 'creator': 'LaTeX with hyperref', 'file_path': './rag_docs/attention_paper.pdf', 'format': 'PDF 1.5', 'keywords': '', 'modDate': 'D:20230803000729Z', 'page': 10, 'producer': 'pdfTeX-1.40.25', 'source': './rag_docs/attention_paper.pdf', 'subject': '', 'title': '', 'total_pages': 15, 'trapped': ''}
Content Brief:


Focuses on the references cited in the research paper, which include foundational works on recurrent neural networks, attention mechanisms, and various neural architectures relevant to sequence transduction and machine translation. These references support the development and evaluation of the Transformer model, highlighting its innovations and comparisons to existing methodologies in the field.
[5] Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Fethi Bougares, Holger Schwenk,
and Yoshua Bengio. Learning phrase representations using rnn encoder-decoder for statistical
machine translation. CoRR, abs/1406.1078, 2014.
[6] Francois Chollet. Xception: Deep learning with depthwise separable convolutions. arXiv
preprint arXiv:1610.02357, 2016.
[7] Junyoung Chung, Çaglar Gülçehre, Kyunghyun Cho, and Yoshua Bengio. Empirical evaluation
of gated recurrent neural networks on sequence modeling. CoRR, abs/1412.3555, 2014.
[8] Chris Dyer, Adhiguna Kuncoro, Miguel Ballesteros, and Noah A. Smit


Metadata: {'id': '663523', 'source': 'Wikipedia', 'title': 'Deep learning'}
Content Brief:


Deep learning (also called deep structured learning or hierarchical learning) is a kind of machine learning, which is mostly used with certain kinds of neural networks. As with other kinds of machine-learning, learning sessions can be unsupervised, semi-supervised, or supervised. In many cases, structures are organised so that there is at least one intermediate layer (or hidden layer), between the input layer and the output layer. Certain tasks, such as as recognizing and understanding speech, images or handwriting, is easy to do for humans. However, for a computer, these tasks are very difficult to do. In a multi-layer neural network (having more than two layers), the information processed will become more abstract with each added layer. Deep learning models are inspired by information processing and communication patterns in biological nervous systems; they are different from the structural and functional properties of biological brains (especially the human brain) in many ways, whic




In [None]:
query = "what is ML?"
top_docs = mq_retriever.invoke(query)
display_docs(top_docs)

INFO:langchain.retrievers.multi_query:Generated queries: ['What does machine learning (ML) refer to?  ', 'Can you explain the concept of machine learning?  ', 'What are the key principles and applications of ML?']


Metadata: {'id': '564928', 'source': 'Wikipedia', 'title': 'Machine learning'}
Content Brief:


Machine learning gives computers the ability to learn without being explicitly programmed (Arthur Samuel, 1959). It is a subfield of computer science. The idea came from work in artificial intelligence. Machine learning explores the study and construction of algorithms which can learn and make predictions on data. Such algorithms follow programmed instructions, but can also make predictions or decisions based on data. They build a model from sample inputs. Machine learning is done where designing and programming explicit algorithms cannot be done. Examples include spam filtering, detection of network intruders or malicious insiders working towards a data breach, optical character recognition (OCR), search engines and computer vision.


Metadata: {'id': '359370', 'source': 'Wikipedia', 'title': 'Supervised learning'}
Content Brief:


In machine learning, supervised learning is the task of inferring a function from labelled training data. The results of the training are known beforehand, the system simply learns how to get to these results correctly. Usually, such systems work with vectors. They get the training data and the result of the training as two vectors and produce a "classifier". Usually, the system uses inductive reasoning to generalize the training data.


Metadata: {'id': '663523', 'source': 'Wikipedia', 'title': 'Deep learning'}
Content Brief:


Deep learning (also called deep structured learning or hierarchical learning) is a kind of machine learning, which is mostly used with certain kinds of neural networks. As with other kinds of machine-learning, learning sessions can be unsupervised, semi-supervised, or supervised. In many cases, structures are organised so that there is at least one intermediate layer (or hidden layer), between the input layer and the output layer. Certain tasks, such as as recognizing and understanding speech, images or handwriting, is easy to do for humans. However, for a computer, these tasks are very difficult to do. In a multi-layer neural network (having more than two layers), the information processed will become more abstract with each added layer. Deep learning models are inspired by information processing and communication patterns in biological nervous systems; they are different from the structural and functional properties of biological brains (especially the human brain) in many ways, whic


Metadata: {'id': '6360', 'source': 'Wikipedia', 'title': 'Artificial intelligence'}
Content Brief:


Artificial intelligence (AI) is the ability of a computer program or a machine to think and learn. It is also a field of study which tries to make computers "smart". They work on their own without being encoded with commands. John McCarthy came up with the name "Artificial Intelligence" in 1955. In general use, the term "artificial intelligence" means a programme which mimics human cognition. At least some of the things we associate with other minds, such as learning and problem solving can be done by computers, though not in the same way as we do. Andreas Kaplan and Michael Haenlein define AI as a system’s ability to correctly interpret external data, to learn from such data, and to use those learnings to achieve specific goals and tasks through flexible adaptation. An ideal (perfect) intelligent machine is a flexible agent which perceives its environment and takes actions to maximize its chance of success at some goal or objective. As machines become increasingly capable, mental facu


Metadata: {'id': '312307', 'source': 'Wikipedia', 'title': 'Standard ML'}
Content Brief:


Standard ML is a functional programming language which is a dialect of ML (programming language). It is sometimes used for writing compilers and in theorem provers. Here is an example of a factorial function written in a simple, non-tail recursive, style.


Metadata: {'id': '44742', 'source': 'Wikipedia', 'title': 'Artificial neural network'}
Content Brief:


A neural network (also called an ANN or an artificial neural network) is a sort of computer software, inspired by biological neurons. Biological brains are capable of solving difficult problems, but each neuron is only responsible for solving a very small part of the problem. Similarly, a neural network is made up of cells that work together to produce a desired result, although each individual cell is only responsible for solving a small part of the problem. This is one method for creating artificially intelligent programs. Neural networks are an example of machine learning, where a program can change as it learns to solve a problem. A neural network can be trained and improved with each example, but the larger the neural network, the more examples it needs to perform well—often needing millions or billions of examples in the case of deep learning. There are two ways to think of a neural network. First is like a human brain. Second is like a mathematical equation.


Metadata: {'author': '', 'creationDate': 'D:20151211011345Z', 'creator': 'LaTeX with hyperref package', 'file_path': './rag_docs/resnet_paper.pdf', 'format': 'PDF 1.5', 'keywords': '', 'modDate': 'D:20151211011345Z', 'page': 8, 'producer': 'pdfTeX-1.40.12', 'source': './rag_docs/resnet_paper.pdf', 'subject': '', 'title': '', 'total_pages': 12, 'trapped': ''}
Content Brief:


Provides a list of references that support various concepts and methodologies discussed throughout the paper, including advancements in deep learning techniques, object detection frameworks, and the performance of convolutional neural networks. These references are integral to understanding the theoretical foundations and empirical results presented in the context of deep residual learning and its applications in image recognition tasks.
[28] G. Mont´ufar, R. Pascanu, K. Cho, and Y. Bengio. On the number of
linear regions of deep neural networks. In NIPS, 2014.
[29] V. Nair and G. E. Hinton. Rectiﬁed linear units improve restricted
boltzmann machines. In ICML, 2010.
[30] F. Perronnin and C. Dance. Fisher kernels on visual vocabularies for
image categorization. In CVPR, 2007.
[31] T. Raiko, H. Valpola, and Y. LeCun. Deep learning made easier by
linear transformations in perceptrons. In AISTATS, 2012.
[32] S. Ren, K. He, R. Girshick, and J. Sun. Faster R-CNN: Towards
real-time object det


Metadata: {'author': '', 'creationDate': 'D:20210604001958Z', 'creator': 'LaTeX with hyperref', 'file_path': './rag_docs/vision_transformer.pdf', 'format': 'PDF 1.5', 'keywords': '', 'modDate': 'D:20210604001958Z', 'page': 10, 'producer': 'pdfTeX-1.40.21', 'source': './rag_docs/vision_transformer.pdf', 'subject': '', 'title': '', 'total_pages': 22, 'trapped': ''}
Content Brief:


Focuses on the references cited in the research paper, which include foundational works in deep learning, optimization methods, and previous studies related to image classification and representation learning. These references support the development and validation of the Vision Transformer (ViT) model presented in the paper.
Published as a conference paper at ICLR 2021
Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by
reducing internal covariate shift. 2015.
Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. In ICLR, 2015.
Alexander Kolesnikov, Lucas Beyer, Xiaohua Zhai, Joan Puigcerver, Jessica Yung, Sylvain Gelly,
and Neil Houlsby. Big transfer (BiT): General visual representation learning. In ECCV, 2020.
Alex Krizhevsky. Learning multiple layers of features from tiny images. Technical report, 2009.
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. Imagenet classiﬁcation with deep convo-
lutional neural ne




### Contextual Compression Retrieval

The information most relevant to a query may be buried in a document with a lot of irrelevant text. Passing that full document through your application can lead to more expensive LLM calls and poorer responses.

Contextual compression is meant to fix this. The idea is simple: instead of immediately returning retrieved documents as-is, you can compress them using the context of the given query, so that only the relevant information is returned.

This compression can happen in the form of:

- Remove parts of the content of retrieved documents which are not relevant to the query. This is done by extracting only relevant parts of the document to the given query

- Filter out documents which are not relevant to the given query but do not remove content from the document

Here we wrap our multi-query retriever with a `ContextualCompressionRetriever`. Then we'll add an `LLMChainExtractor`, which will iterate over the initially returned documents and extract from each only the content that is relevant to the query.

In [None]:
from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import LLMChainExtractor


# extracts from each document only the content that is relevant to the query
compressor = LLMChainExtractor.from_llm(llm=chatgpt)

# retrieves the documents similar to query and then applies the compressor
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor, base_retriever=mq_retriever
)

In [None]:
query = "what is ML?"
top_docs = compression_retriever.invoke(query)
display_docs(top_docs)

INFO:langchain.retrievers.multi_query:Generated queries: ['What does machine learning (ML) refer to, and how does it work?  ', 'Can you explain the concept of machine learning and its applications?  ', 'What are the key principles and techniques involved in machine learning?']


Metadata: {'id': '564928', 'source': 'Wikipedia', 'title': 'Machine learning'}
Content Brief:


Machine learning gives computers the ability to learn without being explicitly programmed (Arthur Samuel, 1959). It is a subfield of computer science. The idea came from work in artificial intelligence. Machine learning explores the study and construction of algorithms which can learn and make predictions on data. Such algorithms follow programmed instructions, but can also make predictions or decisions based on data. They build a model from sample inputs. Machine learning is done where designing and programming explicit algorithms cannot be done.


Metadata: {'id': '359370', 'source': 'Wikipedia', 'title': 'Supervised learning'}
Content Brief:


In machine learning, supervised learning is the task of inferring a function from labelled training data. The results of the training are known beforehand, the system simply learns how to get to these results correctly. Usually, such systems work with vectors. They get the training data and the result of the training as two vectors and produce a "classifier". Usually, the system uses inductive reasoning to generalize the training data.


Metadata: {'id': '312307', 'source': 'Wikipedia', 'title': 'Standard ML'}
Content Brief:


Standard ML is a functional programming language which is a dialect of ML (programming language).


Metadata: {'id': '44742', 'source': 'Wikipedia', 'title': 'Artificial neural network'}
Content Brief:


"Neural networks are an example of machine learning, where a program can change as it learns to solve a problem."




In [None]:
query = "what is nlp?"
top_docs = compression_retriever.invoke(query)
display_docs(top_docs)

INFO:langchain.retrievers.multi_query:Generated queries: ['What does NLP stand for and what are its main applications?  ', 'Can you explain the concept of natural language processing and its significance?  ', 'What are the key techniques and technologies involved in NLP?']


Metadata: {'id': '40613', 'source': 'Wikipedia', 'title': 'Natural language processing'}
Content Brief:


Natural Language Processing (NLP) is a field in Artificial Intelligence, and is also related to linguistics. On a high level, the goal of NLP is to program computers to automatically understand human languages, and also to automatically write/speak in human languages. We say "Natural Language" to mean human language, and to indicate that we are not talking about computer (programming) languages.


Metadata: {'id': '335464', 'source': 'Wikipedia', 'title': 'Neurolinguistic programming'}
Content Brief:


Neurolinguistic programming is a way of communicating, created in the 1970s. It is often shortened to "NLP". The discipline assumes there is a link between neurological processes, language and behavior. According to NLP, it is possible to achieve certain goals in life by changing one's behaviour. NLP was invented by Richard Bandler and John Grinder.




In [None]:
query = "what is a cnn?"
top_docs = compression_retriever.invoke(query)
display_docs(top_docs)

INFO:langchain.retrievers.multi_query:Generated queries: ['What does CNN stand for and what are its main functions?  ', 'Can you explain the concept and applications of a convolutional neural network?  ', 'What are the key features and uses of CNNs in machine learning?']


Metadata: {'id': '3615', 'source': 'Wikipedia', 'title': 'CNN'}
Content Brief:


The Cable News Network (CNN) is an American cable news television channel. It was founded in 1980 by Ted Turner. The Cable News Network first aired on television on June 1, 1980. CNN broadcasts programs from its headquarters at the CNN Center in Atlanta, or from the Time Warner Center in New York City, or from studios in Washington, D.C., and Los Angeles. CNN is owned by Time Warner, and the U.S. news channel is a part of the Turner Broadcasting System.


Metadata: {'author': '', 'creationDate': 'D:20151203014807Z', 'creator': 'LaTeX with hyperref package', 'file_path': './rag_docs/cnn_paper.pdf', 'format': 'PDF 1.5', 'keywords': '', 'modDate': 'D:20151203014807Z', 'page': 3, 'producer': 'pdfTeX-1.40.12', 'source': './rag_docs/cnn_paper.pdf', 'subject': '', 'title': '', 'total_pages': 11, 'trapped': ''}
Content Brief:


Focuses on the structural differences between Convolutional Neural Networks (CNNs) and traditional Artificial Neural Networks (ANNs), emphasizing the three-dimensional organization of neurons in CNNs. It outlines the significance of spatial dimensions and depth in the input volume, and introduces the three main types of layers that constitute a CNN architecture, including convolutional, pooling, and fully-connected layers.

One of the key differences is that the neurons that the layers within the CNN are comprised of neurons organised into three dimensions, the spatial dimensionality of the input (height and the width) and the depth. The depth does not refer to the total number of layers within the ANN, but the third dimension of a activation volume. Unlike standard ANNS, the neurons within any given layer will only connect to a small region of the layer preceding it.

Overall architecture
CNNs are comprised of three types of layers. These are convolutional layers, pooling layers and f


Metadata: {'author': '', 'creationDate': 'D:20151203014807Z', 'creator': 'LaTeX with hyperref package', 'file_path': './rag_docs/cnn_paper.pdf', 'format': 'PDF 1.5', 'keywords': '', 'modDate': 'D:20151203014807Z', 'page': 8, 'producer': 'pdfTeX-1.40.12', 'source': './rag_docs/cnn_paper.pdf', 'subject': '', 'title': '', 'total_pages': 11, 'trapped': ''}
Content Brief:


Focuses on the architectural design of Convolutional Neural Networks (CNNs), specifically the practice of stacking multiple convolutional layers before pooling layers to enhance feature extraction. It discusses the benefits of using smaller convolutional layers to manage computational complexity and memory allocation, while also addressing the importance of input dimensionality and zero-padding in CNN configurations. Introduction to Convolutional Neural Networks CNNs are extremely powerful machine learning algorithms, however they can be horrendously resource-heavy.


Metadata: {'author': '', 'creationDate': 'D:20151203014807Z', 'creator': 'LaTeX with hyperref package', 'file_path': './rag_docs/cnn_paper.pdf', 'format': 'PDF 1.5', 'keywords': '', 'modDate': 'D:20151203014807Z', 'page': 4, 'producer': 'pdfTeX-1.40.12', 'source': './rag_docs/cnn_paper.pdf', 'subject': '', 'title': '', 'total_pages': 11, 'trapped': ''}
Content Brief:


Focuses on the architecture and functionality of Convolutional Neural Networks (CNNs), detailing the roles of activation functions, pooling layers, and fully-connected layers in transforming input data into class scores for classification and regression tasks. It emphasizes the importance of understanding individual layers and their hyperparameters for effective model creation and optimization. 

Through this simple method of transformation, CNNs are able to transform the original input layer by layer using convolutional and downsampling techniques to produce class scores for classification and regression purposes. 

As the name implies, the convolutional layer plays a vital role in how CNNs operate. The layers parameters focus around the use of learnable kernels.


Metadata: {'author': '', 'creationDate': 'D:20151203014807Z', 'creator': 'LaTeX with hyperref package', 'file_path': './rag_docs/cnn_paper.pdf', 'format': 'PDF 1.5', 'keywords': '', 'modDate': 'D:20151203014807Z', 'page': 0, 'producer': 'pdfTeX-1.40.12', 'source': './rag_docs/cnn_paper.pdf', 'subject': '', 'title': '', 'total_pages': 11, 'trapped': ''}
Content Brief:


Focuses on the introduction of Convolutional Neural Networks (CNNs) within the broader field of Artificial Neural Networks (ANNs), highlighting their significance in image-driven pattern recognition tasks. One of the most impressive forms of ANN architecture is that of the Convolutional Neural Network (CNN). CNNs are primarily used to solve difficult image-driven pattern recognition tasks and with their precise yet simple architecture, offers a simplified method of getting started with ANNs. This document provides a brief introduction to CNNs, discussing recently published papers and newly formed techniques in developing these brilliantly fantastic image recognition models.


Metadata: {'author': '', 'creationDate': 'D:20151203014807Z', 'creator': 'LaTeX with hyperref package', 'file_path': './rag_docs/cnn_paper.pdf', 'format': 'PDF 1.5', 'keywords': '', 'modDate': 'D:20151203014807Z', 'page': 5, 'producer': 'pdfTeX-1.40.12', 'source': './rag_docs/cnn_paper.pdf', 'subject': '', 'title': '', 'total_pages': 11, 'trapped': ''}
Content Brief:


Focuses on the function and significance of kernels in convolutional layers of Convolutional Neural Networks (CNNs), detailing how these kernels operate on input data to produce activation maps. It explains the concept of receptive fields and the reduction of model complexity through hyperparameters such as depth, stride, and zero-padding. Convolutional layers are also able to signiﬁcantly reduce the complexity of the model through the optimisation of its output. These are optimised through three hyperparameters, the depth, the stride and setting zero-padding.


Metadata: {'author': '', 'creationDate': 'D:20151203014807Z', 'creator': 'LaTeX with hyperref package', 'file_path': './rag_docs/cnn_paper.pdf', 'format': 'PDF 1.5', 'keywords': '', 'modDate': 'D:20151203014807Z', 'page': 10, 'producer': 'pdfTeX-1.40.12', 'source': './rag_docs/cnn_paper.pdf', 'subject': '', 'title': '', 'total_pages': 11, 'trapped': ''}
Content Brief:


Focuses on the references cited in the research paper, highlighting significant contributions to the field of convolutional neural networks (CNNs) and their applications in image processing, character recognition, and object detection. These references provide foundational knowledge and advancements that support the development and understanding of CNN architectures discussed throughout the document. 

Introduction to Convolutional Neural Networks


Metadata: {'author': '', 'creationDate': 'D:20151203014807Z', 'creator': 'LaTeX with hyperref package', 'file_path': './rag_docs/cnn_paper.pdf', 'format': 'PDF 1.5', 'keywords': '', 'modDate': 'D:20151203014807Z', 'page': 1, 'producer': 'pdfTeX-1.40.12', 'source': './rag_docs/cnn_paper.pdf', 'subject': '', 'title': '', 'total_pages': 11, 'trapped': ''}
Content Brief:


Convolutional Neural Networks (CNNs) are analogous to traditional ANNs in that they are comprised of neurons that self-optimise through learning. Each neuron will still receive an input and perform a operation (such as a scalar product followed by a non-linear function) - the basis of countless ANNs. The only notable difference between CNNs and traditional ANNs is that CNNs are primarily used in the ﬁeld of pattern recognition within images. This allows us to encode image-speciﬁc features into the architecture, making the network.


Metadata: {'author': '', 'creationDate': 'D:20151203014807Z', 'creator': 'LaTeX with hyperref package', 'file_path': './rag_docs/cnn_paper.pdf', 'format': 'PDF 1.5', 'keywords': '', 'modDate': 'D:20151203014807Z', 'page': 2, 'producer': 'pdfTeX-1.40.12', 'source': './rag_docs/cnn_paper.pdf', 'subject': '', 'title': '', 'total_pages': 11, 'trapped': ''}
Content Brief:


As noted earlier, CNNs primarily focus on the basis that the input will be comprised of images. This focuses the architecture to be set up in way to best suit the need for dealing with the speciﬁc type of data. Additionally, it introduces the architecture of convolutional neural networks (CNNs) as a solution tailored for image processing tasks.




In [None]:
query = "what is clustering?"
top_docs = compression_retriever.invoke(query)
display_docs(top_docs)

INFO:langchain.retrievers.multi_query:Generated queries: ['What are the key concepts and techniques involved in clustering?  ', 'Can you explain the different types of clustering methods and their applications?  ', 'How does clustering work, and what are its main purposes in data analysis?']


Metadata: {'id': '593732', 'source': 'Wikipedia', 'title': 'Cluster analysis'}
Content Brief:


Clustering or cluster analysis is a type of data analysis. The analyst groups objects so that objects in the same group (called a cluster) are more similar to each other than to objects in other groups (clusters) in some way. This is a common task in data mining.




In [None]:
query = "what is a neural network?"
top_docs = compression_retriever.invoke(query)
display_docs(top_docs)

INFO:langchain.retrievers.multi_query:Generated queries: ['What are the key components and functions of a neural network?  ', 'Can you explain the concept and purpose of neural networks in machine learning?  ', 'How do neural networks work and what are their applications?']


Metadata: {'id': '44742', 'source': 'Wikipedia', 'title': 'Artificial neural network'}
Content Brief:


A neural network (also called an ANN or an artificial neural network) is a sort of computer software, inspired by biological neurons. Biological brains are capable of solving difficult problems, but each neuron is only responsible for solving a very small part of the problem. Similarly, a neural network is made up of cells that work together to produce a desired result, although each individual cell is only responsible for solving a small part of the problem. This is one method for creating artificially intelligent programs. Neural networks are an example of machine learning, where a program can change as it learns to solve a problem. A neural network can be trained and improved with each example, but the larger the neural network, the more examples it needs to perform well—often needing millions or billions of examples in the case of deep learning. There are two ways to think of a neural network. First is like a human brain. Second is like a mathematical equation.


Metadata: {'author': '', 'creationDate': 'D:20151203014807Z', 'creator': 'LaTeX with hyperref package', 'file_path': './rag_docs/cnn_paper.pdf', 'format': 'PDF 1.5', 'keywords': '', 'modDate': 'D:20151203014807Z', 'page': 1, 'producer': 'pdfTeX-1.40.12', 'source': './rag_docs/cnn_paper.pdf', 'subject': '', 'title': '', 'total_pages': 11, 'trapped': ''}
Content Brief:


Focuses on the foundational concepts of artificial neural networks (ANNs), including their structure and learning paradigms, specifically supervised and unsupervised learning. It highlights the similarities between convolutional neural networks (CNNs) and traditional ANNs, emphasizing the unique application of CNNs in image pattern recognition and the encoding of image-specific features within their architecture.

A simple three layered feedforward neural network (FNN), comprised of a input layer, a hidden layer and an output layer. This structure is the basis of a number of common ANN architectures, included but not limited to Feed-forward Neural Networks (FNN), Restricted Boltzmann Machines (RBMs) and Recurrent Neural Networks (RNNs).

Convolutional Neural Networks (CNNs) are analogous to traditional ANNs in that they are comprised of neurons that self-optimise through learning. Each neuron will still receive an input and perform a operation (such as a scalar product followed by a no


Metadata: {'author': '', 'creationDate': 'D:20151203014807Z', 'creator': 'LaTeX with hyperref package', 'file_path': './rag_docs/cnn_paper.pdf', 'format': 'PDF 1.5', 'keywords': '', 'modDate': 'D:20151203014807Z', 'page': 0, 'producer': 'pdfTeX-1.40.12', 'source': './rag_docs/cnn_paper.pdf', 'subject': '', 'title': '', 'total_pages': 11, 'trapped': ''}
Content Brief:


Artiﬁcial Neural Networks (ANNs) are computational processing systems of which are heavily inspired by way biological nervous systems (such as the hu- man brain) operate. ANNs are mainly comprised of a high number of intercon- nected computational nodes (referred to as neurons), of which work entwine in a distributed fashion to collectively learn from the input in order to optimise its ﬁnal output. The basic structure of a ANN can be modelled as shown in Figure 1. We would load the input, usually in the form of a multidimensional vector to the input layer of which will distribute it to the hidden layers. The hidden layers will then make decisions from the previous layer and weigh up how a stochastic change within itself detriments or improves the ﬁnal output, and this is referred to as the process of learning. Having multiple hidden layers stacked upon each-other is commonly called deep learning.


Metadata: {'id': '663523', 'source': 'Wikipedia', 'title': 'Deep learning'}
Content Brief:


Deep learning (also called deep structured learning or hierarchical learning) is a kind of machine learning, which is mostly used with certain kinds of neural networks. In many cases, structures are organised so that there is at least one intermediate layer (or hidden layer), between the input layer and the output layer. In a multi-layer neural network (having more than two layers), the information processed will become more abstract with each added layer. Deep learning models are inspired by information processing and communication patterns in biological nervous systems; they are different from the structural and functional properties of biological brains (especially the human brain) in many ways, which make them incompatible with neuroscience evidences.




The `LLMChainFilter` is slightly simpler but more robust compressor that uses an LLM chain to decide which of the initially retrieved documents to filter out and which ones to return, without manipulating the document contents.

In [None]:
from langchain.retrievers.document_compressors import LLMChainFilter

#  decides which of the initially retrieved documents to filter out and which ones to return
_filter = LLMChainFilter.from_llm(llm=chatgpt)

# retrieves the documents similar to query and then applies the filter
compression_retriever = ContextualCompressionRetriever(
    base_compressor=_filter, base_retriever=mq_retriever
)

In [None]:
query = "what is ML?"
top_docs = compression_retriever.invoke(query)
display_docs(top_docs)

INFO:langchain.retrievers.multi_query:Generated queries: ['What does machine learning (ML) refer to?  ', 'Can you explain the concept of machine learning?  ', 'What are the key principles and applications of ML?']


Metadata: {'id': '564928', 'source': 'Wikipedia', 'title': 'Machine learning'}
Content Brief:


Machine learning gives computers the ability to learn without being explicitly programmed (Arthur Samuel, 1959). It is a subfield of computer science. The idea came from work in artificial intelligence. Machine learning explores the study and construction of algorithms which can learn and make predictions on data. Such algorithms follow programmed instructions, but can also make predictions or decisions based on data. They build a model from sample inputs. Machine learning is done where designing and programming explicit algorithms cannot be done. Examples include spam filtering, detection of network intruders or malicious insiders working towards a data breach, optical character recognition (OCR), search engines and computer vision.


Metadata: {'id': '359370', 'source': 'Wikipedia', 'title': 'Supervised learning'}
Content Brief:


In machine learning, supervised learning is the task of inferring a function from labelled training data. The results of the training are known beforehand, the system simply learns how to get to these results correctly. Usually, such systems work with vectors. They get the training data and the result of the training as two vectors and produce a "classifier". Usually, the system uses inductive reasoning to generalize the training data.


Metadata: {'id': '663523', 'source': 'Wikipedia', 'title': 'Deep learning'}
Content Brief:


Deep learning (also called deep structured learning or hierarchical learning) is a kind of machine learning, which is mostly used with certain kinds of neural networks. As with other kinds of machine-learning, learning sessions can be unsupervised, semi-supervised, or supervised. In many cases, structures are organised so that there is at least one intermediate layer (or hidden layer), between the input layer and the output layer. Certain tasks, such as as recognizing and understanding speech, images or handwriting, is easy to do for humans. However, for a computer, these tasks are very difficult to do. In a multi-layer neural network (having more than two layers), the information processed will become more abstract with each added layer. Deep learning models are inspired by information processing and communication patterns in biological nervous systems; they are different from the structural and functional properties of biological brains (especially the human brain) in many ways, whic


Metadata: {'id': '312307', 'source': 'Wikipedia', 'title': 'Standard ML'}
Content Brief:


Standard ML is a functional programming language which is a dialect of ML (programming language). It is sometimes used for writing compilers and in theorem provers. Here is an example of a factorial function written in a simple, non-tail recursive, style.


Metadata: {'id': '44742', 'source': 'Wikipedia', 'title': 'Artificial neural network'}
Content Brief:


A neural network (also called an ANN or an artificial neural network) is a sort of computer software, inspired by biological neurons. Biological brains are capable of solving difficult problems, but each neuron is only responsible for solving a very small part of the problem. Similarly, a neural network is made up of cells that work together to produce a desired result, although each individual cell is only responsible for solving a small part of the problem. This is one method for creating artificially intelligent programs. Neural networks are an example of machine learning, where a program can change as it learns to solve a problem. A neural network can be trained and improved with each example, but the larger the neural network, the more examples it needs to perform well—often needing millions or billions of examples in the case of deep learning. There are two ways to think of a neural network. First is like a human brain. Second is like a mathematical equation.




In [None]:
query = "what is NLP?"
top_docs = compression_retriever.invoke(query)
display_docs(top_docs)

INFO:langchain.retrievers.multi_query:Generated queries: ['What does NLP stand for and what are its main applications?  ', 'Can you explain the concept of Natural Language Processing and its significance?  ', 'What are the key techniques and technologies used in NLP?']


Metadata: {'id': '40613', 'source': 'Wikipedia', 'title': 'Natural language processing'}
Content Brief:


Natural Language Processing (NLP) is a field in Artificial Intelligence, and is also related to linguistics. On a high level, the goal of NLP is to program computers to automatically understand human languages, and also to automatically write/speak in human languages. We say "Natural Language" to mean human language, and to indicate that we are not talking about computer (programming) languages.


Metadata: {'author': '', 'creationDate': 'D:20230803000729Z', 'creator': 'LaTeX with hyperref', 'file_path': './rag_docs/attention_paper.pdf', 'format': 'PDF 1.5', 'keywords': '', 'modDate': 'D:20230803000729Z', 'page': 11, 'producer': 'pdfTeX-1.40.25', 'source': './rag_docs/attention_paper.pdf', 'subject': '', 'title': '', 'total_pages': 15, 'trapped': ''}
Content Brief:


Focuses on the references cited in the research paper, which include foundational works in computational linguistics, parsing, attention mechanisms, and neural machine translation. These references support the development and evaluation of the Transformer model, highlighting its contributions to various natural language processing tasks.
[25] Mitchell P Marcus, Mary Ann Marcinkiewicz, and Beatrice Santorini. Building a large annotated
corpus of english: The penn treebank. Computational linguistics, 19(2):313–330, 1993.
[26] David McClosky, Eugene Charniak, and Mark Johnson. Effective self-training for parsing. In
Proceedings of the Human Language Technology Conference of the NAACL, Main Conference,
pages 152–159. ACL, June 2006.
[27] Ankur Parikh, Oscar Täckström, Dipanjan Das, and Jakob Uszkoreit. A decomposable attention
model. In Empirical Methods in Natural Language Processing, 2016.
[28] Romain Paulus, Caiming Xiong, and Richard Socher. A deep reinforced model for abstractive
summ


Metadata: {'author': '', 'creationDate': 'D:20230803000729Z', 'creator': 'LaTeX with hyperref', 'file_path': './rag_docs/attention_paper.pdf', 'format': 'PDF 1.5', 'keywords': '', 'modDate': 'D:20230803000729Z', 'page': 1, 'producer': 'pdfTeX-1.40.25', 'source': './rag_docs/attention_paper.pdf', 'subject': '', 'title': '', 'total_pages': 15, 'trapped': ''}
Content Brief:


Focuses on the limitations of recurrent neural networks and the advantages of attention mechanisms in sequence modeling and transduction tasks. It introduces the Transformer architecture, which eliminates recurrence and leverages attention to enhance parallelization and improve translation quality. The discussion highlights the challenges of sequential computation and the need for more efficient models in natural language processing.
1
Introduction
Recurrent neural networks, long short-term memory [13] and gated recurrent [7] neural networks
in particular, have been firmly established as state of the art approaches in sequence modeling and
transduction problems such as language modeling and machine translation [35, 2, 5]. Numerous
efforts have since continued to push the boundaries of recurrent language models and encoder-decoder
architectures [38, 24, 15].
Recurrent models typically factor computation along the symbol positions of the input and output
sequences. Aligning the positions




In [None]:
query = "what is a neural network?"
top_docs = compression_retriever.invoke(query)
display_docs(top_docs)

INFO:langchain.retrievers.multi_query:Generated queries: ['What are the key components and functions of a neural network?  ', 'Can you explain the concept and workings of neural networks in simple terms?  ', 'What are the different types of neural networks and their applications?']


Metadata: {'id': '44742', 'source': 'Wikipedia', 'title': 'Artificial neural network'}
Content Brief:


A neural network (also called an ANN or an artificial neural network) is a sort of computer software, inspired by biological neurons. Biological brains are capable of solving difficult problems, but each neuron is only responsible for solving a very small part of the problem. Similarly, a neural network is made up of cells that work together to produce a desired result, although each individual cell is only responsible for solving a small part of the problem. This is one method for creating artificially intelligent programs. Neural networks are an example of machine learning, where a program can change as it learns to solve a problem. A neural network can be trained and improved with each example, but the larger the neural network, the more examples it needs to perform well—often needing millions or billions of examples in the case of deep learning. There are two ways to think of a neural network. First is like a human brain. Second is like a mathematical equation.


Metadata: {'author': '', 'creationDate': 'D:20151203014807Z', 'creator': 'LaTeX with hyperref package', 'file_path': './rag_docs/cnn_paper.pdf', 'format': 'PDF 1.5', 'keywords': '', 'modDate': 'D:20151203014807Z', 'page': 1, 'producer': 'pdfTeX-1.40.12', 'source': './rag_docs/cnn_paper.pdf', 'subject': '', 'title': '', 'total_pages': 11, 'trapped': ''}
Content Brief:


Focuses on the foundational concepts of artificial neural networks (ANNs), including their structure and learning paradigms, specifically supervised and unsupervised learning. It highlights the similarities between convolutional neural networks (CNNs) and traditional ANNs, emphasizing the unique application of CNNs in image pattern recognition and the encoding of image-specific features within their architecture.
2
Keiron O’Shea et al.
Input 1
Input 2
Input 3
Input 4
Input Layer
Hidden Layer
Output Layer
Output
Fig. 1: A simple three layered feedforward neural network (FNN), comprised
of a input layer, a hidden layer and an output layer. This structure is the basis
of a number of common ANN architectures, included but not limited to Feed-
forward Neural Networks (FNN), Restricted Boltzmann Machines (RBMs) and
Recurrent Neural Networks (RNNs).
The two key learning paradigms in image processing tasks are supervised and
unsupervised learning. Supervised learning is learning through pre-la


Metadata: {'author': '', 'creationDate': 'D:20151203014807Z', 'creator': 'LaTeX with hyperref package', 'file_path': './rag_docs/cnn_paper.pdf', 'format': 'PDF 1.5', 'keywords': '', 'modDate': 'D:20151203014807Z', 'page': 0, 'producer': 'pdfTeX-1.40.12', 'source': './rag_docs/cnn_paper.pdf', 'subject': '', 'title': '', 'total_pages': 11, 'trapped': ''}
Content Brief:


Focuses on the introduction of Convolutional Neural Networks (CNNs) within the broader field of Artificial Neural Networks (ANNs), highlighting their significance in image-driven pattern recognition tasks. It outlines the foundational concepts of ANNs, their architecture, and the evolution of machine learning techniques, setting the stage for a deeper exploration of CNNs and their applications.
An Introduction to Convolutional Neural Networks
Keiron O’Shea1 and Ryan Nash2
1 Department of Computer Science, Aberystwyth University, Ceredigion, SY23 3DB
keo7@aber.ac.uk
2 School of Computing and Communications, Lancaster University, Lancashire, LA1
4YW
nashrd@live.lancs.ac.uk
Abstract. The ﬁeld of machine learning has taken a dramatic twist in re-
cent times, with the rise of the Artiﬁcial Neural Network (ANN). These
biologically inspired computational models are able to far exceed the per-
formance of previous forms of artiﬁcial intelligence in common machine
learning tasks. One of the mos


Metadata: {'author': '', 'creationDate': 'D:20151203014807Z', 'creator': 'LaTeX with hyperref package', 'file_path': './rag_docs/cnn_paper.pdf', 'format': 'PDF 1.5', 'keywords': '', 'modDate': 'D:20151203014807Z', 'page': 3, 'producer': 'pdfTeX-1.40.12', 'source': './rag_docs/cnn_paper.pdf', 'subject': '', 'title': '', 'total_pages': 11, 'trapped': ''}
Content Brief:


Focuses on the structural differences between Convolutional Neural Networks (CNNs) and traditional Artificial Neural Networks (ANNs), emphasizing the three-dimensional organization of neurons in CNNs. It outlines the significance of spatial dimensions and depth in the input volume, and introduces the three main types of layers that constitute a CNN architecture, including convolutional, pooling, and fully-connected layers.
4
Keiron O’Shea et al.
One of the key differences is that the neurons that the layers within the CNN
are comprised of neurons organised into three dimensions, the spatial dimen-
sionality of the input (height and the width) and the depth. The depth does not
refer to the total number of layers within the ANN, but the third dimension of a
activation volume. Unlike standard ANNS, the neurons within any given layer
will only connect to a small region of the layer preceding it.
In practice this would mean that for the example given earlier, the input ’vol-
ume’ will have 


Metadata: {'id': '779314', 'source': 'Wikipedia', 'title': 'Binary Neural Network'}
Content Brief:


Binary neural network is an artificial neural network, where commonly used floating-point weights are replaced with binary ones. It largely saving the storage and computation, serves as a technique for deploying deep models on resource-limited devices. Usage of binary values can bring up to 58 times speedup, while accuracy and information capacity of binary neural network can be manually controlled. Binary neural networks do not achieve the same accuracy as their full-precision counterparts, but improvements are being made to close this gap.




In [None]:
query = "what is a cnn?"
top_docs = compression_retriever.invoke(query)
display_docs(top_docs)

INFO:langchain.retrievers.multi_query:Generated queries: ['What does CNN stand for and what are its main functions?  ', 'Can you explain the concept and applications of a convolutional neural network?  ', 'What are the key features and uses of CNNs in machine learning?']


Metadata: {'id': '3615', 'source': 'Wikipedia', 'title': 'CNN'}
Content Brief:


The Cable News Network (CNN) is an American cable news television channel. It was founded in 1980 by Ted Turner. The Cable News Network first aired on television on June 1, 1980. The Cable News Network's first newscast was anchored (hosted) by David Walker and his wife Lois Hart. In its first year CNN hired many political analysts, including Rowland Evans and Robert Novak. On January 1, 1982 CNN launched a 24-hour sister newscast channel with no talk shows or commentary shows called CNN2. CNN broadcasts programs from its headquarters at the CNN Center in Atlanta, or from the Time Warner Center in New York City, or from studios in Washington, D.C., and Los Angeles. CNN is owned by Time Warner, and the U.S. news channel is a part of the Turner Broadcasting System. The hosts of its opinion shows are Don Lemon, Chris Cuomo, Fredricka Whitfield, Erin Burnett, Brianna Keiler and Brooke Baldwin. CNN has been criticized by the right-wing Media Research Center for having a left-wing bias. Accor


Metadata: {'author': '', 'creationDate': 'D:20151203014807Z', 'creator': 'LaTeX with hyperref package', 'file_path': './rag_docs/cnn_paper.pdf', 'format': 'PDF 1.5', 'keywords': '', 'modDate': 'D:20151203014807Z', 'page': 3, 'producer': 'pdfTeX-1.40.12', 'source': './rag_docs/cnn_paper.pdf', 'subject': '', 'title': '', 'total_pages': 11, 'trapped': ''}
Content Brief:


Focuses on the structural differences between Convolutional Neural Networks (CNNs) and traditional Artificial Neural Networks (ANNs), emphasizing the three-dimensional organization of neurons in CNNs. It outlines the significance of spatial dimensions and depth in the input volume, and introduces the three main types of layers that constitute a CNN architecture, including convolutional, pooling, and fully-connected layers.
4
Keiron O’Shea et al.
One of the key differences is that the neurons that the layers within the CNN
are comprised of neurons organised into three dimensions, the spatial dimen-
sionality of the input (height and the width) and the depth. The depth does not
refer to the total number of layers within the ANN, but the third dimension of a
activation volume. Unlike standard ANNS, the neurons within any given layer
will only connect to a small region of the layer preceding it.
In practice this would mean that for the example given earlier, the input ’vol-
ume’ will have 


Metadata: {'author': '', 'creationDate': 'D:20151203014807Z', 'creator': 'LaTeX with hyperref package', 'file_path': './rag_docs/cnn_paper.pdf', 'format': 'PDF 1.5', 'keywords': '', 'modDate': 'D:20151203014807Z', 'page': 8, 'producer': 'pdfTeX-1.40.12', 'source': './rag_docs/cnn_paper.pdf', 'subject': '', 'title': '', 'total_pages': 11, 'trapped': ''}
Content Brief:


Focuses on the architectural design of Convolutional Neural Networks (CNNs), specifically the practice of stacking multiple convolutional layers before pooling layers to enhance feature extraction. It discusses the benefits of using smaller convolutional layers to manage computational complexity and memory allocation, while also addressing the importance of input dimensionality and zero-padding in CNN configurations.
Introduction to Convolutional Neural Networks
9
Another common CNN architecture is to stack two convolutional layers before
each pooling layer, as illustrated in Figure 5. This is strongly recommended as
stacking multiple convolutional layers allows for more complex features of the
input vector to be selected.
input
convolution w/ ReLu
pooling
convolution
w/ ReLu
pooling
fully-connected
w/ ReLu
fully-connected
convolution w/ ReLu
pooling
0
9
output 
...
Fig. 5: A common form of CNN architecture in which convolutional layers are
stacked between ReLus continuously before bei


Metadata: {'author': '', 'creationDate': 'D:20151203014807Z', 'creator': 'LaTeX with hyperref package', 'file_path': './rag_docs/cnn_paper.pdf', 'format': 'PDF 1.5', 'keywords': '', 'modDate': 'D:20151203014807Z', 'page': 4, 'producer': 'pdfTeX-1.40.12', 'source': './rag_docs/cnn_paper.pdf', 'subject': '', 'title': '', 'total_pages': 11, 'trapped': ''}
Content Brief:


Focuses on the architecture and functionality of Convolutional Neural Networks (CNNs), detailing the roles of activation functions, pooling layers, and fully-connected layers in transforming input data into class scores for classification and regression tasks. It emphasizes the importance of understanding individual layers and their hyperparameters for effective model creation and optimization.
Introduction to Convolutional Neural Networks
5
an ’elementwise’ activation function such as sigmoid to the output of the
activation produced by the previous layer.
3. The pooling layer will then simply perform downsampling along the spa-
tial dimensionality of the given input, further reducing the number of pa-
rameters within that activation.
4. The fully-connected layers will then perform the same duties found in
standard ANNs and attempt to produce class scores from the activations,
to be used for classiﬁcation. It is also suggested that ReLu may be used
between these layers, as to improve p


Metadata: {'author': '', 'creationDate': 'D:20151203014807Z', 'creator': 'LaTeX with hyperref package', 'file_path': './rag_docs/cnn_paper.pdf', 'format': 'PDF 1.5', 'keywords': '', 'modDate': 'D:20151203014807Z', 'page': 0, 'producer': 'pdfTeX-1.40.12', 'source': './rag_docs/cnn_paper.pdf', 'subject': '', 'title': '', 'total_pages': 11, 'trapped': ''}
Content Brief:


Focuses on the introduction of Convolutional Neural Networks (CNNs) within the broader field of Artificial Neural Networks (ANNs), highlighting their significance in image-driven pattern recognition tasks. It outlines the foundational concepts of ANNs, their architecture, and the evolution of machine learning techniques, setting the stage for a deeper exploration of CNNs and their applications.
An Introduction to Convolutional Neural Networks
Keiron O’Shea1 and Ryan Nash2
1 Department of Computer Science, Aberystwyth University, Ceredigion, SY23 3DB
keo7@aber.ac.uk
2 School of Computing and Communications, Lancaster University, Lancashire, LA1
4YW
nashrd@live.lancs.ac.uk
Abstract. The ﬁeld of machine learning has taken a dramatic twist in re-
cent times, with the rise of the Artiﬁcial Neural Network (ANN). These
biologically inspired computational models are able to far exceed the per-
formance of previous forms of artiﬁcial intelligence in common machine
learning tasks. One of the mos


Metadata: {'author': '', 'creationDate': 'D:20151203014807Z', 'creator': 'LaTeX with hyperref package', 'file_path': './rag_docs/cnn_paper.pdf', 'format': 'PDF 1.5', 'keywords': '', 'modDate': 'D:20151203014807Z', 'page': 5, 'producer': 'pdfTeX-1.40.12', 'source': './rag_docs/cnn_paper.pdf', 'subject': '', 'title': '', 'total_pages': 11, 'trapped': ''}
Content Brief:


Focuses on the function and significance of kernels in convolutional layers of Convolutional Neural Networks (CNNs), detailing how these kernels operate on input data to produce activation maps. It explains the concept of receptive fields and the reduction of model complexity through hyperparameters such as depth, stride, and zero-padding.
6
Keiron O’Shea et al.
These kernels are usually small in spatial dimensionality, but spreads along the
entirety of the depth of the input. When the data hits a convolutional layer,
the layer convolves each ﬁlter across the spatial dimensionality of the input to
produce a 2D activation map. These activation maps can be visualised, as seen
in Figure 3.
As we glide through the input, the scalar product is calculated for each value in
that kernel. (Figure 4) From this the network will learn kernels that ’ﬁre’ when
they see a speciﬁc feature at a given spatial position of the input. These are
commonly known as activations.
0
0
0
1
0
2
0
1
1
4
0
0
0
0
0
0


Metadata: {'author': '', 'creationDate': 'D:20151203014807Z', 'creator': 'LaTeX with hyperref package', 'file_path': './rag_docs/cnn_paper.pdf', 'format': 'PDF 1.5', 'keywords': '', 'modDate': 'D:20151203014807Z', 'page': 10, 'producer': 'pdfTeX-1.40.12', 'source': './rag_docs/cnn_paper.pdf', 'subject': '', 'title': '', 'total_pages': 11, 'trapped': ''}
Content Brief:


Focuses on the references cited in the research paper, highlighting significant contributions to the field of convolutional neural networks (CNNs) and their applications in image processing, character recognition, and object detection. These references provide foundational knowledge and advancements that support the development and understanding of CNN architectures discussed throughout the document.
Introduction to Convolutional Neural Networks
11
4. Cires¸an, D.C., Meier, U., Gambardella, L.M., Schmidhuber, J.: Convolutional neural
network committees for handwritten character classiﬁcation. In: Document Analysis
and Recognition (ICDAR), 2011 International Conference on. pp. 1135–1139. IEEE
(2011)
5. Egmont-Petersen, M., de Ridder, D., Handels, H.: Image processing with neural net-
worksa review. Pattern recognition 35(10), 2279–2301 (2002)
6. Farabet, C., Martini, B., Akselrod, P., Talay, S., LeCun, Y., Culurciello, E.: Hardware
accelerated convolutional neural networks for synthetic


Metadata: {'author': '', 'creationDate': 'D:20151203014807Z', 'creator': 'LaTeX with hyperref package', 'file_path': './rag_docs/cnn_paper.pdf', 'format': 'PDF 1.5', 'keywords': '', 'modDate': 'D:20151203014807Z', 'page': 1, 'producer': 'pdfTeX-1.40.12', 'source': './rag_docs/cnn_paper.pdf', 'subject': '', 'title': '', 'total_pages': 11, 'trapped': ''}
Content Brief:


Focuses on the foundational concepts of artificial neural networks (ANNs), including their structure and learning paradigms, specifically supervised and unsupervised learning. It highlights the similarities between convolutional neural networks (CNNs) and traditional ANNs, emphasizing the unique application of CNNs in image pattern recognition and the encoding of image-specific features within their architecture.
2
Keiron O’Shea et al.
Input 1
Input 2
Input 3
Input 4
Input Layer
Hidden Layer
Output Layer
Output
Fig. 1: A simple three layered feedforward neural network (FNN), comprised
of a input layer, a hidden layer and an output layer. This structure is the basis
of a number of common ANN architectures, included but not limited to Feed-
forward Neural Networks (FNN), Restricted Boltzmann Machines (RBMs) and
Recurrent Neural Networks (RNNs).
The two key learning paradigms in image processing tasks are supervised and
unsupervised learning. Supervised learning is learning through pre-la


Metadata: {'author': '', 'creationDate': 'D:20151203014807Z', 'creator': 'LaTeX with hyperref package', 'file_path': './rag_docs/cnn_paper.pdf', 'format': 'PDF 1.5', 'keywords': '', 'modDate': 'D:20151203014807Z', 'page': 2, 'producer': 'pdfTeX-1.40.12', 'source': './rag_docs/cnn_paper.pdf', 'subject': '', 'title': '', 'total_pages': 11, 'trapped': ''}
Content Brief:


Focuses on the limitations of traditional artificial neural networks (ANNs) in handling image data, particularly regarding computational complexity and the risk of overfitting. It discusses the implications of increasing input dimensionality on the number of weights and the overall architecture of neural networks, emphasizing the need for reduced complexity to enhance predictive performance. Additionally, it introduces the architecture of convolutional neural networks (CNNs) as a solution tailored for image processing tasks.
Introduction to Convolutional Neural Networks
3
more suited for image-focused tasks - whilst further reducing the parameters
required to set up the model.
One of the largest limitations of traditional forms of ANN is that they tend to
struggle with the computational complexity required to compute image data.
Common machine learning benchmarking datasets such as the MNIST database
of handwritten digits are suitable for most forms of ANN, due to its relatively
small 




### Chained Retrieval Pipeline

This strategy uses a chain of multiple retrievers sequentially to get to the most relevant documents. The following is the flow

Similarity Retrieval → Compression Filter → Reranker Model Retrieval

![](http://i.imgur.com/77pXxLu.gif)

In [None]:
from langchain_community.cross_encoders import HuggingFaceCrossEncoder
from langchain.retrievers.document_compressors import CrossEncoderReranker
from langchain.retrievers.document_compressors import LLMChainFilter
from langchain.retrievers import ContextualCompressionRetriever

# Retriever 1 - simple cosine distance based retriever
similarity_retriever = chroma_db.as_retriever(search_type="similarity",
                                              search_kwargs={"k": 5})

#  decides which of the initially retrieved documents to filter out and which ones to return
_filter = LLMChainFilter.from_llm(llm=chatgpt)
# Retriever 2 - retrieves the documents similar to query and then applies the filter
compressor_retriever = ContextualCompressionRetriever(
    base_compressor=_filter, base_retriever=similarity_retriever
)

# download an open-source reranker model - BAAI/bge-reranker-v2-m3
reranker = HuggingFaceCrossEncoder(model_name="BAAI/bge-reranker-large")
reranker_compressor = CrossEncoderReranker(model=reranker, top_n=3)
# Retriever 3 - Uses a Reranker model to rerank retrieval results from the previous retriever
final_retriever = ContextualCompressionRetriever(
    base_compressor=reranker_compressor, base_retriever=compressor_retriever
)

  from tqdm.autonotebook import tqdm, trange
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/801 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/2.24G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/443 [00:00<?, ?B/s]

sentencepiece.bpe.model:   0%|          | 0.00/5.07M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.1M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/279 [00:00<?, ?B/s]

In [None]:
query = "what is ML?"
top_docs = final_retriever.invoke(query)
display_docs(top_docs)

Metadata: {'id': '312307', 'source': 'Wikipedia', 'title': 'Standard ML'}
Content Brief:


Standard ML is a functional programming language which is a dialect of ML (programming language). It is sometimes used for writing compilers and in theorem provers. Here is an example of a factorial function written in a simple, non-tail recursive, style.


Metadata: {'id': '564928', 'source': 'Wikipedia', 'title': 'Machine learning'}
Content Brief:


Machine learning gives computers the ability to learn without being explicitly programmed (Arthur Samuel, 1959). It is a subfield of computer science. The idea came from work in artificial intelligence. Machine learning explores the study and construction of algorithms which can learn and make predictions on data. Such algorithms follow programmed instructions, but can also make predictions or decisions based on data. They build a model from sample inputs. Machine learning is done where designing and programming explicit algorithms cannot be done. Examples include spam filtering, detection of network intruders or malicious insiders working towards a data breach, optical character recognition (OCR), search engines and computer vision.




In [None]:
query = "what is a neural network?"
top_docs = final_retriever.invoke(query)
display_docs(top_docs)

Metadata: {'id': '44742', 'source': 'Wikipedia', 'title': 'Artificial neural network'}
Content Brief:


A neural network (also called an ANN or an artificial neural network) is a sort of computer software, inspired by biological neurons. Biological brains are capable of solving difficult problems, but each neuron is only responsible for solving a very small part of the problem. Similarly, a neural network is made up of cells that work together to produce a desired result, although each individual cell is only responsible for solving a small part of the problem. This is one method for creating artificially intelligent programs. Neural networks are an example of machine learning, where a program can change as it learns to solve a problem. A neural network can be trained and improved with each example, but the larger the neural network, the more examples it needs to perform well—often needing millions or billions of examples in the case of deep learning. There are two ways to think of a neural network. First is like a human brain. Second is like a mathematical equation.


Metadata: {'id': '779314', 'source': 'Wikipedia', 'title': 'Binary Neural Network'}
Content Brief:


Binary neural network is an artificial neural network, where commonly used floating-point weights are replaced with binary ones. It largely saving the storage and computation, serves as a technique for deploying deep models on resource-limited devices. Usage of binary values can bring up to 58 times speedup, while accuracy and information capacity of binary neural network can be manually controlled. Binary neural networks do not achieve the same accuracy as their full-precision counterparts, but improvements are being made to close this gap.


Metadata: {'author': '', 'creationDate': 'D:20151203014807Z', 'creator': 'LaTeX with hyperref package', 'file_path': './rag_docs/cnn_paper.pdf', 'format': 'PDF 1.5', 'keywords': '', 'modDate': 'D:20151203014807Z', 'page': 0, 'producer': 'pdfTeX-1.40.12', 'source': './rag_docs/cnn_paper.pdf', 'subject': '', 'title': '', 'total_pages': 11, 'trapped': ''}
Content Brief:


Focuses on the introduction of Convolutional Neural Networks (CNNs) within the broader field of Artificial Neural Networks (ANNs), highlighting their significance in image-driven pattern recognition tasks. It outlines the foundational concepts of ANNs, their architecture, and the evolution of machine learning techniques, setting the stage for a deeper exploration of CNNs and their applications.
An Introduction to Convolutional Neural Networks
Keiron O’Shea1 and Ryan Nash2
1 Department of Computer Science, Aberystwyth University, Ceredigion, SY23 3DB
keo7@aber.ac.uk
2 School of Computing and Communications, Lancaster University, Lancashire, LA1
4YW
nashrd@live.lancs.ac.uk
Abstract. The ﬁeld of machine learning has taken a dramatic twist in re-
cent times, with the rise of the Artiﬁcial Neural Network (ANN). These
biologically inspired computational models are able to far exceed the per-
formance of previous forms of artiﬁcial intelligence in common machine
learning tasks. One of the mos




In [None]:
query = "what is a transformer model?"
top_docs = final_retriever.invoke(query)
display_docs(top_docs)

Metadata: {'author': '', 'creationDate': 'D:20230803000729Z', 'creator': 'LaTeX with hyperref', 'file_path': './rag_docs/attention_paper.pdf', 'format': 'PDF 1.5', 'keywords': '', 'modDate': 'D:20230803000729Z', 'page': 1, 'producer': 'pdfTeX-1.40.25', 'source': './rag_docs/attention_paper.pdf', 'subject': '', 'title': '', 'total_pages': 15, 'trapped': ''}
Content Brief:


Focuses on the limitations of recurrent neural networks and the advantages of attention mechanisms in sequence modeling and transduction tasks. It introduces the Transformer architecture, which eliminates recurrence and leverages attention to enhance parallelization and improve translation quality. The discussion highlights the challenges of sequential computation and the need for more efficient models in natural language processing.
1
Introduction
Recurrent neural networks, long short-term memory [13] and gated recurrent [7] neural networks
in particular, have been firmly established as state of the art approaches in sequence modeling and
transduction problems such as language modeling and machine translation [35, 2, 5]. Numerous
efforts have since continued to push the boundaries of recurrent language models and encoder-decoder
architectures [38, 24, 15].
Recurrent models typically factor computation along the symbol positions of the input and output
sequences. Aligning the positions


Metadata: {'author': '', 'creationDate': 'D:20230803000729Z', 'creator': 'LaTeX with hyperref', 'file_path': './rag_docs/attention_paper.pdf', 'format': 'PDF 1.5', 'keywords': '', 'modDate': 'D:20230803000729Z', 'page': 1, 'producer': 'pdfTeX-1.40.25', 'source': './rag_docs/attention_paper.pdf', 'subject': '', 'title': '', 'total_pages': 15, 'trapped': ''}
Content Brief:


Focuses on the Transformer model architecture, which relies entirely on self-attention mechanisms for computing input and output representations, eliminating the need for recurrent neural networks (RNNs) or convolutions. It outlines the encoder-decoder structure commonly used in competitive neural sequence transduction models, detailing how the encoder processes input sequences into continuous representations and how the decoder generates output sequences in an auto-regressive manner.
entirely on self-attention to compute representations of its input and output without using sequence-
aligned RNNs or convolution. In the following sections, we will describe the Transformer, motivate
self-attention and discuss its advantages over models such as [17, 18] and [9].
3
Model Architecture
Most competitive neural sequence transduction models have an encoder-decoder structure [5, 2, 35].
Here, the encoder maps an input sequence of symbol representations (x1, ..., xn) to a sequence
of continuous 


Metadata: {'author': '', 'creationDate': 'D:20230803000729Z', 'creator': 'LaTeX with hyperref', 'file_path': './rag_docs/attention_paper.pdf', 'format': 'PDF 1.5', 'keywords': '', 'modDate': 'D:20230803000729Z', 'page': 2, 'producer': 'pdfTeX-1.40.25', 'source': './rag_docs/attention_paper.pdf', 'subject': '', 'title': '', 'total_pages': 15, 'trapped': ''}
Content Brief:


Describes the architecture of the Transformer model, detailing the structure of the encoder and decoder stacks, which consist of multiple layers incorporating self-attention mechanisms and feed-forward networks. It explains the use of residual connections and layer normalization to enhance model performance. Additionally, it introduces the attention function, which is fundamental to the model's operation, allowing for effective mapping of queries to key-value pairs.
Figure 1: The Transformer - model architecture.
The Transformer follows this overall architecture using stacked self-attention and point-wise, fully
connected layers for both the encoder and decoder, shown in the left and right halves of Figure 1,
respectively.
3.1
Encoder and Decoder Stacks
Encoder:
The encoder is composed of a stack of N = 6 identical layers. Each layer has two
sub-layers. The first is a multi-head self-attention mechanism, and the second is a simple, position-
wise fully connected feed-forward network. W




In [None]:
query = "what is nlp?"
top_docs = final_retriever.invoke(query)
display_docs(top_docs)

Metadata: {'id': '40613', 'source': 'Wikipedia', 'title': 'Natural language processing'}
Content Brief:


Natural Language Processing (NLP) is a field in Artificial Intelligence, and is also related to linguistics. On a high level, the goal of NLP is to program computers to automatically understand human languages, and also to automatically write/speak in human languages. We say "Natural Language" to mean human language, and to indicate that we are not talking about computer (programming) languages.




In [None]:
query = "what is clustering?"
top_docs = final_retriever.invoke(query)
display_docs(top_docs)

Metadata: {'id': '593732', 'source': 'Wikipedia', 'title': 'Cluster analysis'}
Content Brief:


Clustering or cluster analysis is a type of data analysis. The analyst groups objects so that objects in the same group (called a cluster) are more similar to each other than to objects in other groups (clusters) in some way. This is a common task in data mining.




In [None]:
query = "what is a vision transformer"
top_docs = final_retriever.invoke(query)
display_docs(top_docs)

Metadata: {'author': '', 'creationDate': 'D:20210604001958Z', 'creator': 'LaTeX with hyperref', 'file_path': './rag_docs/vision_transformer.pdf', 'format': 'PDF 1.5', 'keywords': '', 'modDate': 'D:20210604001958Z', 'page': 0, 'producer': 'pdfTeX-1.40.21', 'source': './rag_docs/vision_transformer.pdf', 'subject': '', 'title': '', 'total_pages': 22, 'trapped': ''}
Content Brief:


Focuses on the introduction of the Vision Transformer (ViT) model, which applies a standard Transformer architecture directly to image classification tasks by treating image patches as tokens. It highlights the limitations of traditional convolutional networks in computer vision and presents evidence that a pure Transformer can achieve competitive performance on various image recognition benchmarks when pre-trained on large datasets.
Published as a conference paper at ICLR 2021
AN IMAGE IS WORTH 16X16 WORDS:
TRANSFORMERS FOR IMAGE RECOGNITION AT SCALE
Alexey Dosovitskiy∗,†, Lucas Beyer∗, Alexander Kolesnikov∗, Dirk Weissenborn∗,
Xiaohua Zhai∗, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer,
Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby∗,†
∗equal technical contribution, †equal advising
Google Research, Brain Team
{adosovitskiy, neilhoulsby}@google.com
ABSTRACT
While the Transformer architecture has become the de-facto standard for natural
language processing tasks


Metadata: {'author': '', 'creationDate': 'D:20210604001958Z', 'creator': 'LaTeX with hyperref', 'file_path': './rag_docs/vision_transformer.pdf', 'format': 'PDF 1.5', 'keywords': '', 'modDate': 'D:20210604001958Z', 'page': 2, 'producer': 'pdfTeX-1.40.21', 'source': './rag_docs/vision_transformer.pdf', 'subject': '', 'title': '', 'total_pages': 22, 'trapped': ''}
Content Brief:


Focuses on the architecture and methodology of the Vision Transformer (ViT), detailing how images are processed by splitting them into patches, embedding them, and utilizing a standard Transformer encoder for image classification tasks. It describes the model's design principles, including the use of position embeddings and the integration of a classification token, while referencing foundational work in Transformer architecture.
Published as a conference paper at ICLR 2021
Transformer Encoder
MLP 
Head
Vision Transformer (ViT)
*
Linear Projection of Flattened Patches
* Extra learnable
     [ cl ass]  embedding
1
2
3
4
5
6
7
8
9
0
Patch + Position 
Embedding
Class
Bird
Ball
Car
...
Embedded 
Patches
Multi-Head 
Attention
Norm
MLP
Norm
+
L x
+
Transformer Encoder
Figure 1: Model overview. We split an image into ﬁxed-size patches, linearly embed each of them,
add position embeddings, and feed the resulting sequence of vectors to a standard Transformer
encoder. In order to perform classiﬁ


Metadata: {'author': '', 'creationDate': 'D:20210604001958Z', 'creator': 'LaTeX with hyperref', 'file_path': './rag_docs/vision_transformer.pdf', 'format': 'PDF 1.5', 'keywords': '', 'modDate': 'D:20210604001958Z', 'page': 1, 'producer': 'pdfTeX-1.40.21', 'source': './rag_docs/vision_transformer.pdf', 'subject': '', 'title': '', 'total_pages': 22, 'trapped': ''}
Content Brief:


Focuses on the performance of the Vision Transformer (ViT) in comparison to convolutional neural networks (CNNs), highlighting how large-scale training on extensive datasets enhances its generalization capabilities. It discusses the results achieved by ViT when pre-trained on datasets like ImageNet-21k and JFT-300M, demonstrating its competitive accuracy on various image recognition benchmarks. Additionally, it references related work that explores the integration of self-attention mechanisms in image processing.
Published as a conference paper at ICLR 2021
inherent to CNNs, such as translation equivariance and locality, and therefore do not generalize well
when trained on insufﬁcient amounts of data.
However, the picture changes if the models are trained on larger datasets (14M-300M images). We
ﬁnd that large scale training trumps inductive bias. Our Vision Transformer (ViT) attains excellent
results when pre-trained at sufﬁcient scale and transferred to tasks with fewer datapoints. W




In [None]:
query = "what is statistics"
top_docs = final_retriever.invoke(query)
display_docs(top_docs)

Metadata: {'id': '789', 'source': 'Wikipedia', 'title': 'Statistics'}
Content Brief:


Statistics is a branch of applied mathematics dealing with data collection, organization, analysis, interpretation and presentation. Descriptive statistics summarize data. Inferential statistics make predictions. Statistics helps in the study of many other fields, such as science, medicine, economics, psychology, politics and marketing. Someone who works in statistics is called a statistician. In addition to being the name of a field of study, the word "statistics" also refers to numbers that are used to describe data or relationships. The first known statistics are census data. The Babylonians did a census around 3500 BC, the Egyptians around 2500 BC, and the Ancient Chinese around 1000 BC. Starting in the 16th century mathematicians such as Gerolamo Cardano developed probability theory, which made statistics a science. Since then, people have collected and studied statistics on many things. Trees, starfish, stars, rocks, words, almost anything that can be counted has been a subject o


Metadata: {'id': '208393', 'source': 'Wikipedia', 'title': 'Descriptive statistics'}
Content Brief:


Descriptive statistics is a branch of statistics. Its aim is to summarize a set of statistical data. The data are usually taken by sampling a population. To picture the way the data are distributed, a histogram may be drawn. The data may be summarized by computing some characterising values, like the "center" of the dat, and the "spread". In some cases the different items of data will be grouped and the groups will be described, in some way.


Metadata: {'id': '470230', 'source': 'Wikipedia', 'title': 'Statistical significance'}
Content Brief:


Statistics uses variables to describe a measurement. Such a variable is called statistically significant if under a certain status quo assumption, the probability of obtaining its outcome (or a more extreme outcome) is less than a given value. Statistical significance is hence a way of determining the "unlikeliness" of an experimental result—when a certain status quo assumption is assumed to be true. Statistical hypothesis tests are used to check significance. The concept of statistical significance was originated by Ronald Fisher in his 1925 publication, "Statistical Methods for Research Workers", when he developed statistical hypothesis testing (which he described as "tests of significance"). Fisher suggested a probability of one in twenty (0.05 or 5%)—as a convenient cutoff level to reject the null hypothesis. In their 1933 paper, Jerzy Neyman and Egon Pearson recommended that the significance level (for example 0.05), which they called α, be set before any data collection. Despite 


