In [1]:
from utils import *
import pandas as pd
from tqdm import tqdm 
from revChatGPT.V1 import Chatbot
tqdm.pandas()

In [2]:
import torch
from transformers import PegasusForConditionalGeneration, PegasusTokenizer

model_name = "tuner007/pegasus_summarizer"
torch_device = "cuda" if torch.cuda.is_available() else "cpu"
tokenizer_summarizer = PegasusTokenizer.from_pretrained(model_name)
model_summarizer = PegasusForConditionalGeneration.from_pretrained(model_name).to(torch_device)

In [3]:
def get_response(input_text):
    batch = tokenizer_summarizer(
        [input_text],
        truncation=True,
        padding="longest",
        max_length=1024,
        return_tensors="pt",
    ).to(torch_device)
    gen_out = model_summarizer.generate(
        **batch, max_length=256, num_beams=5, num_return_sequences=1, temperature=1.5
    )
    output_text = tokenizer_summarizer.batch_decode(gen_out, skip_special_tokens=True)
    return output_text[0]

In [4]:
query1 = "What is the optimal architecture for deep neural networks, and how can we develop more efficient training algorithms to scale these models to larger datasets and applications?"
# search and re-rank for the query
df = get_results(query1)
df["tldr"] = df["abstract"].progress_apply(get_response)
df, query1 = rerank(df, query1, 'tldr')

100%|██████████| 20/20 [01:29<00:00,  4.48s/it]


In [5]:
prompt = answer_question_chatgpt(df, 
                                 query1, 
                                 k=8, 
                                 instructions="Instructions: Using the provided web search results, write a comprehensive reply to the given query. If you find a result relevant make sure to cite the result using [[number](URL)] notation after the reference. End your answer with a summary. A\nQuery:")

In [6]:
chatbot = Chatbot(config=constants.CHATGPT_CREDS)

response = ""

for data in chatbot.ask(
  prompt
):
    response = data["message"]

# print the response variable  as ipython markdown
from IPython.display import Markdown, display
display(Markdown(f"## Question\n\n"))

display(Markdown(f"### {query1}\n\n"))
# display(Markdown(f"ChatGPT Response: \n {response}"))
display(Markdown(f"## ChatGPT Response: \n\n {response}"))
print_papers(df, k=8)

## Question



### What is the optimal architecture for deep neural networks, and how can we develop more efficient training algorithms to scale these models to larger datasets and applications?



## ChatGPT Response: 

 Deep Neural Networks (DNNs) have proven to be effective in various applications, but their computational cost can be high, and they can suffer from scaling issues. Therefore, researchers have proposed various techniques to optimize DNN architectures and training algorithms for more efficient computation and scaling.

One approach to improving the computation cost of DNNs is to reduce computation precision and eliminate inessential computation and memory demands [[1](https://www.semanticscholar.org/paper/b39af7b35d0cb1a42573fa70c5a6799a1d88b8f7)]. Another strategy is to explore the design space of deep neural network accelerators in the early design stages to select optimal dataflow and guide new hardware architecture design [[2](https://www.semanticscholar.org/paper/d98125b0acec88b6fb26a77b33b86941d1fa20d0)].

To overcome scaling issues in DNNs, one proposal is to integrate networks (TN) into NNs, in combination with variational DMRG-like optimization, resulting in a scalable neural network architecture that can be efficiently trained for a large number of neurons and layers [[3](https://www.semanticscholar.org/paper/fd364ecc2896424107ec4fc57096942e373581d3)]. Another effective architecture for calibrating deep neural networks during training is Annealing Double-Head, which constructs an additional calibration head to map logits to the aligned confidence [[4](https://www.semanticscholar.org/paper/cf0d478b04b8bb91bbe4a383a052ab16456fa64b)]. 

Bayes without Bayesian Learning (BwoBL) is an algorithm that builds Bayesian Neural Networks (BNNs) using learned parameters of pre-trained models and focuses on Bayesian inference without costing, resulting in better accuracy compared to no adversarial training [[5](https://www.semanticscholar.org/paper/a83a370ceb4971b6ac03d3d4244aa553a50f52eb)]. Additionally, a neuron normalization technique can be used to adjust the neural selectivity and develop a direct learning algorithm for deep SNNs, which enable energy-efficient implementation on emerging neuromorphic hardware [[6](https://www.semanticscholar.org/paper/67b12aa7cde36523780c6be84afb9b95ddbd078a)].

Finally, GPipe is a pipeline parallelism library that allows scaling any network that can be expressed as a sequence of layers, resulting in almost linear speedup when a model is partitioned across multiple accelerators [[7](https://www.semanticscholar.org/paper/c18663fea10c8a303d045fd2c1f33cacf9b73ca3)]. Moreover, neuromorphic platforms have been developed to execute SNNs, which provide a power-efficient and brain-inspired computing paradigm for machine learning applications [[8](https://www.semanticscholar.org/paper/12c74e5e4439a2e05109172ef3fbc17f51d93860)].

In summary, optimizing DNN architectures and developing more efficient training algorithms can be achieved through techniques such as computation precision reduction, design space exploration, integration of networks, Annealing Double-Head, BwoBL, neuron normalization, pipeline parallelism, and neuromorphic platforms. These techniques have shown promising results in improving the computation cost of DNNs and scaling these models to larger datasets and applications.

#### [1] [Algorithm-Hardware Optimization of Deep Neural Networks for Edge Applications](https://www.semanticscholar.org/paper/b39af7b35d0cb1a42573fa70c5a6799a1d88b8f7) - , 2020

#### [2] [TRIM: A Design Space Exploration Model for Deep Neural Networks Inference and Training Accelerators](https://www.semanticscholar.org/paper/d98125b0acec88b6fb26a77b33b86941d1fa20d0) - IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2021

#### [3] [Variational Tensor Neural Networks for Deep Learning](https://www.semanticscholar.org/paper/fd364ecc2896424107ec4fc57096942e373581d3) - , 2022

#### [4] [Annealing Double-Head: An Architecture for Online Calibration of Deep Neural Networks](https://www.semanticscholar.org/paper/cf0d478b04b8bb91bbe4a383a052ab16456fa64b) - ArXiv, 2022

#### [5] [Flexible Bayesian Inference by Weight Transfer for Robust Deep Neural Networks](https://www.semanticscholar.org/paper/a83a370ceb4971b6ac03d3d4244aa553a50f52eb) - IEICE Trans. Inf. Syst., 2021

#### [6] [Direct Training for Spiking Neural Networks: Faster, Larger, Better](https://www.semanticscholar.org/paper/67b12aa7cde36523780c6be84afb9b95ddbd078a) - AAAI Conference on Artificial Intelligence, 2018

#### [7] [GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism](https://www.semanticscholar.org/paper/c18663fea10c8a303d045fd2c1f33cacf9b73ca3) - Neural Information Processing Systems, 2018

#### [8] [Advancements in Algorithms and Neuromorphic Hardware for Spiking Neural Networks](https://www.semanticscholar.org/paper/12c74e5e4439a2e05109172ef3fbc17f51d93860) - Neural Computation, 2022