In [1]:
from scrapeai import Folder, AnthropicClient

In [None]:
client = AnthropicClient(api_key='zzz')

# Answering questions about random text files

In [3]:
folder = Folder(path='./example_collection', files_endwith=['.txt'], client=client)

In [4]:
folder.files

['c.txt', 'e.txt', 'd.txt']

In [5]:
for f in folder.get_files():
    print(f.path.name)
    print(f.content())

c.txt
Title: A treatise on the uses of native shrubbery and the children there into.
By: Thomas Babington the Third
Well by this point, I would say that's about all there is to say on the subject.

e.txt
Okay.

d.txt
Another Exciting Story
By Anon

Once upon anon anon.



In [6]:
# Use `async_ask` instead of `ask` because Juypter notebook already has event loop running.
answers = await folder.async_ask("What is the author's name?")

In [7]:
print(answers)

0. [c.txt] Title: A treatise on the uses of native shrubbery and the children there into, Answer: The author's name is Thomas Babington the Third.
1. [e.txt] Title: None, Answer: 
There isn't enough information to answer the question "What is the author's name?" The document does not contain any relevant information about the author or the document's title.

2. [d.txt] Title: Another Exciting Story, Answer: The author's name is given as "Anon" in the document. This is likely short for "Anonymous," indicating that the true identity of the author is unknown or intentionally withheld.


In [8]:
print(answers.raw)

[('c.txt', 'Title: A treatise on the uses of native shrubbery and the children there into', "Answer: The author's name is Thomas Babington the Third."), ('e.txt', 'Title: None', 'Answer: \nThere isn\'t enough information to answer the question "What is the author\'s name?" The document does not contain any relevant information about the author or the document\'s title.\n'), ('d.txt', 'Title: Another Exciting Story', 'Answer: The author\'s name is given as "Anon" in the document. This is likely short for "Anonymous," indicating that the true identity of the author is unknown or intentionally withheld.')]


# Answering questions about ScrapeAI's source code

In [9]:
folder = Folder(path='./src/scrapeai', files_endwith=['.py'], client=client)

In [10]:
folder.files

['folder.py', 'client.py', '__init__.py', 'file.py']

In [11]:
answers = await folder.async_ask('What is the purpose of this code?')

In [15]:
answers

0. [folder.py] Title: folder.py, Answer: 
The purpose of this code is to define a Folder class that represents a directory containing specific types of files. Its main functionalities include:

1. Initializing a Folder object with a path, allowed file suffixes, and an API client.
2. Retrieving files from the specified folder that match the allowed suffixes.
3. Providing a list of filenames in the folder.
4. Offering methods to ask questions about the files in the folder using the provided API client, both synchronously and asynchronously.

The class enforces specific file types (PDF, TXT, TSV, CSV, MD, PY) and requires an API client for processing questions about the files. It serves as a wrapper for file management and interaction with an external API for file-related queries.

1. [client.py] Title: client.py, Answer: 
The purpose of this code is to create an AnthropicClient class that interacts with the Anthropic API to process and analyze documents. The class has methods to:

1. Gen

# Answering questions about research papers

In [17]:
folder = Folder(path='./papers', files_endwith=['.pdf'], client=client)

In [18]:
folder.files

['vgg.pdf',
 'efficiently_scaling_transformer_inference.pdf',
 'attention_is_all_you_need.pdf',
 'alexnet.pdf']

In [22]:
answers = await folder.async_ask('What background information would be good to have before I read this paper?')

In [23]:
answers

0. [vgg.pdf] Title: Very Deep Convolutional Networks for Large-Scale Image Recognition, Answer: 
- Basic understanding of convolutional neural networks (ConvNets) and their use in image classification tasks
- Familiarity with the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) dataset and competition
- Knowledge of previous state-of-the-art ConvNet architectures like AlexNet 
- Understanding of key ConvNet concepts like convolutional layers, pooling layers, fully connected layers, and network depth
- Familiarity with common techniques used in training deep neural networks like data augmentation, dropout, etc.
- Basic knowledge of object detection and localization tasks in computer vision

Having this background would help provide context for the paper's exploration of very deep ConvNet architectures and their performance on image classification and localization tasks.

1. [efficiently_scaling_transformer_inference.pdf] Title: Efficiently Scaling Transformer Inference, Answer

In [26]:
answers = await folder.async_ask('What are the key discoveries of this paper? Keep it very succinct.')

In [27]:
answers

0. [vgg.pdf] Title: Very Deep Convolutional Networks for Large-Scale Image Recognition, Answer: 
The key discoveries of this paper are:

1. Increasing the depth of convolutional neural networks to 16-19 layers significantly improves accuracy on image classification tasks.

2. Using very small (3x3) convolutional filters throughout the network is effective and allows for increasing depth while controlling the number of parameters.

3. The deep networks generalize well to other datasets, outperforming previous state-of-the-art methods on tasks like object classification, localization, and action recognition when used as feature extractors.

4. Multi-scale training and evaluation improves performance.

5. The best performing models (16 and 19 layers deep) achieved state-of-the-art results on ImageNet classification and localization tasks.

1. [efficiently_scaling_transformer_inference.pdf] Title: Efficiently Scaling Transformer Inference, Answer: 
The key discoveries of this paper are:

1