This repo has been migrated to AutoLLM/ArxivDigest and new features will be udpated there!
This repo aims to provide a better daily digest for newly published arxiv papers based on your own research interests and descriptions.
Staying up to date on arxiv papers can take a considerable amount of time, with on the order of hundreds of new papers each day to filter through. There is an official daily digest service, however large subtopics like cs.AI still have 50-100 papers a day. Determining if these papers are relevant and important to you means reading through the title and abstract.
This repository provides a way to have this daily digest sorted by relevance via large language models:
- You modify the configuration file
config.yaml
with an arxiv topic, some set of subtopics, and a natural language statement about the type of papers you are interested in - The code pulls all the abstracts for papers in those subtopics and ranks how relevant they are to your interest on a scale of 1-10 using gpt-3.5-turbo.
- The code then emits an HTML digest listing all the relevant papers, and optionally emails it to you using SendGrid. You will need to have a SendGrid account with an API key for this functionality to work
The recommended way to get started using this repository is to:
- Fork the repository
- Modify
config.yaml
and merge the changes into your main branch. If you want a different schedule than Sunday through Thursday at 1:25PM UTC, then also modify the file.github/workflows/daily_pipeline.yaml
- Create or fetch your api key for OpenAI. Note: you will need an OpenAI account.
- Create or fetch your api key for SendGrid (optional, if you want the action to email you)
- Set the following secrets:
OPENAI_API_KEY
SENDGRID_API_KEY
(only if using SendGrid)FROM_EMAIL
(only if using SendGrid and if you don't have them set inconfig.yaml
)TO_EMAIL
(only if using SendGrid and if you don't have them set inconfig.yaml
)
- Manually trigger the action or wait until the scheduled action takes place.
This repository uses SendGrid to send emails. If you do not wish to use SendGrid, then simply do not create and add a SendGrid API key. Instead, the digest will be uploaded as part of the github action.
You can access this digest as part of the github action artifact.
If you do not wish to fork this repository, and would prefer to clone and run it locally instead:
- Install the requirements in
src/requirements.txt
- Modify the configuration file
config.yaml
- Create or fetch your api key for OpenAI. Note: you will need an OpenAI account.
- Create or fetch your api key for SendGrid (optional, if you want the script to email you)
- Set the following secrets:
OPENAI_API_KEY
SENDGRID_API_KEY
(only if using SendGrid)FROM_EMAIL
(only if using SendGrid and if you don't have them set inconfig.yaml
)TO_EMAIL
(only if using SendGrid and if you don't have them set inconfig.yaml
)
- Run
python action.py
. - If you are not using SendGrid, the html of the digest will be written to
digest.html
. You can then use your favorite webbrowser to view it.
You may want to use something like crontab to schedule the digest.
Install the requirements in src/requirements.txt
as well as gradio
. Set the evironment variables OPENAI_API_KEY
, FROM_EMAIL
and SENDGRID_API_KEY
Run python src/app.py
and go to the local URL. From there you will be able to preview the papers from today, as well as the generated digests.
You may (and are encourage to) modify the code in this repository to suit your personal needs. If you think your modifications would be in any way useful to others, please submit a pull request.
These types of modifications include things like changes to the prompt, different language models, or additional ways for the digest is delivered to you.