<a href="https://colab.research.google.com/github/priyahari868/my_skills_and_linkedin/blob/main/genai-arxiv-summary.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

README.md (Markdown)
# arxiv-genai-summary

🚀 This project fetches the latest AI/ML research papers from arXiv and summarizes their abstracts using Hugging Face Transformers.

## 📌 Features
- Searches arXiv for the latest papers on AI or Machine Learning
- Uses `facebook/bart-large-cnn` summarization model
- Outputs the original abstract and a concise summary

## 📁 Files
- `arxiv_summarizer.ipynb`: Main Jupyter/Colab notebook
- `requirements.txt`: Python packages needed
- `README.md`: Project overview and instructions

## 🧠 Tech Stack
- Python, Pandas
- arxiv API
- Hugging Face Transformers

## 📎 How to Run
1. Clone this repo or open the notebook in Google Colab
2. Install dependencies (if not in Colab):  
   `pip install -r requirements.txt`
3. Run the notebook to fetch papers and generate summaries

## 📫 Contact
Created by **Vempadapu Hari Priya**  
Email: priyahari868@gmail.com  
Available for hire | Open to collaboration


## 🧪 Requirements

To run this notebook, you need the following packages:

- pandas  
- arxiv  
- transformers  


transformers


In [1]:
# Step 1: Install required libraries (if running in Colab)
!pip install arxiv pandas transformers

# Step 2: Import libraries
import arxiv
import pandas as pd
from transformers import pipeline

# Step 3: Define query and fetch papers
query = 'Artificial intelligence OR M achine learning'
search = arxiv.Search(
    query=query,
    max_results=10,
    sort_by=arxiv.SortCriterion.SubmittedDate
)

papers = []
for result in search.results():
    papers.append({
        'published': result.published,
        'title': result.title,
        'abstract': result.summary,
        'categories': result.categories
    })

# Step 4: Create DataFrame
df = pd.DataFrame(papers)
pd.set_option('display.max_colwidth', None)
df.head(3)  # Display top 3 results


Collecting arxiv
  Downloading arxiv-2.2.0-py3-none-any.whl.metadata (6.3 kB)
Collecting feedparser~=6.0.10 (from arxiv)
  Downloading feedparser-6.0.11-py3-none-any.whl.metadata (2.4 kB)
Collecting sgmllib3k (from feedparser~=6.0.10->arxiv)
  Downloading sgmllib3k-1.0.0.tar.gz (5.8 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Downloading arxiv-2.2.0-py3-none-any.whl (11 kB)
Downloading feedparser-6.0.11-py3-none-any.whl (81 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m81.3/81.3 kB[0m [31m7.1 MB/s[0m eta [36m0:00:00[0m
[?25hBuilding wheels for collected packages: sgmllib3k
  Building wheel for sgmllib3k (setup.py) ... [?25l[?25hdone
  Created wheel for sgmllib3k: filename=sgmllib3k-1.0.0-py3-none-any.whl size=6046 sha256=497a04e9a23802cbd7c1f59d6dc6cd029a6eeacb0c005eddbe438d7caa9538fe
  Stored in directory: /root/.cache/pip/wheels/3b/25/2a/105d6a15df6914f4d15047691c6c28f9052cc1173e40285d03
Successfully built sgmllib3k
Installing collected packag

  for result in search.results():


Unnamed: 0,published,title,abstract,categories
0,2025-07-14 17:59:59+00:00,Self-supervised Learning on Camera Trap Footage Yields a Strong Universal Face Embedder,"Camera traps are revolutionising wildlife monitoring by capturing vast\namounts of visual data; however, the manual identification of individual\nanimals remains a significant bottleneck. This study introduces a fully\nself-supervised approach to learning robust chimpanzee face embeddings from\nunlabeled camera-trap footage. Leveraging the DINOv2 framework, we train Vision\nTransformers on automatically mined face crops, eliminating the need for\nidentity labels. Our method demonstrates strong open-set re-identification\nperformance, surpassing supervised baselines on challenging benchmarks such as\nBossou, despite utilising no labelled data during training. This work\nunderscores the potential of self-supervised learning in biodiversity\nmonitoring and paves the way for scalable, non-invasive population studies.","[cs.CV, cs.AI, cs.LG]"
1,2025-07-14 17:59:46+00:00,EmbRACE-3K: Embodied Reasoning and Action in Complex Environments,"Recent advanced vision-language models(VLMs) have demonstrated strong\nperformance on passive, offline image and video understanding tasks. However,\ntheir effectiveness in embodied settings, which require online interaction and\nactive scene understanding remains limited. In such scenarios, an agent\nperceives the environment from a first-person perspective, with each action\ndynamically shaping subsequent observations. Even state-of-the-art models such\nas GPT-4o, Claude 3.5 Sonnet, and Gemini 2.5 Pro struggle in open-environment\ninteractions, exhibiting clear limitations in spatial reasoning and\nlong-horizon planning. To address this gap, we introduce EmRACE-3K, a dataset\nof over 3,000 language-guided tasks situated in diverse, photorealistic\nenvironments constructed using Unreal Engine and the UnrealCV-Zoo framework.\nThe tasks encompass a wide range of embodied challenges, including navigation,\nobject manipulation, and multi-stage goal execution. Each task unfolds as a\nmulti-step trajectory, pairing first-person visual observations with high-level\ninstructions, grounded actions, and natural language rationales that express\nthe agent's intent at every step. Using EmRACE-3K, we establish a benchmark to\nevaluate the embodied reasoning capabilities of VLMs across three key\ndimensions: Exploration, Dynamic Spatial-Semantic Reasoning, and Multi-stage\nGoal Execution. In zero-shot settings, all models achieve success rates below\n20%, underscoring the challenge posed by our benchmark and the current\nlimitations of VLMs in interactive environments. To demonstrate the utility of\nEmRACE-3K, we further fine-tune Qwen2.5-VL-7B using supervised learning\nfollowed by reinforcement learning. This approach yields substantial\nimprovements across all three challenge categories, highlighting the dataset's\neffectiveness in enabling the development of embodied reasoning capabilities.","[cs.CV, cs.AI, cs.CL]"
2,2025-07-14 17:59:41+00:00,Quantize-then-Rectify: Efficient VQ-VAE Training,"Visual tokenizers are pivotal in multimodal large models, acting as bridges\nbetween continuous inputs and discrete tokens. Nevertheless, training\nhigh-compression-rate VQ-VAEs remains computationally demanding, often\nnecessitating thousands of GPU hours. This work demonstrates that a pre-trained\nVAE can be efficiently transformed into a VQ-VAE by controlling quantization\nnoise within the VAE's tolerance threshold. We present\n\textbf{Quantize-then-Rectify (ReVQ)}, a framework leveraging pre-trained VAEs\nto enable rapid VQ-VAE training with minimal computational overhead. By\nintegrating \textbf{channel multi-group quantization} to enlarge codebook\ncapacity and a \textbf{post rectifier} to mitigate quantization errors, ReVQ\ncompresses ImageNet images into at most 512 tokens while sustaining competitive\nreconstruction quality (rFID = 1.06). Significantly, ReVQ reduces training\ncosts by over two orders of magnitude relative to state-of-the-art approaches:\nReVQ finishes full training on a single NVIDIA 4090 in approximately 22 hours,\nwhereas comparable methods require 4.5 days on 32 A100 GPUs. Experimental\nresults show that ReVQ achieves superior efficiency-reconstruction trade-offs.","[cs.CV, cs.LG]"


In [2]:
# Choose the first abstract for summarization
abstract = df['abstract'][0]

# Load summarizer model from Hugging Face
summarizer = pipeline("summarization", model="facebook/bart-large-cnn")

# Summarize the abstract
summary = summarizer(abstract, max_length=100, min_length=30, do_sample=False)

# Display the summary
print("Original Abstract:\n", abstract)
print("\n🔍 Summarized:\n", summary[0]['summary_text'])


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json: 0.00B [00:00, ?B/s]

model.safetensors:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

Device set to use cpu


Original Abstract:
 Camera traps are revolutionising wildlife monitoring by capturing vast
amounts of visual data; however, the manual identification of individual
animals remains a significant bottleneck. This study introduces a fully
self-supervised approach to learning robust chimpanzee face embeddings from
unlabeled camera-trap footage. Leveraging the DINOv2 framework, we train Vision
Transformers on automatically mined face crops, eliminating the need for
identity labels. Our method demonstrates strong open-set re-identification
performance, surpassing supervised baselines on challenging benchmarks such as
Bossou, despite utilising no labelled data during training. This work
underscores the potential of self-supervised learning in biodiversity
monitoring and paves the way for scalable, non-invasive population studies.

🔍 Summarized:
 Self-supervised approach to learning robust chimpanzee face embeddings from unlabeled camera-trap footage. Our method demonstrates strong open-set re