Local Birder LLM/SLM

Overview

Using data on birds local to Arizona, image classification and NLP will be performed using locally run LLMs. Using scraped species information and images from eBird.com, LLMs run on the host machine will be used to classify both images and text queries. Due to the narrower scope and amount of data, SLMs will also be explored. Again, the narrow use case of this project and the goal of being run locally makes this a good test for Retrieval Augmented Generation (RAG).

Using the Selenium web driver, ~1,000 images for roughly 111 different birds native to Arizona will be scraped from the Cornell Lab Macaulay library yielding a training dataset of over 100,000 images. The scraped images have varying resolutions and will be padded to a standard 480x480 square. A vision transformer (ViT) will be used to classify each of the 111 classes. While convolutional neural networks (CNN) are the usual choice in image classification, this serves as an exploration of the broad use cases of transformer models. Vision transformers are known to be both data-hungry and computationally expensive. Fine-tuning will be needed to increase accuracy, but this approach may not yield the most accurate classification. Again, this is just a proof of concept for a small locally run case. The model approach is subject to change depending on the accuracy of the LLM chosen. While the ViT will be explored first, other models such as CLIP (Contrastive Language-Image Pre-training), ResNet (Residual Networks), as well as EfficientNet will also be used.

Along with images, taxonomic information and species descriptions for each class will be scraped for all 111 classes. This information will serve as the basis for a small dataset (knowledge base) that can be utilized by a RAG architecture. Additional knowledge libraries will be added using scraped species descriptions from Wikipedia and/or the detailed description pages on eBird. The operation should work as follows: A prompt will be entered, e.g. "What bird is light brown with stripes on its back and a red patch on its head". The results from the RAG LLM using the stored data should produce the result "Gila Woodpecker".

The overall goal is an exploration of locally obtained data run on local LLMs/SLMs for a specific use case without the need for APIs. Datasets such as the Caltech-UCSD Birds-200-2011 (CUB-200-2011) exist, but have fewer images per class and aren't specific to Arizona though there is some overlap! High accuracy has been achieved on the smaller CalTech dataset (11,788 images) using few-shot learning. Given that there aren't millions of training images, that approach may be explored later with a truncated dataset.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
README.md		README.md
contrastive_language-image_pre-training_v1.py		contrastive_language-image_pre-training_v1.py
efficientnet_v1.py		efficientnet_v1.py
error_prone_search_terms_df.csv		error_prone_search_terms_df.csv
img_preprocess.py		img_preprocess.py
residual_neural_network_v1.py		residual_neural_network_v1.py
scrape_detail.py		scrape_detail.py
scrape_image.py		scrape_image.py
search_terms.csv		search_terms.csv
search_terms_input.py		search_terms_input.py
species_info_111.csv		species_info_111.csv
train_validation_sets.py		train_validation_sets.py
vision_transformer_v1.py		vision_transformer_v1.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Local Birder LLM/SLM

Overview

Roadmap

About

Releases

Packages

Languages

osiikhareb/local-birder-llm

Folders and files

Latest commit

History

Repository files navigation

Local Birder LLM/SLM

Overview

Roadmap

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages