GitHub - project-katara/katara-llama

Artificial Intelligence (AI) - LLAMA

Our LLaMa was created using a pre-processed model called Llama-2-7b-Chat-GGUF. Basically, it converts Llama 2 to a standard called GPT-Generated Unified Format.

We used the 7 billion parameter model, which is the repository of the enhanced 7B model, optimized for dialogue use cases and converted to the Hugging Face Transformers format.

Our model has five main parts:

EMBEDDINGS: We used hkunlp/instructor-large.

Embeddings are representations of values or objects like text, images, and audio that are designed to be consumed by machine learning models and semantic search algorithms. They translate objects like these into a mathematical form according to the factors or traits each one may or may not have, and the categories they belong to.

Essentially, embeddings enable machine learning models to find similar objects. Given a photo or a document, a machine learning model that uses embeddings could find a similar photo or document. Since embeddings make it possible for computers to understand the relationships between words and other objects, they are foundational for artificial intelligence (AI).

DB - Database Object responsible for saving training data in memory so that it can be consumed later by the model itself.

RETRIEVER - Also known as Retrieval-Augmented Generation is an AI framework for retrieving facts from an external knowledge base to ground large language models (LLMs) on the most accurate, up-to-date information and to give users insight into LLMs' generative process.

LLM - Large Language Models are a core component of LangChain. LangChain does not serve its own LLMs, but rather provides a standard interface for interacting with many different LLMs.

DataSets

Our AI was trained with data provided by NASA. The training process consisted of a few steps: Data Screening, Data Capture, Data Processing and finally Data Processing by our LLaMa version 2.

The data was taken from the following sources:

Internal Data

LLaMa 2: Dataset from Facebook's own model Llama-2-7b-Chat-GGUF: Model that uses the GPT-Generated Unified Format

External Data to fine tune

Earth Observatory Environmental Performance Index (EPI) Wikipedia Climatekids - Nasa Climate - Nasa Center for Science Education Earth Data - Nasa HydroSheds Food and Agriculture Organization of the United Nations - FAO

Acknowledgements

This project is based on the following repository localGPT

Name		Name	Last commit message	Last commit date
Latest commit History 231 Commits
.github/workflows		.github/workflows
SOURCE_DOCUMENTS		SOURCE_DOCUMENTS
assets		assets
static		static
websocket		websocket
.dockerignore		.dockerignore
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
.pyup.yml		.pyup.yml
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
callbacks.py		callbacks.py
constants.py		constants.py
get-pip.py		get-pip.py
ingest.py		ingest.py
load_models.py		load_models.py
main.py		main.py
prompt_template_utils.py		prompt_template_utils.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
run.sh		run.sh
webSocketManger.py		webSocketManger.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Artificial Intelligence (AI) - LLAMA

DataSets

Internal Data

External Data to fine tune

Acknowledgements

About

Releases

Packages

Contributors 29

Languages

License

project-katara/katara-llama

Folders and files

Latest commit

History

Repository files navigation

Artificial Intelligence (AI) - LLAMA

DataSets

Internal Data

External Data to fine tune

Acknowledgements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 29

Languages

Packages