RAGCAR: Retrieval-Augmented Generative Companion for Advanced Research

RAGCAR🚛는 카카오브레인의 자연어 처리 라이브러리 PORORO 아키텍처를 기반으로 구축하여, 대규모 언어 모델(Large Language Models, LLM) OpenAI의 GPT와 NAVER의 HyperCLOVA X API 기능을 추가하고 RAG(Retrieval-Augmented Generation)에 필요한 도구들을 쉽게 사용할 수 있도록 지원합니다.

Installation

python>=3.8 환경에서 정상적으로 동작합니다.
아래 커맨드를 통해 패키지를 설치하실 수 있습니다.

pip install ragcar

혹은 아래와 같이 로컬 환경에서 설치를 하실 수도 있습니다.

git clone https://github.com/leewaay/ragcar.git
cd ragcar
pip install -e .

Usage

다음과 같은 명령어로 Ragcar 를 사용할 수 있습니다.

먼저, Ragcar 를 임포트하기 위해서는 다음과 같은 명령어를 실행하셔야 합니다:

>>> from ragcar import Ragcar

임포트 이후에는, 다음 명령어를 통해 현재 Ragcar 에서 지원하고 있는 Task를 확인하실 수 있습니다.

>>> from ragcar import Ragcar
>>> Ragcar.available_tools()
"Available tools are ['tokenization', 'sentence_embedding', 'sentence_similarity', 'semantic_search', 'text_generation', 'text_segmentation']"

Task 별로 어떠한 모델이 지원되는지 확인하기 위해서는 아래 과정을 거치시면 됩니다.

>>> Ragcar.available_models("text_generation")
'Available models for text_generation are ([src]: openai, [model]: gpt-4-turbo-preview, gpt-4, gpt-3.5-turbo, MODELS_SUPPORTED(https://platform.openai.com/docs/models)), ([src]: clova, [model]: YOUR_MODEL(https://www.ncloud.com/product/aiService/clovaStudio))'

특정 Task를 수행하고자 하실 때에는, tool 인자에 앞서 살펴본 도구명과 src 인자에 모델 종류를 넣어주시면 됩니다.

>>> from ragcar.utils import PromptTemplate
>>> prompt_template = PromptTemplate("사용자: {input} 수도는?\nAI:")

>>> generator = Ragcar(tool="text_generation", src="openai", prompt_template=prompt_template, formatting=True)

객체 생성 이후에는, 다음과 같이 입력 값을 넘겨주는 방식으로 사용이 가능합니다. 자세한 사용방법은 examples에서 각 Task 예제를 참고해주세요.

>>> generator(input="대한민국")
{
    'id': 'openai-dad4969f-6f0d-4413-a748-26d05cc0e73d', 
    'model': 'gpt-4-turbo-preview', 
    'content': '대한민국의 수도는 서울입니다.', 
    'finish_reason': 'stop', 
    'input_tokens': 23, 
    'output_tokens': 15, 
    'total_tokens': 38, 
    'predicted_cost': 0.0015899999999999998, 
    'response_time': 1.0608701705932617
}

⚠️ 환경변수 설정 방법

특정 src는 보안과 유지보수가 필요한 환경변수(ex. API Key)를 요구하며, 다음의 3가지 방법 중 하나로 설정할 수 있습니다:

.env 파일: 프로젝트 최상위 루트에 .env 파일을 생성하고 필요한 환경 변수 값을 입력합니다.

export: 터미널에서 필요한 환경변수를 직접 선언합니다.

export OPENAI_API_KEY='sk-...'

model 인자 값: 필요한 환경변수를 model 인자 값으로 직접 입력합니다. (기본 제공되는 model 외에 추가가 필요한 경우에도 동일하게 적용)

>>> Ragcar.available_customizable_src("text_generation")
"Available customizable src for text_generation are ['clova', 'openai']"

>>> Ragcar.available_model_fields("clova")
'Available fields for clova are ([field]: model_n, [type]: str), ([field]: api_key, [type]: str), ([field]: app_key, [type]: str)'

>>>generator = Ragcar(
    tool="text_generation", 
    src="clova", 
    model={
        "model_n": "YOUR_API_URL", 
        "api_key": "YOUR_APIGW-API-KEY",
        "app_key": "YOUR_CLOVASTUDIO-API-KEY"
    }, 
    prompt_template=prompt_template, 
    formatting=True
)
>>> generator(input="대한민국")
{
    'id': 'clova-3c241fa1-f01e-4738-b208-5bcb35daad42',
    'model': 'HCX-003',
    'content': '대한민국 수도는 서울입니다.',
    'finish_reason': 'stop_before',
    'input_tokens': 12,
    'output_tokens': 8,
    'total_tokens': 20,
    'predicted_cost': 0.6,
    'response_time': 0.7090704441070557,
    'ai_filter': []
 }

보다 상세한 활용 방법은 examples를 확인해 주세요!

⚠️ text_generation `Tool` 사용 시 주의사항

1. `predicted_cost`에 대한 주의사항

text_generation 도구를 사용할 때 predicted_cost는 사용한 API에 따라 다르게 계산됩니다. OpenAI의 경우 predicted_cost는 달러(USD) 로 계산되며, CLOVA는 원화(KRW) 로 계산됩니다. 이는 각 서비스의 과금 체계가 다르기 때문입니다. 현재 적용되는 모델에 따른 구체적인 과금 정보는 base.py 파일에서 확인할 수 있습니다.

2. 네이버 하이퍼클로바 사용 시 주의사항

text_generation tool을 clova src와 함께 사용할 때, 몇 가지 공식 Parameter 대비 변경된 사항에 주의해야 합니다:

파라미터 명 변경:
- top_k 대신 presence_penalty를 사용해 주세요.
- repeat_penalty 대신 frequency_penalty를 사용해 주세요.
파라미터 값 범위:
- 0.0 < temperature < 1.0
- 0.0 < top_p < 1.0
- 0 < presence_penalty < 128
- 0.0 < frequency_penalty < 10.0

⚠️ 구글 드라이브 모델 업로드 방법

sentence_embedding example 확인

Documentation

궁금한 사항이나 의견 등이 있으시다면 이슈를 남겨주세요.

Contributors

이원석

Acknowledgements

pororo

@misc{pororo,
  author       = {Heo, Hoon and Ko, Hyunwoong and Kim, Soohwan and
                  Han, Gunsoo and Park, Jiwoo and Park, Kyubyong},
  title        = {PORORO: Platform Of neuRal mOdels for natuRal language prOcessing},
  howpublished = {\url{https://github.com/kakaobrain/pororo}},
  year         = {2021},
}

sentence-transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
examples		examples
ragcar		ragcar
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

examples

examples

ragcar

ragcar

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

setup.py

setup.py

Repository files navigation

RAGCAR: Retrieval-Augmented Generative Companion for Advanced Research

Installation

Usage

⚠️ 환경변수 설정 방법

⚠️ text_generation `Tool` 사용 시 주의사항

1. `predicted_cost`에 대한 주의사항

2. 네이버 하이퍼클로바 사용 시 주의사항

⚠️ 구글 드라이브 모델 업로드 방법

Documentation

Contributors

Acknowledgements

About

Releases 4

Packages

Languages

License

leewaay/ragcar

Folders and files

Latest commit

History

Repository files navigation

RAGCAR: Retrieval-Augmented Generative Companion for Advanced Research

Installation

Usage

⚠️ 환경변수 설정 방법

⚠️ text_generation Tool 사용 시 주의사항

1. predicted_cost에 대한 주의사항

2. 네이버 하이퍼클로바 사용 시 주의사항

⚠️ 구글 드라이브 모델 업로드 방법

Documentation

Contributors

Acknowledgements

About

Topics

Resources

License

Stars

Watchers

Forks

Languages

⚠️ text_generation `Tool` 사용 시 주의사항

1. `predicted_cost`에 대한 주의사항