### 8.1 HuggingFaceEndpoints

In [1]:
from dotenv import load_dotenv
import os

load_dotenv()

HUGGINGFACEHUB_API_KEY = os.getenv("HUGGINGFACEHUB_API_KEY")

os.environ["KMP_DUPLICATE_LIB_OK"] = "True"  # 라이브러리끼리의 충돌을 해결
os.environ["HUGGINGFACEHUB_API_TOKEN"] = HUGGINGFACEHUB_API_KEY

In [None]:
from langchain.llms.huggingface_endpoint import HuggingFaceEndpoint
from langchain.prompts import PromptTemplate

# HuggingFace 모델 선택
llm = HuggingFaceEndpoint(
    repo_id="mistralai/Mistral-7B-Instruct-v0.3",  # 선택한 모델의 ID
    temperature=0.5,
    huggingfacehub_api_token=HUGGINGFACEHUB_API_KEY,
)

prompt = PromptTemplate.from_template("What is the meaning of {word}?")

chain = prompt | llm
result = chain.invoke({"word": "potato"})
print(result)

Token will not been saved to git credential helper. Pass `add_to_git_credential=True` if you want to set the git credential as well.
Token is valid (permission: fineGrained).
Your token has been saved to C:\Users\USER\.cache\huggingface\token
Login successful


A potato is a starchy, tuberous crop from the Solanum tuberosum species. The plant is native to the Andean region of South America, and was the first vegetable to be cultivated for food in the New World.

Potatoes are grown worldwide and are a major source of food for many people. They come in a variety of shapes, sizes, and colors, and can be cooked in many different ways, such as boiling, baking, frying, and mashing.

Potatoes are a good source of carbohydrates, fiber, and various vitamins and minerals, including vitamin C, potassium, and vitamin B6. They are also low in fat and calories, making them a popular choice for people on weight-loss diets.

In addition to being a food source, potatoes have also been used for a variet

### 8.2 HuggingFacePipeline

In [2]:
import torch

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(torch.__version__)
print(device)
print(torch.cuda.device_count())

2.5.1+cu124
cuda
1


In [6]:
from langchain.llms.huggingface_pipeline import HuggingFacePipeline
from langchain.prompts import PromptTemplate

# HuggingFace 모델을 다운로드
llm = HuggingFacePipeline.from_model_id(
    model_id="openai-community/gpt2",  # 사용할 모델의 ID
    task="text-generation",  # 수행할 작업 지정
    device=0,  # 0: GPU, -1: CPU (default)
    
    # 파이프라인에 전달할 추가 인자 설정 
    pipeline_kwargs={
        "max_new_tokens":50
    }
)

prompt = PromptTemplate.from_template("A {word} is a")

chain = prompt | llm
result = chain.invoke({"word": "banana"})
print(result)

Using pad_token, but it is not set yet.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


A banana is a special type of fruit, typically about 20 percent larger than an ordinary banana and about 4 centimeters (5 meters) long. Because of this, bananas are usually cooked in large pots for a day or two or occasionally taken into the hospital after a surgery.


### 8.3 GPT4All

In [4]:
from gpt4all import GPT4All

model = GPT4All(
    model_name="Meta-Llama-3-8B-Instruct.Q4_0.gguf",
    model_path="./files/gpt4all",
) # downloads / loads a 4.66GB LLM

with model.chat_session():
    print(model.generate("How can I run LLMs efficiently on my laptop?", max_tokens=1024))

Downloading: 100%|██████████| 4.66G/4.66G [01:58<00:00, 39.5MiB/s]
Verifying: 100%|██████████| 4.66G/4.66G [00:58<00:00, 79.7MiB/s]


Large Language Models (LLMs) are powerful AI models that require significant computational resources to train and run. However, with some optimization techniques and hardware considerations, you can still run LLMs efficiently on your laptop:

1. **Choose the right model**: Select a smaller or more efficient language model variant, such as BERT-base instead of BERT-large.
2. **Use a GPU (if available)**: If your laptop has a dedicated graphics card, consider using it to accelerate computations. Many LLMs are designed to run on GPUs, which can significantly speed up processing times.
3. **Optimize model inputs**: Reduce the input sequence length or use attention-based models that process shorter sequences more efficiently.
4. **Use batch processing**: Process multiple examples simultaneously (batch size) instead of running single instances one by one. This can reduce overall computation time and memory usage.
5. **Take advantage of parallelization**: Utilize multi-core processors to run 

### 8.4 Ollama

### 8.5 Conclusions