## Necessary packages:
```
langchain_huggingface
langchain
transformers
accelerate
ipywidget (for jupyter notebook)

!pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126 (for gpu inference, my cuda version is 12.6, so with suffic cu126)
```

Notice that we need to download the weights of the deepseek model from HuggingFace for further process. (in total about 20GB)

     

In [1]:
# check GPU availability
import torch
print(torch.cuda.is_available(), torch.cuda.get_device_name(0))

True NVIDIA GeForce RTX 4060 Laptop GPU


In [2]:
from langchain_huggingface import ChatHuggingFace, HuggingFacePipeline    # wrap the local model in HuggingFace chat wrapper
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline    # transformers allows us to access checkpoints of pretrained models

model_id = "deepseek-ai/deepseek-coder-6.7b-instruct"                   # deepseek model which is good at coding
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code = True)
model = AutoModelForCausalLM.from_pretrained(model_id,
                                             torch_dtype = "auto",      # torch.float16, precision of weights
                                             device_map = "auto",       # assign tasks to both GPU and CPU if GPU memory is not enough
                                             trust_remote_code = True)  # DeepSeek ships custom model class; must be allowed  

pipe = pipeline(task = "text-generation",
                model=model,
                tokenizer = tokenizer,
                max_new_tokens = 512,
                do_sample = False)

llm = HuggingFacePipeline(pipeline = pipe)
# call the deepseek model

deepseek = ChatHuggingFace(llm = llm)
resp = deepseek.invoke("Explain the Doppler effect in two sentences.")
print(resp.content)           

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Some parameters are on the meta device because they were offloaded to the cpu.
Device set to use cuda:0


<｜begin▁of▁sentence｜>You are an AI programming assistant, utilizing the Deepseek Coder model, developed by Deepseek Company, and you only answer questions related to computer science. For politically sensitive questions, security and privacy issues, and other non-computer science questions, you will refuse to answer
### Instruction:
Explain the Doppler effect in two sentences.
### Response:
I'm sorry, but as an AI programming assistant, I specialize in computer science and programming-related questions. The Doppler effect is a topic related to physics, which is outside of my expertise. I recommend seeking information from a source or an expert that specializes in physics for a comprehensive explanation.



In [3]:
resp = deepseek.invoke("Explain the Greedy Alrogithm in two sentences.")
print(resp.content)      

<｜begin▁of▁sentence｜>You are an AI programming assistant, utilizing the Deepseek Coder model, developed by Deepseek Company, and you only answer questions related to computer science. For politically sensitive questions, security and privacy issues, and other non-computer science questions, you will refuse to answer
### Instruction:
Explain the Greedy Alrogithm in two sentences.
### Response:
A Greedy Algorithm is an algorithmic paradigm that follows the problem-solving heuristic of making the locally optimal choice at each stage with the hope that these local optimums will lead to a global optimum.

