## Running Llama 3 on Mac, Windows or Linux
This notebook goes over how you can set up and run Llama 3.1 locally on a Mac, Windows or Linux using [Ollama](https://ollama.com/).

### Steps at a glance:
1. Download and install Ollama.
2. Download and test run Llama 3.1
3. Use local Llama 3.1 via Python.
4. Use local Llama 3.1 via LangChain.


#### 1. Download and install Ollama

On Mac or Windows, go to the Ollama download page [here](https://ollama.com/download) and select your platform to download it, then double click the downloaded file to install Ollama.

On Linux, you can simply run on a terminal `curl -fsSL https://ollama.com/install.sh | sh` to download and install Ollama.

#### 2. Download and test run Llama 3

On a terminal or console, run `ollama pull llama3.1` to download the Llama 3.1 8b chat model, in the 4-bit quantized format with size about 4.7 GB.

Run `ollama pull llama3.1:70b` to download the Llama 3.1 70b chat model, also in the 4-bit quantized format with size 39GB.

Then you can run `ollama run llama3.1` and ask Llama 3.1 questions such as "who wrote the book godfather?" or "who wrote the book godfather? answer in one sentence." You can also try `ollama run llama3.1:70b`, but the inference speed will most likely be too slow - for example, on an Apple M1 Pro with 32GB RAM, it takes over 10 seconds to generate one token using Llama 3.1 70b chat (vs over 10 tokens per second with Llama 3.1 8b chat).

You can also run the following command to test Llama 3.1 8b chat:
```
 curl http://localhost:11434/api/chat -d '{
  "model": "llama3.1",
  "messages": [
    {
      "role": "user",
      "content": "who wrote the book godfather?"
    }
  ],
  "stream": false
}'
```

The complete Ollama API doc is [here](https://github.com/ollama/ollama/blob/main/docs/api.md).

#### 3. Use local Llama 3.1 via Python

The Python code below is the port of the curl command above.

In [None]:
import requests
import json

url = "http://localhost:11434/api/chat"

def llama3(prompt):
    data = {
        "model": "llama3.1",
        "messages": [
            {
              "role": "user",
              "content": prompt
            }
        ],
        "stream": False
    }
    
    headers = {
        'Content-Type': 'application/json'
    }
    
    response = requests.post(url, headers=headers, json=data)
    
    return(response.json()['message']['content'])

In [None]:
response = llama3("who wrote the book godfather")
print(response)

#### 4. Use local Llama 3.1 via LangChain

Code below use LangChain with Ollama to query Llama 3 running locally. For a more advanced example of using local Llama 3 with LangChain and agent-powered RAG, see [this](https://github.com/langchain-ai/langgraph/blob/main/examples/rag/langgraph_rag_agent_llama3_local.ipynb).

In [None]:
!pip install langchain

In [None]:
from langchain_community.chat_models import ChatOllama

llm = ChatOllama(model="llama3.1", temperature=0)
response = llm.invoke("who wrote the book godfather?")
print(response.content)
