<a href="https://colab.research.google.com/github/ship9599/AI_TP/blob/main/Local_RAG_with_LlamaIndex_and_Microsoft_phi_3_via_Ollama.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

First you'll need to [install Ollama](https://ollama.com/download) on your local machine. Then you can run

`ollama run phi3`

This will take a while the first time as it will need to download the model weights, and you'll need a laptop with a fair amount of RAM (we tested on a 64GB Macbook pro, but it can probably run on smaller machines).

Next, install LlamaIndex and the dependencies we'll use for this demo:

In [None]:
%pip install llama-index-core
%pip install llama-index-llms-ollama
%pip install llama-index-embeddings-huggingface
%pip install wikipedia
%pip install llama-index-readers-wikipedia

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Collecting wikipedia
  Using cached wikipedia-1.4.0-py3-none-any.whl
Collecting beautifulsoup4 (from wikipedia)
  Using cached beautifulsoup4-4.12.3-py3-none-any.whl.metadata (3.8 kB)
Collecting soupsieve>1.2 (from beautifulsoup4->wikipedia)
  Using cached soupsieve-2.5-py3-none-any.whl.metadata (4.7 kB)
Using cached beautifulsoup4-4.12.3-py3-none-any.whl (147 kB)
Using cached soupsieve-2.5-py3-none-any.whl (36 kB)
Installing collected packages: soupsieve, beautifulsoup4, wikipedia
Successfully installed beautifulsoup4-4.12.3 soupsieve-2.5 wikipedia-1.4.0
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


Import your dependencies in Python:

In [None]:
from llama_index.llms.ollama import Ollama
from llama_index.core import Settings
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.readers.wikipedia import WikipediaReader
from llama_index.core import VectorStoreIndex

Initialize phi3 as your LLM, plus local embeddings via HuggingFace to avoid a call to OpenAI embeddings (which are the default).

In [None]:
phi3 = Ollama(model="phi3", request_timeout=30.0)

In [None]:
Settings.llm = phi3
Settings.chunk_size = 512
Settings.chunk_overlap = 75
Settings.embed_model = HuggingFaceEmbedding(
    model_name="BAAI/bge-small-en-v1.5"
)

Load some text from Wikipedia:

In [None]:
documents = WikipediaReader().load_data(["History of Trinidad and Tobago"])
vector_index = VectorStoreIndex.from_documents(documents)

Create a query engine:

In [None]:
query_engine = vector_index.as_query_engine(similarity_top_k=3)

And ask a question about the data:

In [None]:
response = query_engine.query("When was oil first discovered in Trinidad")
print(response)

The American civil engineer, Walter Darwent, along with his neighbour, Mr Lee Lum, drilled a successful well near Darwent's original one in 1893. However, after the death of Walter Darwent from yellow fever, major drilling operations began early in 1907 and by 1910, Trinidad and Tobago was producing oil at an annual rate of about 47,000 barrels (7,500 m³).


Tada! Totally local RAG with phi3.