## Install dependencies

In [None]:
! pip install "nucliadb-sdk<=2.42.1"
! pip install -U sentence-transformers
! pip install datasets
! pip install InstructorEmbedding

## Setup NucliaDB

- Run **NucliaDB** image:
```bash
docker run -it \
       -e LOG=INFO \
       -p 8080:8080 \
       -p 8060:8060 \
       -p 8040:8040 \
       -v nucliadb-standalone:/data \
       nuclia/nucliadb:latest
```
- Or install with pip and run:

```bash
pip install nucliadb
nucliadb
```

## Check everything's up and running

In [1]:
import requests
response = requests.get(f"http://0.0.0.0:8080")

assert response.status_code == 200, "Ups, it seems something is not properly installed"



## Load our data

Load and explore the prompt dataset

In [2]:
from datasets import load_dataset


dataset = load_dataset("flax-sentence-embeddings/stackexchange_titlebody_best_and_down_voted_answer_jsonl","outdoors")

  from .autonotebook import tqdm as notebook_tqdm
Found cached dataset stackexchange_titlebody_best_and_down_voted_answer_jsonl (/Users/ciniesta/.cache/huggingface/datasets/flax-sentence-embeddings___stackexchange_titlebody_best_and_down_voted_answer_jsonl/outdoors/1.1.0/a767719a162391b61f7fecca12b41572102b8cf2909d9c06f55eb7a70c7aa579)
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 194.67it/s]


In [3]:
dataset

DatasetDict({
    train: Dataset({
        features: ['title_body', 'upvoted_answer', 'downvoted_answer'],
        num_rows: 221
    })
})

## Load the models to generate embeddings

In this case we are using Instructor and MSMARCO

Instructor is an LLM to which we can indicate with instructions the kind of embeddings we want to generate

In [4]:
from sentence_transformers import SentenceTransformer
model_marco = SentenceTransformer('sentence-transformers/msmarco-MiniLM-L6-cos-v5')


In [5]:
from InstructorEmbedding import INSTRUCTOR
model_instructor = INSTRUCTOR('hkunlp/instructor-base')
instruction_query = "Represent the question for retrieving relevant outdoors related posts:"
instruction_posts= "Represent the outdoors related post for retrieval:"

load INSTRUCTOR_Transformer
max_seq_length  512


## Upload our data to NucliaDB


In [6]:
from nucliadb_sdk import KnowledgeBox,get_or_create


In [7]:
my_kb=get_or_create("my_outdoors_kb")


In [8]:
for row in dataset["train"]:
    my_kb.upload(
        text=row["upvoted_answer"],
        vectors={"ms-marco-vectors": model_marco.encode([row["upvoted_answer"]])[0],
                "instructor-vectors": model_instructor.encode([[instruction_posts,row["upvoted_answer"]]])[0],
                }
    )

Vectorset is not created, we will create it for you
Vectorset is not created, we will create it for you


## Let's compare our results


In [9]:
def print_results(model_name, results):
    print(f"---{model_name.upper()} RESULTS---")
    count=1
    for count, result in enumerate(results):
        if count>=3:
            break
        print(f"----- RESULT {count+1} -----")
        print("Similarity score:",'%.2f' %result.score)
        print("Result:",'%.450s' %result.text,"...\n")
    

In [10]:
query = "how to deal with cold weather?"

ms_marco_vectors=model_marco.encode([query])[0]
results_msmarco = my_kb.search(vector=ms_marco_vectors, vectorset="ms-marco-vectors")
print_results("MS_MARCO", results_msmarco)

---MS_MARCO RESULTS---
----- RESULT 1 -----
Similarity score: 0.57
Result: In the outdoors, the way to deal with cold is not to heat the environment, but to insulate yourself.  You say "snow line", so it appears you aren't asking about anything particularly cold.

Just get a proper sleeping bag rated for the temperature.  Since you are car camping, you can bring some extra supplies like blankets.  Get a sleeping bag rated for the normal or a bit above normal temperature, then add a blanket for the unusually cold nights. ...

----- RESULT 2 -----
Similarity score: 0.41
Result: If you are in a place that has streams, placing the beer in the water every time you take a break will cool them.
Place in a cold/cool body of water about 30 minutes before drinking will also help. 
If you don't have a body of water, wrap the individual cans/bottles in a wet towel in the shade, preferable where it is windy. Evaporation will cool the beers.

Last option will be to use some form of insulating contai

In [11]:
instructor_vectors=model_instructor.encode([[instruction_query,query]])[0]
results_instructor = my_kb.search(vector = instructor_vectors, vectorset="instructor-vectors", min_score=0)
print_results("INSTRUCTOR", results_instructor)

---INSTRUCTOR RESULTS---
----- RESULT 1 -----
Similarity score: 0.89
Result: In the outdoors, the way to deal with cold is not to heat the environment, but to insulate yourself.  You say "snow line", so it appears you aren't asking about anything particularly cold.

Just get a proper sleeping bag rated for the temperature.  Since you are car camping, you can bring some extra supplies like blankets.  Get a sleeping bag rated for the normal or a bit above normal temperature, then add a blanket for the unusually cold nights. ...

----- RESULT 2 -----
Similarity score: 0.88
Result: Buy an Overbag.  


Use the Overbag when it's too warm for your down bag
Use the down bag at and around -15C
Use the Overbag and Down Bag together when it's colder than -15C


AND if you want to get real fancy get a Vapour Barrier Liner and use all three together for expeditions and temps below -30C.  You now have all your bases covered! 

This is much more cost effective than to buy a summer bag, your -15 bag, 

In [12]:
query = "What to do if you get lost?"

ms_marco_vectors=model_marco.encode([query])[0]
results_msmarco = my_kb.search(vector=ms_marco_vectors, vectorset="ms-marco-vectors")
print_results("MS_MARCO", results_msmarco)


---MS_MARCO RESULTS---
----- RESULT 1 -----
Similarity score: 0.50
Result: With fog, the only thing you're losing is extended visibility. This shouldn't throw off your plan too much, unless you were navigating by watching far away landmarks.

If you were on a trail, stay on it. There's no need to wander around. If you can't see anything and traveling is becoming dangerous or you're not sure where you're going, then stop and wait for the fog to lift.

If it's getting dark, you might have to setup camp. Hope you have an e ...

----- RESULT 2 -----
Similarity score: 0.45
Result: Leave no Trace

The basic guideline is do not leave your feces anywhere that it can be discovered or uncovered in the future.

As far as upsetting the ecosystem equilibrium, good luck with that, there are much bigger things than you in the woods are that are indiscriminately defecating on the ground and in watercourses. It's less of a sanitary hazard to the environment than it is to other human beings.   

When di

In [13]:
instructor_vectors=model_instructor.encode([[instruction_query,query]])[0]
results_instructor = my_kb.search(vector = instructor_vectors, vectorset="instructor-vectors", min_score=0)
print_results("INSTRUCTOR", results_instructor)

---INSTRUCTOR RESULTS---
----- RESULT 1 -----
Similarity score: 0.90
Result: The VERY FIRST thing you need to do is to not panic.  Sit down for a minute or two and let your mind catch up to the fact you are lost.  Now, take out your map, compass, gps, or whatever and try to find your way back to where you DID know where you were.

If you can't figure out where the trail should be and you need to bushwhack, find a bit of a clearing, take bearings to nearby landmarks, and draw on your map (or use sticks if they are straight ...

----- RESULT 2 -----
Similarity score: 0.88
Result: With fog, the only thing you're losing is extended visibility. This shouldn't throw off your plan too much, unless you were navigating by watching far away landmarks.

If you were on a trail, stay on it. There's no need to wander around. If you can't see anything and traveling is becoming dangerous or you're not sure where you're going, then stop and wait for the fog to lift.

If it's getting dark, you might hav

In [14]:
query = "What to do if you run into animals on the trail?"

ms_marco_vectors=model_marco.encode([query])[0]
results_msmarco = my_kb.search(vector=ms_marco_vectors, vectorset="ms-marco-vectors")
print_results("MS_MARCO", results_msmarco)


---MS_MARCO RESULTS---
----- RESULT 1 -----
Similarity score: 0.51
Result: The VERY FIRST thing you need to do is to not panic.  Sit down for a minute or two and let your mind catch up to the fact you are lost.  Now, take out your map, compass, gps, or whatever and try to find your way back to where you DID know where you were.

If you can't figure out where the trail should be and you need to bushwhack, find a bit of a clearing, take bearings to nearby landmarks, and draw on your map (or use sticks if they are straight ...

----- RESULT 2 -----
Similarity score: 0.50
Result: Bears don't generally like people, and the ones who do are usually going to be more interested in dumpsters and campgrounds than a random boat on the river. The likelihood of ever getting into a situation where you have to fend off a bear attack on the water is absurdly small. Bears are usually either crossing water to get somewhere else and want nothing to do with you, or else they're fishing... and want nothing 

In [15]:
instructor_vectors=model_instructor.encode([[instruction_query,query]])[0]
results_instructor = my_kb.search(vector = instructor_vectors, vectorset="instructor-vectors", min_score=0)
print_results("INSTRUCTOR", results_instructor)

---INSTRUCTOR RESULTS---
----- RESULT 1 -----
Similarity score: 0.90
Result: Hyenas aren't closely related to wild dogs (1), but a predator is a predator is a predator, and running from a predator says "I am prey."  

How worried should you be if you come upon hyenas?  The first thing to note is: are they striped or spotted?  If they are striped, they may be more interested in your oranges (2) than you.  If they are spotted, be worried.  

"The striped hyena is primarily a scavenger, though it will occasionally attack and ...

----- RESULT 2 -----
Similarity score: 0.90
Result: Wild dogs can indeed be dangerous, and packs can be extremely dangerous.  You do not want to take on a pack of dogs if you can at all avoid it, and running is often a particularly bad idea.

I suggest starting with the basics: try to keep the dog calm and don't try to intimidate it.  This means:


Don't make direct eye contact, and remember that sunglasses look like large unblinking eyes.
Don't smile (it bares y