# How to use MongoDB with LinkML-Store

LinkML-Store provides a uniform interface across different backends. It allows you to write database-neutral code
and operations where it makes sense, and use database-specific code where you need it.

The best supported backend is duckdb. The next best is MongoDB.

This tutorial walks through using MongoDB via the Python interface. It is recommended you start first with the main
tutorial.



## Creating a client and attaching to a database

First we will create a client as normal:

In [1]:
from linkml_store import Client

client = Client()

Next we'll attach to a MongoDB instance. this assumes you have one running already.

You can connect in two ways:
1. Using a connection string
2. Using an existing MongoClient instance

### Connection using a string

In [ ]:
db = client.attach_database("mongodb://localhost:27017", "test")

In [3]:
db.handle

'mongodb://localhost:27017'

### Connection using an existing MongoClient

If you already have a MongoClient instance (for example, from an existing application), you can use it directly without having to create a new connection:

In [ ]:
# Now let's connect using an existing MongoClient
from pymongo import MongoClient

# Create a MongoClient instance directly
mongo_client = MongoClient("mongodb://localhost:27017/")

# Use the existing client with linkml-store
db2 = client.attach_mongodb_client(
    mongo_client=mongo_client,
    db_name="test2",
    alias="direct_mongo"
)

print(f"DB Handle: {db2.handle}")
print(f"DB Alias: {db2.alias}")
print(f"DB Name: {db2._db_name}")

This approach is particularly useful when:
1. You already have a MongoClient configured in your application
2. You need special authentication or connection settings
3. You want to reuse an existing connection pool

For the rest of this tutorial, we'll continue using the first database (`db`), but all operations work the same on both connection types.

In [ ]:
db.metadata.model_dump_json()

## Creating a collection

We'll create a simple test collection. The concept of collection in linkml-store maps directly to mongodb collections

In [5]:
collection = db.create_collection("test", recreate_if_exists=True)

## Preparing data to load

Next we'll parse an (incomplete) list of countries in JSON-Lines format:

In [6]:
COUNTRIES = "../../tests/input/countries/countries.jsonl"

In [7]:
from linkml_store.utils.format_utils import load_objects

objects = load_objects(COUNTRIES)

Let's check with pandas just to make sure it looks as expected:

In [8]:
import pandas as pd
pd.DataFrame(objects)

Unnamed: 0,name,code,capital,continent,languages
0,United States,US,"Washington, D.C.",North America,[English]
1,Canada,CA,Ottawa,North America,"[English, French]"
2,Mexico,MX,Mexico City,North America,[Spanish]
3,Brazil,BR,Brasília,South America,[Portuguese]
4,Argentina,AR,Buenos Aires,South America,[Spanish]
5,United Kingdom,GB,London,Europe,[English]
6,France,FR,Paris,Europe,[French]
7,Germany,DE,Berlin,Europe,[German]
8,Italy,IT,Rome,Europe,[Italian]
9,Spain,ES,Madrid,Europe,[Spanish]


## Inserting objects

We will call `insert` on the collection to add the objects. Note we haven't specified a schema - this will be induced.

In [9]:
collection.insert(objects)

Let's check this worked by querying:

In [10]:
qr = collection.find()

In [11]:
qr.rows_dataframe

Unnamed: 0,name,code,capital,continent,languages
0,United States,US,"Washington, D.C.",North America,[English]
1,Canada,CA,Ottawa,North America,"[English, French]"
2,Mexico,MX,Mexico City,North America,[Spanish]
3,Brazil,BR,Brasília,South America,[Portuguese]
4,Argentina,AR,Buenos Aires,South America,[Spanish]
5,United Kingdom,GB,London,Europe,[English]
6,France,FR,Paris,Europe,[French]
7,Germany,DE,Berlin,Europe,[German]
8,Italy,IT,Rome,Europe,[Italian]
9,Spain,ES,Madrid,Europe,[Spanish]


## Queries

We can specify key-value constraints:

In [12]:
qr = collection.find({"continent": "Europe"})

In [13]:
qr.rows_dataframe

Unnamed: 0,name,code,capital,continent,languages
0,United Kingdom,GB,London,Europe,[English]
1,France,FR,Paris,Europe,[French]
2,Germany,DE,Berlin,Europe,[German]
3,Italy,IT,Rome,Europe,[Italian]
4,Spain,ES,Madrid,Europe,[Spanish]


## Facet counts

We will now do a query fetching facet counts for all fields.

Unlike Solr, MongoDB doesn't facet natively but under the hood linkml-store implements the necessary logic

In [14]:
fc = collection.query_facets()

In [15]:
fc["continent"]

[('Europe', 5),
 ('Asia', 5),
 ('Africa', 3),
 ('North America', 3),
 ('Oceania', 2),
 ('South America', 2)]

## Creating an LLM embedding index

We will now attach an indexer. By default the `llm` indexer uses OpenAI so you will need a key:

In [16]:
collection.attach_indexer("llm")

We can now query using the index. Note that search terms need only be *semantically* related, they don't need to contain the same lexical elements

In [17]:
qr = collection.search("countries with a King or Queen")
qr.rows_dataframe

Unnamed: 0,score,name,code,capital,continent,languages
0,0.770891,United Kingdom,GB,London,Europe,[English]
1,0.758388,Australia,AU,Canberra,Oceania,[English]
2,0.754203,South Korea,KR,Seoul,Asia,[Korean]
3,0.750652,New Zealand,NZ,Wellington,Oceania,"[English, Māori]"
4,0.750419,United States,US,"Washington, D.C.",North America,[English]
5,0.748973,South Africa,ZA,Pretoria,Africa,"[Zulu, Xhosa, Afrikaans, English, Northern Sot..."
6,0.748322,Canada,CA,Ottawa,North America,"[English, French]"
7,0.746444,France,FR,Paris,Europe,[French]
8,0.745408,Germany,DE,Berlin,Europe,[German]
9,0.743449,Spain,ES,Madrid,Europe,[Spanish]


The precise ranking could be debated, but in terms of rough semantic distance the first answer is in the right ballpark, at the time of writing. 

In [18]:
qr.num_rows

20

In [19]:
qr.ranked_rows

[(0.7708908770614274,
  {'name': 'United Kingdom',
   'code': 'GB',
   'capital': 'London',
   'continent': 'Europe',
   'languages': ['English']}),
 (0.7583880255490492,
  {'name': 'Australia',
   'code': 'AU',
   'capital': 'Canberra',
   'continent': 'Oceania',
   'languages': ['English']}),
 (0.754202745445488,
  {'name': 'South Korea',
   'code': 'KR',
   'capital': 'Seoul',
   'continent': 'Asia',
   'languages': ['Korean']}),
 (0.7506523769140084,
  {'name': 'New Zealand',
   'code': 'NZ',
   'capital': 'Wellington',
   'continent': 'Oceania',
   'languages': ['English', 'Māori']}),
 (0.7504190890778679,
  {'name': 'United States',
   'code': 'US',
   'capital': 'Washington, D.C.',
   'continent': 'North America',
   'languages': ['English']}),
 (0.7489726600700292,
  {'name': 'South Africa',
   'code': 'ZA',
   'capital': 'Pretoria',
   'continent': 'Africa',
   'languages': ['Zulu',
    'Xhosa',
    'Afrikaans',
    'English',
    'Northern Sotho',
    'Tswana',
    'Southern 