Using xtdb via http python requests.

In [1]:
import py_xtdb as xt
import pandas as pd

we can check if a server is running with `xt.status()`.

In [2]:
xt.status(host="http://localhost:3001")

{'version': '1.21.0',
 'revision': 'db66ed6d3aa4e814ba34e988a5d898381dec6c81',
 'kvStore': 'xtdb.rocksdb.RocksKv',
 'estimateNumKeys': 1,
 'size': 29592,
 'indexVersion': 20,
 'consumerState': None}

(You'll probably have no keys if this is your first go)

We'll create some fake data to transact with python's faker.

In [3]:
from faker import Faker
import random 
fake = Faker()

def fake_doc():
    return {"name"    : fake.name()    ,
            "city"    : fake.city()    ,
            "state"   : fake.state()   ,
            "address" : fake.address() ,
            #"xt/id"   : random.randint(1, 10),
            "xt/id"      : str(fake.uuid4()),
            "observation-date": fake.date_time_between(start_date='-15yr', end_date='now').isoformat(),
            }

docs = [fake_doc() for _ in range(20)] 

In [5]:
pd.DataFrame(docs).head()

Unnamed: 0,name,city,state,address,xt/id,observation-date
0,Courtney Swanson,North Heatherhaven,Florida,"3650 Huynh Tunnel\nLake Aliciafort, DC 49657",ce613502-0926-4957-975d-a067a36ccf5d,2018-08-26T10:06:47
1,Stephanie Colon,Makaylaburgh,North Carolina,"07405 Davis Trafficway Apt. 328\nSonyaside, SD...",9306eed4-f518-4afa-821a-4d84eafda027,2012-04-01T11:28:24
2,Melanie Boone,Brendachester,Connecticut,"097 Scott Route Apt. 250\nWest Reginaberg, NC ...",e984bce4-8218-4e0a-ad2d-109dae04c972,2012-01-30T10:40:06
3,Samantha Cameron,North John,Arizona,"2446 Alvarez Highway Suite 844\nNew Daniel, WA...",e2e35743-3c51-4136-90bd-7b55be258489,2022-02-16T00:03:40
4,Renee Salinas,Shelbyberg,Rhode Island,067 Wilson Villages Suite 251\nSouth Katrinavi...,3146f6c1-dfaa-479d-9480-ab21c8601c2d,2013-04-14T23:06:49


We can transact those documents into xtdb with the following: 

In [6]:
xt.submit_tx(host="http://localhost:3001", docs=docs)

{'txId': 0, 'txTime': '2022-05-17T05:56:25Z'}

And now we should see some key stats: 

In [7]:
xt.attribute_stats()

{'address': 20,
 'name': 20,
 'city': 20,
 'observation-date': 20,
 'state': 20,
 'xt/id': 20}

To query them something like this should work:

In [8]:
xt.query_edn(host="http://localhost:3001", 
             data="""
          {:query {:find [ (pull ?id [*])]         
         :where [[?id :xt/id]
                 [?id :name ?name]
                 [?id :address ?address]]
         :limit 2}}             
             """)  
    

[[{'address': 'Unit 4441 Box 0757\nDPO AP 77979',
   'name': 'Nicholas Morgan',
   'city': 'Port Jennifer',
   'observation-date': '2019-09-20T11:43:53',
   'state': 'Indiana',
   'xt/id': '0dd21fef-a88d-4237-9bcc-a143d5530cd7'}],
 [{'address': '047 Lauren Trace Apt. 159\nPort Nicholas, RI 65034',
   'name': 'Laura Liu',
   'city': 'South Rachelview',
   'observation-date': '2008-03-06T06:59:13',
   'state': 'Illinois',
   'xt/id': '247cb3c6-0f01-4b10-aee5-201a1a5497d1'}]]

If you pass `:keys` into an xtdb query this could be read directly into a dataframe:

In [9]:

pd.DataFrame(
xt.query_edn(host="http://localhost:3001", 
             data="""
          {
         :query {:find [?id ?name ?city]         
         :keys [id name city] 
         :where [[?id :city ?city]
                 [?id :name ?name]
                 ]
         :limit 5}}             
             """))

    

Unnamed: 0,id,name,city
0,0dd21fef-a88d-4237-9bcc-a143d5530cd7,Nicholas Morgan,Port Jennifer
1,247cb3c6-0f01-4b10-aee5-201a1a5497d1,Laura Liu,South Rachelview
2,2b48606e-a092-49a9-aac3-d125ab4ef4d1,Justin Roman,Lewisbury
3,3146f6c1-dfaa-479d-9480-ab21c8601c2d,Renee Salinas,Shelbyberg
4,46d907e9-20b9-43c2-b1b5-b9da9997b3de,Tim Howell,Davisland


You can grab a specific entity with `xt.entity`: 

In [11]:
our_doc = xt.entity_json(params={"eid":"0dd21fef-a88d-4237-9bcc-a143d5530cd7"})
our_doc

{'address': 'Unit 4441 Box 0757\nDPO AP 77979',
 'name': 'Nicholas Morgan',
 'city': 'Port Jennifer',
 'observation-date': '2019-09-20T11:43:53',
 'state': 'Indiana',
 'xt/id': '0dd21fef-a88d-4237-9bcc-a143d5530cd7'}

We can add some history for this id.  Add a fav number: 

In [12]:
our_doc['fav-number'] = 9


In [13]:
xt.submit_tx(docs=[our_doc,]) 

{'txId': 1, 'txTime': '2022-05-17T05:57:02Z'}

Now we can see some doc history:

In [17]:
pd.DataFrame(xt.entity_json(params={"eid":"0dd21fef-a88d-4237-9bcc-a143d5530cd7",  "with-docs": "true", "history": "true", "sort-order": "desc"}))

Unnamed: 0,txTime,txId,validTime,contentHash,doc
0,2022-05-17T05:57:02Z,1,2022-05-17T05:57:02Z,f4625c1ef512833097beb9674705fcd70a229ae4,"{'address': 'Unit 4441 Box 0757 DPO AP 77979',..."
1,2022-05-17T05:56:25Z,0,2022-05-17T05:56:25Z,85be985975996b645fc3ce0a74df7837f1e9e948,"{'address': 'Unit 4441 Box 0757 DPO AP 77979',..."


----
Lets add some additional fake docs and try one of the lucene queries: 

In [18]:
from toolz import partition_all 

In [19]:
docs = [fake_doc() for _ in range(5000)] 

We might want to batch up some transactions and we could use toolz's partition; here i'm batching docs into partitions at most 300 long:

In [20]:
[xt.submit_tx(host="http://localhost:3001", docs=docs_batch) for docs_batch in partition_all(300, docs)] 

[{'txId': 2, 'txTime': '2022-05-17T05:57:47Z'},
 {'txId': 3, 'txTime': '2022-05-17T05:57:47Z'},
 {'txId': 4, 'txTime': '2022-05-17T05:57:47Z'},
 {'txId': 5, 'txTime': '2022-05-17T05:57:47Z'},
 {'txId': 6, 'txTime': '2022-05-17T05:57:47Z'},
 {'txId': 7, 'txTime': '2022-05-17T05:57:47Z'},
 {'txId': 8, 'txTime': '2022-05-17T05:57:47Z'},
 {'txId': 9, 'txTime': '2022-05-17T05:57:47Z'},
 {'txId': 10, 'txTime': '2022-05-17T05:57:47Z'},
 {'txId': 11, 'txTime': '2022-05-17T05:57:47Z'},
 {'txId': 12, 'txTime': '2022-05-17T05:57:47Z'},
 {'txId': 13, 'txTime': '2022-05-17T05:57:47Z'},
 {'txId': 14, 'txTime': '2022-05-17T05:57:47Z'},
 {'txId': 15, 'txTime': '2022-05-17T05:57:47Z'},
 {'txId': 16, 'txTime': '2022-05-17T05:57:47Z'},
 {'txId': 17, 'txTime': '2022-05-17T05:57:47Z'},
 {'txId': 18, 'txTime': '2022-05-17T05:57:47Z'}]

In [21]:
xt.attribute_stats()

{'address': 5021,
 'fav-number': 1,
 'name': 5021,
 'city': 5021,
 'observation-date': 5021,
 'state': 5021,
 'xt/id': 5021}

TODO Probably need to think about the equivelent of sql injection if you are doing anything like this with string manipulation.  It would probably be better to use XT's `:in` parameters.

In [26]:
edn_query = """
{:query 
{:find [ ?score
         ?attribute
         ?value
         (pull ?id [*])]
  :keys [matched-score matched-attribute matched-value results]               
  :where [[(wildcard-text-search "sabrina~") [[?id ?value ?attribute ?score]]]
          [?id :xt/id]                 
          ]
   :order-by [[?score :desc]]
  :limit 10}  }""" 

results = xt.query_edn(host="http://localhost:3001", 
             data=edn_query)

display(pd.DataFrame(results))
display(results) 

Unnamed: 0,matched-score,matched-attribute,matched-value,results
0,4.210673,name,Sabrina Smith,{'address': '4803 Stokes Cliffs Suite 912 Bass...
1,4.210673,name,Sabrina Velazquez,{'address': '61841 Kristi Squares Suite 379 Sa...
2,4.210673,name,Sabrina Flores,"{'address': 'PSC 7590, Box 1898 APO AA 57686',..."
3,4.210673,name,Sabrina Cox,{'address': '6637 Anthony Track Apt. 203 Lake ...
4,4.210673,name,Sabrina Henderson,"{'address': '21849 Gomez Green Maxborough, NC ..."
5,4.210673,name,Sabrina Li,{'address': '84866 Logan Village Suite 222 Ric...
6,4.210673,name,Sabrina Gomez,"{'address': '345 Cameron Mission South Travis,..."
7,3.007623,city,North Katrina,{'address': '1246 Chaney Loaf Suite 545 Matash...
8,3.007623,city,South Katrina,"{'address': '143 Vance Run Mitchellport, MI 65..."
9,3.007623,name,Katrina Camacho,"{'address': '44773 Eric Brooks New Sarahview, ..."


[{'matched-score': 4.210672855377197,
  'matched-attribute': 'name',
  'matched-value': 'Sabrina Smith',
  'results': {'address': '4803 Stokes Cliffs Suite 912\nBasschester, MT 75858',
   'name': 'Sabrina Smith',
   'city': 'East Ethan',
   'observation-date': '2007-07-16T16:25:01',
   'state': 'Michigan',
   'xt/id': '275c2df7-0e52-49dc-bb9d-8bdc937ea0a6'}},
 {'matched-score': 4.210672855377197,
  'matched-attribute': 'name',
  'matched-value': 'Sabrina Velazquez',
  'results': {'address': '61841 Kristi Squares Suite 379\nSarahchester, OK 69894',
   'name': 'Sabrina Velazquez',
   'city': 'Samanthaberg',
   'observation-date': '2021-01-14T00:26:17',
   'state': 'Idaho',
   'xt/id': '7dc5be92-ee84-4612-8aec-cb855c0466f7'}},
 {'matched-score': 4.210672855377197,
  'matched-attribute': 'name',
  'matched-value': 'Sabrina Flores',
  'results': {'address': 'PSC 7590, Box 1898\nAPO AA 57686',
   'name': 'Sabrina Flores',
   'city': 'Brookeport',
   'observation-date': '2012-01-01T01:31:13',