Using xtdb via http python requests.

In [1]:
import py_xtdb as xt
import pandas as pd

we can check if a server is running with `xt.status()`.

In [2]:
xt.status()

{'version': '1.21.0',
 'revision': 'db66ed6d3aa4e814ba34e988a5d898381dec6c81',
 'kvStore': 'xtdb.rocksdb.RocksKv',
 'estimateNumKeys': 1,
 'size': 29592,
 'indexVersion': 20,
 'consumerState': None}

(You'll probably have no keys if this is your first go)

We'll create some fake data to transact with python's faker.

In [3]:
from faker import Faker
import random 
fake = Faker()

def fake_doc():
    return {"name"    : fake.name()    ,
            "city"    : fake.city()    ,
            "state"   : fake.state()   ,
            "address" : fake.address() ,
            "xt/id"   : random.randint(1, 10),
            #"xt/id"      : fake.uuid4(),
            "observation-date": fake.date_time_between(start_date='-15yr', end_date='now').isoformat(),
            }

docs = [fake_doc() for _ in range(20)] 

In [4]:
pd.DataFrame(docs)

Unnamed: 0,name,city,state,address,xt/id,observation-date
0,Matthew Snyder,Port Christopher,Rhode Island,98912 Lauren Alley Apt. 104\nNorth Davidcheste...,2,2013-04-05T01:04:05
1,Monique Kaufman,Strongfort,Massachusetts,"255 Robert Lights Apt. 802\nPort Lisa, RI 99031",7,2019-07-25T10:56:13
2,Brian Anderson,Deleonberg,Maryland,"269 Tucker Heights Suite 011\nMartinville, OR ...",3,2015-10-26T01:26:58
3,Robert Johnson,New Kimberly,New Hampshire,"71617 Huang Ramp\nNorth Nathanielshire, RI 14862",4,2008-11-30T11:14:10
4,David Colon,East Richard,Texas,Unit 8047 Box 1890\nDPO AA 32252,7,2019-12-04T03:23:16
5,Kyle Watkins,Patrickberg,Maine,"1520 Lori Plaza\nLake Kelly, VT 77251",4,2010-05-07T00:54:12
6,Diana Fischer,West Jonathanport,South Carolina,"003 Rosario Ramp\nRayton, MS 93366",2,2021-06-26T16:33:13
7,Bradley Brown,Floresview,Florida,"93212 Carrillo Junctions\nCardenashaven, CA 50600",6,2011-06-03T09:31:14
8,Stephanie Leonard,Aaronville,Ohio,"668 Short Knolls\nMargaretfort, ME 97472",4,2012-11-22T12:08:29
9,Michael Terry,East Nicholas,California,"52143 Leah Run\nDaltonshire, NM 74899",3,2008-09-07T09:40:11


We can transact those documents into xtdb with the following: 

In [5]:
xt.submit_tx(host="http://localhost:3001", recs=docs)

{'txId': 0, 'txTime': '2022-05-17T03:59:31Z'}

And now we should see some key stats: 

In [6]:
xt.attribute_stats()

{'address': 20,
 'name': 20,
 'city': 20,
 'observation-date': 20,
 'state': 20,
 'xt/id': 20}

To query them something like this should work:

In [7]:
xt.query_edn(host="http://localhost:3001", 
             data="""
          {:query {:find [ (pull ?id [*])]         
         :where [[?id :xt/id]
                 [?id :name ?name]
                 [?id :address ?address]]
         :limit 2}}             
             """)  
    

[[{'address': '3608 Wendy Land Suite 886\nShanefort, WI 33849',
   'name': 'Joseph Dudley',
   'city': 'Christinefort',
   'observation-date': '2019-10-12T03:19:55',
   'state': 'Kentucky',
   'xt/id': 1}],
 [{'address': '54196 Patrick Garden Suite 682\nSouth Barryview, MT 56518',
   'name': 'Alexandria Hester',
   'city': 'East David',
   'observation-date': '2014-12-14T04:01:07',
   'state': 'Minnesota',
   'xt/id': 2}]]

If you pass `:keys` into an xtdb query this could be read directly into a dataframe:

In [8]:

pd.DataFrame(
xt.query_edn(host="http://localhost:3001", 
             data="""
          {
         :query {:find [?name ?city]         
         :keys [name city] 
         :where [[?id :city ?city]
                 [?id :name ?name]
                 ]
         :limit 200}}             
             """))

    

Unnamed: 0,name,city
0,Joseph Dudley,Christinefort
1,Alexandria Hester,East David
2,Michael Terry,East Nicholas
3,Stephanie Leonard,Aaronville
4,Bradley Brown,Floresview
5,William Brown,Brooksview
6,Lori Meadows,South Melissastad
7,Donald Kennedy,Robertton
8,Ashley Rodriguez,Andersonberg


You can grab a specific entity with `xt.entity`: 

In [9]:
our_doc = xt.entity_json(params={"eid-json":"1"})
our_doc

{'address': '3608 Wendy Land Suite 886\nShanefort, WI 33849',
 'name': 'Joseph Dudley',
 'city': 'Christinefort',
 'observation-date': '2019-10-12T03:19:55',
 'state': 'Kentucky',
 'xt/id': 1}

We can add some history for this id.  Add a fav number: 

In [10]:
our_doc['fav-number'] = 9


In [11]:
xt.submit_tx(recs=[our_doc,]) 

{'txId': 1, 'txTime': '2022-05-17T03:59:42Z'}

Now we can see some doc history:

In [12]:
xt.entity_json(params={"eid-json":"1",  "with-docs": "true", "history": "true", "sort-order": "desc"})

[{'txTime': '2022-05-17T03:59:42Z',
  'txId': 1,
  'validTime': '2022-05-17T03:59:42Z',
  'contentHash': '55f7d009bb175cddc9056f37f62db9d24471beef',
  'doc': {'address': '3608 Wendy Land Suite 886\nShanefort, WI 33849',
   'fav-number': 9,
   'name': 'Joseph Dudley',
   'city': 'Christinefort',
   'observation-date': '2019-10-12T03:19:55',
   'state': 'Kentucky',
   'xt/id': 1}},
 {'txTime': '2022-05-17T03:59:31Z',
  'txId': 0,
  'validTime': '2022-05-17T03:59:31Z',
  'contentHash': 'a1011eb66184023a7b1134349dc6065038914db1',
  'doc': {'address': '3608 Wendy Land Suite 886\nShanefort, WI 33849',
   'name': 'Joseph Dudley',
   'city': 'Christinefort',
   'observation-date': '2019-10-12T03:19:55',
   'state': 'Kentucky',
   'xt/id': 1}}]

----
Lets add some additional fake docs and try one of the lucene queries: 

In [15]:
from toolz import partition_all 

In [13]:
docs = [fake_doc() for _ in range(5000)] 

We might want to batch up some transactions and we could use toolz's partition; here i'm batching docs into partitions at most 300 long:

In [17]:
[xt.submit_tx(host="http://localhost:3001", recs=docs_batch) for docs_batch in partition_all(300, docs)] 

[{'txId': 2, 'txTime': '2022-05-17T04:02:12Z'},
 {'txId': 3, 'txTime': '2022-05-17T04:02:12Z'},
 {'txId': 4, 'txTime': '2022-05-17T04:02:12Z'},
 {'txId': 5, 'txTime': '2022-05-17T04:02:12Z'},
 {'txId': 6, 'txTime': '2022-05-17T04:02:12Z'},
 {'txId': 7, 'txTime': '2022-05-17T04:02:12Z'},
 {'txId': 8, 'txTime': '2022-05-17T04:02:12Z'},
 {'txId': 9, 'txTime': '2022-05-17T04:02:12Z'},
 {'txId': 10, 'txTime': '2022-05-17T04:02:12Z'},
 {'txId': 11, 'txTime': '2022-05-17T04:02:12Z'},
 {'txId': 12, 'txTime': '2022-05-17T04:02:12Z'},
 {'txId': 13, 'txTime': '2022-05-17T04:02:12Z'},
 {'txId': 14, 'txTime': '2022-05-17T04:02:13Z'},
 {'txId': 15, 'txTime': '2022-05-17T04:02:13Z'},
 {'txId': 16, 'txTime': '2022-05-17T04:02:13Z'},
 {'txId': 17, 'txTime': '2022-05-17T04:02:13Z'},
 {'txId': 18, 'txTime': '2022-05-17T04:02:13Z'}]

In [18]:
xt.attribute_stats()

{'address': 5021,
 'fav-number': 1,
 'name': 5021,
 'city': 5021,
 'observation-date': 5021,
 'state': 5021,
 'xt/id': 5021}

TODO Probably need to think about the equivelent of sql injection if you are doing anything like this with string manipulation.  It would probably be better to use XT's `:in` parameters.

In [37]:
search_term = "ford~"

edn_query = """
{:query 
{:find [?attribute
         ?score
         (pull ?id [*])]
  :keys [matched-attribute matched-score results]               
  :where [[(wildcard-text-search "%s") [[?id ?value ?attribute ?score]]]
          [?id :xt/id]                 
          ]
  :limit 30}  }""" % search_term 

results = xt.query_edn(host="http://localhost:3001", 
             data=edn_query)
display(results) 
pd.DataFrame(results)

[{'matched-attribute': 'address',
  'matched-score': 0.9839632511138916,
  'results': {'address': '1017 Jerry Road Apt. 381\nPort Sandramouth, VA 61208',
   'name': 'Michael Davis',
   'city': 'Josephside',
   'observation-date': '2020-12-08T00:18:55',
   'state': 'North Carolina',
   'xt/id': 2}},
 {'matched-attribute': 'name',
  'matched-score': 0.9678977131843567,
  'results': {'address': '890 Charles Cove\nSamanthashire, VA 60393',
   'name': 'Robert York',
   'city': 'Mariachester',
   'observation-date': '2010-07-12T02:11:07',
   'state': 'Virginia',
   'xt/id': 3}},
 {'matched-attribute': 'address',
  'matched-score': 0.7379724979400635,
  'results': {'address': '50177 Sanchez Fork Apt. 193\nWest Peterberg, MA 21099',
   'name': 'Jacqueline Khan',
   'city': 'East Lindseyton',
   'observation-date': '2010-02-13T07:12:13',
   'state': 'Nebraska',
   'xt/id': 8}}]

Unnamed: 0,matched-attribute,matched-score,results
0,address,0.983963,{'address': '1017 Jerry Road Apt. 381 Port San...
1,name,0.967898,"{'address': '890 Charles Cove Samanthashire, V..."
2,address,0.737972,{'address': '50177 Sanchez Fork Apt. 193 West ...
