Using xtdb via http python requests.

In [1]:
import py_xtdb as xt
import pandas as pd

we can check if a server is running with `xt.status()`.

In [30]:
xt.status(host="http://localhost:3001")

{'version': '1.21.0',
 'revision': 'db66ed6d3aa4e814ba34e988a5d898381dec6c81',
 'kvStore': 'xtdb.rocksdb.RocksKv',
 'estimateNumKeys': 167017,
 'size': 12383093,
 'indexVersion': 20,
 'consumerState': None}

(You'll probably have no keys if this is your first go)

We'll create some fake data to transact with python's faker.

In [3]:
from faker import Faker
import random 
fake = Faker()

def fake_doc():
    return {"name"    : fake.name()    ,
            "city"    : fake.city()    ,
            "state"   : fake.state()   ,
            "address" : fake.address() ,
            "xt/id"   : random.randint(1, 10),
            #"xt/id"      : fake.uuid4(),
            "observation-date": fake.date_time_between(start_date='-15yr', end_date='now').isoformat(),
            }

docs = [fake_doc() for _ in range(20)] 

In [4]:
pd.DataFrame(docs)

Unnamed: 0,name,city,state,address,xt/id,observation-date
0,Matthew Snyder,Port Christopher,Rhode Island,98912 Lauren Alley Apt. 104\nNorth Davidcheste...,2,2013-04-05T01:04:05
1,Monique Kaufman,Strongfort,Massachusetts,"255 Robert Lights Apt. 802\nPort Lisa, RI 99031",7,2019-07-25T10:56:13
2,Brian Anderson,Deleonberg,Maryland,"269 Tucker Heights Suite 011\nMartinville, OR ...",3,2015-10-26T01:26:58
3,Robert Johnson,New Kimberly,New Hampshire,"71617 Huang Ramp\nNorth Nathanielshire, RI 14862",4,2008-11-30T11:14:10
4,David Colon,East Richard,Texas,Unit 8047 Box 1890\nDPO AA 32252,7,2019-12-04T03:23:16
5,Kyle Watkins,Patrickberg,Maine,"1520 Lori Plaza\nLake Kelly, VT 77251",4,2010-05-07T00:54:12
6,Diana Fischer,West Jonathanport,South Carolina,"003 Rosario Ramp\nRayton, MS 93366",2,2021-06-26T16:33:13
7,Bradley Brown,Floresview,Florida,"93212 Carrillo Junctions\nCardenashaven, CA 50600",6,2011-06-03T09:31:14
8,Stephanie Leonard,Aaronville,Ohio,"668 Short Knolls\nMargaretfort, ME 97472",4,2012-11-22T12:08:29
9,Michael Terry,East Nicholas,California,"52143 Leah Run\nDaltonshire, NM 74899",3,2008-09-07T09:40:11


We can transact those documents into xtdb with the following: 

In [4]:
xt.submit_tx(host="http://localhost:3001", docs=docs)

{'txId': 21, 'txTime': '2022-05-17T04:53:50Z'}

And now we should see some key stats: 

In [5]:
xt.attribute_stats()

{'address': 5261,
 'fav-number': 1,
 'name': 5261,
 'city': 5261,
 'observation-date': 5261,
 'state': 5261,
 'observation-date2': 220,
 'xt/id': 5261}

To query them something like this should work:

In [6]:
xt.query_edn(host="http://localhost:3001", 
             data="""
          {:query {:find [ (pull ?id [*])]         
         :where [[?id :xt/id]
                 [?id :name ?name]
                 [?id :address ?address]]
         :limit 2}}             
             """)  
    

[[{'address': '26590 Victoria Trail\nPort Bryce, AR 37777',
   'name': 'Katie Calhoun',
   'city': 'Matthewside',
   'observation-date': '2008-12-18T07:41:59',
   'state': 'Michigan',
   'xt/id': 1}],
 [{'address': '249 Freeman Roads\nEast Donna, CO 06556',
   'name': 'Sara Marsh',
   'city': 'South Victor',
   'observation-date': '2012-06-15T10:10:37',
   'state': 'Ohio',
   'xt/id': 2}]]

If you pass `:keys` into an xtdb query this could be read directly into a dataframe:

In [7]:

pd.DataFrame(
xt.query_edn(host="http://localhost:3001", 
             data="""
          {
         :query {:find [?name ?city]         
         :keys [name city] 
         :where [[?id :city ?city]
                 [?id :name ?name]
                 ]
         :limit 200}}             
             """))

    

Unnamed: 0,name,city
0,Katie Calhoun,Matthewside
1,Sara Marsh,South Victor
2,Jessica Bowers,South Latoyamouth
3,Amber Harvey,West Steve
4,Alan Gutierrez,Clarkhaven
5,Eric Walker,Janetview
6,Derek Rice,Mckaymouth
7,Diane Coleman,Christopherfort
8,Bailey Tucker,Lake Kathrynberg
9,David Cortez,Gallegosmouth


You can grab a specific entity with `xt.entity`: 

In [8]:
our_doc = xt.entity_json(params={"eid-json":"1"})
our_doc

{'address': '26590 Victoria Trail\nPort Bryce, AR 37777',
 'name': 'Katie Calhoun',
 'city': 'Matthewside',
 'observation-date': '2008-12-18T07:41:59',
 'state': 'Michigan',
 'xt/id': 1}

We can add some history for this id.  Add a fav number: 

In [9]:
our_doc['fav-number'] = 9


In [11]:
xt.submit_tx(docs=[our_doc,]) 

{'txId': 22, 'txTime': '2022-05-17T04:54:09Z'}

Now we can see some doc history:

In [14]:
pd.DataFrame(xt.entity_json(params={"eid-json":"1",  "with-docs": "false", "history": "true", "sort-order": "desc"}))

Unnamed: 0,txTime,txId,validTime,contentHash
0,2022-05-17T04:54:09Z,22,2022-05-17T04:54:09Z,db25cbb334f43fb942933bc1b219d02cc24e819b
1,2022-05-17T04:53:50Z,21,2022-05-17T04:53:50Z,40b3f649e0e79830d404d44c3511938990bd7b17
2,2022-05-17T04:53:07Z,19,2022-05-17T04:53:07Z,0121fac411c78136594cbd922b5b8f21571c288f
3,2022-05-17T04:02:13Z,18,2022-05-17T04:02:13Z,99217824097a03d9338f5467ebfc5a517818d977
4,2022-05-17T04:02:13Z,17,2022-05-17T04:02:13Z,77e2df34ac57fe086e2e6c8a354fa6ea23eaa4aa
5,2022-05-17T04:02:13Z,16,2022-05-17T04:02:13Z,0f0c72854f3eff41d92408f146ad822f548b90af
6,2022-05-17T04:02:13Z,15,2022-05-17T04:02:13Z,2e38c6c1b944c635e205f6d1e20368766b171ff0
7,2022-05-17T04:02:13Z,14,2022-05-17T04:02:13Z,c549b2ecd9dc8f7400f3dfadfd18ff304ead072b
8,2022-05-17T04:02:12Z,13,2022-05-17T04:02:12Z,e1f094ae276a67140e9598df9d25825fe0a1a9cd
9,2022-05-17T04:02:12Z,12,2022-05-17T04:02:12Z,5dc65d26b30ccb5a32c8b2bc13f3b76eae7c788c


----
Lets add some additional fake docs and try one of the lucene queries: 

In [15]:
from toolz import partition_all 

In [16]:
docs = [fake_doc() for _ in range(5000)] 

We might want to batch up some transactions and we could use toolz's partition; here i'm batching docs into partitions at most 300 long:

In [18]:
[xt.submit_tx(host="http://localhost:3001", docs=docs_batch) for docs_batch in partition_all(300, docs)] 

[{'txId': 23, 'txTime': '2022-05-17T04:54:58Z'},
 {'txId': 24, 'txTime': '2022-05-17T04:54:58Z'},
 {'txId': 25, 'txTime': '2022-05-17T04:54:58Z'},
 {'txId': 26, 'txTime': '2022-05-17T04:54:58Z'},
 {'txId': 27, 'txTime': '2022-05-17T04:54:58Z'},
 {'txId': 28, 'txTime': '2022-05-17T04:54:58Z'},
 {'txId': 29, 'txTime': '2022-05-17T04:54:58Z'},
 {'txId': 30, 'txTime': '2022-05-17T04:54:58Z'},
 {'txId': 31, 'txTime': '2022-05-17T04:54:58Z'},
 {'txId': 32, 'txTime': '2022-05-17T04:54:58Z'},
 {'txId': 33, 'txTime': '2022-05-17T04:54:58Z'},
 {'txId': 34, 'txTime': '2022-05-17T04:54:58Z'},
 {'txId': 35, 'txTime': '2022-05-17T04:54:58Z'},
 {'txId': 36, 'txTime': '2022-05-17T04:54:58Z'},
 {'txId': 37, 'txTime': '2022-05-17T04:54:58Z'},
 {'txId': 38, 'txTime': '2022-05-17T04:54:58Z'},
 {'txId': 39, 'txTime': '2022-05-17T04:54:58Z'}]

In [19]:
xt.attribute_stats()

{'address': 10262,
 'fav-number': 2,
 'name': 10262,
 'city': 10262,
 'observation-date': 10262,
 'state': 10262,
 'observation-date2': 220,
 'xt/id': 10262}

TODO Probably need to think about the equivelent of sql injection if you are doing anything like this with string manipulation.  It would probably be better to use XT's `:in` parameters.

In [29]:
search_term = "mouth~"

edn_query = """
{:query 
{:find [ ?score
         ?attribute
         ?value
         (pull ?id [*])]
  :keys [matched-score matched-attribute matched-value results]               
  :where [[(wildcard-text-search "%s") [[?id ?value ?attribute ?score]]]
          [?id :xt/id]                 
          ]
   :order-by [[?score :desc]]
  :limit 30}  }""" % search_term 

results = xt.query_edn(host="http://localhost:3001", 
             data=edn_query)

display(pd.DataFrame(results))
display(results) 

Unnamed: 0,matched-score,matched-attribute,matched-value,results
0,1.528647,city,South Steven,{'address': '84081 Alexis Mission Suite 868 An...
1,1.146486,city,North Zachary,"{'address': '3339 Troy Street West Colin, IL 4..."
2,1.146486,city,North Katelynport,"{'address': '1972 Vickie Common Frankmouth, VA..."
3,0.923634,address,"28507 Jason Rest\nSouth Cesar, CA 97825","{'address': '28507 Jason Rest South Cesar, CA ..."


[{'matched-score': 1.5286474227905273,
  'matched-attribute': 'city',
  'matched-value': 'South Steven',
  'results': {'address': '84081 Alexis Mission Suite 868\nAnitaport, TN 83881',
   'name': 'Jason Johnston',
   'city': 'South Steven',
   'observation-date': '2010-03-31T09:34:12',
   'state': 'New Hampshire',
   'xt/id': 4}},
 {'matched-score': 1.1464855670928955,
  'matched-attribute': 'city',
  'matched-value': 'North Zachary',
  'results': {'address': '3339 Troy Street\nWest Colin, IL 43076',
   'name': 'Kevin Cooke',
   'city': 'North Zachary',
   'observation-date': '2020-03-29T01:05:02',
   'state': 'Louisiana',
   'xt/id': 6}},
 {'matched-score': 1.1464855670928955,
  'matched-attribute': 'city',
  'matched-value': 'North Katelynport',
  'results': {'address': '1972 Vickie Common\nFrankmouth, VA 71681',
   'name': 'Rebecca Dunn',
   'city': 'North Katelynport',
   'observation-date': '2017-01-21T06:34:26',
   'state': 'Arkansas',
   'xt/id': 9}},
 {'matched-score': 0.923634