# Saving and Reading a Kodexa document as a file

### Setting up Imports

For these examples, all we'll need is the Document and JsonDocumentStore modules from the kodexa package.

In [1]:
import os

from kodexa import Document, JsonDocumentStore

## Persisting as a file

Kodexa documents can be saved to disk by calling the Document's "to_kdxa(path)" function.

In [2]:

KDXA_TMP_PATH = '/tmp/kdxa_output/'

# It is recommended that you use the .kdxa file extension on your files.
test_file_location = os.path.join(KDXA_TMP_PATH, 'test_doc1.kdxa')

if not os.path.exists(KDXA_TMP_PATH):
    os.makedirs(KDXA_TMP_PATH)

# Create a basic document from text
new_doc = Document.from_text("This is my new document")

# Save this document to our location on disk
# you must provide the entire path to the file you're saving.
new_doc.to_kdxa(test_file_location)


## You can now find that document in the location of the test_file_location file and see that it's been persisted

## Reading from file

We'll read the document that was just saved into a new Kodexa Document instance

In [3]:
## Reading from a file

read_doc = Document.from_kdxa(test_file_location)

print(f'The restored document\'s contents is: \n{read_doc.get_root().get_all_content()}')

The restored document's contents is: 
This is my new document


## Saving in JsonDocumentStores

You can also create JsonDocumentStores and add documents to them outside of pipeline steps.  JsonDocumentStores are saved to disk as a set of files.

In [4]:

JSON_STORE_PATH = '/tmp/json_store_doc_ex/'

doc_1 = Document.from_text('The sun is very bright today.')
doc_2 = Document.from_text('Fluffy clouds float through the sky.')

# Create an empty store - if it already exists, we'll delete it by setting force_initialize to True
json_doc_store = JsonDocumentStore(JSON_STORE_PATH, force_initialize = True)

print(f'Upon initialization, there are {json_doc_store.count()} documents in the store.')

json_doc_store.add(doc_1)
json_doc_store.add(doc_2)
      
print(f'After adding the documents, there are now {json_doc_store.count()} documents in the store.')

Upon initialization, there are 0 documents in the store.
After adding the documents, there are now 2 documents in the store.


## The JsonDocumentStores are already saved as files

You can browse the path that was specified in the json_doc_store initialization and view the persisted files.

In [5]:
from os import listdir

for f in listdir(JSON_STORE_PATH):
    print(f)


index.idx
7592b89f-5fb0-4cfc-998e-ffb3855759f0.json
c14db2b8-8b59-49fc-be23-d14458da1e51.json


## Reading/loading Files from JsonDocumentStores

If you've created a JsonDocumentStore and want to read the Kodexa documents within it, you'll create an instance of a JsonDocumentStore and provide the location of the folder containing the index and document.json files.  Then you can read them by index or by id.

In [6]:

# We'll create a new instance of a JsonDocumentStore and use the JSON_STORE_PATH from the previous step

## We are not forcing initialization as we don't want to delete the contents
new_json_store = JsonDocumentStore(JSON_STORE_PATH)

print(f'There are {new_json_store.count()} documents in the store\n')

# We can get the documents in the store using their loction in the index.json (list)
doc_by_index = new_json_store.get(0)

print(f'Getting document by index.\n\tThe uuid of the document at index 0 is: {doc_by_index.uuid}\n')

## We can also get documents from the store using their UUID
doc_by_uuid = new_json_store.load(doc_by_index.uuid)

print(f'Getting document by uuid.\n\tThe contents of the document with uuid {doc_by_index.uuid} is: {doc_by_uuid.get_root().get_all_content()}\n')


There are 2 documents in the store

Getting document by index.
	The uuid of the document at index 0 is: 7592b89f-5fb0-4cfc-998e-ffb3855759f0

Getting document by uuid.
	The uuid contents of the document with uuid 7592b89f-5fb0-4cfc-998e-ffb3855759f0 is: The sun is very bright today.

