# Data Sinks and Stores

Sinks 


## Setup our imports
1. We'll be build pipelines to process our document, so we'll import Kodexa's Pipeline module
2. Import all the connector types that we plan to try out
3. Some of our parsing will need to happen in the could, so we'll import the KodexaPlatform and KodexaAction modules
4. All files that have been processed/parsed in Kodexa become Kodexa Documents, so we'll import that module as well.

We're also setting the CLOUD_URL value to the platform environment on which we want to perform our processing.

In [2]:
from kodexa import Document, Pipeline, KodexaPlatform, KodexaAction, InMemoryDocumentSink, JsonDocumentStore, TableDataStore, DictDataStore

CLOUD_URL = 'https://platform.kodexa.com' 

## Set Platform Environment and Access Token Credential

In the next cell, you'll be prompted to enter your access token that you've created in the environment specified by the CLOUD_URL.
If you haven't created a token already, follow the steps in our [Getting Started](https://developer.kodexa.com/org-management/manage-access-token) guide.

* Note:  The text you enter in the prompt field will be masked.  Once you're done entering the access token value, hit enter to complete the action in the cell.

In [3]:
import getpass

ACCESS_TOKEN = getpass.getpass("Enter access token:")

KodexaPlatform.set_url(CLOUD_URL)
KodexaPlatform.set_access_token(ACCESS_TOKEN)

Enter access token: ································


In [4]:
import os

# Setting up location of data folders and files
DATA_FOLDER = '_data'
TEXT_FOLDER = 'texts'
JSON_STORE_FOLDER = 'json_doc_stores'
TEXT_DATA_FILE = 'tongue_twister.txt'

TEXT_FOLDER_PATH = os.path.join(os.getcwd(), '..', DATA_FOLDER, TEXT_FOLDER)
FULL_PATH = os.path.join(TEXT_FOLDER_PATH, TEXT_DATA_FILE)
JSON_STORE_PATH = os.path.join(os.getcwd(), '..', DATA_FOLDER, JSON_STORE_FOLDER, 'text_json_store.json')

## Using an InMemoryDocumentSink


In [7]:
from kodexa import FolderConnector


# instantiate the store and provide the location of our already-prepared data
json_doc_store = JsonDocumentStore(store_path=JSON_STORE_PATH)



In [10]:

# Parse the first document and get it from the output
pipeline = Pipeline(FolderConnector(path=TEXT_FOLDER_PATH, file_filter='tongue_twister.txt'))
pipeline.add_step(KodexaAction(slug='kodexa/text-parser', options={}, attach_source=True))
pipeline.run()
kodexa_doc = pipeline.context.output_document


In [11]:
kodexa_doc.get_root().get_all_content()

'A flea and a fly got stuck in a flue.\nSaid the flea to the fly, "What shall we do?"\nSaid the fly, "Let us flee!"\nSaid the flea, "Let us fly!"\nSo they flew through a flaw in the flue.'

In [12]:

# add the first document
json_doc_store.add(kodexa_doc)



1

In [13]:
# Parse the second document and get it from the output
pipeline = Pipeline(FolderConnector(path=TEXT_FOLDER_PATH, file_filter='peter_piper.txt'))
pipeline.add_step(KodexaAction(slug='kodexa/text-parser', options={}, attach_source=True))
pipeline.run()
kodexa_doc = pipeline.context.output_document


In [14]:

# add the second document
json_doc_store.add(kodexa_doc)


2

In [15]:

# let's see how many documents we have
print(f'There are {json_doc_store.count()} documents in the store')

There are 2 documents in the store
