# Using the SQE API

You can get nearly all the information in the SQE database directly via the HTTP API.

We will be adding a new API soon, but this one should remain available until I can update this.  I present here some basics for downloading transcriptions using the current HTTP API.

## First pull in the dependencies

In [None]:
import sys, pprint, json

try:
    import requests
except ImportError:
    !conda install --yes --prefix {sys.prefix} requests
    import requests
    
try:
    from genson import SchemaBuilder
except ImportError:
    !conda install --yes --prefix {sys.prefix} genson
    from genson import SchemaBuilder

pp = pprint.PrettyPrinter(indent=2)
api = "https://qumranica.org/Scrollery/resources/cgi-bin/scrollery-cgi.pl"

## Next get the login credentials

All requests to the SQE API require valid credentials.  You can get them like this:

In [None]:
r = requests.post(api, json={"transaction": "validateSession", "PASSWORD":"asdf", "USER_NAME":"test"})
session = r.json()['SESSION_ID']

## Making requests

All calls to the SQE API will use a `transaction` in the post request payload data.  This should be accompanied by the necessary data to perform that transaction.

### Finding all available scrolls

Try, for instance, downloading a list of scrolls with the `transaction` `getCombs`. You can also use the little python function `scrollIdByName` here to find a scroll_version_id in the API response by its scroll name.

In [None]:
r = requests.post(api, json={"transaction": "getCombs", "SESSION_ID":session})
scrolls = r.json()['results']

def scrollIdByName(name):
    sid = None
    for scroll in scrolls:
        if name == scroll['name']:
            sid = scroll['scroll_version_id']
            break
    return sid

selectedScroll = scrollIdByName('4Q51')

### Finding available cols/frags

The API transaction `getColOfComb` will send you all columns and fragments of a scroll in their canonical order—you must supply the desired `scroll_version_id`.

In [None]:
r = requests.post(api, json={"transaction": "getColOfComb", "scroll_version_id": selectedScroll, "SESSION_ID":session})
cols = r.json()['results']
print(json.dumps(cols, indent=2, sort_keys=True))
col2 = cols[1]

### transcriptions

There are several different ways to work with transcribed text.  After downloading it with the `transaction` `getSignStreamOfFrag`, you will want to serialize it into something more human freindly.  The transcriptions in the database are a DAG, but these initial API calls serialize it into an ordered array for you (we do have functionality to download the graph, but I will add more broad support for that later).

The schema of this output looks as follows:

In [None]:
r = requests.post(api, json={"transaction": "getSignStreamOfFrag", "scroll_version_id": selectedScroll, "col_id": col2['col_id'], "SESSION_ID":session})
text = r.json()['text']

builder = SchemaBuilder()
builder.add_object(text)
print(json.dumps(builder.to_schema(), indent=2, sort_keys=False))

The actual data looks like this:

In [None]:
print(json.dumps(r.json(), indent=2, sort_keys=False))

Since the data already comes in order, you could simply iterate over the lists to quickly see the text (note the helper functions at the beginning of the cell):

In [None]:
#The following helpers serialize each element to a list, since they could be either a scalar or list
def serializeChars(sign):
    if isinstance(sign['chars'], list):
        return sign['chars']
    else:
        return [sign['chars']]
def serializeCharLetters(char):
    if isinstance(char['sign_char'], list):
        return char['sign_char']
    else:
        return [char['sign_char']]  
def serializeCharAttributes(char):
    try:
        if isinstance(char['attributes'], list):
            return char['attributes']
        else:
            return [char['attributes']]
    except:
        return [] 
def serializeAttrValues(attr):
    if isinstance(attr['values'], list):
        #These are ordered so we can easily open and close HTML tags
        sortorder={
            "SCROLL_START":0, 
            "COLUMN_START":1, 
            "LINE_START":2, 
            "LINE_END":3, 
            "COLUMN_END":4, 
            "SCROLL_END":5
        }
        return sorted(attr['values'], key=lambda k: sortorder[k['attribute_value']])
    else:
        return [attr['values']]

#This function formats the output
def outputAllText():
    #Begin printing the output
    print(r.json()['text'][0]['scroll_name'])
    # Cycle through the cols/fragments
    for fragment in r.json()['text'][0]['fragments']:
        print(fragment['fragment_name'], end='')
        #Cycle through the lines
        for line in fragment['lines']:
            print('\n', line['line_name'], '\t', end='')
            #Cycle through the signs
            for sign in line['signs']:
                #Whether there is more than one sign possible, print the first
                char = serializeChars(sign)[0]
                letter = serializeCharLetters(char)[0]
                print(letter, end='')
                #Check the attributes (if there are any) to see if we have a space
                attrs = serializeCharAttributes(char)
                if len(attrs) > 0:
                    for attr in attrs:
                        values = serializeAttrValues(attr)
                        for value in values:
                            if value['attribute_value'] == 'SPACE':
                                print(' ', end='')
outputAllText()

The previous method does not do any advanced checking to see if signs are damaged or reconstructed.  It just prints the entirety of the transcribed text.

We could do a minimal output that only prints those transcribed characters which are fully visible (this information is transmitted in the `attribute_id` and `attribute_value` fields).

In [None]:
def outputMinimalText():
    #Begin printing the output
    print(r.json()['text'][0]['scroll_name'])
    # Cycle through the cols/fragments
    for fragment in r.json()['text'][0]['fragments']:
        print(fragment['fragment_name'], end='')
        #Cycle through the lines
        for line in fragment['lines']:
            print('\n', line['line_name'], '\t', end='')
            #Cycle through the signs
            for sign in line['signs']:
                #Whether there is more than one sign possible, print the first
                char = serializeChars(sign)[0]
                letter = serializeCharLetters(char)[0]
                #Check the attributes for damage and to see if we have a space
                attrs = serializeCharAttributes(char)
                damaged = False
                space = False
                if len(attrs) > 0:
                    for attr in attrs:
                        values = serializeAttrValues(attr)
                        for value in values:
                            if value['attribute_value'] == 'SPACE':
                                space = True
                            if (value['attribute_value'] == 'INCOMPLETE_BUT_CLEAR' 
                                or value['attribute_value'] == 'INCOMPLETE_AND_NOT_CLEAR') or (
                                attr['attribute_id'] == 6 and value['attribute_value'] == 'TRUE'):
                                damaged = True
                if not damaged:
                    print(letter, end='')
                    if space:
                        print(' ', end='')
                            
outputMinimalText()

You could also serialize this to HTML by reading the all of the attribute tags more closely and adding some nice CSS.

In [None]:
def outputHTMLText():
    print('<!DOCTYPE html>')
    print('<html>')
    print('<head>')
    print('\t<meta charset="UTF-8">')
    print('\t<title>SQE Transcription Output</title>')
    print("""
        <style>
            span.non-rcnst + span.reconstructed:before {
                content: '[';
            }
            span.reconstructed + span.non-rcnst:before {
                content: ']';
            }
            span.reconstructed:first-child:before {
                content: '[';
            }
            span.reconstructed:last-child:after {
                content: ']';
            }
        </style>
    """)
    print('</head>')
    print('\n<body>')
    #Begin printing the output
    print('\t<h1>', r.json()['text'][0]['scroll_name'], '</h1>')
    # Cycle through the cols/fragments
    for fragment in r.json()['text'][0]['fragments']:
        #Cycle through the lines
        for line in fragment['lines']:
            #Cycle through the signs
            for sign in line['signs']:
                #Whether there is more than one sign possible, print the first
                char = serializeChars(sign)[0]
                letter = serializeCharLetters(char)[0]
                #Check the attributes for damage and to see if we have a space
                attrs = serializeCharAttributes(char)
                damaged = False
                space = False
                if len(attrs) > 0:
                    for attr in attrs:
                        values = serializeAttrValues(attr)
                        for value in values:
                            if value['attribute_value'] == 'COLUMN_START':
                                print('\t<div dir="rtl">')
                                print('\t\t<h2>', fragment['fragment_name'], '</h2>')
                                print('\t\t<p>')
                            if value['attribute_value'] == 'COLUMN_END':
                                print('\t\t</p>')
                                print('\t</div>')
                            if value['attribute_value'] == 'LINE_START':
                                print('\t\t\t<div>')
                                print('\t\t\t\t<span class="line-name non-rcnst">', line['line_name'], '</span>')
                                print('\t\t\t\t<span>', end='')
                            if value['attribute_value'] == 'LINE_END':
                                print('</span>')
                                print('\t\t\t</div>')
                            if (value['attribute_value'] == 'INCOMPLETE_BUT_CLEAR' 
                                or value['attribute_value'] == 'INCOMPLETE_AND_NOT_CLEAR') or (
                                attr['attribute_id'] == 6 and value['attribute_value'] == 'TRUE'):
                                damaged = True
                            if value['attribute_value'] == 'SPACE':
                                print(' ', end='')
                            else:
                                if value['attribute_value'] == 'INCOMPLETE_BUT_CLEAR':
                                    print(f'<span class="incomplete-but-clear non-rcnst">{letter}ׄ</span>', end='')
                                elif value['attribute_value'] == 'INCOMPLETE_AND_NOT_CLEAR':
                                    print(f'<span class="incomplete-and-not-clear non-rcnst">{letter}֯</span>', end='')
                                elif attr['attribute_id'] == 6 and value['attribute_value'] == 'TRUE':
                                    print(f'<span class="reconstructed">{letter}</span>', end='')
                                elif value['attribute_value'] == 'ABOVE_LINE':
                                    print(f'<span class="non-rcnst"><sup>{letter}</sup></span>', end='')
                                elif value['attribute_value'] == 'BELOW_LINE':
                                    print(f'<span class="non-rcnst"><sub>{letter}</sub></span>', end='')
                else: print(f'<span class="non-rcnst">{letter}</span>', end='')
    print('</body>')
    print('</html>')
                            
outputHTMLText()