<a href="https://colab.research.google.com/github/ryderwishart/biblical-machine-learning/blob/main/semantic_domains_overview.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Simple TSV Exploring Semantic Domains in Greek
This notebook is designed to load and explore the MACULA Greek semantic domains.

## Setup
Import the necessary libraries.

In [1]:
import pandas as pd
import os

## Download and Load Data
Here, download the TSV files using the `!wget` command and load them using pandas.

We need two files, namely the TSV data and also the dictionary of semantic domain labels, since semantic domains are encoded as numbers.

In [3]:
if 'macula-greek.tsv' not in [path for path in os.listdir()]:
    !wget -q 'https://raw.githubusercontent.com/Clear-Bible/macula-greek/main/Nestle1904/TSV/macula-greek.tsv'
if 'marble-domain-label-mapping.json' not in [path for path in os.listdir()]:
    !wget -q 'https://raw.githubusercontent.com/Clear-Bible/macula-greek/main/sources/MARBLE/SDBG/marble-domain-label-mapping.json'

Load the chosen TSV file

In [4]:
data = pd.read_csv('macula-greek.tsv', sep="\t")

## Create Semantic Domain Lookup Dictionary

In [5]:
# Import domain-label mapping
import json

# Open the JSON file
with open('marble-domain-label-mapping.json', 'r') as f:

    # Load the contents of the file as a dictionary
    domain_labels = json.load(f)

domain_labels['missing'] = 'no domain'

# Display the resulting dictionary
count = 0
for d, l in domain_labels.items():
    print(d, l)
    if count > 5:
        break
    count += 1

001 Geographical Objects and Features
001001 Universe, Creation
001002 Regions Above the Earth
001003 Regions Below the Surface of the Earth
001004 Heavenly Bodies
001005 Atmospheric Objects
001006 The Earth's Surface


## Search Domains by English Word

Here let's define a function to search through semantic domain labels and return several words for each matching domain.

In [57]:
def search_domains(data, domain_labels, label_substring, top_n):
    """
    Searches the values in domain_labels for matching substrings, and returns the top_n rows in data that match each domain label.

    Parameters:
    data (pandas.DataFrame): The DataFrame to search
    domain_labels (dict): A dictionary where the keys are numeric strings and the values are human readable labels
    label_substring (str): The substring to search for in the domain labels
    top_n (int): The number of matching rows to return for each domain label

    Returns:
    pandas.DataFrame: The top_n rows in data that match each unique domain label
    """

    label_substring_clean = ''.join([c for c in label_substring.lower() if c.isalpha()])
    # Find all the matching domain labels
    matching_domains = []
    for label in domain_labels.values():
        label_clean = ''.join([c for c in label.lower() if c.isalpha()])
        if label_substring in label_clean:
            matching_domains.append(label)

    # Initialize an empty DataFrame to store the results
    result = pd.DataFrame(columns=['text', 'gloss', 'domain', 'ref'])

    # Filter the data for rows where the domain label matches
    for domain in matching_domains:
        matching_rows = data[data['domain'].isin([k for k, v in domain_labels.items() if v == domain])].copy()

        # Replace the 'domain' column with its corresponding label from domain_labels
        matching_rows['domain'] = matching_rows['domain'].apply(lambda domain: domain_labels[domain])

        # Append the top_n matching rows with only the desired columns to the result DataFrame
        result = pd.concat([result, matching_rows[['text', 'gloss', 'domain', 'ref']].head(top_n)], ignore_index=True)

    return result


In [59]:
# Search for an english word (e.g., 'earth', 'save')

search_string = 'earth' # Change this string value to search for a different string

search_domains(data, domain_labels, search_string, 3)

Unnamed: 0,text,gloss,domain,ref
0,οὐρανοί,heavens,Regions Above the Earth,MAT 3:16!14
1,οὐρανῶν,heavens,Regions Above the Earth,MAT 3:17!6
2,οὐρανοῖς,heavens,Regions Above the Earth,MAT 5:12!11
3,γέενναν,hell,Regions Below the Surface of the Earth,MAT 5:22!37
4,γέενναν,hell,Regions Below the Surface of the Earth,MAT 5:29!33
5,γέενναν,hell,Regions Below the Surface of the Earth,MAT 5:30!31
6,κόσμου,world,The Earth's Surface,MAT 4:8!17
7,γῆν,earth,The Earth's Surface,MAT 5:5!8
8,γῇ,earth,The Earth's Surface,MAT 5:35!4
9,λίθων,stones,"Earth, Mud, Sand, Rock",MAT 3:9!20
