# DIALITE: Discover, Align and Integrate open Data Tables

Import the necessary libraries

In [5]:
import pandas as pd
import dialite_server as dialite
import requests
import json
import time
import glob

## Step 1: Discover
The first step of DIALITE is to search for the related tables from open data repository. DIALITE offers state-of-the-art table search techniques to find the joinable, unionable or related tables from the open data repositories.

In [2]:
# Upload the query table
#todo: use tkinter to upload file using GUI
print("Select Query table")
filelocation = "data/query/stadiums_0.csv"
file_name=filelocation.split("/")[-1]
print("Query table name:", file_name)
query_table = pd.read_csv(filelocation, encoding="latin-1", on_bad_lines="skip")
query_table.head(5)

Select Query table
Query table name: stadiums_0.csv


Unnamed: 0,Player,Position,Team
0,Kyler Murray,QB,Cardinals
1,Tom Brad,QB,Buccaneer
2,Joel Bitonio,G,Browns
3,CeeDee Lamb,WR,Cowboys
4,Jason Kelce,C,Eagles


The next step is to select the technique for table discovery. In this demo, we will use JOSIE for joinable table search and SANTOS for unionable table search. However, the user can easily add new table discovery systems to DIALITE.

In [3]:
print("Select table discovery algorithm.")
print("Enter 1 for SANTOS (unionable table search), 2 for JOSIE (joinable table search) and 3 for both available algorithms.")
available_algorithms = ['SANTOS', 'JOSIE']
selected_algorithms = set() #to be completed
algorithm = int(input())
print("Selected algorithm:")
if algorithm > len(available_algorithms):
    print(available_algorithms)
    for each_algorithm in available_algorithms:
        selected_algorithms.add(each_algorithm)
else:
    print(available_algorithms[algorithm-1])
    selected_algorithms.add(available_algorithms[algorithm-1])

Select table discovery algorithm.
Enter 1 for SANTOS (unionable table search), 2 for JOSIE (joinable table search) and 3 for both available algorithms.
Selected algorithm:
['SANTOS', 'JOSIE']


For both algorithms, the user needs to provide the value of k to search for top-k tables. In this demo, we will use k = 1 for SANTOS and k = 2 for JOSIE. 
Also, SANTOS needs user to specify the intent column and JOSIE needs user to specify a query column. We use Player as intent column and Stadium as query column.
The search results are stored in /dialite/data/integration-set. Note that the query table is also included in the integration set.

In [4]:
#Apply Table discovery algorithms.
search_results = set()
if "SANTOS" in selected_algorithms:
    print("Enter the value of k for SANTOS:")
    k = int(input())
    print("Enter index of intent column:")
    intent_column = int(input())
    dialite.QuerySANTOS(query_table, intent_column, k)

if "JOSIE" in selected_algorithms:
    print("Enter the value of k for JOSIE:")
    k = int(input())
    print("Enter index of query column:")
    query_column = int(input())
    dialite.QueryJOSIE(query_table, query_column, k)

Enter the value of k for SANTOS:
Enter index of intent column:
SANTOS top-1 added to the integration set.
Enter the value of k for JOSIE:
Enter index of query column:
JOSIE top-1 added to the integration set.


In [10]:
print("The integration set contains the following tables:")
integration_set = glob.glob("data/integration-set/*")
for each_table in integration_set:
    print(each_table)

The integration set contains the following tables:
data/integration-set/stadiums_0.csv
data/integration-set/stadiums_1.csv
data/integration-set/stadiums_2.csv
data/integration-set/stadiums_3.csv


## Step 2: Align and Integrate
In this step, DIALITE uses ALITE, a new table integration algorithm to integrate the discovered tables. The input for this step is the set of tables to be integrated (integration set) stored in /dialite/data/integration-set and the output is integration result stored in /dialite/data/integration-result/alite_fd_*.csv where, * is replaced by the name of integration set derived from the query table name.

In [14]:
dialite.FindIntegrationIDs()
dialite.ApplyALITEIntegration()

#For comparison, we also integrate the tables using outer join.
dialite.ApplyOuterJoinIntegration()

Alignment task completed.
Integrated table using ALITE.
Integrated table using Outer join.
