In [1]:
from main import init_system
api = init_system("/Users/ra-mit/development/discovery_proto/test/network.pickle")

Loading: */Users/ra-mit/development/discovery_proto/test/network.pickle*

### Help Menu

You can use the system through an **API** object. API objects are returnedby the *init_system* function, so you can get one by doing:

***your_api_object = init_system('path_to_stored_model')***

Once you have access to an API object there are a few concepts that are useful to use the API. **content** refers to actual values of a given field. For example, if you have a table with an attribute called __Name__ and values *Olu, Mike, Sam*, content refers to the actual values, e.g. Mike, Sam, Olu.

**schema** refers to the name of a given field. In the previous example, schema refers to the word__Name__ as that's how the field is called.

Finally, **entity** refers to the *semantic type* of the content. This is in experimental state. For the previous example it would return *'person'* as that's what those names refer to.

Certain functions require a *field* as input. In general a field is specified by the source name (e.g. table name) and the field name (e.g. attribute name). For example, if we are interested in finding content similar to the one of the attribute *year* in the table *Employee* we can provide the field in the following way:

field = ('Employee', 'year') # field = [<source_name>, <field_name>)


# Discovery Primitives

### Keyword, schema and entity search

I have a question: **"How many subjects does CSAIL offer to students?"**

**"I will start searching for files that contain the word *CSAIL*"**

This command will search among thousands of tables that are spread along the campuses data storage systems, and report every single one that contains the word *CSAIL*.

In [4]:
res = api.keyword_search("CSAIL")
api.output(res)

source: Fac_rooms.csv					 field: Organization Name
source: Fclt_rooms.csv					 field: Organization Name


**"Maybe some people use *computer science* instead of CSAIL. I will also search for files that contain the words *computer science*."**

In [5]:
res = api.keyword_search("computer science")
api.output(res)

source: short_cis_course_catalog.csv					 field: Department Name
source: Hr_faculty_roster.csv					 field: Hr Org Unit Title
source: Mit_student_directory.csv					 field: Department Name


**"I just realized that *computer science* can refer to a department, but may also refer to a subject. I want to retrieve any data that refers only to department."**

Instead of searching for data, now I want to search for the name of the file, or the name of the table in the database. The following command does just that.

In [7]:
res = api.schema_search("department", max_results=10)
api.output(res)

source: Mit_student_directory.csv					 field: Department
source: Student_degree_program.csv					 field: Department
source: Library_course_instructor.csv					 field: Department
source: Sis_course_description.csv					 field: Department
source: short_cis_course_catalog.csv					 field: Department Code
source: short_cis_course_catalog.csv					 field: Department Name
source: subject_grouping_slice.csv					 field: Department Code
source: Sis_department.csv					 field: Department Name
source: Sis_course_description.csv					 field: Department Name
source: short_subject_summary.csv					 field: Department Code


**"I have a bunch of files that refer to CSAIL. Now I need data about employees, more specifically about their gender."**

In [15]:
res = api.schema_search("Subject", max_results=10)
api.output(res)

source: short_cis_course_catalog.csv					 field: Subject Code
source: short_drupal_course_catalog.csv					 field: Subject Id
source: short_drupal_course_catalog.csv					 field: Subject Code
source: short_subject_enrollable.csv					 field: Subject Id
source: Library_reserve_matrl_detail.csv					 field: Subject Id
source: short_course_catalog_subject_offered.csv					 field: Meets With Subjects
source: short_subjects_offered.csv					 field: Subject Title
source: short_cis_course_catalog.csv					 field: Subject Id
source: short_cis_course_catalog.csv					 field: Subject Description
source: short_drupal_course_catalog.csv					 field: Subject Number


In [None]:
res = api.entity_search("organization")
api.output(res)

### Content, schema, entity similarities

In [None]:
field = ("Mit_student_directory.csv", "Full Name")
res = api.similar_content_fields(field)
api.output(res)

In [None]:
field = ("Mit_student_directory.csv", "Full Name")
res = api.similar_schema_fields(field)
api.output(res)

In [None]:
field = ("Se_person.csv", "Full Name")
res = api.similar_entities_fields(field)
api.output(res)

# Combining Primitives

### AND

In [None]:
r1 = api.schema_search("department", max_results=50)
r2 = api.entity_search("organization", max_results=50)
res = api.and_conjunctive(r1, r2)
api.output(res)

### OR

In [None]:
r1 = api.keyword_search("Madden", max_results=50)
r2 = api.keyword_search("Stonebraker", max_results=50)
res = api.or_conjunctive(r1, r2)
api.output(res)

# Discovery Functions

### Join path

In [None]:
field1 = ("Fclt_rooms.csv", "Building Room")
field2 = ("Fac_rooms.csv", "Room")
res = api.join_path(field1, field2)
api.output(res)

### Find matching schema

In [None]:
sn = "Fclt_organization.csv"
res = api.schema_complement(sn)
api.output(res)

### Find tables matching schema

In [None]:
res = api.find_tables_matching_schema("name, department", 10)
api.output_raw(res)

In [None]:
res = api.find_tables_matching_schema("course, department", 30)
api.output_raw(res)

In [None]:
res = api.find_tables_matching_schema("Table, type", 40)
api.output_raw(res)