# First Tutorial Notebook.
This Notebook covers forming a schema, cleaning the data, creating a table and saving it to disk as a table file.

Import the libraries.  If they are not pre installed on your system, install with
% pip install --extra-index-url https://pypi.engagelively.com galyleo
GalyleoTable is the table data structure
GALYLEO_NUMBER, GALYLEO_BOOLEAN, GALYLEO_STRING, GALYLEO_DATE, GALYLEO_DATETIME, GALYLEO_TIME_OF_DAY are string constants denoting the types

In [None]:
from galyleo.galyleo_table import GalyleoTable
from galyleo.galyleo_constants import GALYLEO_NUMBER, GALYLEO_BOOLEAN, GALYLEO_STRING, GALYLEO_DATE, GALYLEO_DATETIME, GALYLEO_TIME_OF_DAY

Read the file in, and extract the column names from the header

In [None]:
import csv
ufo_file = open('../ufos.csv', 'r')
reader = csv.reader(ufo_file)
header = next(reader)

Take a look at the header...

In [None]:
header

Put in the types for each column

In [None]:
column_types = [GALYLEO_NUMBER, GALYLEO_NUMBER, GALYLEO_NUMBER, GALYLEO_STRING, GALYLEO_STRING, GALYLEO_STRING, GALYLEO_STRING, GALYLEO_NUMBER]

Form the schema.  This has the name of each column, and the types above.

In [None]:
schema = [{"name": header[i].strip(), "type": column_types[i]} for i in range(len(header))]

Build the table data.  Start with a routine that strips whitespace from values and converts numbers to numbers

In [None]:
def clean_row(row):
    values = [entry.strip() for entry in row]
    int_indices = {0, 1, 2}
    return [float(values[i]) if i == 7 else int(values[i]) if i in int_indices else values[i] for i in range(len(values))]

Go over the data, adding each cleaned row.  Keep track of rows which fail for some reason (there should be none).

In [None]:
data = []
bad_rows = []
for row in reader:
    try:
        data.append(clean_row(row))
    except Exception:
        bad_rows.append(row)
    

Print out any bad rows.  This should show nothing.

In [None]:
bad_rows

Form the table, giving it a name. 

In [None]:
table = GalyleoTable("ufos")

Load the schema and data.   This is one of the three  load methods for Tables.  This one takes in a dictionary with two fields, the columns and rows.

In [None]:
table.load_from_dictionary({"columns": schema, "rows": data})

Write it out as a JSON file to disk, first closing the csv file.  By convention, table files have the suffix .gt.json

In [None]:
ufo_file.close()
table_as_json = table.to_json()
json_file = open("ufos.gt.json", "w")
json_file.write(table_as_json)
json_file.close()

Read it with your favorite JSON viewer