# Trane - A quick DEMO

Trane is a software package for automatically generating prediction problems and generating labels for supervised learning. This tutorial shows the workflow of Trane.

### Get Example Dataset
[Download a synthetic taxi dataset here](https://s3.amazonaws.com/hdi-demos/trane-demo/taxi_data.zip). Unzip the file and get the folder with the raw data `synthetic_taxi_data.csv` and the table metadata `taxi_meta.json`. Put the folder `taxi_data` in Trane or set the correct path in the cell below. 

### Generate Prediction Problems
We first import trane and other packages. We set data path and other parameters. 

In [4]:
import trane
import json

multiple_csv = ["taxi_data/synthetic_taxi_data.csv"] # path to multiple csv tables.
table_meta_json = "taxi_data/taxi_meta.json"         # path to table metadata. 

entity_id_column = 'taxi_id'        # Trane will generate a label of each entity in the entity_id_column.
label_generating_column = 'fare'    # Trane will use data in label_generating_column to generate labels. 
time_column = 'trip_id'             # time_column is used for cutoff time. 

We load table metadata, then create a PredictionProblemGenerator.

In [5]:
table_meta = trane.TableMeta(json.loads(open(table_meta_json).read()))
generator = trane.PredictionProblemGenerator(table_meta, entity_id_column, label_generating_column, time_column)


TypeError: list indices must be integers or slices, not str

We use the generator to generate 3 prediction problems. 

In [3]:
probs = []
for idx, prob in enumerate(generator.generate()):
    probs.append(prob)
    if idx + 1 == 3:
        break

We save prediction problems in to `prediction_problems.json`

In [4]:
prediction_problems_json = trane.prediction_problems_to_json_file(
    probs, table_meta, entity_id_column, label_generating_column, time_column, "prediction_problems.json")

trane.generate_nl_description(
    probs, table_meta, entity_id_column, label_generating_column, time_column, trane.ConstantIntegerCutoffTimes(0))

# with open("prediction_problems.json", "w") as f:
#     json.dump(json.loads(prediction_problems_json), f, indent=4, separators=(',', ': '))

['For each taxi_id, predict the first fare, after trip_id 0.',
 'For each taxi_id, predict the first fare, after trip_id 0.',
 'For each taxi_id, predict the first fare, after trip_id 0.']

### Check Prediction Problems and Tune HyperParameters
Now we should check saved prediction problems and set thresholds in field `param_values` for some operations.

Here is the truncated output. 
```
{
    "entity_id_column": "taxi_id",
    "time_column": "trip_id",
    "table_meta": ...,
    "prediction_problems": [
        {
            "operations": [
                {
                    "SubopType": "AllFilterOp",
                    "OpType": "FilterOpBase",
                    "param_values": {},
                    "column_name": "duration",
                    "iotype": [
                        "value",
                        "value"
                    ]
                },
                {
                    "SubopType": "IdentityRowOp",
                    "OpType": "RowOpBase",
                    "param_values": {},
                    "column_name": "fare",
                    "iotype": [
                        "value",
                        "value"
                    ]
                },
                {
                    "SubopType": "IdentityTransformationOp",
                    "OpType": "TransformationOpBase",
                    "param_values": {},
                    "column_name": "fare",
                    "iotype": [
                        "value",
                        "value"
                    ]
                },
                {
                    "SubopType": "FirstAggregationOp",
                    "OpType": "AggregationOpBase",
                    "param_values": {},
                    "column_name": "fare",
                    "iotype": [
                        "value",
                        "value"
                    ]
                }
            ]
        }, ...
    ],
    "label_generating_column": "fare"
}

```

### Load Problems and Generate Labels
We load multiple csvs and denormalize them into a Pandas DataFrame. We group them by entity ids. 
We show the first 5 records of entity taxi 0.

In [5]:
denormalized_dataframe = trane.csv_to_df(multiple_csv)
entity_to_data_dict = trane.df_group_by_entity_id(denormalized_dataframe, entity_id_column)
entity_to_data_dict[0].head(5)

Unnamed: 0,vendor_id,taxi_id,trip_id,distance,duration,fare,num_passengers
0,0,0,0,4.97,16.53,46.8,3
1,0,0,1,6.0,16.82,49.6,4
2,0,0,2,0.68,11.7,27.87,1
3,0,0,3,7.75,11.69,43.12,1
4,0,0,4,6.05,13.32,42.71,4


We apply a cutoff strategy. Here we simple use fixed cuttoff time. The cutoff time for all entities are 0.

In [6]:
entity_to_data_and_cutoff_dict = trane.ConstantIntegerCutoffTimes(0).generate_cutoffs(entity_to_data_dict)

Create a labeler and generate labels. 

In [7]:
labeler = trane.Labeler()
output = labeler.execute(entity_to_data_and_cutoff_dict, "prediction_problems.json")
output

{0: ([49.6, 49.6, 49.6], 0),
 1: ([20.45, 20.45, 20.45], 0),
 2: ([61.6, 61.6, 61.6], 0),
 3: ([58.52, 58.52, 58.52], 0),
 4: ([42.1, 42.1, 42.1], 0),
 5: ([58.66, 58.66, 58.66], 0),
 6: ([34.5, 34.5, 34.5], 0),
 7: ([34.05, 34.05, 34.05], 0),
 8: ([54.3, 54.3, 54.3], 0),
 9: ([44.55, 44.55, 44.55], 0),
 10: ([62.88, 62.88, 62.88], 0),
 11: ([30.83, 30.83, 30.83], 0),
 12: ([29.7, 29.7, 29.7], 0),
 13: ([50.56, 50.56, 50.56], 0),
 14: ([43.73, 43.73, 43.73], 0),
 15: ([29.93, 29.93, 29.93], 0),
 16: ([43.91, 43.91, 43.91], 0),
 17: ([34.35, 34.35, 34.35], 0),
 18: ([63.63, 63.63, 63.63], 0),
 19: ([51.86, 51.86, 51.86], 0),
 20: ([55.33, 55.33, 55.33], 0),
 21: ([59.03, 59.03, 59.03], 0),
 22: ([53.36, 53.36, 53.36], 0),
 23: ([38.05, 38.05, 38.05], 0),
 24: ([42.9, 42.9, 42.9], 0),
 25: ([27.53, 27.53, 27.53], 0),
 26: ([42.63, 42.63, 42.63], 0),
 27: ([62.42, 62.42, 62.42], 0),
 28: ([38.45, 38.45, 38.45], 0),
 29: ([60.36, 60.36, 60.36], 0),
 30: ([50.59, 50.59, 50.59], 0),
 31: ([4