# Getting Started
Read in the travel diary, filter to "complete" days, and do some basic summaries

## Setup

In [1]:
import sys, os
import numpy as np
import pandas as pd

Define and add the path of `rmove_utils`

In [2]:
PYTHONLIB = r'<root directory for rmove_utils>'
RMOVE_UTIL_DIR = os.path.join(PYTHONLIB,r'rmove_utils')
sys.path.insert(0, RMOVE_UTIL_DIR)
from rmove_utils.survey import Survey

Create a `config` object from the codebook

In [3]:
import rmove_utils.config as ruc
ruc.Config.from_excel(path=r'<Path to the codebook>',
                      config_file=r'<Codebook filename>')

Define the location of the survey, and the location to write any outputs

In [4]:
INDIR = r'<Path to rmove data>'
OUTDIR = r'<Output path>'

# Import the survey
Import the survey.  This will check each field in the codebook against fields in the data set.  It will also check the coded values against the valid codes identified in the codebook.  Currently, values are only validated for categorical variables, not continuous variables. 

In [5]:
survey = Survey(root=INDIR,
                household_file=r'household.tsv',
                person_file=r'person.tsv',
                trip_file=r'trip.tsv',
                day_file=r'day.tsv',
                vehicle_file=r'vehicle.tsv',
                location_file=r'location.tsv',
               )

found unexpected column wkdy_hh_weight_sp_owners in <class 'rmove_utils.households.Households'>.
did not find expected column wkdy_hh_weight_sp_owners  in <class 'rmove_utils.households.Households'>.
found unexpected column wkdy_person_weight_sp_owners in <class 'rmove_utils.persons.Persons'>.
did not find expected column person_exp_weight in <class 'rmove_utils.persons.Persons'>.
did not find expected column person_weight_day in <class 'rmove_utils.persons.Persons'>.
did not find expected column wkdy_person_weight_sp_owners  in <class 'rmove_utils.persons.Persons'>.
found unexpected column wkdy_trip_weight_sp_owners in <class 'rmove_utils.trips.Trips'>.
did not find expected column wkdy_trip_weight_sp_owners  in <class 'rmove_utils.trips.Trips'>.
found 49 unexpected value(s) A;B for column trip_quality_flag in <class 'rmove_utils.trips.Trips'>.
found 17 unexpected value(s) A;C for column trip_quality_flag in <class 'rmove_utils.trips.Trips'>.
found 17 unexpected value(s) A;D for colum

The survey contains an object for each file type.  For example, `survey.households`.  The imported data is stored in a Pandas DataFrame, `survey.households.data`. It also contains a human readable version of each table, for example `survey.households.human_readable`.

# Do some basic data maintenance
Filter to just days that are complete

In [6]:
survey.filter_complete_days()

Create some basic attribute summaries

In [7]:
survey.summarize(household_weights='wkdy_hh_weight_all_adults', 
                 person_weights='wkdy_person_weight_all_adults',
                 day_weights='wkdy_day_weight_all_adults',
                 trip_weights='wkdy_trip_weight_all_adults')

## Explore the data a bit
Check out the data as imported
Note: these are not shown because of potential PII.  Uncomment to explore your data internally.

In [8]:
#survey.households.data.head()

In [9]:
#survey.persons.data.head()

In [10]:
#survey.days.data.head()

In [11]:
#survey.trips.data.head()

In [12]:
#survey.vehicles.data.head()

Now check out the human readable versions

In [13]:
#survey.households.human_readable.head()

In [14]:
#survey.persons.human_readable.head()

In [15]:
#survey.days.human_readable.head()

In [16]:
#survey.trips.human_readable.head()

In [17]:
#survey.vehicles.human_readable.head()

In [18]:
#survey.locations.human_readable.head()

## Explore the `data_dicitonary` and `value_lookup`
What is `mode_type`, and what are it's valid values?

In [19]:
print(survey.data_dictionary['mode_type'])

field name: mode_type

description: Mode category

    1: Walk
    2: Bike
    3: Car
    4: Taxi
    5: Transit
    6: Schoolbus
    7: Other
    8: Shuttle/vanpool
    9: TNC
   10: Carshare
   11: Bikeshare
   12: Scooter share
   13: Long-distance passenger mode
-9998: Missing: Non-response
  995: Missing: Skip logic



What about just a dictionary of valid values to descriptions?

In [20]:
survey.trips.value_lookup['mode_type']

{1: 'Walk',
 2: 'Bike',
 3: 'Car',
 4: 'Taxi',
 5: 'Transit',
 6: 'Schoolbus',
 7: 'Other',
 8: 'Shuttle/vanpool',
 9: 'TNC',
 10: 'Carshare',
 11: 'Bikeshare',
 12: 'Scooter share',
 13: 'Long-distance passenger mode',
 -9998: 'Missing: Non-response',
 995: 'Missing: Skip logic'}

How many trips by each `mode_type`?

In [21]:
survey.trips.summary['mode_type']

Unnamed: 0_level_0,name,size,wkdy_trip_weight_all_adults
mode_type,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
-9998,Missing: Non-response,11913,0.0
1,Walk,51088,4798116.0
2,Bike,5151,457437.4
3,Car,76258,22693110.0
4,Taxi,231,49618.32
5,Transit,15563,1751710.0
6,Schoolbus,2,2902.281
7,Other,1239,245606.9
8,Shuttle/vanpool,1823,285599.9
9,TNC,5135,362419.1


How many people have X daily trips?

In [22]:
survey.days.summary['num_trips_day']

Unnamed: 0_level_0,name,size,wkdy_day_weight_all_adults
num_trips_day,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,,2764,677466.90922
1,,615,142111.8523
2,,4234,892062.55331
3,,2947,541744.36261
4,,4054,812971.65633
5,,3297,615923.19812
6,,3145,645434.39703
7,,2478,543908.86882
8,,1970,371903.77303
9,,1325,187004.21801
