# Feature Profiling with pyrasgo

This notebook explains how to use `pyrasgo` to create feature profiles of a `pandas` dataframe.

### Packages

This tutorial uses:
* [pandas](https://pandas.pydata.org/docs/)
* [statsmodels](https://www.statsmodels.org/stable/index.html)
    * [statsmodels.api](https://www.statsmodels.org/stable/api.html#statsmodels-api)
* [numpy](https://numpy.org/doc/stable/)
* [pyrasgo](https://app.gitbook.com/@rasgo/s/rasgo-docs/pyrasgo-0.1/dataframe-prep)

In [2]:
import statsmodels.api as sm
import pandas as pd
import numpy as np

import pyrasgo

## Connect to Rasgo

NB: This does not run as this has not yet been built

In [None]:
api_key = pyrasgo.register(email='<your email>')
rasgo = pyrasgo.connect(api_key)

## Reading the data

The data is from `rdatasets` imported using the Python package `statsmodels`.

In [4]:
df = sm.datasets.get_rdataset('flights', 'nycflights13').data

## Feature Engineering

### Convert the times from floats or ints to hour and minutes

Convert some of the fields into more meaningful fields to better understand the time flights depart and arrive.  Next the original fields are dropped as they are now redundant.

In [5]:
df.dropna(inplace=True)
df['arr_hour'] = df.arr_time.apply(lambda x: int(np.floor(x/100)))
df['arr_minute'] = df.arr_time.apply(lambda x: int(x - np.floor(x/100)*100))
df['sched_arr_hour'] = df.sched_arr_time.apply(lambda x: int(np.floor(x/100)))
df['sched_arr_minute'] = df.sched_arr_time.apply(lambda x: int(x - np.floor(x/100)*100))
df['sched_dep_hour'] = df.sched_dep_time.apply(lambda x: int(np.floor(x/100)))
df['sched_dep_minute'] = df.sched_dep_time.apply(lambda x: int(x - np.floor(x/100)*100))
df.rename(columns={'hour': 'dep_hour',
                   'minute': 'dep_minute'}, inplace=True)
df.drop(columns=['time_hour', 'dep_time', 'sched_dep_time', 'arr_time', 'sched_arr_time', 'dep_delay'], inplace=True)

## Profile Features

In [6]:
response = rasgo.df.evaluate.generate_profile(df)
response

yearmonthdayarr_delaycarrierflighttailnumorigindestair_timedistancedep_hourdep_minutearr_hourarr_minutesched_arr_hoursched_arr_minutesched_dep_hoursched_dep_minute


'Profile available at: https://app.rasgoml.com/dataframes/642742021051069569404/features'