# Uraniborg with preprocessed data
![](https://ventrafiken.se/wp-content/uploads/2020/02/ura-20130125-655x311-1.png.webp)

I have added a kedro pipeline to do the preprocessing of data:
* **Join Files Node** : join all data from all files.
* **Preprocess Node**
* **Preprocess Trip Columns Node**

Some trip statistics is also calculated.
You will need to run this pipeline to get the new datasets.

* Some new dependencies are needed which can be installed with: In the project folder type: ```kedro install```
* Then type: ```kedro run``` this should do the preprocessing and store the preprocessed data. How to access these new datasets is shown below.

![](pipeline.PNG)

In [None]:
%load_ext autoreload
%autoreload 2

from d2e2f.visualization.visualize import plot_map, plot_trips
import pandas as pd

## Loading the data

In [None]:
df_raw = catalog.load('uraniborg.preprocessed_data')
df_raw['aw'].isnull().count()

In [None]:
df_raw.plot(y='aw')

In [None]:
%reload_kedro
#loaders = catalog.load(f'uraniborg.raw_data')

df = catalog.load('uraniborg.data_with_trip_columns')
df = df.iloc[-200000:-100000].copy()  # Taking just a few samples for demo

In [None]:
df.plot(y='aw')

In [None]:
df.head()

In [None]:
df.columns

In [None]:
df['Engine load ME1 (%)']*1397

In [None]:
df.longitude.quantile(0.10)

In [None]:
df.longitude.quantile(0.90)

And here is a nice map of the data. You may have to install "folium" to get this to work:

```pip install folium```

In [None]:
plot_trips(df=df, zoom_start=13)

## One trip

In [None]:
trips = df.groupby('trip_no')
trip_no = list(trips.groups.keys())[-199]
trip = trips.get_group(trip_no)

trip.head()

In [None]:
plot_map(trip)

## Statistics

In [None]:
df_statistics = catalog.load("uraniborg.trip_statistics_clean")

In [None]:
df_statistics.describe()

In [None]:
df_statistics.hist(column='sog', bins=100)

In [None]:
df_statistics.hist(column='E', bins=100)

In [None]:
df_statistics.hist(column='aw', bins=100)

## Apparent wind

In [None]:
import numpy as np
aw = trip['aw']
awa = np.deg2rad(trip['awa'])
sog = trip['sog']

In [None]:
def apparent_wind_to_true(sog, aw, awa):
    return np.sqrt(aw**2 + sog**2 - 2*aw*sog*np.cos(awa))

def apparent_wind_angle_to_true(sog, aw, awa):
    return np.arccos((aw*np.cos(awa)-sog)/np.sqrt(aw**2 + sog**2 - 2*aw*sog*np.cos(awa)))


In [None]:
trip['w'] = apparent_wind_to_true(sog=sog,aw=aw, awa=awa)
trip['wa'] = np.rad2deg(apparent_wind_angle_to_true(sog=sog,aw=aw, awa=awa))

In [None]:
trip.plot(y=['w','aw', 'sog'])

In [None]:
trip.plot(y=['wa','awa','heading'])

In [None]:
df_statistics.loc[trip_no:trip_no+5][['trip_direction','heading','cog','awa','aw']]

In [None]:
sog = 0
aw = 1
awa = np.deg2rad(90)

print(apparent_wind_to_true(sog=sog,aw=aw, awa=awa))
print(np.mod(np.rad2deg(apparent_wind_angle_to_true(sog=sog,aw=aw, awa=awa)),360))

In [None]:
trip['trip_direction']