## Data preprocessing

In [1]:
import movekit as mkit
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

#### Read data input

In [2]:
# Enter path to CSV file
path = "./datasets/fish-5.csv"

# Alternative: enter path to Excel file
# path = "./datasets/fish-5.xlsx"

In [3]:
# Read in file using 
data = mkit.read_data(path)
data.head()

Unnamed: 0,time,animal_id,x,y
0,1,312,405.29,417.76
1,1,511,369.99,428.78
2,1,607,390.33,405.89
3,1,811,445.15,411.94
4,1,905,366.06,451.76


In [4]:
# Simple call of the preprocessing method
preprocessed_data = mkit.preprocess(data)

Total number of missing values =  0
time         0
animal_id    0
x            0
y            0
dtype: int64


In [5]:
# OPTIONAL: more parameters to control the preprocessing of data

# preprocessed_data = mkit.preprocess(data, dropna=True, interpolation=False, limit=1, limit_direction="forward", inplace=False, method="linear")

# Paramters 
#  data: DataFrame to perform preprocessing on
#  dropna: Optional parameter to drop columns with  missing values for 'time' and 'animal_id'
#  interpolate: Optional parameter to perform linear interpolation
#  limit: Maximum number of consecutive NANs to fill
#  limit_direction: If limit is specified, consecutive NaNs will be filled in this direction.
#  method: Interpolation technique to use. Default is "linear".
#  order: To be used in case of polynomial interpolation.

In [6]:
# OPTIONAL: converting positional data into scale, defined by user
# preprocessed_data = mkit.convert_measueres(preprocessed_data, x_min = 0, x_max = 100, y_min = 0, y_max = 100)

In [7]:
# save cleaned features to csv 
preprocessed_data.to_csv("datasets/fish-5-cleaned.csv", index=False)

#### Support for 3d datasets

`movekit` also supports movement in three dimensions. All function calls remain the same for the user as the presence of a third dimension in the data is recognized by `movekit`.

Below we show an example of a 3D dataset that can be given to `movekit`.

In [8]:
# create a synthetic 3D dataset by appending a third dimension to the 2D dataset from above
z = np.random.normal(loc=0.0, scale=1.0, size=len(preprocessed_data))
preprocessed_data['z'] = z
preprocessed_data

Unnamed: 0,time,animal_id,x,y,z
0,1,312,405.29,417.76,0.038791
1,1,511,369.99,428.78,1.680744
2,1,607,390.33,405.89,0.266656
3,1,811,445.15,411.94,-0.721698
4,1,905,366.06,451.76,-0.462322
...,...,...,...,...,...
4995,1000,312,720.96,244.60,0.639186
4996,1000,511,662.56,225.29,0.001201
4997,1000,607,722.75,296.34,-0.058461
4998,1000,811,762.44,307.61,-1.158683


#### Support for geographic coordinates

`movekit` is able to project data from GPS coordinates in the latitude and longitude format to the cartesian coordinate system.

In [9]:
path = "./datasets/geo.csv"

# Read in file using 
geo_data = pd.read_csv(path, sep=';')
geo_data.head()

Unnamed: 0,time,animal_id,latitude,longitude
0,1,1,47.691358,9.176731
1,1,2,52.472161,13.402034
2,1,3,47.692101,9.055353


In [10]:
# convert and store in a new DataFrame
projected_data = mkit.convert_latlon(geo_data)
projected_data.head()



Unnamed: 0,time,animal_id,x,y
0,1,1,513261.777038,5282012.0
1,1,2,391460.27695,5814756.0
2,1,3,504153.593963,5282081.0


Often, it is helpful to normalize the data, e.g. for plotting.

In [11]:
projected_data = mkit.normalize(projected_data)