# Example usage for "movekit"

In [1]:
import movekit.io 
import movekit.preprocess
import movekit.feature_extraction

### Read in CSV file:

In [2]:
# Enter absolute/complete path to CSV file-
path_to_file = "datasets/fish-5.csv"

In [3]:
# Read in CSV file using 'path_to_file' variable-
data = movekit.io.parse_csv(path_to_file)
print(data)

      time  animal_id       x       y
0        1        312  405.29  417.76
1        1        511  369.99  428.78
2        1        607  390.33  405.89
3        1        811  445.15  411.94
4        1        905  366.06  451.76
9        2        905  365.86  451.76
7        2        607  390.25  405.89
8        2        811  445.48  412.26
5        2        312  405.31  417.37
6        2        511  370.01  428.82
10       3        312  405.31  417.07
11       3        511  370.01  428.85
12       3        607  390.17  405.88
13       3        811  445.77  412.61
14       3        905  365.70  451.76
19       4        905  365.57  451.76
17       4        607  390.07  405.88
18       4        811  446.03  413.00
15       4        312  405.30  416.86
16       4        511  370.01  428.86
20       5        312  405.29  416.71
21       5        511  369.99  428.86
22       5        607  389.98  405.87
23       5        811  446.24  413.42
24       5        905  365.47  451.76
27       6  

### Preprocess CSV file:
- "data_preprocessing()" function takes as input the CSV file read in using "csv_to_pandas()" function.
- The function returns the preprocessed data as a Pandas DataFrame. Also, it prints out statistics for the data preprocessing it performs for the user to view.

In [4]:
# To perform data preprocessing-
preprocessed_data = movekit.preprocess.clean(data)

Missing values:
 y            0
x            0
animal_id    0
time         0
dtype: int64
Removed duplicate rows based on the columns 'animal_id' and 'time' column are:
Empty DataFrame
Columns: [time, animal_id, x, y]
Index: []


### Impute missing values:
- To impute missing values for the attribute/feature/column 'x' and 'y', linear interpolation is used
- 'linear_interpolation()' function takes as argument the preprocessed data which we get by using 'data_preprocessing()' and also takes 'threshold' as the second argument which specifies the number of rows till which data should NOT be deleted.

Example: If threshold = 20, this means that if number of consecutive rows for the data is equal to or greater than 20, they will be deleted!

In [5]:
# Perform linear interpolation-
linear_interpolated_data = movekit.preprocess.linear_interpolation(preprocessed_data, 20)


Number of missing values in 'x' attribute = 0
Number of missing values in 'y' attribute = 0



### Grouping data according to 'animal_id' attribute-
- 'grouping_data' function groups all values for each 'animal_id'.
- The input parameter is 'processed_data' which is the processed Pandas DataFrame
- The function returns a dictionary where-:
- key is animal_id, value in Pandas DataFrame for that 'animal_id'

In [6]:
# To group data according to 'animal_id' attribute-
data_grouped = movekit.preprocess.grouping_data(preprocessed_data)

In [7]:
# Iterate through the keys of dictionary (which are animal_ids) and get the shape/dimension of each Pandas DataFrame-
for aid in data_grouped.keys():
    print("\nAnimal ID: {0} has the dimension/shape: {1}".format(aid, data_grouped[aid].shape))


Animal ID: 312 has the dimension/shape: (1000, 8)

Animal ID: 511 has the dimension/shape: (1000, 8)

Animal ID: 607 has the dimension/shape: (1000, 8)

Animal ID: 811 has the dimension/shape: (1000, 8)

Animal ID: 905 has the dimension/shape: (1000, 8)


### Calculate absolute features: metric distance, direction, avg_speed, avg_acceleration 
- Calculate the metric distance and direction between two consecutive time frames/time stamps for each moving entity (animals)
- 'compute_average_speed()' function to compute average speed of an animal based on fps (frames per second) parameter
- Formula used-
- Average Speed = Total Distance Travelled / Total Time taken
- 'compute_average_speed()' function computes the average speed of an animal based on fps (frames per second) parameter

In [8]:
data_features = movekit.feature_extraction.compute_absolute_features(data_grouped)
print(data_features)


Computing Distance & Direction for Animal ID = 312



  direction = math.degrees(math.atan((y2 - y1) / (x2 - x1)))
  direction = math.degrees(math.atan((y2 - y1) / (x2 - x1)))



Computing Distance & Direction for Animal ID = 511


Computing Distance & Direction for Animal ID = 607


Computing Distance & Direction for Animal ID = 811


Computing Distance & Direction for Animal ID = 905


Computing Average Speed for Animal ID = 312


Computing Average Speed for Animal ID = 511


Computing Average Speed for Animal ID = 607


Computing Average Speed for Animal ID = 811


Computing Average Speed for Animal ID = 905


Computing Average Speed for Animal ID = 312


Computing Average Speed for Animal ID = 511


Computing Average Speed for Animal ID = 607


Computing Average Speed for Animal ID = 811


Computing Average Speed for Animal ID = 905


Number of movers stopped according to threshold speed = 0.5 is 1985
Number of movers moving according to threshold speed = 0.5 is 3015

      time  animal_id       x       y  Distance  Average_Speed  \
0        1        312  405.29  417.76  0.000000       0.000000   
1        2        312  405.31  417.37  0.300000       0.150

### Using "tsfresh" Python library:

In [9]:
# For extracting all time series related features, do-
extracted_features = movekit.feature_extraction.time_series_analyis(data_features)

Feature Extraction: 100%|██████████| 10/10 [00:39<00:00,  3.91s/it]


In [10]:
# Save to disk 
print(extracted_features)
#extracted_features.to_json("extraced_features_fish.json")

variable  Average_Acceleration__abs_energy  \
id                                           
312                               0.135873   
511                               0.178501   
607                               0.085367   
811                               0.049559   
905                               0.169263   

variable  Average_Acceleration__absolute_sum_of_changes  \
id                                                        
312                                            1.758851   
511                                            1.884188   
607                                            1.694866   
811                                            1.519192   
905                                            2.181508   

variable  Average_Acceleration__agg_autocorrelation__f_agg_"mean"__maxlag_40  \
id                                                                             
312                                                0.011256                    
511                    