# Statistics Meets Logistics
This notebook holds the DataFrames and analysis for the project.

Requirements for the project environment can be found in https://github.com/luiul/statistics-meets-logistics/blob/main/requirements.txt. The project has not been tested in any other environment. 

## 📋 Outline of Project
- Import
- Apply Pandas knowledge to DataFrame
    - Conditional filtering
    - Useful Methods
    - Check for missing data

## 📚 Import Libraries

In [7]:
import numpy as np
import pandas as pd

# calling np.version.version should return 1.18.1
# calling pd.__version__ should return 1.1.2

# uncomment one of the following commands to prints floats with 2 decimals
# pd.options.display.float_format = "{:,.2f}".format
# pd.set_option('float_format', '{:.2f}'.format)

## 🖼 Prepare DataFrames & Describe Features

In [8]:
dl = pd.read_csv('raw_data_dl.csv')
ul = pd.read_csv('raw_data_ul.csv')

# dl is the DataFrame for the download raw data
# ul is the DataFrame for the upload raw data

In [9]:
dl

Unnamed: 0,timestamp,rawTimesamp,distance,lat,lon,alt,speed,acc,dir,connected,...,ss,ta,ci,pci,id,payload,throughput,rtt,measurement,location
0,10.33,1544432937,99.42,51.490553,7.413966,157.63,11.83,0.00,79.35,1,...,50,7,26385408,95,0,0.1,6.83763,41,1544432927,campus
1,21.87,1544432949,237.43,51.490715,7.416002,152.41,10.76,-0.52,89.45,1,...,52,4,29391105,167,1,2.0,9.71463,58,1544432927,campus
2,32.46,1544432959,325.26,51.490668,7.417176,154.64,6.19,-0.62,27.05,1,...,54,4,29391105,167,2,2.0,7.30594,57,1544432927,campus
3,46.40,1544432973,448.27,51.491839,7.416804,155.87,9.77,-0.98,342.45,1,...,45,4,29391105,167,3,3.0,3.94997,163,1544432927,campus
4,54.95,1544432982,540.48,51.492531,7.416222,154.41,12.33,0.01,336.68,1,...,45,4,29391105,167,4,5.0,8.54884,59,1544432927,campus
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9349,421.52,1547803330,2908.77,51.510228,7.461691,159.23,5.05,0.00,137.18,1,...,49,1,26378755,441,41,0.5,4.61894,48,1547802908,urban
9350,434.63,1547803343,3010.99,51.509847,7.463047,155.79,6.32,0.00,101.37,1,...,53,5,27299332,326,42,10.0,19.87580,35,1547802908,urban
9351,441.33,1547803350,3044.81,51.509787,7.463589,150.10,4.93,0.00,109.23,1,...,55,5,27299332,326,43,0.5,5.87372,43,1547802908,urban
9352,453.15,1547803361,3146.50,51.509798,7.465158,151.15,11.74,0.00,95.13,1,...,46,5,27299332,326,44,5.0,15.72950,69,1547802908,urban


In [27]:
ul

Unnamed: 0,timestamp,rawTimesamp,distance,lat,lon,alt,speed,acc,dir,connected,...,ta,ci,pci,id,payload,throughput,rtt,txPower,measurement,location
0,11.53,1544432938,113.75,51.49,7.41,156.39,12.21,0.50,77.59,1,...,7,26385408,95,0,4.00,24.52,35,12.30,1544432927,campus
1,21.33,1544432948,231.40,51.49,7.42,152.53,11.17,0.00,87.24,1,...,4,29391105,167,1,2.00,14.86,51,10.02,1544432927,campus
2,32.22,1544432959,323.73,51.49,7.42,154.23,6.44,0.00,48.55,1,...,4,29391105,167,2,4.00,16.27,57,4.34,1544432927,campus
3,45.99,1544432973,444.10,51.49,7.42,155.89,10.11,0.06,344.34,1,...,4,29391105,167,3,9.00,12.68,54,17.11,1544432927,campus
4,54.69,1544432982,537.34,51.49,7.42,154.41,12.33,0.02,336.68,1,...,4,29391105,167,4,8.00,14.59,60,17.31,1544432927,campus
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9353,422.52,1547803331,2914.37,51.51,7.46,159.96,6.28,0.00,135.25,1,...,1,26378755,441,41,5.00,21.37,48,17.83,1547802908,urban
9354,434.11,1547803342,3007.72,51.51,7.46,155.79,6.32,0.00,101.37,1,...,5,27299332,326,42,8.00,18.77,46,17.12,1547802908,urban
9355,442.91,1547803351,3053.18,51.51,7.46,147.95,6.77,1.34,118.12,1,...,5,27299332,326,43,9.00,31.82,42,11.08,1547802908,urban
9356,451.53,1547803360,3128.44,51.51,7.46,150.03,10.24,0.00,93.22,1,...,5,27299332,326,44,2.00,17.45,36,17.28,1547802908,urban


In [6]:
dl.info()
# around 9.400 entries, unlabeled index, 24 columns, no missing data(?), 
# 

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9354 entries, 0 to 9353
Data columns (total 24 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   timestamp    9354 non-null   float64
 1   rawTimesamp  9354 non-null   int64  
 2   distance     9354 non-null   float64
 3   lat          9354 non-null   float64
 4   lon          9354 non-null   float64
 5   alt          9354 non-null   float64
 6   speed        9354 non-null   float64
 7   acc          9354 non-null   float64
 8   dir          9354 non-null   float64
 9   connected    9354 non-null   int64  
 10  rsrp         9354 non-null   int64  
 11  rsrq         9354 non-null   int64  
 12  sinr         9354 non-null   int64  
 13  cqi          9354 non-null   int64  
 14  ss           9354 non-null   int64  
 15  ta           9354 non-null   int64  
 16  ci           9354 non-null   int64  
 17  pci          9354 non-null   int64  
 18  id           9354 non-null   int64  
 19  payloa

In [7]:
ul.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9358 entries, 0 to 9357
Data columns (total 25 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   timestamp    9358 non-null   float64
 1   rawTimesamp  9358 non-null   int64  
 2   distance     9358 non-null   float64
 3   lat          9358 non-null   float64
 4   lon          9358 non-null   float64
 5   alt          9358 non-null   float64
 6   speed        9358 non-null   float64
 7   acc          9358 non-null   float64
 8   dir          9358 non-null   float64
 9   connected    9358 non-null   int64  
 10  rsrp         9358 non-null   int64  
 11  rsrq         9358 non-null   int64  
 12  sinr         9358 non-null   int64  
 13  cqi          9358 non-null   int64  
 14  ss           9358 non-null   int64  
 15  ta           9358 non-null   int64  
 16  ci           9358 non-null   int64  
 17  pci          9358 non-null   int64  
 18  id           9358 non-null   int64  
 19  payloa

In [11]:
df[dl.isnull()]

NameError: name 'df' is not defined