The purpose of this notebook is to inspect the may pickle file and find opportunities for data clean up. Ideally, we want:

1. No null values in columns where we need data
2. Eliminate time periods where the scooters' charge level was at 0%, as to elimiate the reason for non-use as the scooter was not charged.
3. Condense time periods to a frame of days
4. A way to find out which scooters were stationary for a long period of time.

Then, we will use that data to plot these points on our promise zone map.

In [2]:
import pandas as pd
import matplotlib.pyplot as plt

In [13]:
may = pd.read_pickle('../data/may.pkl')
may.head()

Unnamed: 0,pubdatetime,latitude,longitude,sumdid,chargelevel,companyname
0,2019-05-01 00:01:41.247,36.136822,-86.799877,PoweredLIRL1,93.0,0
1,2019-05-01 00:01:41.247,36.191252,-86.772945,PoweredXWRWC,35.0,0
2,2019-05-01 00:01:41.247,36.144752,-86.806293,PoweredMEJEH,90.0,0
3,2019-05-01 00:01:41.247,36.162056,-86.774688,Powered1A7TC,88.0,0
4,2019-05-01 00:01:41.247,36.150973,-86.783109,Powered2TYEF,98.0,0


In [14]:
may.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 20283582 entries, 0 to 20292502
Data columns (total 6 columns):
 #   Column       Dtype         
---  ------       -----         
 0   pubdatetime  datetime64[ns]
 1   latitude     float64       
 2   longitude    float64       
 3   sumdid       object        
 4   chargelevel  float64       
 5   companyname  int64         
dtypes: datetime64[ns](1), float64(3), int64(1), object(1)
memory usage: 1.1+ GB


Find null values.

In [9]:
may.isnull().sum()

pubdatetime      0
latitude         0
longitude        0
sumdid           0
chargelevel    283
companyname      0
dtype: int64

Remove the null chargelevel rows.

In [15]:
may = may.dropna()
may.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 20283299 entries, 0 to 20292502
Data columns (total 6 columns):
 #   Column       Dtype         
---  ------       -----         
 0   pubdatetime  datetime64[ns]
 1   latitude     float64       
 2   longitude    float64       
 3   sumdid       object        
 4   chargelevel  float64       
 5   companyname  int64         
dtypes: datetime64[ns](1), float64(3), int64(1), object(1)
memory usage: 1.1+ GB


Find out how many chargelevels are at 0% and take them out.

In [17]:
(may['chargelevel'] == 0). sum()

1025190

In [19]:
may = may[may['chargelevel'] > 0]
(may['chargelevel'] == 0). sum()

0

Condense time periods to a frame of days.

In [None]:
may['day'] = pd.to_datetime(may['pubdatetime']).dt.day

may_by_day = may.loc[may.groupby('day')['sumdid'].transform('sum') >= 1].sort_values('day').drop_duplicates(subset=['pubdatetime'])

print(may_by_day.head(50))

In [32]:
may_test = may_by_day[may_by_day.sumdid == 'PoweredLIRL1']
print(may_test)

                     pubdatetime   latitude  longitude        sumdid  \
0        2019-05-01 00:01:41.247  36.136822 -86.799877  PoweredLIRL1   
379776   2019-05-01 17:05:36.127  36.150405 -86.784339  PoweredLIRL1   
379706   2019-05-01 17:05:36.123  36.151842 -86.790645  PoweredLIRL1   
379913   2019-05-01 17:05:36.130  36.143596 -86.814203  PoweredLIRL1   
379947   2019-05-01 17:05:36.133  36.138810 -86.806151  PoweredLIRL1   
...                          ...        ...        ...           ...   
19627905 2019-05-31 08:41:15.153  36.157046 -86.770016  PoweredLIRL1   
19627884 2019-05-31 08:41:12.897  36.151746 -86.796175  PoweredLIRL1   
19628285 2019-05-31 08:41:38.920  36.127278 -86.789125  PoweredLIRL1   
19628309 2019-05-31 08:41:38.923  36.121319 -86.770356  PoweredLIRL1   
19628356 2019-05-31 08:42:26.423  36.011010 -86.684170  PoweredLIRL1   

          chargelevel  companyname  day  
0                93.0            0    1  
379776           38.0            3    1  
379706   

In [None]:
may[may.sumdid == 'PoweredLIRL1']