### Data Cleaning

It's commonly said that data scientists spend 80% of their time cleaning and manipulating data and only 20% of their time analyzing it. The time spent cleaning is vital since analyzing dirty data can lead you to draw inaccurate conclusions. Data cleaning is an essential task in data science. Without properly cleaned data, the results of any data analysis or machine learning model could be inaccurate. In this course, you will learn how to identify, diagnose, and treat a variety of data cleaning problems in Python, ranging from simple to advanced. You will deal with improper data types, check that your data is in the correct range, handle missing data, perform record linkage, and more!

### 1. Common data problems 

- Inconsistent column names
- Missing Data
- Outliers
- Duplicate rows
- Untidiness

In [1]:
#Import libraries
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import random
from random import randint

import extra # Just a file containing useful lists

In [2]:
ride_sharing = pd.read_csv('../Datasets/ride_sharing_new.csv')
ride_sharing.head()

Unnamed: 0.1,Unnamed: 0,duration,station_A_id,station_A_name,station_B_id,station_B_name,bike_id,user_type,user_birth_year,user_gender
0,0,12 minutes,81,Berry St at 4th St,323,Broadway at Kearny,5480,2,1959,Male
1,1,24 minutes,3,Powell St BART Station (Market St at 4th St),118,Eureka Valley Recreation Center,5193,2,1965,Male
2,2,8 minutes,67,San Francisco Caltrain Station 2 (Townsend St...,23,The Embarcadero at Steuart St,3652,3,1993,Male
3,3,4 minutes,16,Steuart St at Market St,28,The Embarcadero at Bryant St,1883,1,1979,Male
4,4,11 minutes,22,Howard St at Beale St,350,8th St at Brannan St,4626,2,1994,Male


### Numeric data or ... ?
You'll be working with bicycle ride sharing data in San Francisco called ride_sharing. It contains information on the start and end stations, the trip duration, and some user information for a bike sharing service.

The user_type column contains information on whether a user is taking a free ride and takes on the following values:

1 for free riders.

2 for pay per ride.

3 for monthly subscribers.

In this instance, you will print the information of ride_sharing using .info() and see a firsthand example of how an incorrect data type can flaw your analysis of the dataset. The pandas package is imported as pd.

In [3]:
# Print the information of ride_sharing
print(ride_sharing.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 25760 entries, 0 to 25759
Data columns (total 10 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   Unnamed: 0       25760 non-null  int64 
 1   duration         25760 non-null  object
 2   station_A_id     25760 non-null  int64 
 3   station_A_name   25760 non-null  object
 4   station_B_id     25760 non-null  int64 
 5   station_B_name   25760 non-null  object
 6   bike_id          25760 non-null  int64 
 7   user_type        25760 non-null  int64 
 8   user_birth_year  25760 non-null  int64 
 9   user_gender      25760 non-null  object
dtypes: int64(6), object(4)
memory usage: 2.0+ MB
None


In [4]:
# Print summary statistics of user_type column
print(ride_sharing['user_type'].describe())

count    25760.000000
mean         2.008385
std          0.704541
min          1.000000
25%          2.000000
50%          2.000000
75%          3.000000
max          3.000000
Name: user_type, dtype: float64


In [5]:
# Convert user_type from integer to category
ride_sharing['user_type_cat'] = ride_sharing['user_type'].astype('category')

In [6]:
# Write an assert statement confirming the change
assert ride_sharing['user_type_cat'].dtype == 'category'


In [7]:
# Print new summary statistics 
print(ride_sharing['user_type_cat'].describe())

count     25760
unique        3
top           2
freq      12972
Name: user_type_cat, dtype: int64


### Summing strings and concatenating numbers
In the previous exercise, you were able to identify that category is the correct data type for user_type and convert it in order to extract relevant statistical summaries that shed light on the distribution of user_type.

Another common data type problem is importing what should be numerical values as strings, as mathematical operations such as summing and multiplication lead to string concatenation, not numerical outputs.

In this exercise, you'll be converting the string column duration to the type int. Before that however, you will need to make sure to strip "minutes" from the column in order to make sure pandas reads it as numerical.

In [8]:
# Strip duration of minutes
ride_sharing['duration_trim'] = ride_sharing['duration'].str.strip('minutes')

# Convert duration to integer
ride_sharing['duration_time'] = ride_sharing['duration_trim'].astype('int')

# Write an assert statement making sure of conversion
assert ride_sharing['duration_time'].dtype == 'int'

# Print formed columns and calculate average ride duration 
print(ride_sharing[['duration','duration_trim','duration_time']])
print('Average ride sharing duration time is {:.2f}'.format(ride_sharing['duration_time'].mean()))

         duration duration_trim  duration_time
0      12 minutes           12              12
1      24 minutes           24              24
2       8 minutes            8               8
3       4 minutes            4               4
4      11 minutes           11              11
...           ...           ...            ...
25755  11 minutes           11              11
25756  10 minutes           10              10
25757  14 minutes           14              14
25758  14 minutes           14              14
25759  29 minutes           29              29

[25760 rows x 3 columns]
Average ride sharing duration time is 11.39


In [9]:
#Trying to create random tire sizes for each bike in the dataset
tire_sizes = []
for s in range(0, 25760):
    n = random.randint(26, 29)
    tire_sizes.append(n)
    
#Creating a tire sizez column in the dataset
ride_sharing['tire_sizes'] = tire_sizes

In [10]:
ride_sharing.head()

Unnamed: 0.1,Unnamed: 0,duration,station_A_id,station_A_name,station_B_id,station_B_name,bike_id,user_type,user_birth_year,user_gender,user_type_cat,duration_trim,duration_time,tire_sizes
0,0,12 minutes,81,Berry St at 4th St,323,Broadway at Kearny,5480,2,1959,Male,2,12,12,28
1,1,24 minutes,3,Powell St BART Station (Market St at 4th St),118,Eureka Valley Recreation Center,5193,2,1965,Male,2,24,24,27
2,2,8 minutes,67,San Francisco Caltrain Station 2 (Townsend St...,23,The Embarcadero at Steuart St,3652,3,1993,Male,3,8,8,27
3,3,4 minutes,16,Steuart St at Market St,28,The Embarcadero at Bryant St,1883,1,1979,Male,1,4,4,28
4,4,11 minutes,22,Howard St at Beale St,350,8th St at Brannan St,4626,2,1994,Male,2,11,11,28


In [11]:
ride_sharing.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 25760 entries, 0 to 25759
Data columns (total 14 columns):
 #   Column           Non-Null Count  Dtype   
---  ------           --------------  -----   
 0   Unnamed: 0       25760 non-null  int64   
 1   duration         25760 non-null  object  
 2   station_A_id     25760 non-null  int64   
 3   station_A_name   25760 non-null  object  
 4   station_B_id     25760 non-null  int64   
 5   station_B_name   25760 non-null  object  
 6   bike_id          25760 non-null  int64   
 7   user_type        25760 non-null  int64   
 8   user_birth_year  25760 non-null  int64   
 9   user_gender      25760 non-null  object  
 10  user_type_cat    25760 non-null  category
 11  duration_trim    25760 non-null  object  
 12  duration_time    25760 non-null  int64   
 13  tire_sizes       25760 non-null  int64   
dtypes: category(1), int64(8), object(5)
memory usage: 2.6+ MB


In [12]:
#Changing the datatype of tire sizes from integer to category
ride_sharing['tire_sizes'] = ride_sharing['tire_sizes'].astype('category')
assert ride_sharing['tire_sizes'].dtype == 'category'

In [13]:
#Checking if the data type change really worked
assert ride_sharing['tire_sizes'].dtype == 'category'

### Tire size constraints
In this lesson, you're going to build on top of the work you've been doing with the ride_sharing DataFrame. You'll be working with the tire_sizes column which contains data on each bike's tire size.

Bicycle tire sizes could be either 26″, 27″ or 29″ and are here correctly stored as a categorical value. In an effort to cut maintenance costs, the ride sharing provider decided to set the maximum tire size to be 27″.

In this exercise, you will make sure the tire_sizes column has the correct range by first converting it to an integer, then setting and testing the new upper limit of 27″ for tire sizes.

In [14]:
# Convert tire_sizes to integer
ride_sharing['tire_sizes'] = ride_sharing['tire_sizes'].astype('int')

# Set all values above 27 to 27
ride_sharing.loc[ride_sharing['tire_sizes'] > 27, 'tire_sizes'] = 27
ride_sharing[ride_sharing['tire_sizes'] > 27]

# Reconvert tire_sizes back to categorical
ride_sharing['tire_sizes'] = ride_sharing['tire_sizes'].astype('category')

# Print tire size description
print(ride_sharing['tire_sizes'].describe())

count     25760
unique        2
top          27
freq      19293
Name: tire_sizes, dtype: int64


In [15]:
ride_sharing.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 25760 entries, 0 to 25759
Data columns (total 14 columns):
 #   Column           Non-Null Count  Dtype   
---  ------           --------------  -----   
 0   Unnamed: 0       25760 non-null  int64   
 1   duration         25760 non-null  object  
 2   station_A_id     25760 non-null  int64   
 3   station_A_name   25760 non-null  object  
 4   station_B_id     25760 non-null  int64   
 5   station_B_name   25760 non-null  object  
 6   bike_id          25760 non-null  int64   
 7   user_type        25760 non-null  int64   
 8   user_birth_year  25760 non-null  int64   
 9   user_gender      25760 non-null  object  
 10  user_type_cat    25760 non-null  category
 11  duration_trim    25760 non-null  object  
 12  duration_time    25760 non-null  int64   
 13  tire_sizes       25760 non-null  category
dtypes: category(2), int64(7), object(5)
memory usage: 2.4+ MB


In [16]:
# I want to add a date column to the dataframe. will do that since its necessary for the next exercise
import random
from datetime import datetime, timedelta

min_year = 2017
max_year = datetime.now().year

start = datetime(min_year, 1, 1, 00, 00, 00)
years = max_year - min_year + 2
end = start + timedelta(days=365 * years)

for i in range(25760):
    random_date = start + (end - start) * random.random()
    print(random_date)

2021-05-07 23:24:00.083218
2022-04-16 05:00:00.201996
2017-11-23 04:48:44.005188
2019-01-21 19:31:21.302395
2019-03-24 19:07:13.663301
2021-11-04 19:04:07.779551
2022-12-19 16:41:38.935018
2019-08-21 19:09:29.812937
2018-12-02 19:43:13.309578
2018-05-31 22:16:28.137907
2019-11-20 06:47:08.941628
2020-09-29 01:34:58.568963
2020-05-03 18:46:10.533783
2017-09-17 03:21:31.109947
2020-10-11 17:45:37.301410
2022-05-11 03:25:16.854224
2020-11-07 15:04:51.104505
2021-05-27 04:47:04.134832
2022-11-02 04:17:31.379734
2020-01-31 21:38:00.899443
2017-01-10 02:15:15.422004
2021-08-12 12:03:53.127342
2021-09-17 15:08:38.578344
2017-08-09 05:09:59.428245
2022-03-22 09:36:12.589454
2018-06-11 10:31:57.148270
2018-05-04 14:44:25.928051
2018-11-17 05:48:13.489546
2020-04-08 00:04:53.914168
2018-10-29 02:39:53.494788
2017-03-23 07:13:30.585336
2019-11-05 14:22:36.633195
2018-02-02 23:41:05.928656
2019-04-16 18:33:33.848872
2017-05-11 08:04:02.253753
2020-08-24 17:27:47.610474
2017-07-18 08:10:02.816911
2

2019-12-07 07:30:11.478242
2018-08-06 11:32:47.222450
2021-09-03 18:33:01.138422
2022-08-19 09:20:12.341421
2018-05-25 01:39:27.114212
2019-08-13 09:04:27.730480
2019-02-20 22:21:46.184158
2018-10-20 20:08:40.373233
2019-06-19 22:25:56.358838
2021-02-01 07:33:00.373370
2020-01-21 14:43:17.815598
2017-04-03 16:58:02.676771
2022-12-09 21:08:04.317353
2017-03-13 20:40:11.011422
2022-10-18 09:50:36.182678
2017-12-02 09:58:14.399497
2022-03-06 21:41:22.228261
2019-09-23 04:00:50.072359
2020-04-02 08:21:33.889276
2017-05-25 20:19:51.455307
2022-03-13 05:25:36.021489
2022-10-26 07:19:46.786992
2021-07-02 14:42:06.658892
2020-12-19 19:57:00.559101
2019-03-08 05:46:46.812020
2021-01-06 21:17:41.217734
2021-04-27 05:26:46.610960
2018-07-20 23:48:00.520624
2021-03-05 21:32:33.922380
2021-03-08 22:09:41.568220
2018-02-08 21:08:13.413098
2018-08-10 20:23:47.068279
2021-10-09 17:35:28.587235
2019-07-25 19:38:56.075684
2017-01-23 18:12:01.198177
2021-06-29 22:39:53.276155
2022-11-02 15:25:28.884217
2

2021-02-12 05:03:30.017819
2022-07-11 14:10:00.228227
2019-03-15 12:01:48.565047
2022-01-27 00:52:29.211832
2017-07-29 18:38:32.155206
2019-04-16 23:29:06.688453
2021-09-20 07:45:41.609397
2020-08-01 18:39:22.697326
2017-03-24 20:19:29.781997
2019-07-01 04:58:34.329476
2020-04-14 04:37:51.904312
2020-03-03 13:16:25.076785
2021-07-07 19:04:42.133742
2021-12-06 20:36:08.286402
2022-10-03 00:47:22.592403
2021-07-27 06:06:00.773728
2018-11-10 21:53:33.242451
2020-02-16 23:20:16.798646
2021-12-20 07:10:23.657243
2019-09-05 05:17:43.236208
2018-03-14 01:41:14.309632
2018-11-05 05:48:11.386585
2018-06-10 12:07:08.745207
2020-02-03 19:02:44.456087
2021-08-20 16:17:18.369276
2021-09-12 15:14:16.124556
2018-09-07 02:51:26.461955
2021-09-24 14:03:01.587489
2017-08-13 14:33:00.512665
2020-11-18 05:39:16.481769
2020-04-09 02:55:24.720946
2018-10-16 01:07:53.306121
2020-07-30 18:05:44.599086
2021-03-19 04:18:13.442255
2022-05-28 09:58:30.754561
2017-10-25 12:31:59.175091
2020-03-10 15:28:36.601548
2

2019-11-06 13:04:42.353176
2017-01-22 13:29:21.995042
2017-09-04 01:30:13.288019
2020-09-10 16:51:15.587025
2019-08-17 17:43:09.775271
2020-09-29 09:23:45.095227
2021-10-14 01:23:21.621657
2019-08-08 15:02:49.216335
2022-09-02 10:57:15.568686
2021-02-11 07:22:03.656960
2018-09-03 00:10:40.012351
2018-04-25 03:48:39.196364
2019-10-08 02:06:19.431557
2018-12-24 06:45:13.140914
2017-05-12 15:55:46.223170
2021-02-10 21:15:09.234864
2020-07-18 12:24:45.706005
2017-09-14 17:22:36.559878
2019-01-31 20:44:34.385530
2017-02-15 06:42:39.249095
2021-08-08 09:07:58.550045
2017-09-18 04:37:49.935044
2022-09-30 00:01:49.209281
2019-03-23 17:27:04.896832
2022-09-25 22:12:29.709712
2022-02-04 14:19:56.042885
2022-01-05 05:17:17.475264
2021-12-24 11:45:47.753734
2022-09-21 04:07:54.176219
2017-05-20 21:36:16.820958
2020-03-21 00:34:22.073154
2022-07-16 11:22:23.089420
2019-02-15 15:28:12.869067
2018-02-11 19:27:27.874084
2022-03-16 12:23:20.962170
2017-10-31 06:24:45.053942
2017-02-01 11:20:33.603581
2

2020-06-26 01:59:40.041568
2020-08-19 04:04:36.505305
2021-10-31 16:58:40.925428
2019-11-02 12:18:26.645598
2018-02-24 07:50:01.104552
2022-05-17 11:33:47.674202
2022-12-27 20:06:02.799083
2020-12-16 20:23:38.754765
2020-10-02 02:46:49.372598
2017-09-08 00:57:06.742421
2017-06-18 05:19:10.061600
2018-12-18 01:25:44.279052
2020-03-11 19:49:13.188899
2022-12-21 07:30:00.038711
2019-04-04 15:04:50.569440
2021-01-07 23:32:31.966137
2018-09-20 10:45:21.472440
2022-02-10 05:08:40.665926
2018-06-03 18:36:13.040932
2020-09-20 12:25:05.619706
2019-12-30 14:31:42.498712
2018-06-26 11:40:34.151360
2022-12-21 06:21:24.825485
2018-03-07 18:37:08.297322
2018-03-10 19:19:03.195480
2022-01-14 03:57:26.139834
2020-10-20 06:11:07.977541
2017-01-14 19:03:28.670648
2019-02-10 05:08:55.401933
2018-05-12 11:50:38.711473
2018-04-22 15:07:56.920503
2019-07-11 15:04:28.433245
2017-07-29 13:49:08.682685
2018-04-18 21:34:40.011664
2017-08-24 05:03:05.701539
2022-04-19 13:15:38.728814
2020-09-13 07:05:03.514826
2

2018-12-14 16:11:38.205659
2019-02-24 06:17:32.972769
2020-09-22 00:19:43.333375
2020-11-22 15:24:30.411982
2019-03-08 22:44:23.508492
2018-11-10 17:47:32.054264
2019-06-15 04:34:20.554103
2019-07-04 18:16:38.052606
2022-09-29 11:40:23.162083
2017-08-04 00:04:22.432263
2020-03-06 01:50:58.578079
2018-04-23 05:58:17.311746
2020-05-05 15:51:38.858347
2021-04-25 20:42:31.623997
2021-10-08 11:43:22.279469
2020-05-03 17:08:56.387615
2019-06-01 20:10:10.606431
2018-12-31 15:09:54.966381
2017-09-07 06:44:47.306412
2019-08-22 05:54:22.407198
2018-09-30 03:34:58.121887
2022-03-16 21:19:41.672120
2018-08-31 12:23:06.088993
2018-02-28 01:56:07.119320
2022-06-14 12:43:50.429414
2020-11-15 09:05:21.213861
2020-10-15 02:19:32.280823
2020-06-27 18:37:32.449256
2019-01-24 07:03:52.358411
2021-05-16 17:32:23.059091
2019-09-26 04:36:24.473135
2019-04-20 21:53:14.415117
2022-03-11 02:40:43.213192
2021-09-27 17:23:35.366382
2020-06-16 14:44:04.401931
2021-10-25 14:58:13.562865
2017-11-11 15:46:50.002718
2

2022-09-04 03:07:01.800977
2021-12-10 10:23:03.102420
2021-04-04 06:36:01.669337
2022-04-06 12:28:18.180477
2021-05-02 14:52:06.535280
2022-09-08 11:59:05.721785
2019-05-23 06:27:13.109697
2019-10-02 05:08:34.739276
2020-10-09 18:01:47.013326
2019-04-25 07:17:17.163782
2017-02-13 14:57:45.093188
2021-06-29 09:15:54.452141
2017-10-25 09:24:47.409319
2018-05-14 20:51:15.499418
2020-07-14 13:56:25.066009
2022-01-26 21:37:10.288376
2020-08-09 22:42:50.292288
2021-05-05 06:18:38.211367
2021-08-20 01:27:46.146277
2019-01-10 05:27:41.128184
2019-09-04 22:40:19.301521
2020-07-26 19:45:59.105272
2019-10-13 04:12:18.037776
2021-06-05 14:28:18.086129
2020-10-23 16:11:41.260606
2020-01-05 20:39:09.898007
2022-05-01 12:23:39.657501
2018-07-14 15:14:26.013273
2018-06-02 11:33:26.843949
2022-10-26 08:31:58.322333
2020-05-30 06:26:36.888427
2017-02-27 20:34:01.846447
2017-07-28 18:55:31.200301
2017-09-04 22:33:20.905044
2020-10-08 18:31:28.876396
2020-03-21 16:23:19.781471
2020-05-27 23:07:47.130252
2

2018-08-31 22:44:42.708227
2018-01-17 09:28:13.034094
2019-02-14 23:53:53.120186
2018-12-30 02:41:41.830363
2021-11-27 13:29:27.595773
2021-11-12 14:19:15.904508
2019-03-29 20:04:07.767253
2020-12-27 22:34:30.635995
2018-04-30 13:43:49.379094
2022-02-03 02:16:06.182075
2017-11-18 19:04:03.315840
2017-11-11 15:20:17.287766
2022-07-25 10:28:28.354354
2018-11-13 09:08:31.193416
2022-08-30 11:01:08.087934
2018-04-22 16:46:07.913894
2017-01-07 07:08:14.763854
2019-11-27 12:53:48.139318
2022-09-06 08:39:29.765787
2022-03-24 08:04:18.160528
2020-02-20 21:52:51.625167
2019-06-18 03:04:40.141979
2018-02-25 00:14:02.109755
2020-10-23 10:09:58.351827
2022-04-01 06:21:17.454811
2022-05-30 15:30:22.666179
2019-05-11 15:22:37.589797
2017-12-31 16:16:31.828698
2018-05-30 10:10:40.770618
2018-03-12 11:53:44.817958
2020-02-25 01:00:53.435118
2019-07-21 11:20:30.300487
2017-03-23 19:40:08.508275
2021-09-26 15:25:15.573353
2022-04-03 06:35:29.501450
2020-01-11 19:37:54.745510
2021-09-11 06:52:03.424433
2

2022-03-16 04:01:04.780781
2020-07-29 18:45:36.973300
2022-10-18 16:30:43.561284
2018-02-16 14:50:19.669742
2020-12-03 10:32:55.813339
2019-11-19 17:44:05.414212
2017-08-08 02:15:07.072572
2017-06-10 17:08:27.347320
2022-08-11 20:06:07.570510
2018-11-18 05:48:33.807507
2021-01-17 13:20:40.932871
2020-08-09 23:18:28.020981
2017-07-03 11:33:07.633891
2017-10-13 15:48:39.001428
2017-01-19 11:46:06.151489
2018-06-22 11:00:31.317104
2021-05-11 06:10:13.390460
2017-08-28 20:15:10.562610
2019-03-08 21:02:02.259549
2022-01-14 08:25:24.320613
2020-11-17 15:55:45.818278
2017-02-11 12:15:06.340296
2022-10-10 13:43:49.446568
2022-01-05 22:00:34.980513
2021-11-15 11:17:45.470631
2019-12-05 13:02:10.449418
2020-04-05 11:45:46.943125
2019-02-02 09:13:26.419073
2022-03-19 11:06:50.926181
2022-09-18 18:41:38.564506
2018-09-25 12:48:42.928283
2020-06-03 22:10:54.940608
2018-01-25 20:12:30.881336
2020-10-05 08:53:56.042397
2021-07-26 01:36:39.239102
2022-09-18 22:01:27.673010
2019-12-17 15:25:02.461004
2

2020-09-02 17:51:18.010723
2022-03-14 20:20:26.754874
2021-12-24 22:34:04.753845
2022-12-12 21:10:12.920879
2018-10-18 05:41:05.646834
2020-09-05 05:32:13.646813
2020-11-17 14:14:01.268820
2020-03-20 01:51:31.666662
2020-04-27 14:21:43.922702
2018-09-10 03:43:06.518005
2020-05-23 08:15:23.241663
2022-03-18 11:07:38.896012
2020-09-30 16:36:48.777333
2017-03-17 12:39:19.986238
2022-05-07 13:25:20.503256
2019-04-07 13:12:20.328627
2017-03-01 22:37:18.269565
2017-07-24 04:50:13.254201
2020-07-07 17:14:48.080203
2017-05-12 12:33:52.099022
2018-05-20 03:28:09.047815
2021-08-23 20:08:08.886475
2017-08-08 18:18:35.749751
2022-01-28 21:13:34.424870
2017-03-25 11:21:02.709290
2019-10-16 22:23:38.004149
2022-09-07 00:51:44.869067
2017-04-22 07:59:33.317785
2019-10-08 03:19:03.290967
2019-01-28 02:27:38.677435
2019-05-21 04:08:32.646675
2022-03-29 12:16:00.316504
2017-12-27 20:31:58.791803
2022-08-18 08:25:27.825050
2017-05-11 12:36:54.957083
2021-03-13 09:05:55.178270
2018-06-02 09:34:28.679545
2

2018-01-02 05:13:27.233811
2022-09-03 19:26:32.714930
2021-10-07 23:21:34.110955
2018-07-28 00:45:51.155117
2018-11-13 02:35:40.989463
2021-03-31 06:13:37.026870
2022-09-22 03:44:56.834088
2018-03-01 18:02:27.590356
2022-12-14 03:23:35.601092
2018-01-16 02:11:45.907270
2020-06-19 19:32:54.350295
2019-05-21 06:39:16.207701
2017-08-24 00:17:28.119154
2020-04-13 07:52:53.168103
2021-11-23 23:56:19.108415
2019-04-29 12:37:00.545047
2020-07-17 02:11:05.019830
2022-12-27 04:39:33.768783
2019-10-10 15:09:44.029060
2019-01-12 15:28:29.119262
2017-09-16 20:48:51.154115
2019-09-15 09:19:03.459011
2017-07-18 15:20:36.539730
2017-09-09 17:29:07.174224
2022-01-11 19:32:08.463938
2018-12-24 02:27:38.465290
2021-09-02 19:10:29.038357
2019-10-15 00:04:59.930786
2019-06-23 18:33:06.637731
2018-01-13 02:11:56.872272
2019-07-12 16:24:46.472870
2018-10-23 12:51:24.522682
2022-01-14 11:18:49.061243
2020-05-26 09:49:05.753446
2017-01-08 13:04:49.025542
2017-06-27 13:14:46.062867
2018-08-31 17:04:06.705395
2

2018-08-31 18:19:42.940219
2022-04-10 13:12:35.954431
2018-07-23 23:27:40.969333
2018-07-18 00:25:06.615615
2019-06-07 04:23:49.647298
2017-10-10 05:47:25.228485
2020-02-04 03:52:40.672718
2021-01-30 22:36:31.056595
2017-05-30 06:16:11.791010
2019-01-19 22:11:58.686506
2021-01-22 02:08:47.187881
2021-09-15 15:34:38.935323
2018-02-16 04:22:21.336937
2021-08-06 03:53:44.303662
2018-11-19 19:12:39.929258
2021-03-20 06:11:06.109513
2021-11-14 00:20:28.867036
2018-07-11 18:54:07.255846
2019-01-14 12:40:00.652422
2018-12-23 04:58:15.683713
2022-02-10 09:39:12.876678
2017-08-07 04:46:08.720388
2020-09-25 13:27:07.500168
2021-06-23 04:12:21.416831
2019-02-18 07:05:27.202190
2020-07-04 14:03:29.707744
2020-07-01 16:52:18.514441
2017-07-24 23:25:33.602397
2018-07-02 04:46:10.127362
2022-06-29 19:43:47.467085
2019-10-05 00:10:08.024229
2021-07-23 05:40:26.082477
2017-04-16 01:24:09.620355
2017-01-17 05:38:30.089729
2020-07-30 01:09:13.398489
2018-09-01 08:52:00.940208
2018-11-29 08:39:46.956919
2

2022-10-05 22:12:11.827476
2018-08-23 00:28:31.732560
2020-08-27 11:12:46.661747
2017-02-08 16:52:55.675535
2017-10-07 21:02:23.961005
2022-08-25 11:02:03.034979
2017-02-13 09:37:58.239410
2021-09-19 15:44:32.924549
2017-01-25 11:32:24.265543
2018-10-13 06:45:34.460140
2018-07-16 11:22:16.011361
2017-02-11 00:14:43.053136
2022-12-10 07:04:11.175868
2021-07-06 19:39:29.363903
2018-04-15 12:06:26.561983
2019-05-15 21:30:19.846958
2022-02-13 05:49:05.505834
2020-12-20 02:14:33.244344
2022-02-13 01:48:38.588494
2017-09-20 15:37:09.268667
2021-11-27 15:43:27.065218
2022-03-03 15:47:56.429467
2018-08-07 19:07:55.780818
2022-03-15 13:47:13.643091
2018-02-15 12:30:41.680689
2018-01-14 19:03:00.742400
2017-03-17 09:44:25.462385
2017-05-15 19:31:02.614269
2021-02-21 11:38:08.500144
2022-11-04 19:36:54.692214
2017-01-11 02:01:42.806599
2021-02-17 04:32:53.952383
2019-10-04 09:56:02.942652
2017-05-16 02:57:24.300384
2017-05-06 08:10:35.314532
2018-01-18 15:49:26.472897
2018-06-16 09:25:52.089904
2

2019-06-17 09:01:07.916582
2020-05-30 17:38:57.474728
2019-10-15 07:13:12.930188
2017-10-12 11:20:38.686230
2020-12-21 08:33:43.151470
2022-05-29 18:55:41.317197
2022-09-22 00:35:05.198645
2020-11-12 14:57:25.236374
2022-04-30 19:22:34.490854
2021-08-14 12:28:37.356209
2021-06-02 15:23:34.826929
2021-02-26 22:03:46.685783
2019-04-14 23:29:30.304227
2021-01-12 12:13:19.915223
2022-06-13 08:13:11.074891
2019-02-02 02:27:50.790641
2021-02-24 22:46:42.280053
2017-06-19 05:38:18.355763
2021-10-03 18:33:03.094930
2018-06-29 20:59:49.052312
2021-11-11 09:13:02.016493
2019-07-09 00:14:01.233381
2022-06-28 21:02:11.143266
2018-03-20 19:43:23.766174
2020-10-01 16:15:29.863027
2018-12-23 14:46:28.102685
2022-05-16 10:32:54.161241
2018-04-05 12:07:08.485366
2021-10-04 23:06:22.507183
2018-10-21 20:21:35.505301
2020-01-14 19:11:08.290540
2019-03-25 08:23:27.424324
2020-08-21 10:26:39.410141
2019-09-29 20:39:52.627773
2021-11-20 17:25:48.717502
2022-03-30 08:00:07.567016
2021-05-03 10:39:46.668641
2

2019-03-01 09:31:28.611328
2021-01-21 13:44:05.617703
2017-04-05 04:53:17.788595
2018-01-30 02:55:10.373458
2019-06-11 06:39:57.957546
2020-06-27 23:38:13.336916
2018-06-10 20:47:37.687993
2017-06-15 01:26:45.930260
2019-08-12 08:25:24.834037
2020-06-23 10:02:40.954867
2017-11-18 18:24:13.879570
2020-06-05 02:04:07.599512
2022-08-04 17:38:11.384421
2017-08-05 22:02:09.648512
2022-10-18 13:41:16.100970
2022-04-08 14:19:59.754703
2019-03-01 12:02:11.476327
2020-12-22 16:26:51.062798
2020-06-11 08:03:50.220216
2017-10-15 04:15:50.556469
2021-07-01 13:38:34.259723
2018-09-12 16:06:12.590445
2018-08-13 13:41:28.743896
2018-01-09 15:46:09.208134
2017-11-06 05:39:49.402965
2017-10-08 17:32:33.152487
2020-08-15 16:35:21.199194
2019-10-04 12:59:27.062539
2019-10-04 19:52:47.086126
2020-05-22 03:35:04.801117
2020-09-22 08:34:25.211013
2019-03-06 00:57:37.563632
2019-09-15 14:50:34.104874
2020-02-25 04:09:59.671107
2020-04-26 19:39:06.953551
2022-11-16 01:19:55.745085
2020-05-05 01:43:40.905613
2

2017-03-06 11:14:52.110187
2017-09-07 06:21:57.258020
2018-11-05 01:31:53.051340
2017-11-23 11:50:27.345748
2018-03-03 10:40:28.687258
2019-06-07 13:23:11.015011
2019-10-25 06:15:21.637151
2017-12-18 16:59:44.974908
2017-11-03 21:28:28.102581
2019-09-06 02:45:21.841406
2017-10-15 12:59:44.378362
2022-12-29 06:58:43.012994
2022-05-29 13:20:05.406254
2017-01-19 11:32:55.860382
2019-07-27 13:07:21.279586
2017-09-30 18:56:44.944922
2018-12-13 07:12:26.749206
2022-11-03 22:09:31.524237
2021-05-02 06:10:43.170747
2019-12-31 22:03:55.011741
2020-07-01 02:43:53.477418
2021-07-14 19:17:29.043081
2022-11-24 03:55:09.276509
2021-04-10 00:56:29.093281
2017-02-19 21:52:10.071705
2017-09-25 13:40:39.426654
2020-05-31 02:08:35.201867
2020-08-28 07:30:45.572744
2019-12-15 01:11:51.591012
2020-04-29 19:15:29.890871
2021-10-01 15:50:13.786238
2018-02-15 16:25:53.896501
2019-12-16 22:38:46.514070
2020-02-09 18:16:40.531744
2021-01-26 18:52:25.591627
2022-03-04 10:41:45.054352
2019-08-08 15:08:06.411562
2

2021-06-21 22:44:51.631214
2021-08-26 08:21:48.817953
2022-02-19 13:28:21.763497
2021-06-01 03:40:43.176257
2018-11-07 11:02:26.659731
2019-07-12 06:50:12.158114
2017-06-17 17:32:41.740084
2020-02-29 01:11:26.496979
2021-04-10 22:46:25.442583
2021-01-01 16:15:40.855983
2021-01-13 14:42:49.114757
2017-06-26 13:16:25.362122
2019-03-05 16:50:06.556801
2021-01-17 10:54:04.084680
2019-10-31 23:09:14.386244
2019-03-10 00:47:10.116855
2022-11-17 00:10:19.532143
2017-09-17 22:18:18.746299
2020-05-23 16:46:53.597702
2022-07-12 11:24:36.974712
2017-02-21 23:12:40.695461
2019-09-29 04:18:24.792864
2018-03-05 13:35:47.092185
2021-10-18 11:45:37.986952
2018-09-28 05:51:33.929889
2021-01-04 14:35:03.655060
2017-12-05 01:58:27.919017
2020-08-12 02:49:01.414699
2019-08-26 13:43:49.543948
2019-11-23 12:27:00.295411
2020-05-20 05:09:21.962713
2021-12-17 19:05:05.229843
2022-05-02 14:47:32.134419
2019-08-06 17:35:08.499228
2017-10-03 08:05:16.233868
2022-05-27 04:42:49.542598
2019-01-16 02:10:35.701137
2

2020-05-20 10:45:07.979974
2020-09-12 18:46:42.528009
2022-11-21 18:19:54.114872
2019-01-02 21:37:07.446989
2020-10-15 09:32:56.295317
2019-05-23 16:50:23.969077
2017-04-15 04:43:46.556501
2022-06-22 19:12:31.224451
2019-09-08 01:26:49.055840
2018-08-24 04:24:51.745204
2017-01-05 15:30:45.872902
2018-10-18 12:07:25.586310
2022-11-28 05:57:39.807012
2022-01-10 02:35:35.907400
2021-02-25 06:28:08.518976
2020-04-21 14:45:49.429051
2018-06-07 22:39:43.708034
2017-07-30 02:37:56.778708
2018-01-12 19:41:13.469742
2021-07-27 12:21:01.088363
2019-01-01 02:24:01.190500
2021-08-13 21:23:38.109358
2021-11-12 08:26:30.706964
2018-09-03 14:42:08.159424
2020-12-20 11:51:16.256555
2020-07-29 14:24:36.704864
2022-02-26 06:44:08.597055
2017-01-19 19:33:59.661378
2020-08-16 05:36:25.790277
2020-12-07 12:12:02.487680
2018-05-16 13:58:42.817933
2021-06-20 10:58:42.119697
2019-05-11 06:07:24.168819
2019-05-10 07:45:16.004966
2020-11-23 13:14:04.351489
2020-09-01 23:45:13.680728
2018-11-12 09:18:34.534164
2

2021-05-07 04:41:51.022900
2017-06-08 06:18:25.025687
2022-12-30 12:17:51.860598
2018-12-23 11:58:26.471597
2017-04-18 17:17:26.340622
2021-08-09 13:13:22.277816
2022-03-22 10:11:08.291514
2017-09-12 10:02:42.628682
2017-03-22 16:15:28.456702
2018-04-29 21:37:50.055032
2021-09-09 17:14:01.621989
2020-07-12 06:58:26.603592
2017-06-11 06:20:56.851431
2017-04-17 17:25:21.780081
2021-01-09 19:53:29.895268
2020-01-22 19:02:15.780559
2021-01-21 17:25:52.140239
2018-10-30 03:13:16.479175
2022-12-19 14:14:22.497655
2019-11-25 15:02:46.943680
2018-10-11 15:31:02.645738
2017-05-19 16:08:49.239605
2019-11-05 05:27:44.350646
2021-02-02 04:28:37.583383
2018-03-13 00:05:00.787773
2017-02-02 19:58:48.785565
2017-02-09 21:30:24.088014
2022-10-23 16:17:11.238750
2018-05-06 21:48:53.373353
2020-12-29 07:17:22.904836
2019-06-23 07:31:33.236495
2021-08-22 12:05:00.969520
2018-12-23 18:10:35.216664
2020-03-18 17:28:43.848127
2021-03-25 00:49:19.487374
2017-12-31 13:36:25.584368
2019-03-17 00:50:47.427860
2

2017-09-19 16:09:34.096013
2022-10-27 07:23:03.805114
2021-05-26 06:15:57.898922
2019-09-28 00:09:46.071573
2022-02-28 17:43:45.310493
2021-12-26 13:11:10.666062
2018-12-13 07:51:59.231769
2021-01-06 17:38:43.695058
2020-05-11 08:29:38.121320
2019-08-12 05:46:37.105018
2017-06-04 08:31:29.442070
2021-10-19 13:31:04.272174
2017-04-05 02:53:30.616908
2017-05-29 20:19:40.124229
2018-08-19 18:17:26.109263
2021-07-07 05:43:36.823359
2018-03-31 22:20:48.863430
2021-10-03 11:22:12.584156
2019-12-18 19:47:20.660752
2021-07-15 07:17:26.895820
2021-02-06 21:22:56.235542
2019-09-29 00:02:58.847901
2019-02-19 05:24:30.113484
2022-12-01 20:14:39.883159
2021-09-28 04:20:53.691431
2022-11-11 04:42:53.111162
2022-10-03 09:57:13.220463
2017-04-18 15:46:17.854015
2021-03-10 18:32:44.205961
2018-10-27 07:57:41.715431
2019-03-01 00:21:59.676299
2017-09-20 10:05:57.413047
2017-05-22 08:37:12.377814
2017-07-12 15:59:05.236188
2022-04-08 22:04:34.226457
2018-10-14 09:23:37.790643
2019-12-20 16:12:40.397293
2

2017-01-28 17:27:38.508958
2021-12-11 16:47:50.527991
2020-01-08 08:49:26.517883
2018-04-15 20:03:05.311048
2022-11-22 03:55:46.119559
2020-03-04 11:00:34.408722
2019-04-05 07:33:34.909458
2022-07-09 13:13:50.012135
2021-07-22 00:23:40.730284
2022-06-27 14:06:45.662389
2021-01-24 04:22:03.601902
2019-07-02 14:52:38.557896
2022-01-22 14:25:48.648728
2018-01-04 18:26:41.215487
2020-05-21 02:17:19.265436
2020-03-14 06:58:01.617051
2022-12-13 14:41:25.785196
2018-07-05 21:52:22.728940
2022-12-23 11:22:14.110324
2022-01-23 13:03:11.463928
2017-02-23 22:09:43.134980
2022-03-25 21:12:22.711871
2020-08-27 15:29:31.150812
2021-09-27 05:09:26.303725
2022-05-14 06:28:50.861488
2020-04-10 01:25:14.947303
2017-05-17 04:28:29.169094
2020-05-19 06:14:55.831054
2018-07-19 16:35:43.534385
2017-06-15 00:59:26.584105
2021-08-11 04:43:45.813467
2017-02-21 22:04:42.570101
2018-02-25 13:03:32.878786
2019-07-30 12:57:41.290165
2018-04-03 14:29:50.258179
2021-01-23 12:46:25.074893
2021-09-24 12:13:34.153405
2

2020-10-24 21:22:12.123591
2022-05-22 21:30:14.713901
2022-10-27 15:46:22.653814
2018-09-27 16:40:00.016076
2020-05-21 14:02:34.732267
2018-01-08 05:31:41.138937
2018-11-20 14:46:25.495722
2020-08-07 05:34:23.476534
2021-12-31 16:42:42.651113
2022-07-17 04:06:28.392171
2018-07-08 17:44:22.777132
2017-03-29 15:21:28.118957
2017-02-19 04:21:28.092068
2018-10-11 15:50:54.252530
2018-10-05 06:30:28.191863
2017-12-27 16:39:55.199950
2018-09-05 09:32:34.274075
2021-10-24 23:30:47.810856
2022-11-05 13:24:53.073832
2017-03-25 02:23:53.771374
2018-02-19 07:40:45.019231
2017-12-14 08:04:52.953975
2017-04-01 18:32:56.772167
2021-10-17 05:59:03.745645
2020-06-12 20:44:54.965324
2019-12-22 02:04:43.625510
2022-09-04 23:05:23.445969
2017-05-20 00:24:59.806612
2018-06-05 10:36:12.694595
2018-09-14 05:41:25.934364
2018-04-16 08:39:12.827918
2022-10-12 00:41:51.171734
2020-10-30 12:54:48.304404
2021-07-26 12:27:08.669497
2017-01-12 18:00:55.698338
2020-01-23 12:55:21.144859
2022-06-11 16:37:04.157167
2

2022-04-16 21:57:30.503556
2020-10-21 13:55:39.468916
2020-10-29 14:33:06.173804
2017-05-03 05:51:51.621584
2020-09-29 07:46:46.059678
2021-01-23 22:41:44.917119
2019-09-11 15:36:36.783342
2017-04-18 16:44:12.264423
2021-10-23 21:53:14.674010
2022-02-20 23:48:35.782946
2020-08-06 20:55:12.319426
2022-08-05 02:34:33.734054
2022-10-10 17:33:38.245813
2019-06-07 14:25:50.425824
2019-09-05 23:40:02.497483
2021-11-07 16:41:07.617892
2021-06-15 00:47:18.413741
2019-12-21 13:44:22.515206
2018-01-30 23:03:04.277872
2021-03-08 08:32:02.709896
2022-01-29 18:13:50.894241
2018-04-02 22:01:38.821778
2022-02-19 22:45:32.329360
2020-04-12 00:31:19.885056
2019-11-16 21:14:02.531619
2019-04-27 14:28:47.445523
2018-09-18 14:51:48.743750
2022-01-14 02:23:29.728273
2019-10-23 23:47:30.098482
2018-07-04 18:13:48.847239
2022-03-08 15:59:41.471523
2022-02-19 04:36:55.687702
2017-12-17 23:00:58.896935
2022-08-07 09:05:14.233831
2020-09-06 02:25:25.900805
2022-09-10 16:09:47.200751
2022-02-23 09:34:33.493017
2

2020-12-26 21:40:45.153620
2017-12-01 00:17:41.714752
2021-04-10 15:04:31.332763
2022-10-28 06:24:48.884642
2022-09-23 22:01:22.509041
2022-11-15 02:07:58.175533
2017-01-24 14:19:22.839457
2022-11-13 19:36:12.975940
2020-01-14 07:16:18.858664
2019-10-27 18:13:21.796969
2019-12-02 18:51:57.348656
2020-01-10 07:29:52.310402
2017-03-24 12:31:48.969338
2021-07-17 17:33:17.556139
2020-06-06 12:55:15.195753
2021-10-04 00:48:10.149413
2022-10-19 01:43:23.909471
2022-04-03 07:31:04.200330
2022-09-03 16:46:31.758020
2019-09-23 00:24:52.975640
2019-04-02 11:09:55.575153
2020-09-28 09:42:59.392037
2022-10-02 20:37:23.329668
2021-06-15 17:27:57.328659
2017-12-01 23:12:19.455146
2020-08-31 15:30:48.199408
2022-03-08 13:31:44.094019
2018-01-18 03:32:02.078018
2017-03-01 19:28:44.121163
2017-10-02 12:49:06.699115
2021-03-22 14:15:57.690428
2019-10-13 13:30:09.217950
2019-02-23 13:19:30.375451
2019-02-10 11:09:42.131803
2020-02-13 18:54:37.384927
2021-09-28 02:33:45.709294
2017-11-11 07:03:31.724951
2

2017-11-03 03:09:36.154777
2020-09-11 12:14:22.490362
2022-01-25 03:03:34.960367
2020-02-21 18:52:06.815278
2017-11-21 17:18:08.907509
2020-04-30 04:13:29.634543
2021-10-10 06:09:06.557310
2020-05-01 19:41:48.557158
2017-02-06 18:48:30.381064
2022-11-28 08:18:11.642425
2017-08-27 11:06:32.542045
2020-05-18 12:07:33.809644
2017-10-07 23:13:12.603870
2018-06-03 13:42:46.888098
2020-11-13 06:12:24.407638
2019-11-08 09:49:46.934158
2022-07-15 08:33:22.721484
2017-01-17 05:03:01.611235
2022-01-01 16:42:29.070335
2018-12-16 12:57:24.244124
2022-08-07 15:42:51.880537
2022-05-09 18:31:22.893270
2018-03-10 04:34:38.831426
2020-12-18 17:19:56.116949
2017-10-26 12:48:21.176280
2020-05-06 23:42:08.942876
2020-02-25 03:28:32.720078
2021-04-19 19:42:10.469266
2021-10-28 12:24:08.595547
2020-05-29 11:35:47.011615
2020-01-29 09:32:13.103585
2021-05-06 14:09:32.301538
2017-02-07 19:57:23.049689
2020-08-24 02:07:31.737492
2022-02-14 05:17:49.500701
2017-09-29 20:51:51.849322
2018-08-14 12:08:38.620361
2

2022-08-19 22:55:30.154030
2022-07-13 18:04:53.399975
2021-10-07 12:11:51.696305
2021-09-20 08:41:09.515236
2017-07-01 18:47:12.442716
2020-06-02 17:31:30.541802
2019-12-16 18:15:20.980608
2017-02-23 18:20:32.570588
2022-05-27 22:06:55.904611
2019-11-15 07:16:53.818332
2019-06-12 23:36:25.285136
2021-05-10 09:52:39.452265
2018-01-10 19:12:39.450276
2019-02-12 03:40:02.653662
2017-03-20 20:18:55.301434
2020-11-01 16:10:34.489521
2021-12-07 19:24:51.142334
2017-06-01 18:45:03.651689
2017-07-05 09:19:53.079458
2018-03-09 04:46:50.765086
2020-02-28 17:19:00.515209
2018-07-17 10:25:51.303060
2017-11-26 04:14:41.772303
2021-04-27 20:39:09.252213
2020-04-18 03:38:39.534861
2019-08-03 15:21:03.071177
2022-07-17 19:38:23.681488
2021-02-03 11:48:16.990312
2020-03-26 04:39:24.304506
2019-09-10 15:40:37.291806
2021-07-16 00:13:46.280576
2018-01-22 00:42:30.761126
2018-12-15 14:36:40.009929
2017-06-14 06:44:49.016340
2018-04-10 23:44:37.683597
2021-07-03 22:28:54.476919
2019-12-12 02:49:47.285388
2

2018-06-23 14:16:04.885555
2018-11-14 14:13:23.777745
2017-01-23 06:00:30.040450
2021-06-19 20:41:51.728821
2022-10-29 13:14:50.095942
2019-09-26 09:10:28.936033
2021-10-04 20:36:58.372226
2020-03-29 07:39:45.688732
2018-04-01 23:57:28.830416
2018-01-07 15:33:45.967497
2021-02-22 22:26:32.569966
2017-09-16 19:55:16.142996
2022-04-14 20:33:39.450614
2021-10-26 12:39:34.500039
2017-04-08 08:22:36.750892
2020-10-21 16:20:37.864409
2017-10-03 08:04:26.330250
2020-09-20 17:08:07.020544
2018-07-13 06:10:44.797666
2018-03-18 06:32:26.157506
2018-09-04 00:33:42.037091
2021-08-28 05:52:35.215601
2021-02-05 23:56:13.457731
2017-07-19 13:51:47.918725
2021-11-05 12:16:41.743777
2019-08-22 07:43:20.631557
2021-10-24 07:09:38.729486
2018-08-05 09:37:59.565337
2018-07-30 03:03:41.442691
2018-09-20 01:13:09.831591
2022-02-15 23:20:28.151306
2022-09-16 01:20:22.487389
2018-02-11 22:17:56.196827
2018-03-28 11:33:26.885261
2018-06-01 17:10:13.491145
2018-03-03 01:45:34.258841
2018-08-08 15:12:06.222652
2

2017-09-16 05:24:41.116599
2020-06-26 17:39:57.941956
2020-02-12 12:35:41.026297
2022-11-17 07:17:53.875029
2019-07-18 14:10:34.938453
2022-10-12 11:18:20.149783
2021-02-02 12:10:27.387474
2019-08-25 13:21:01.843001
2018-03-04 04:54:27.350378
2017-12-06 03:51:12.680345
2022-04-16 17:55:36.586189
2019-04-12 17:45:10.972689
2020-08-13 13:08:35.462851
2021-03-29 19:40:19.780165
2019-03-30 00:16:40.644754
2017-05-19 13:38:47.827078
2019-10-27 14:04:09.639803
2019-05-07 20:35:58.006085
2019-01-22 05:32:11.991721
2022-10-11 20:15:27.476738
2017-04-24 22:46:32.137645
2021-03-01 01:47:51.527606
2022-07-13 10:12:12.279557
2020-04-17 04:55:44.477665
2020-12-10 11:04:39.293843
2018-05-20 07:36:06.022130
2019-01-29 16:38:57.731982
2022-01-01 08:34:46.557172
2019-05-17 12:26:02.214092
2019-03-16 00:14:57.445796
2022-09-19 14:03:45.676465
2021-11-07 13:16:40.020330
2020-05-22 10:30:32.886652
2017-10-26 14:01:10.700190
2022-05-22 08:02:54.508190
2021-10-06 02:18:38.207155
2019-09-03 07:20:26.360566
2

2019-04-05 15:34:29.209951
2020-02-27 02:44:46.007927
2021-08-21 07:29:46.707061
2017-07-05 09:48:36.377520
2022-11-29 15:10:50.720068
2018-03-16 18:53:45.801657
2020-08-05 19:26:36.328101
2021-03-08 01:19:29.528952
2019-07-05 22:57:24.543338
2021-08-14 19:23:21.268644
2018-07-25 00:29:37.877790
2022-05-16 16:58:07.276207
2021-09-14 21:48:40.634351
2019-01-03 06:19:19.177276
2018-06-04 06:07:36.856284
2021-11-16 20:39:24.085941
2018-04-26 22:02:45.910468
2021-12-31 17:07:20.893734
2017-07-03 05:46:16.114515
2021-09-16 09:59:57.001731
2018-01-23 18:07:30.285839
2022-11-06 17:59:17.299869
2017-09-28 11:21:39.082741
2022-11-12 14:30:33.085504
2017-12-05 22:44:16.353606
2018-01-12 23:25:12.039540
2020-07-10 01:58:34.990403
2021-08-26 10:30:11.414143
2020-07-04 10:48:57.571997
2018-02-26 03:45:41.537634
2018-05-26 10:05:28.758440
2022-02-17 19:04:53.707762
2021-11-09 14:19:01.985233
2021-05-08 06:28:48.071484
2017-01-14 23:25:19.159759
2017-12-27 16:27:26.308999
2017-03-28 11:09:26.979248
2

2019-10-20 23:11:49.457298
2017-08-06 18:48:12.973590
2017-04-02 18:47:52.383580
2017-12-26 21:00:46.638644
2017-04-03 07:44:05.162986
2018-09-09 07:55:48.553615
2019-11-03 08:38:04.347440
2019-12-28 02:56:32.251242
2022-08-13 05:52:25.861520
2021-04-18 03:12:10.858107
2017-12-16 23:26:19.148454
2020-07-16 11:25:47.443615
2022-01-20 12:18:18.779500
2019-12-08 20:24:40.823446
2021-02-08 17:54:09.369532
2020-03-18 21:52:45.810030
2017-08-25 10:56:05.028638
2017-05-09 17:09:17.499113
2021-04-15 01:44:51.851918
2018-12-14 08:51:45.875194
2017-01-15 20:11:12.991247
2022-08-11 00:09:07.098420
2019-10-27 14:26:51.154845
2022-04-27 22:26:55.150204
2022-01-15 18:51:41.809817
2017-12-28 21:51:14.649241
2018-07-31 19:03:31.194505
2018-03-25 21:55:23.298833
2019-06-07 07:40:01.623196
2021-09-05 07:31:41.412104
2019-03-06 23:53:57.338306
2020-05-29 20:51:48.185341
2022-02-21 17:50:14.976594
2020-06-08 11:25:29.898813
2017-07-09 19:46:25.594226
2019-11-26 14:39:58.394248
2021-08-07 06:44:47.028890
2

2020-05-24 12:01:11.373383
2019-07-08 09:29:02.567357
2021-06-10 12:09:08.934360
2021-10-26 23:56:33.784951
2017-02-17 00:48:52.594029
2018-04-25 16:48:18.851237
2017-10-30 04:44:46.603713
2022-09-29 22:11:38.934816
2019-09-01 11:07:38.600942
2018-07-30 19:40:29.657747
2017-04-13 03:47:56.100774
2018-06-12 17:46:35.255299
2021-07-01 04:42:59.946996
2021-07-23 06:15:22.861413
2020-07-25 13:48:27.760105
2019-04-14 03:30:14.589722
2019-10-25 20:46:00.129011
2020-11-16 16:50:46.318016
2022-03-21 11:04:32.494819
2022-03-23 12:07:54.853504
2019-10-03 07:36:29.040059
2022-06-18 18:40:32.557331
2017-05-27 22:16:57.436803
2018-09-20 02:29:53.950152
2019-12-16 20:05:52.860445
2022-01-02 15:52:38.355516
2019-01-05 15:08:04.849575
2017-10-13 06:18:02.700827
2017-09-01 08:38:45.293156
2018-12-04 23:00:32.032886
2017-01-27 00:38:48.772288
2020-01-11 05:41:45.918077
2017-09-21 16:43:52.030408
2020-04-08 22:29:17.960213
2017-11-04 17:25:06.078215
2021-06-04 20:01:30.872604
2018-03-29 01:10:07.629084
2

2018-11-02 17:51:14.033227
2020-08-05 05:01:15.841778
2020-12-19 01:22:58.140291
2018-07-24 15:15:00.802030
2017-11-03 07:35:47.764502
2021-12-17 10:24:48.859554
2021-04-27 09:31:28.218787
2017-06-15 23:49:07.235338
2021-12-23 18:02:26.087812
2021-10-20 17:25:14.525272
2020-01-29 21:05:48.917881
2018-08-23 14:06:57.836618
2020-10-28 15:15:42.301377
2019-04-17 01:02:19.960326
2017-10-26 17:07:01.115405
2022-04-24 08:25:49.413405
2019-06-17 04:38:59.915951
2017-07-10 02:45:43.312582
2018-09-06 15:29:43.861978
2022-03-14 12:03:52.596421
2018-02-28 04:49:52.211162
2018-01-23 17:03:00.163192
2021-01-26 19:09:40.815037
2019-10-21 15:59:57.573390
2019-08-25 20:08:11.753232
2022-12-01 13:29:14.293295
2020-05-15 07:51:46.410142
2019-06-07 14:38:32.362351
2018-07-28 00:43:40.675863
2021-01-29 19:13:29.945692
2018-02-14 00:13:18.473327
2019-01-07 10:02:23.384495
2021-02-11 21:39:47.656084
2022-10-23 17:43:43.930823
2018-01-08 18:18:52.575901
2019-07-04 15:37:24.488369
2022-07-27 08:40:48.676931
2

2021-01-21 03:46:53.736786
2017-12-05 18:32:03.029497
2020-11-24 14:30:27.247754
2019-08-24 20:06:24.047408
2017-11-30 23:52:54.405464
2021-12-05 11:52:59.135293
2017-11-17 07:09:38.360006
2017-01-27 07:17:32.907584
2021-05-05 09:06:21.003232
2017-01-16 20:33:22.577940
2018-04-27 08:26:46.846796
2019-11-16 21:03:49.771998
2022-10-06 02:00:22.220683
2017-08-20 18:50:10.975921
2018-06-20 18:37:25.632067
2017-11-22 18:22:34.534812
2022-02-05 14:43:46.397616
2021-04-16 08:29:27.745037
2022-07-25 14:47:52.225322
2021-05-27 12:20:11.087286
2020-04-27 19:28:56.738861
2021-07-28 06:26:18.324207
2021-07-06 03:17:56.670470
2021-08-21 08:49:50.106363
2021-08-12 05:55:42.337294
2021-07-08 07:44:11.776804
2018-10-14 09:18:15.498627
2018-10-09 10:47:06.397688
2021-08-29 06:54:13.189123
2018-09-07 04:31:28.145515
2019-09-05 02:08:18.998089
2021-04-19 20:16:48.726490
2022-03-09 15:37:24.275517
2019-06-24 20:46:36.425330
2022-06-05 00:27:05.553717
2020-02-11 05:58:36.700140
2019-05-19 06:40:32.873561
2

2020-08-02 12:10:56.685665
2022-05-15 05:41:42.325952
2020-09-17 04:28:50.233698
2018-09-07 02:28:16.436429
2021-11-14 11:07:58.360894
2019-05-19 06:16:34.537163
2022-12-23 04:48:05.338769
2021-02-18 03:25:57.216762
2021-05-19 17:17:36.035726
2017-02-12 10:47:53.936363
2019-04-01 00:36:27.485611
2017-08-10 21:01:00.450286
2019-03-26 17:49:41.265334
2017-02-17 05:21:37.178219
2020-03-01 03:36:58.522678
2020-03-14 08:19:40.235543
2021-12-09 12:08:21.693997
2017-02-17 17:21:39.064096
2022-01-29 00:38:40.881817
2017-05-19 03:50:29.597679
2019-05-30 05:04:53.376295
2019-09-11 10:47:48.256125
2020-03-30 04:09:03.775669
2022-10-25 01:22:00.798759
2021-09-02 11:09:00.242390
2020-08-06 07:19:24.016253
2021-08-25 19:40:42.143943
2019-02-13 21:11:56.752539
2020-10-16 21:58:38.672090
2022-12-23 14:44:10.787985
2017-01-04 07:38:33.810964
2017-03-01 08:29:19.452090
2020-05-08 12:11:51.054358
2021-06-09 18:25:30.450059
2022-10-21 19:48:50.330741
2019-11-14 13:15:10.380717
2021-05-05 00:32:37.134286
2

2022-08-15 18:09:58.445070
2017-12-18 02:09:39.149571
2019-05-25 14:06:04.517119
2017-12-16 10:12:25.618076
2021-07-09 09:37:49.783677
2021-02-28 17:45:49.731662
2021-11-22 04:14:30.252067
2021-05-02 06:10:56.970594
2017-10-14 05:04:20.899593
2020-04-06 03:51:51.262194
2020-02-11 21:43:00.521755
2022-04-16 02:29:03.428623
2021-12-26 17:18:58.847133
2021-10-01 11:49:44.872985
2022-07-25 05:16:52.848477
2020-09-19 23:39:17.284022
2020-03-19 04:27:17.483104
2022-04-30 13:56:22.350815
2020-10-15 08:00:58.023558
2019-01-22 06:50:12.547835
2019-08-28 13:09:35.996745
2022-03-10 23:38:07.919668
2022-07-19 18:22:40.906028
2022-12-26 17:29:20.453457
2022-12-21 23:56:30.416125
2019-08-06 13:47:17.324891
2020-08-30 15:34:36.102177
2018-04-23 23:22:06.314412
2021-03-01 02:58:57.100275
2017-08-18 16:26:26.913125
2022-10-11 19:03:09.286043
2020-08-19 03:52:59.125071
2020-05-26 20:15:16.557802
2020-05-16 11:53:30.246557
2019-02-07 11:40:56.651453
2021-04-18 22:55:43.405743
2017-10-26 11:03:36.108927
2

2021-11-15 21:00:49.484911
2019-07-17 10:21:24.351989
2020-12-25 22:31:22.104649
2021-05-26 03:07:47.587245
2019-01-19 21:39:26.624907
2017-05-03 03:02:56.551827
2021-02-26 12:03:28.174091
2017-03-10 04:19:06.149154
2021-09-14 07:46:04.384270
2022-01-14 23:20:13.009330
2020-11-15 05:37:21.278703
2019-12-10 14:45:46.120596
2018-08-18 20:10:27.209641
2017-02-04 17:40:46.883420
2019-03-02 16:21:03.123017
2019-06-01 11:20:14.222773
2022-01-23 05:39:43.151639
2019-12-07 00:33:45.425672
2018-09-04 19:02:28.588126
2018-10-19 03:52:27.390016
2017-09-20 05:19:26.395457
2019-07-18 19:24:07.121676
2020-03-26 19:17:57.817957
2019-11-14 16:37:16.149651
2020-12-07 05:48:58.185558
2019-06-22 03:12:45.891168
2022-03-30 22:10:24.703541
2022-10-21 01:36:57.242008
2017-12-09 17:21:11.822232
2021-05-19 08:05:31.300668
2021-06-24 04:39:19.415868
2017-08-22 11:19:58.271675
2021-10-08 12:40:24.796879
2017-04-22 08:38:15.611897
2021-05-18 16:50:50.812474
2022-08-24 23:36:25.779878
2022-08-03 11:51:38.134141
2

2017-08-17 23:07:42.917588
2021-09-22 08:00:06.142765
2019-03-01 16:02:44.940860
2022-10-07 12:46:57.270790
2022-03-07 11:19:39.672035
2020-12-31 22:32:27.051572
2022-07-11 08:06:47.013546
2017-03-01 05:17:10.916297
2021-09-12 01:18:25.807034
2017-01-31 08:10:14.175480
2020-06-20 02:12:11.895807
2022-06-11 18:14:23.174574
2022-12-30 22:12:49.262657
2022-04-29 05:13:07.643211
2018-05-30 20:30:31.917685
2017-02-01 13:38:30.778311
2018-07-09 02:34:33.239469
2022-02-25 13:56:54.241370
2022-08-26 17:54:06.743146
2022-01-17 09:20:00.709026
2020-01-14 07:55:16.939204
2017-01-14 04:20:34.903564
2019-12-14 23:21:32.770720
2022-03-14 12:21:44.860029
2020-08-20 09:51:49.419247
2022-11-27 18:14:32.537053
2021-10-31 23:47:52.305301
2021-10-23 15:35:34.839861
2020-02-24 05:14:53.464015
2018-09-19 04:41:25.605922
2020-12-13 00:37:47.730779
2020-12-24 05:53:54.308093
2020-10-25 05:08:21.656943
2021-01-28 12:43:52.415082
2021-05-28 09:37:54.991586
2020-11-10 20:34:24.252196
2021-08-22 08:09:57.892334
2

2020-06-06 17:30:46.480183
2017-08-15 13:06:48.028389
2018-09-16 15:34:56.026158
2019-01-08 04:01:12.330771
2018-04-26 05:03:16.358318
2022-10-08 22:29:23.388972
2021-12-31 06:03:35.534129
2018-03-03 07:28:53.398011
2022-07-07 14:10:28.546840
2017-08-16 18:21:15.291114
2019-04-13 03:32:23.555032
2020-05-27 05:45:59.181387
2020-04-02 07:10:40.043215
2018-11-16 06:10:57.720628
2021-02-24 22:42:20.392356
2021-03-17 04:20:27.933936
2021-07-28 17:52:58.191984
2018-09-02 18:22:56.290834
2021-08-15 12:57:44.053853
2019-04-11 05:25:17.703046
2020-04-28 23:59:00.401140
2021-11-01 17:44:31.150301
2020-12-19 11:09:26.161746
2018-08-06 03:38:28.107189
2018-09-23 16:42:55.668409
2018-01-13 01:41:07.586210
2022-07-12 16:49:38.274992
2022-09-26 04:31:26.314635
2019-08-18 18:23:36.603945
2019-09-19 02:32:21.906931
2022-12-13 01:20:25.808422
2018-11-10 20:33:27.651306
2021-02-04 07:35:25.839176
2017-03-13 06:45:41.543103
2021-08-15 11:18:41.077365
2017-05-23 07:25:46.353204
2021-09-12 19:28:39.399124
2

2020-09-02 04:14:35.956436
2019-02-09 01:40:59.741202
2022-08-30 00:30:12.207155
2019-01-08 13:19:06.515303
2021-07-26 17:41:38.447895
2019-03-14 17:20:36.748363
2020-07-21 19:33:16.547065
2017-06-04 07:54:11.036278
2019-07-18 09:40:58.989574
2022-02-17 23:48:32.014243
2019-11-06 17:34:05.730760
2019-07-24 11:47:27.396501
2018-10-09 10:08:00.210318
2018-10-23 02:23:12.744058
2022-02-25 05:05:47.297074
2018-12-20 08:40:47.280785
2019-10-27 07:05:26.402815
2022-07-31 20:52:41.208947
2017-09-04 23:07:16.260081
2021-10-29 01:19:18.860659
2022-01-05 17:29:20.825476
2022-05-09 04:56:57.871730
2020-03-24 00:06:51.151765
2021-06-13 23:40:05.834195
2017-07-01 04:12:32.033330
2020-12-03 08:14:44.772970
2018-07-26 23:11:31.086826
2020-06-03 12:37:48.844465
2022-10-13 01:33:54.075841
2021-09-25 11:33:20.287656
2018-04-10 22:46:14.722025
2018-07-17 17:21:28.712141
2021-01-07 07:00:09.751810
2021-06-20 03:26:36.900496
2017-05-09 13:49:23.702917
2020-09-17 04:34:11.269969
2021-05-01 07:32:54.670091
2

In [17]:
#Creating a ride date column
ride_sharing['ride_date'] = random_date

### Back to the future
A new update to the data pipeline feeding into the ride_sharing DataFrame has been updated to register each ride's date. This information is stored in the ride_date column of the type object, which represents strings in pandas.

A bug was discovered which was relaying rides taken today as taken next year. To fix this, you will find all instances of the ride_date column that occur anytime in the future, and set the maximum possible value of this column to today's date. Before doing so, you would need to convert ride_date to a datetime object.

The datetime package has been imported as dt, alongside all the packages you've been using till now.

In [18]:
import datetime as dt
# Convert ride_date to datetime
ride_sharing['ride_dt'] = pd.to_datetime(ride_sharing['ride_date'])

# Save today's date
today = pd.Timestamp('today')

# Set all in the future to today's date
ride_sharing.loc[ride_sharing['ride_dt'] > today, 'ride_dt'] = today

# Print maximum of ride_dt column
print(ride_sharing['ride_dt'].max())

2020-10-01 19:17:36.966529


In [19]:
#Creating a subset of the dataset 
ride_sharing_sub = ride_sharing.loc[0:77, :]
ride_sharing_sub.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 78 entries, 0 to 77
Data columns (total 16 columns):
 #   Column           Non-Null Count  Dtype         
---  ------           --------------  -----         
 0   Unnamed: 0       78 non-null     int64         
 1   duration         78 non-null     object        
 2   station_A_id     78 non-null     int64         
 3   station_A_name   78 non-null     object        
 4   station_B_id     78 non-null     int64         
 5   station_B_name   78 non-null     object        
 6   bike_id          78 non-null     int64         
 7   user_type        78 non-null     int64         
 8   user_birth_year  78 non-null     int64         
 9   user_gender      78 non-null     object        
 10  user_type_cat    78 non-null     category      
 11  duration_trim    78 non-null     object        
 12  duration_time    78 non-null     int64         
 13  tire_sizes       78 non-null     category      
 14  ride_date        78 non-null     datetime64[

In [20]:
ride_sharing_sub.columns

Index(['Unnamed: 0', 'duration', 'station_A_id', 'station_A_name',
       'station_B_id', 'station_B_name', 'bike_id', 'user_type',
       'user_birth_year', 'user_gender', 'user_type_cat', 'duration_trim',
       'duration_time', 'tire_sizes', 'ride_date', 'ride_dt'],
      dtype='object')

In [21]:
#Dropping unnecessary columns
cols_to_go = ['Unnamed: 0', 'user_type_cat', 'duration_trim', 'duration_time', 'ride_dt']
ride_sharing_sub = ride_sharing_sub.drop(cols_to_go, axis = 1)

In [22]:
#creating an id for each row
id = extra.id
ride_sharing_sub.insert(loc = 0, column = 'ride_id', value = id)

In [23]:
#Stripping the string 'minutes' from the duration column
ride_sharing_sub['duration'] = ride_sharing_sub['duration'].str.strip('minutes')

In [24]:
#Creating new duration entries. Just wanted to do so, no reason
duration = extra.duration
ride_sharing_sub['duration'] = duration

In [25]:
#Creating new user_birth_year entries. Just wanted to do so, no reason.
user_birth_year = extra.user_birth_year
ride_sharing_sub['user_birth_year'] = user_birth_year

In [26]:
ride_sharing_sub.head()

Unnamed: 0,ride_id,duration,station_A_id,station_A_name,station_B_id,station_B_name,bike_id,user_type,user_birth_year,user_gender,tire_sizes,ride_date
0,0,11,81,Berry St at 4th St,323,Broadway at Kearny,5480,2,1988,Male,27,2020-10-01 19:17:36.966529
1,1,8,3,Powell St BART Station (Market St at 4th St),118,Eureka Valley Recreation Center,5193,2,1988,Male,27,2020-10-01 19:17:36.966529
2,2,11,67,San Francisco Caltrain Station 2 (Townsend St...,23,The Embarcadero at Steuart St,3652,3,1988,Male,27,2020-10-01 19:17:36.966529
3,3,7,16,Steuart St at Market St,28,The Embarcadero at Bryant St,1883,1,1969,Male,27,2020-10-01 19:17:36.966529
4,4,11,22,Howard St at Beale St,350,8th St at Brannan St,4626,2,1986,Male,27,2020-10-01 19:17:36.966529


### Finding duplicates
A new update to the data pipeline feeding into ride_sharing has added the ride_id column, which represents a unique identifier for each ride.

The update however coincided with radically shorter average ride duration times and irregular user birth dates set in the future. Most importantly, the number of rides taken has increased by 20% overnight, leading you to think there might be both complete and incomplete duplicates in the ride_sharing DataFrame.

In this exercise, you will confirm this suspicion by finding those duplicates. A sample of ride_sharing is in your environment, as well as all the packages you've been working with thus far.

In [27]:
# Find duplicates
duplicates = ride_sharing_sub.duplicated(subset = 'ride_id', keep = False)
print(duplicates)

0     False
1     False
2     False
3     False
4     False
      ...  
73    False
74     True
75     True
76     True
77     True
Length: 78, dtype: bool


In [28]:
# Sort your duplicated rides
duplicated_rides = ride_sharing_sub[duplicates].sort_values(by = 'ride_id')
print(duplicated_rides.head())

    ride_id  duration  station_A_id  \
22       33        10             5   
39       33         2            30   
53       55         9            21   
65       55         9            16   
74       71        11            67   

                                       station_A_name  station_B_id  \
22       Powell St BART Station (Market St at 5th St)           356   
39     San Francisco Caltrain (Townsend St at 4th St)           130   
53   Montgomery St BART Station (Market St at 2nd St)            78   
65                            Steuart St at Market St            93   
74  San Francisco Caltrain Station 2  (Townsend St...            90   

                  station_B_name  bike_id  user_type  user_birth_year  \
22   Valencia St at Clinton Park     2165          2             1979   
39      22nd St Caltrain Station     5213          1             1979   
53           Folsom St at 9th St     1502          2             1985   
65  4th St at Mission Bay Blvd S     5392     

In [29]:
# Print relevant columns of duplicated_rides
print(duplicated_rides[['ride_id','duration','user_birth_year']])

    ride_id  duration  user_birth_year
22       33        10             1979
39       33         2             1979
53       55         9             1985
65       55         9             1985
74       71        11             1997
75       71        11             1997
76       89         9             1986
77       89         9             2060


In [30]:
# Drop complete duplicates from ride_sharing
ride_dup = ride_sharing_sub.drop_duplicates()
ride_dup[ride_dup.duplicated(subset = 'ride_id', keep = False)]

Unnamed: 0,ride_id,duration,station_A_id,station_A_name,station_B_id,station_B_name,bike_id,user_type,user_birth_year,user_gender,tire_sizes,ride_date
22,33,10,5,Powell St BART Station (Market St at 5th St),356,Valencia St at Clinton Park,2165,2,1979,Male,27,2020-10-01 19:17:36.966529
39,33,2,30,San Francisco Caltrain (Townsend St at 4th St),130,22nd St Caltrain Station,5213,1,1979,Male,27,2020-10-01 19:17:36.966529
53,55,9,21,Montgomery St BART Station (Market St at 2nd St),78,Folsom St at 9th St,1502,2,1985,Female,26,2020-10-01 19:17:36.966529
65,55,9,16,Steuart St at Market St,93,4th St at Mission Bay Blvd S,5392,2,1985,Male,27,2020-10-01 19:17:36.966529
74,71,11,67,San Francisco Caltrain Station 2 (Townsend St...,90,Townsend St at 7th St,1920,2,1997,Male,27,2020-10-01 19:17:36.966529
75,71,11,21,Montgomery St BART Station (Market St at 2nd St),58,Market St at 10th St,316,2,1997,Female,26,2020-10-01 19:17:36.966529
76,89,9,22,Howard St at Beale St,72,Page St at Scott St,5162,2,1986,Female,26,2020-10-01 19:17:36.966529
77,89,9,21,Montgomery St BART Station (Market St at 2nd St),64,5th St at Brannan St,1299,2,2060,Male,27,2020-10-01 19:17:36.966529


In [31]:
# Create statistics dictionary for aggregation function
statistics = {'user_birth_year': 'min', 'duration': 'mean'}

In [32]:
# Group by ride_id and compute new statistics
ride_unique = ride_dup.groupby('ride_id').agg(statistics).reset_index()
ride_unique

Unnamed: 0,ride_id,user_birth_year,duration
0,0,1988,11
1,1,1988,8
2,2,1988,11
3,3,1969,7
4,4,1986,11
...,...,...,...
69,94,1993,25
70,95,1959,11
71,96,1991,7
72,98,1989,21


In [33]:
# Find duplicated values again
duplicates = ride_unique.duplicated(subset = 'ride_id', keep = False)
duplicated_rides = ride_unique[duplicates == True]

# Assert duplicates are processed
assert duplicated_rides.shape[0] == 0

### 2. Text & Categorical Data Problems
Categorical and text data can often be some of the messiest parts of a dataset due to their unstructured nature.We will fix whitespace and capitalization inconsistencies in category labels, collapse multiple categories into one, and reformat strings for consistency.

#### Finding consistency
In this exercise we'll be working with the airlines DataFrame which contains survey responses on the San Francisco Airport from airline customers.

The DataFrame contains flight metadata such as the airline, the destination, waiting times as well as answers to key questions regarding cleanliness, safety, and satisfaction. Another DataFrame named categories was created, containing all correct possible values for the survey columns.

In this exercise, we will use both of these DataFrames to find survey answers with inconsistent values, and drop them, effectively performing an outer and inner join on both these DataFrames. 

In [35]:
airlines = pd.read_csv('../Datasets/airlines_final.csv')
airlines.head()

Unnamed: 0.1,Unnamed: 0,id,day,airline,destination,dest_region,dest_size,boarding_area,dept_time,wait_min,cleanliness,safety,satisfaction
0,0,1351,Tuesday,UNITED INTL,KANSAI,Asia,Hub,Gates 91-102,2018-12-31,115.0,Clean,Neutral,Very satisfied
1,1,373,Friday,ALASKA,SAN JOSE DEL CABO,Canada/Mexico,Small,Gates 50-59,2018-12-31,135.0,Clean,Very safe,Very satisfied
2,2,2820,Thursday,DELTA,LOS ANGELES,West US,Hub,Gates 40-48,2018-12-31,70.0,Average,Somewhat safe,Neutral
3,3,1157,Tuesday,SOUTHWEST,LOS ANGELES,West US,Hub,Gates 20-39,2018-12-31,190.0,Clean,Very safe,Somewhat satsified
4,4,2992,Wednesday,AMERICAN,MIAMI,East US,Hub,Gates 50-59,2018-12-31,559.0,Somewhat clean,Very safe,Somewhat satsified


In [None]:
#Creating the categories dataframe
data = {'cleanliness' : ['Clean', 'Average', 'Somewhat clean', 'Somewhat dirty', 'Dirty'],
        'safety': ['Neutral', 'Very Safe', 'Somewhat safe', 'Very unsafe', 'Somewhat unsafe'],
        'satisfaction': ['Very satisfied', 'neutral', 'Somewhat satisfied', 'Somewhat unsatisfied', 'Very unsatisfied']
       }

categories = pd.DataFrame(data)
# Print categories DataFrame
print(categories)

# Print unique values of survey columns in airlines
print('Cleanliness: ', airlines['cleanliness'].unique(), "\n")
print('Safety: ', airlines['safety'].unique(), "\n")
print('Satisfaction: ', airlines['satisfaction'].unique(), "\n")