### Data Cleaning

It's commonly said that data scientists spend 80% of their time cleaning and manipulating data and only 20% of their time analyzing it. The time spent cleaning is vital since analyzing dirty data can lead you to draw inaccurate conclusions. Data cleaning is an essential task in data science. Without properly cleaned data, the results of any data analysis or machine learning model could be inaccurate. In this course, you will learn how to identify, diagnose, and treat a variety of data cleaning problems in Python, ranging from simple to advanced. You will deal with improper data types, check that your data is in the correct range, handle missing data, perform record linkage, and more!

### 1. Common data problems 

- Inconsistent column names
- Missing Data
- Outliers
- Duplicate rows
- Untidiness

In [1]:
#Import libraries
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import random
from random import randint

import extra # Just a file containing useful lists

In [2]:
ride_sharing = pd.read_csv('../Datasets/ride_sharing_new.csv')
ride_sharing.head()

Unnamed: 0.1,Unnamed: 0,duration,station_A_id,station_A_name,station_B_id,station_B_name,bike_id,user_type,user_birth_year,user_gender
0,0,12 minutes,81,Berry St at 4th St,323,Broadway at Kearny,5480,2,1959,Male
1,1,24 minutes,3,Powell St BART Station (Market St at 4th St),118,Eureka Valley Recreation Center,5193,2,1965,Male
2,2,8 minutes,67,San Francisco Caltrain Station 2 (Townsend St...,23,The Embarcadero at Steuart St,3652,3,1993,Male
3,3,4 minutes,16,Steuart St at Market St,28,The Embarcadero at Bryant St,1883,1,1979,Male
4,4,11 minutes,22,Howard St at Beale St,350,8th St at Brannan St,4626,2,1994,Male


### Numeric data or ... ?
You'll be working with bicycle ride sharing data in San Francisco called ride_sharing. It contains information on the start and end stations, the trip duration, and some user information for a bike sharing service.

The user_type column contains information on whether a user is taking a free ride and takes on the following values:

1 for free riders.

2 for pay per ride.

3 for monthly subscribers.

In this instance, you will print the information of ride_sharing using .info() and see a firsthand example of how an incorrect data type can flaw your analysis of the dataset. The pandas package is imported as pd.

In [3]:
# Print the information of ride_sharing
print(ride_sharing.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 25760 entries, 0 to 25759
Data columns (total 10 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   Unnamed: 0       25760 non-null  int64 
 1   duration         25760 non-null  object
 2   station_A_id     25760 non-null  int64 
 3   station_A_name   25760 non-null  object
 4   station_B_id     25760 non-null  int64 
 5   station_B_name   25760 non-null  object
 6   bike_id          25760 non-null  int64 
 7   user_type        25760 non-null  int64 
 8   user_birth_year  25760 non-null  int64 
 9   user_gender      25760 non-null  object
dtypes: int64(6), object(4)
memory usage: 2.0+ MB
None


In [4]:
# Print summary statistics of user_type column
print(ride_sharing['user_type'].describe())

count    25760.000000
mean         2.008385
std          0.704541
min          1.000000
25%          2.000000
50%          2.000000
75%          3.000000
max          3.000000
Name: user_type, dtype: float64


In [5]:
# Convert user_type from integer to category
ride_sharing['user_type_cat'] = ride_sharing['user_type'].astype('category')

In [6]:
# Write an assert statement confirming the change
assert ride_sharing['user_type_cat'].dtype == 'category'


In [7]:
# Print new summary statistics 
print(ride_sharing['user_type_cat'].describe())

count     25760
unique        3
top           2
freq      12972
Name: user_type_cat, dtype: int64


### Summing strings and concatenating numbers
In the previous exercise, you were able to identify that category is the correct data type for user_type and convert it in order to extract relevant statistical summaries that shed light on the distribution of user_type.

Another common data type problem is importing what should be numerical values as strings, as mathematical operations such as summing and multiplication lead to string concatenation, not numerical outputs.

In this exercise, you'll be converting the string column duration to the type int. Before that however, you will need to make sure to strip "minutes" from the column in order to make sure pandas reads it as numerical.

In [8]:
# Strip duration of minutes
ride_sharing['duration_trim'] = ride_sharing['duration'].str.strip('minutes')

# Convert duration to integer
ride_sharing['duration_time'] = ride_sharing['duration_trim'].astype('int')

# Write an assert statement making sure of conversion
assert ride_sharing['duration_time'].dtype == 'int'

# Print formed columns and calculate average ride duration 
print(ride_sharing[['duration','duration_trim','duration_time']])
print('Average ride sharing duration time is {:.2f}'.format(ride_sharing['duration_time'].mean()))

         duration duration_trim  duration_time
0      12 minutes           12              12
1      24 minutes           24              24
2       8 minutes            8               8
3       4 minutes            4               4
4      11 minutes           11              11
...           ...           ...            ...
25755  11 minutes           11              11
25756  10 minutes           10              10
25757  14 minutes           14              14
25758  14 minutes           14              14
25759  29 minutes           29              29

[25760 rows x 3 columns]
Average ride sharing duration time is 11.39


In [9]:
#Trying to create random tire sizes for each bike in the dataset
tire_sizes = []
for s in range(0, 25760):
    n = random.randint(26, 29)
    tire_sizes.append(n)
    
#Creating a tire sizez column in the dataset
ride_sharing['tire_sizes'] = tire_sizes

In [10]:
ride_sharing.head()

Unnamed: 0.1,Unnamed: 0,duration,station_A_id,station_A_name,station_B_id,station_B_name,bike_id,user_type,user_birth_year,user_gender,user_type_cat,duration_trim,duration_time,tire_sizes
0,0,12 minutes,81,Berry St at 4th St,323,Broadway at Kearny,5480,2,1959,Male,2,12,12,28
1,1,24 minutes,3,Powell St BART Station (Market St at 4th St),118,Eureka Valley Recreation Center,5193,2,1965,Male,2,24,24,28
2,2,8 minutes,67,San Francisco Caltrain Station 2 (Townsend St...,23,The Embarcadero at Steuart St,3652,3,1993,Male,3,8,8,27
3,3,4 minutes,16,Steuart St at Market St,28,The Embarcadero at Bryant St,1883,1,1979,Male,1,4,4,29
4,4,11 minutes,22,Howard St at Beale St,350,8th St at Brannan St,4626,2,1994,Male,2,11,11,27


In [11]:
ride_sharing.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 25760 entries, 0 to 25759
Data columns (total 14 columns):
 #   Column           Non-Null Count  Dtype   
---  ------           --------------  -----   
 0   Unnamed: 0       25760 non-null  int64   
 1   duration         25760 non-null  object  
 2   station_A_id     25760 non-null  int64   
 3   station_A_name   25760 non-null  object  
 4   station_B_id     25760 non-null  int64   
 5   station_B_name   25760 non-null  object  
 6   bike_id          25760 non-null  int64   
 7   user_type        25760 non-null  int64   
 8   user_birth_year  25760 non-null  int64   
 9   user_gender      25760 non-null  object  
 10  user_type_cat    25760 non-null  category
 11  duration_trim    25760 non-null  object  
 12  duration_time    25760 non-null  int64   
 13  tire_sizes       25760 non-null  int64   
dtypes: category(1), int64(8), object(5)
memory usage: 2.6+ MB


In [12]:
#Changing the datatype of tire sizes from integer to category
ride_sharing['tire_sizes'] = ride_sharing['tire_sizes'].astype('category')
assert ride_sharing['tire_sizes'].dtype == 'category'

In [13]:
#Checking if the data type change really worked
assert ride_sharing['tire_sizes'].dtype == 'category'

### Tire size constraints
In this lesson, you're going to build on top of the work you've been doing with the ride_sharing DataFrame. You'll be working with the tire_sizes column which contains data on each bike's tire size.

Bicycle tire sizes could be either 26″, 27″ or 29″ and are here correctly stored as a categorical value. In an effort to cut maintenance costs, the ride sharing provider decided to set the maximum tire size to be 27″.

In this exercise, you will make sure the tire_sizes column has the correct range by first converting it to an integer, then setting and testing the new upper limit of 27″ for tire sizes.

In [14]:
# Convert tire_sizes to integer
ride_sharing['tire_sizes'] = ride_sharing['tire_sizes'].astype('int')

# Set all values above 27 to 27
ride_sharing.loc[ride_sharing['tire_sizes'] > 27, 'tire_sizes'] = 27
ride_sharing[ride_sharing['tire_sizes'] > 27]

# Reconvert tire_sizes back to categorical
ride_sharing['tire_sizes'] = ride_sharing['tire_sizes'].astype('category')

# Print tire size description
print(ride_sharing['tire_sizes'].describe())

count     25760
unique        2
top          27
freq      19355
Name: tire_sizes, dtype: int64


In [15]:
ride_sharing.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 25760 entries, 0 to 25759
Data columns (total 14 columns):
 #   Column           Non-Null Count  Dtype   
---  ------           --------------  -----   
 0   Unnamed: 0       25760 non-null  int64   
 1   duration         25760 non-null  object  
 2   station_A_id     25760 non-null  int64   
 3   station_A_name   25760 non-null  object  
 4   station_B_id     25760 non-null  int64   
 5   station_B_name   25760 non-null  object  
 6   bike_id          25760 non-null  int64   
 7   user_type        25760 non-null  int64   
 8   user_birth_year  25760 non-null  int64   
 9   user_gender      25760 non-null  object  
 10  user_type_cat    25760 non-null  category
 11  duration_trim    25760 non-null  object  
 12  duration_time    25760 non-null  int64   
 13  tire_sizes       25760 non-null  category
dtypes: category(2), int64(7), object(5)
memory usage: 2.4+ MB


In [16]:
# I want to add a date column to the dataframe. will do that since its necessary for the next exercise
import random
from datetime import datetime, timedelta

min_year = 2017
max_year = datetime.now().year

start = datetime(min_year, 1, 1, 00, 00, 00)
years = max_year - min_year + 2
end = start + timedelta(days=365 * years)

for i in range(25760):
    random_date = start + (end - start) * random.random()
    print(random_date)

2018-07-07 05:49:27.004181
2022-03-08 21:50:41.215939
2019-05-17 02:55:09.284087
2018-01-21 17:36:42.626465
2020-01-04 21:49:09.074927
2022-06-22 06:59:50.642621
2021-01-14 12:27:54.608932
2021-01-07 02:57:31.274097
2019-11-12 09:05:15.586537
2019-12-15 15:30:46.287957
2022-02-08 14:49:23.745106
2019-05-21 10:22:57.638180
2017-09-02 14:05:17.150506
2021-03-07 22:20:18.782353
2021-03-22 23:20:21.507543
2022-09-07 07:30:13.034658
2022-10-04 17:45:40.520841
2022-06-17 14:08:43.145918
2017-07-02 19:13:05.287756
2021-04-08 07:58:50.659951
2020-11-22 10:56:03.204599
2019-11-30 18:21:16.486389
2021-02-12 12:50:41.281159
2017-07-30 14:35:08.237533
2022-10-30 23:10:54.555854
2018-10-14 15:41:06.158298
2018-04-12 20:03:02.502051
2022-03-30 00:42:06.796205
2018-01-16 06:20:59.741608
2018-03-24 09:56:59.728919
2018-09-19 14:25:19.155376
2020-05-09 19:08:48.401129
2017-04-11 01:09:32.979493
2018-06-01 15:30:26.339783
2018-04-23 06:46:50.383217
2019-03-17 13:14:53.695852
2020-06-14 07:00:27.573862
2

2021-02-23 09:00:26.604479
2021-11-03 03:38:00.505964
2018-04-26 14:20:46.453609
2019-06-07 18:31:21.665162
2020-03-19 03:55:26.729986
2017-03-13 19:24:27.633544
2018-08-22 09:54:58.924180
2021-10-11 15:23:37.930322
2020-01-07 08:04:18.901068
2017-05-17 10:37:21.850212
2017-03-23 01:45:55.810499
2017-01-14 16:30:43.302160
2021-07-13 07:00:43.008643
2018-03-04 03:56:44.498638
2017-10-26 01:15:32.141587
2022-12-21 15:33:27.889023
2021-08-05 14:18:47.737069
2017-04-18 21:49:05.155501
2018-10-07 23:00:41.528295
2018-03-09 13:50:00.564054
2018-07-10 20:08:42.368278
2018-01-10 00:28:00.340686
2022-01-22 17:38:25.420160
2017-09-17 18:56:05.164546
2017-10-17 20:53:36.631219
2020-08-14 22:42:06.785111
2018-02-05 03:59:53.391969
2020-02-25 09:46:09.035007
2021-12-05 22:12:28.575620
2018-11-08 15:28:54.319492
2018-09-11 19:49:45.599316
2019-07-02 15:55:51.658269
2020-12-08 10:20:31.252315
2022-05-17 17:38:10.567406
2019-05-13 20:58:04.964548
2017-05-22 08:39:18.141791
2021-05-27 20:27:02.808672
2

2022-05-25 18:22:11.505771
2021-09-30 08:07:52.622358
2021-12-18 16:43:03.504391
2022-07-20 01:33:10.905819
2017-01-25 10:26:47.287157
2017-04-15 17:20:55.416313
2018-05-25 22:54:45.670212
2021-10-18 06:13:24.402988
2019-01-14 18:50:39.292063
2018-09-14 00:30:29.252643
2017-03-18 04:29:32.477751
2022-08-08 22:46:44.734575
2022-09-05 06:25:09.922015
2017-11-29 01:02:04.734215
2021-10-24 06:18:52.480473
2020-01-15 19:52:01.626003
2019-08-17 03:49:39.851244
2018-12-16 21:33:11.503358
2019-01-08 10:24:39.762235
2019-09-16 03:39:32.397886
2018-09-24 12:25:52.478909
2021-05-10 04:49:48.993901
2020-06-17 22:17:36.796217
2018-05-22 08:20:25.187238
2018-02-25 04:50:34.139077
2017-11-02 10:01:48.203538
2022-04-05 23:07:12.817043
2017-12-20 02:19:00.332115
2017-08-13 11:55:27.465951
2021-02-05 11:15:48.643058
2018-07-05 06:04:25.793100
2022-04-28 03:21:19.917297
2017-10-06 21:24:48.890966
2021-01-29 23:54:39.816064
2019-08-14 14:00:36.091242
2022-04-02 10:14:22.792602
2018-04-22 21:12:56.146539
2

2020-01-31 05:31:55.405144
2018-09-04 00:14:14.291606
2022-09-25 14:29:46.827787
2022-10-05 04:02:09.545613
2017-01-25 20:06:14.670196
2019-11-11 02:30:38.033498
2018-05-16 18:54:20.031924
2017-07-27 06:57:48.985378
2021-05-28 23:12:21.772618
2022-01-11 13:27:13.651037
2021-04-15 07:25:51.592719
2020-12-12 09:32:01.337963
2019-09-16 07:42:48.415880
2020-11-22 03:55:35.439933
2021-11-21 02:53:47.218855
2017-09-25 10:41:02.812325
2017-07-22 21:47:02.677362
2022-05-14 20:10:06.800819
2022-01-10 06:39:10.311416
2017-11-10 14:48:36.616163
2022-03-09 00:53:23.204817
2020-11-25 23:23:43.070777
2021-07-09 18:36:28.982226
2022-05-19 04:09:25.059926
2019-09-19 10:49:02.015435
2019-09-29 07:19:56.117671
2017-11-18 14:26:00.116067
2018-10-01 10:45:09.578213
2018-09-10 02:23:50.294835
2018-06-10 05:25:04.327804
2021-12-23 01:01:37.952342
2022-08-29 12:09:55.239876
2018-07-07 16:45:02.940698
2019-11-03 17:12:45.692938
2020-11-18 19:27:01.495339
2019-05-23 04:08:06.591822
2018-04-26 19:17:03.270588
2

2022-11-27 17:45:23.174566
2017-06-26 23:47:10.953584
2017-04-30 22:10:53.204714
2021-04-06 16:00:15.080289
2020-01-31 03:11:20.703817
2019-07-06 05:11:20.500286
2019-02-22 21:01:19.739558
2018-09-30 05:02:22.316853
2017-10-22 18:37:11.666017
2019-12-29 12:18:44.740312
2021-06-12 09:04:36.759022
2019-03-26 00:49:50.306722
2021-08-09 02:02:15.516153
2021-06-13 21:16:25.536076
2018-02-06 07:59:54.701002
2022-12-23 02:57:23.216899
2022-01-14 03:06:32.870357
2019-01-10 06:55:15.246543
2022-10-19 17:49:17.094938
2017-04-02 07:13:13.260137
2021-03-14 12:46:54.759728
2020-09-02 20:30:42.925839
2019-08-25 19:53:59.856316
2019-01-24 01:14:43.943764
2018-12-04 15:08:22.822340
2018-01-04 19:18:14.765812
2019-01-19 00:14:09.955886
2019-09-03 13:24:33.855596
2017-01-02 20:34:27.167375
2020-04-29 20:56:31.587768
2020-10-15 18:21:03.265545
2019-11-08 16:44:13.339479
2017-04-08 11:14:06.795590
2020-06-28 04:48:43.925796
2021-12-15 03:46:16.958368
2018-12-11 03:45:39.290193
2022-04-26 20:51:46.033797
2

2019-02-06 02:52:37.936057
2019-10-05 03:30:14.471908
2021-03-30 20:46:34.474396
2022-04-21 07:22:26.656294
2021-01-31 13:01:16.280563
2020-05-03 07:27:27.238133
2020-04-29 05:06:55.056963
2021-11-21 22:20:00.028656
2020-10-13 11:39:38.327711
2018-05-20 03:09:43.641531
2018-05-04 00:45:13.496001
2019-12-02 03:21:33.701239
2019-03-12 03:04:41.287267
2017-09-01 12:09:57.886135
2017-10-10 18:08:21.880417
2017-06-23 19:32:20.353104
2018-05-12 12:46:36.587291
2018-03-08 17:03:09.351119
2021-01-31 04:12:53.769480
2021-10-16 03:47:49.663277
2020-04-03 00:23:18.047686
2020-06-28 22:41:44.941597
2017-01-07 03:01:35.030749
2020-10-27 00:30:46.752812
2018-03-21 12:44:04.623622
2020-02-11 20:00:14.782360
2020-09-05 05:26:26.365536
2021-10-13 23:52:58.518748
2019-05-23 12:06:27.565375
2017-04-05 15:15:05.756839
2018-08-07 05:44:41.929320
2021-03-28 15:41:20.903479
2021-07-04 13:54:28.134684
2021-09-04 12:22:41.520548
2022-09-08 06:28:48.481633
2019-06-23 05:57:08.138581
2017-10-13 06:16:12.627163
2

2021-06-04 15:16:05.578112
2017-10-30 11:49:48.535259
2019-09-15 03:44:55.960041
2017-10-27 10:31:59.306434
2018-09-02 18:45:53.107271
2020-04-23 09:31:56.156347
2018-12-06 17:22:14.873312
2022-08-18 05:43:44.709924
2020-08-28 22:04:44.975657
2019-03-24 17:29:25.424389
2019-04-26 02:46:10.724289
2022-05-11 15:27:54.789764
2019-12-28 02:56:37.585391
2022-07-08 00:01:47.818937
2021-06-11 07:18:10.958376
2021-02-15 12:52:05.499606
2020-07-14 01:40:22.596145
2020-12-03 08:01:39.411507
2018-04-03 08:29:22.176875
2021-07-08 12:01:45.235555
2021-01-21 06:12:30.682186
2019-10-10 12:58:17.013788
2017-12-04 23:31:42.429525
2021-01-14 15:50:50.002511
2019-11-02 07:56:55.354757
2019-02-09 09:14:33.702848
2017-09-09 16:26:56.137132
2022-02-09 10:53:32.642607
2018-11-14 03:31:58.686740
2022-11-18 11:15:20.158287
2021-12-05 22:13:14.319493
2020-07-05 18:32:52.888627
2021-03-02 03:40:54.433895
2020-06-09 12:14:55.747761
2022-05-09 15:12:40.681462
2021-12-21 13:45:12.584454
2019-07-13 14:24:41.304714
2

2021-11-19 00:22:49.733560
2022-05-15 00:53:57.736820
2018-02-17 16:19:25.999311
2018-11-19 20:57:35.868893
2022-04-19 21:02:21.898913
2022-06-02 14:24:17.548185
2017-09-04 20:29:43.585335
2022-04-09 22:59:59.776002
2022-07-20 19:05:06.776149
2017-11-27 13:40:38.451305
2019-04-04 18:50:13.914604
2018-12-20 13:25:28.894448
2018-09-27 18:12:01.599336
2017-02-07 07:30:00.381575
2022-12-25 13:33:10.219451
2021-10-30 06:11:41.620711
2019-12-24 08:06:30.274048
2017-07-10 06:42:05.835950
2022-11-26 18:02:05.986083
2019-08-19 05:24:25.826340
2019-10-17 16:29:46.803262
2017-06-14 19:35:24.004737
2021-08-05 21:47:31.047865
2022-12-01 13:10:27.090857
2017-07-04 22:05:18.637126
2019-05-01 07:09:36.693994
2022-01-30 03:22:29.733881
2022-03-11 09:14:09.995414
2018-02-12 05:50:12.706117
2019-08-09 02:57:01.340508
2019-02-10 22:44:03.019528
2021-07-12 00:36:55.466153
2017-06-20 21:47:36.032802
2020-12-11 02:04:10.699190
2022-09-07 02:20:52.727266
2017-10-17 13:50:26.614489
2022-10-06 11:47:47.459481
2

2019-01-20 16:52:56.948416
2017-05-25 05:53:11.132091
2021-08-19 18:30:41.343311
2022-08-23 01:34:55.275809
2021-12-21 02:54:22.666452
2019-09-21 14:04:21.842030
2020-05-07 06:27:48.685128
2022-11-05 02:56:39.138254
2019-03-12 01:22:57.591831
2017-12-17 23:53:05.141580
2017-08-06 11:09:53.628366
2017-04-04 07:48:15.443056
2020-03-07 21:34:16.972835
2019-07-09 09:19:22.505403
2019-04-26 20:06:23.765288
2018-11-04 07:06:02.689228
2017-05-09 21:33:12.692300
2022-08-18 11:54:01.278465
2020-11-02 20:38:33.172299
2021-06-05 07:07:34.623792
2020-07-11 07:44:25.895169
2020-06-13 20:25:05.382854
2018-01-14 21:37:18.490666
2022-06-19 01:54:14.465015
2019-05-13 22:23:48.968838
2019-06-09 11:38:48.660409
2021-07-16 20:56:19.436111
2018-06-03 01:24:13.009903
2019-09-12 17:54:28.768085
2018-02-09 16:13:25.863171
2018-10-11 13:50:50.286599
2020-07-07 17:18:09.844504
2019-11-18 22:26:18.963417
2019-08-05 16:28:36.158588
2018-10-09 23:41:21.547939
2019-02-26 00:07:33.460748
2021-03-21 09:06:11.599325
2

2018-03-24 00:29:08.400062
2021-05-14 12:30:09.147423
2018-11-11 21:01:12.839232
2018-04-28 22:21:11.220895
2020-08-25 13:05:20.487185
2020-02-10 18:07:53.957384
2021-08-20 05:41:52.994516
2021-12-23 05:37:52.902329
2020-10-30 20:59:46.854111
2021-07-20 20:17:17.989736
2017-07-29 03:10:42.533908
2021-12-15 06:56:17.457265
2021-05-09 12:43:44.459788
2020-03-16 04:56:23.298928
2017-04-27 16:16:32.035336
2022-03-30 13:17:32.512245
2022-08-18 03:59:04.674040
2017-04-24 03:07:22.694782
2021-03-26 19:36:59.085486
2020-05-17 10:29:55.427346
2019-07-08 07:37:55.681233
2022-05-10 18:29:47.481710
2021-12-28 07:52:35.089200
2018-08-11 21:54:18.498919
2017-06-12 17:25:40.114331
2021-05-28 00:30:38.282192
2019-10-08 19:59:46.307925
2020-10-19 03:14:06.989021
2018-07-15 18:55:12.807524
2020-03-14 11:33:29.549265
2021-09-29 23:55:37.485555
2020-11-13 08:06:50.162229
2019-02-09 23:40:20.505792
2018-02-11 23:48:50.264266
2022-11-11 02:26:51.739150
2020-03-10 16:23:59.647760
2020-11-11 02:42:01.029968
2

2019-03-19 17:57:53.432534
2017-08-27 14:26:05.939446
2022-10-21 01:27:17.930791
2021-08-21 03:39:43.909215
2018-04-01 07:35:18.925369
2017-04-05 00:08:10.955987
2021-07-18 05:51:13.964927
2020-05-04 07:04:21.756819
2018-10-30 10:26:45.900508
2019-04-01 04:26:35.997525
2018-12-15 03:20:47.820486
2018-08-14 09:20:47.440446
2022-02-09 06:16:05.848286
2018-12-23 16:37:12.043699
2017-04-27 11:33:09.219785
2021-03-25 09:58:53.364367
2022-04-30 15:45:05.830324
2018-06-16 09:44:57.238077
2018-02-27 22:38:24.366022
2020-12-23 20:49:12.314671
2017-12-24 02:25:25.231069
2017-03-01 06:06:37.689378
2018-11-18 22:08:48.190580
2020-11-05 23:29:44.282941
2022-01-21 05:54:28.966783
2020-05-31 23:59:54.564354
2017-12-04 22:02:58.975140
2021-07-12 20:43:11.589644
2019-02-03 18:46:53.204254
2018-12-13 08:16:48.917120
2018-10-11 09:54:33.094411
2019-07-28 23:23:49.344632
2017-05-15 00:43:33.211842
2019-01-14 01:27:07.001697
2019-01-02 20:09:20.815024
2021-12-29 22:10:26.781825
2021-07-19 11:17:46.826832
2

2022-06-05 08:36:34.022032
2022-05-19 01:29:28.424257
2020-01-05 01:47:23.563524
2017-03-26 19:30:32.161294
2018-07-02 12:22:36.125857
2017-04-08 16:01:21.677575
2018-07-23 13:08:31.395423
2021-07-04 22:15:19.635768
2021-04-19 19:54:00.187395
2022-10-30 04:45:04.211794
2022-12-23 00:44:21.238449
2020-11-09 20:36:14.192518
2018-09-07 03:18:27.235173
2017-03-22 10:25:37.715501
2020-04-19 04:21:46.935195
2019-02-03 19:00:53.448672
2018-02-08 21:05:17.719768
2020-12-31 15:15:59.943375
2019-07-16 08:52:41.577131
2021-05-26 10:33:00.665701
2020-09-23 07:07:13.068961
2019-09-10 16:34:15.874630
2019-06-05 02:52:51.200319
2020-04-02 07:33:32.223655
2018-07-20 18:37:34.730087
2019-06-06 10:12:33.900592
2019-04-13 13:38:26.850076
2021-05-08 08:51:39.987422
2022-09-24 00:05:45.684502
2020-04-24 03:46:44.943896
2018-07-04 20:50:53.375368
2022-05-26 03:14:40.938439
2020-10-10 13:31:52.239572
2022-08-27 11:04:48.971590
2022-06-04 17:14:43.222546
2022-10-04 21:27:39.133873
2018-09-23 18:42:26.844693
2

2017-05-31 11:12:18.307267
2019-10-16 11:47:40.616956
2017-08-10 07:20:49.754202
2019-07-07 13:48:29.719524
2021-03-02 06:29:15.359448
2020-03-28 20:46:34.344575
2019-07-26 10:33:18.996645
2020-06-19 23:17:22.227989
2018-10-08 08:09:56.993124
2021-03-05 13:23:30.770044
2017-01-23 21:11:06.431789
2018-01-30 04:58:17.833694
2017-08-14 03:15:54.968129
2020-11-11 08:19:04.311711
2017-12-04 07:21:01.244287
2017-06-28 22:12:14.494509
2022-06-29 02:20:57.380101
2022-05-29 15:56:08.660627
2021-03-03 03:16:08.160507
2020-04-01 14:51:20.270651
2022-04-13 10:42:41.451298
2020-04-05 03:33:29.407769
2018-09-26 09:32:15.469894
2017-05-17 11:29:11.893221
2021-06-17 18:41:44.483583
2018-06-11 15:54:11.156686
2019-06-13 10:31:54.741410
2021-03-09 16:09:23.260762
2021-02-15 21:41:42.147894
2020-02-24 00:58:25.271750
2021-03-24 20:11:13.815845
2021-03-29 13:12:52.520099
2019-01-05 23:56:05.434569
2022-07-07 01:15:46.773851
2020-04-21 18:17:50.372296
2017-11-28 11:07:26.707714
2018-04-07 10:22:36.834165
2

2018-03-09 17:45:08.214393
2019-10-27 05:18:05.859690
2020-07-05 04:10:12.689048
2020-05-21 04:22:47.811521
2020-11-01 14:46:20.813455
2017-11-05 08:37:49.969307
2022-12-09 05:24:56.969612
2020-03-12 03:15:17.784424
2018-05-12 13:36:27.225899
2022-12-03 04:48:10.634358
2020-04-30 17:56:36.443596
2018-09-20 15:50:02.210574
2019-09-12 06:49:58.635970
2017-04-30 04:07:49.670858
2021-01-07 07:18:42.333835
2022-04-30 12:58:53.650484
2020-05-14 00:47:11.643408
2019-01-16 20:01:30.129593
2021-02-05 13:58:50.284426
2020-03-30 01:41:58.303627
2020-08-14 07:38:02.264910
2021-05-11 18:28:51.981542
2019-02-19 02:53:45.463168
2018-04-03 16:01:49.273231
2021-03-10 08:56:20.038520
2020-01-29 10:24:47.836444
2019-04-29 18:51:46.390922
2021-01-26 18:59:34.208026
2020-09-23 16:21:09.545626
2022-02-16 05:36:20.266285
2021-02-17 10:44:53.407272
2019-05-18 10:32:04.794553
2019-08-11 14:14:06.666562
2021-10-19 06:22:13.375014
2019-09-06 05:38:04.380751
2020-10-01 07:14:28.148808
2022-05-19 11:08:23.456320
2

2017-12-17 08:46:40.797145
2021-10-01 00:36:41.424410
2017-01-24 10:34:25.751245
2018-09-17 15:16:30.129797
2018-05-09 07:11:11.653308
2022-09-05 20:20:54.000046
2021-06-22 13:17:17.902931
2018-07-10 05:33:48.112101
2020-01-18 03:22:51.363097
2019-11-19 15:16:00.908220
2022-07-03 21:08:05.220161
2020-03-18 15:39:24.948821
2017-03-03 20:36:47.893194
2022-11-11 10:20:39.369945
2020-06-04 00:35:30.627323
2021-07-24 20:08:53.145168
2019-04-06 16:12:48.266848
2021-05-01 00:09:24.637428
2019-06-20 06:17:24.630185
2022-03-08 01:14:04.807840
2020-03-19 21:41:57.586481
2021-02-27 06:08:22.890971
2021-03-29 00:28:39.124678
2019-04-14 15:28:09.728809
2022-07-13 16:47:16.189271
2018-04-20 23:09:04.176205
2021-04-30 04:30:32.499988
2022-01-30 17:57:43.485438
2020-05-29 12:43:20.891220
2018-08-07 00:08:01.376685
2021-07-11 15:47:32.451772
2022-01-08 00:41:16.736227
2021-08-10 12:09:29.777185
2020-12-18 15:55:28.859671
2022-08-20 13:44:56.001054
2020-11-01 14:15:29.983273
2021-09-30 18:02:52.056325
2

2019-12-13 19:21:24.098689
2019-09-12 21:02:28.961930
2022-09-10 20:52:27.473550
2021-01-03 08:29:27.908890
2022-08-11 05:34:33.412154
2018-02-22 15:17:59.659894
2017-08-11 14:00:38.766242
2017-09-02 13:41:24.587470
2020-09-06 05:51:40.685281
2017-01-13 11:48:35.745911
2019-01-06 04:33:09.205156
2017-11-14 18:32:01.732202
2021-03-22 23:43:06.777635
2020-01-05 09:32:10.453778
2017-06-28 01:00:36.146320
2020-10-26 04:48:26.985520
2020-05-31 04:19:57.249907
2021-06-19 14:46:14.840986
2018-11-13 22:29:53.460686
2019-08-19 05:54:38.334342
2021-01-04 09:55:51.652193
2020-12-15 00:12:52.264054
2021-01-17 04:06:38.458943
2020-03-11 11:48:18.209273
2020-01-25 16:35:25.828077
2019-08-22 18:31:54.855445
2021-05-06 08:23:25.474711
2018-05-29 04:25:53.460397
2018-07-06 22:37:37.405480
2017-07-31 14:26:37.659389
2019-12-08 16:36:09.412047
2021-01-25 15:12:07.266816
2018-08-22 17:32:03.948631
2018-08-29 12:24:02.563054
2017-08-03 17:56:57.379521
2017-02-25 06:47:46.856021
2021-07-01 10:39:02.156520
2

2022-07-09 10:33:07.115825
2021-06-29 09:49:35.700666
2021-10-06 12:03:57.808226
2022-12-21 04:46:54.288002
2020-08-03 21:40:07.663686
2018-12-13 06:08:18.328131
2021-05-25 22:38:57.982519
2018-12-29 03:46:42.087795
2019-06-29 22:42:37.576277
2020-03-26 23:50:31.635810
2017-10-24 02:28:02.348151
2018-01-08 07:59:13.436818
2022-12-11 23:23:38.928835
2020-07-16 02:14:44.066196
2017-05-13 14:51:06.310288
2017-03-15 16:15:59.154075
2022-07-27 10:35:54.423213
2021-10-14 17:04:38.758593
2021-02-12 07:25:27.441934
2022-05-21 12:10:25.223037
2020-05-13 21:49:52.769440
2022-05-04 22:19:25.052457
2020-04-18 20:28:41.684565
2018-01-19 20:06:26.321914
2021-07-09 23:42:55.557569
2017-05-13 06:24:05.157130
2021-07-10 10:34:40.278166
2022-05-07 01:05:06.535736
2020-02-20 23:12:03.994743
2020-09-10 06:29:37.124014
2017-11-03 19:57:01.723778
2019-09-07 00:18:18.801545
2017-11-12 15:25:28.805343
2020-05-28 20:58:11.154222
2021-02-14 03:35:38.849193
2021-12-10 07:24:34.143535
2021-10-20 14:50:46.079808
2

2018-12-12 10:08:57.069076
2022-11-27 05:39:06.787535
2022-02-11 03:45:52.040684
2019-10-07 20:00:58.832098
2020-09-12 16:49:06.945277
2018-07-02 04:09:46.428136
2017-03-01 14:06:48.234641
2022-03-30 06:15:10.215542
2017-07-10 16:42:05.426306
2020-10-10 00:57:43.174535
2021-06-21 09:27:31.287517
2017-12-28 00:25:50.638272
2022-10-24 11:51:36.335289
2019-11-14 18:49:56.409675
2017-01-05 14:31:14.704153
2019-08-12 14:18:26.955886
2017-07-29 02:58:19.063584
2021-06-06 12:56:24.197391
2020-04-26 11:34:10.307040
2022-01-30 17:01:09.748780
2020-05-23 14:47:39.547970
2017-01-11 06:55:38.079260
2017-05-14 06:08:13.962478
2021-12-17 10:04:24.426838
2020-01-26 10:22:12.208604
2018-10-17 07:22:58.099300
2022-11-09 18:05:22.330431
2018-07-25 17:34:44.278648
2018-07-04 02:30:14.823651
2019-03-28 12:48:33.386877
2022-03-27 19:11:12.165975
2022-01-10 01:00:50.044171
2022-04-20 19:49:26.711769
2021-10-17 16:38:11.302984
2018-02-05 01:38:22.893478
2020-01-13 04:14:45.775724
2022-10-26 00:09:42.864904
2

2018-06-29 20:10:08.256624
2022-11-29 00:49:09.960473
2022-04-22 20:51:29.831295
2022-06-01 19:18:47.870595
2020-12-24 08:56:48.834890
2020-03-18 11:58:01.862437
2021-08-24 21:59:37.378415
2020-12-07 22:42:13.828079
2019-12-16 19:51:31.520359
2022-10-10 18:15:27.710481
2022-02-14 19:53:43.333464
2022-04-30 14:40:07.167475
2017-07-08 09:27:04.794281
2022-06-04 07:43:29.023307
2022-07-28 01:18:04.918943
2020-05-16 07:17:53.302767
2019-07-10 04:31:42.634539
2020-08-20 17:01:01.271770
2018-11-07 07:45:12.822457
2022-09-13 23:17:25.661098
2020-10-09 00:48:17.721795
2020-12-04 23:03:24.330624
2021-03-24 01:17:05.850754
2019-12-28 23:01:57.958807
2020-03-07 02:27:16.564372
2020-11-30 22:54:03.590668
2022-10-13 02:34:45.046595
2019-04-22 09:53:46.492193
2018-12-15 11:23:46.856277
2017-05-16 16:41:13.992933
2019-05-07 16:12:06.753388
2017-03-07 14:38:07.284287
2020-02-18 11:21:04.831322
2017-11-01 06:38:41.604876
2021-02-19 02:23:54.555588
2019-11-29 10:56:49.004479
2020-07-08 21:10:34.740428
2

2021-12-16 02:45:12.685326
2019-10-26 12:52:49.165686
2019-12-03 07:35:14.509978
2021-04-23 05:50:55.940882
2019-02-20 01:18:26.481659
2022-09-10 10:48:41.481648
2018-07-15 09:50:04.269279
2019-08-04 02:05:50.682439
2017-03-31 15:05:17.608999
2017-12-04 05:40:14.890192
2020-08-18 20:14:03.310005
2022-09-22 01:25:41.598701
2021-11-30 23:45:00.700585
2020-10-07 00:29:21.614065
2022-05-07 09:42:24.493718
2019-07-23 03:38:10.307559
2017-09-16 17:37:36.486466
2018-08-20 07:35:30.850434
2021-04-04 22:03:04.577241
2017-10-07 13:35:18.232844
2022-07-13 22:06:45.910411
2022-01-26 03:29:27.157246
2020-07-25 04:26:40.775252
2017-02-12 15:42:11.139255
2022-10-21 19:56:47.710312
2020-01-21 06:44:36.731462
2017-11-23 00:43:14.086550
2021-05-16 04:34:06.527152
2021-02-04 04:14:20.676299
2018-07-09 15:30:41.499664
2021-01-10 14:24:17.699873
2019-12-26 19:05:11.582984
2022-11-29 21:09:28.167852
2020-02-24 13:25:44.929433
2021-11-30 16:18:25.070332
2017-11-03 08:42:41.121170
2019-09-10 21:30:55.764142
2

2019-05-04 21:37:26.824996
2021-02-10 00:43:52.177412
2017-12-13 01:31:07.453788
2022-11-05 12:57:12.281423
2019-02-14 22:59:06.707622
2021-03-29 06:50:59.217697
2017-06-21 15:17:29.764295
2021-01-18 13:26:41.622355
2022-05-25 05:18:55.976863
2018-08-13 03:44:03.705786
2018-10-15 10:42:06.361393
2017-07-24 13:58:03.005899
2019-08-18 07:45:11.211716
2018-03-15 16:47:46.386199
2019-06-28 11:38:37.690187
2018-07-18 19:06:34.168861
2022-06-04 22:57:03.823485
2018-10-25 11:27:09.190335
2017-11-19 01:40:21.348548
2019-11-15 16:19:08.029745
2020-08-13 22:50:35.567973
2018-08-06 02:57:45.562716
2018-06-07 01:53:09.457810
2017-03-01 20:32:57.830343
2021-01-25 01:41:36.571824
2021-01-08 05:44:44.957937
2018-02-13 08:18:22.940490
2022-04-15 07:23:14.671131
2022-07-15 06:29:02.922840
2019-02-26 11:07:14.292725
2020-11-28 02:44:20.728743
2018-08-16 10:26:45.205570
2017-11-03 12:46:29.922092
2022-04-20 14:24:19.444323
2019-04-09 01:07:38.908752
2017-02-24 00:15:46.869671
2018-06-18 03:08:28.891853
2

2019-08-21 00:31:06.969073
2019-04-14 23:18:57.340316
2018-02-06 18:06:06.176501
2017-09-20 16:29:43.684445
2018-10-27 20:06:44.149525
2021-10-12 04:57:19.493748
2019-07-10 11:47:33.961357
2018-11-08 00:14:04.492029
2020-01-24 08:44:29.628703
2018-05-23 12:59:47.261563
2022-05-01 06:35:09.324383
2019-12-04 07:48:50.932510
2018-05-07 16:14:42.974251
2020-10-18 07:48:50.548586
2018-09-13 16:11:09.620394
2022-05-18 08:08:27.855694
2017-02-09 04:26:04.223613
2020-09-08 23:43:57.096115
2017-03-18 13:16:03.496832
2018-06-05 08:11:14.674413
2018-08-08 19:49:36.835251
2022-05-25 16:14:48.005807
2022-04-04 14:44:22.517419
2017-05-26 19:33:35.007829
2020-05-30 09:21:29.896977
2022-07-11 13:27:51.450876
2021-11-28 13:35:09.175521
2018-02-09 14:55:32.506131
2018-07-22 21:17:41.569987
2022-10-18 23:21:01.765273
2022-09-30 00:17:20.660972
2022-12-28 14:39:25.188838
2021-07-22 18:46:27.301077
2021-03-25 10:24:15.005328
2018-06-22 22:52:46.583473
2018-01-21 01:19:39.083513
2022-04-16 16:26:55.377401
2

2019-08-22 14:01:58.226792
2019-09-24 03:36:47.576835
2021-05-11 18:07:47.520775
2017-07-14 00:50:36.307245
2018-12-23 19:37:54.698073
2021-09-21 16:57:51.004952
2021-10-12 07:34:21.052209
2020-01-30 18:19:33.193767
2020-11-30 01:09:48.999300
2021-06-04 13:30:53.852909
2018-02-01 03:15:26.211277
2017-02-18 17:55:21.082794
2022-02-21 19:43:56.774262
2018-01-30 23:26:31.597782
2021-07-09 19:47:19.351442
2020-09-03 11:18:37.450468
2019-03-29 10:33:39.805616
2017-11-06 14:40:34.286243
2019-12-30 09:45:17.948710
2017-12-12 02:48:09.192486
2017-03-28 07:57:33.308522
2019-11-12 02:22:01.920410
2017-05-16 00:31:57.792795
2022-07-11 16:36:20.939799
2022-01-06 19:35:08.999810
2018-01-24 16:03:33.483135
2018-09-05 20:08:08.198057
2017-02-24 17:41:01.570538
2021-11-20 04:52:55.620883
2018-08-26 10:45:40.329590
2017-09-20 02:06:13.010991
2017-12-25 09:33:39.583164
2020-08-08 12:08:21.356298
2017-01-24 20:16:51.225954
2020-03-10 03:35:51.133461
2017-11-11 01:57:46.084377
2017-11-29 09:37:46.254074
2

2022-07-16 18:11:50.599528
2019-06-24 04:57:23.206422
2021-10-22 11:48:19.388227
2017-09-08 12:38:27.891331
2022-02-12 11:52:44.921959
2021-02-25 22:47:16.549239
2018-08-24 16:43:36.750882
2019-10-29 10:45:53.404651
2018-08-19 18:55:36.037942
2021-10-12 20:28:55.167713
2019-06-24 06:55:39.200696
2020-03-06 22:33:27.199818
2019-03-18 13:13:09.314832
2022-12-02 12:14:35.810006
2017-04-13 13:13:38.525448
2022-05-07 22:36:32.647617
2021-05-07 13:02:01.405208
2022-08-04 13:08:44.082633
2018-12-31 05:19:53.775419
2017-07-29 19:08:10.847256
2021-09-03 12:03:18.656058
2022-09-26 03:43:17.410498
2022-03-28 11:44:42.832007
2017-02-09 22:12:22.142565
2020-09-07 14:20:59.605713
2022-11-21 19:57:40.071505
2021-03-17 21:01:20.284284
2019-08-04 15:28:45.941856
2017-03-05 02:03:20.190413
2020-08-26 22:39:59.762973
2021-12-09 03:21:19.196957
2022-07-25 23:30:01.409645
2019-08-13 07:50:43.547583
2019-09-26 12:44:17.283431
2019-09-27 19:41:11.560572
2019-09-05 23:21:58.081328
2022-09-13 07:32:54.046219
2

2021-01-07 14:11:40.145665
2018-01-29 05:22:58.328951
2021-01-23 14:08:42.540860
2017-01-05 17:25:49.892953
2022-07-11 09:02:57.460268
2018-12-11 19:59:35.167878
2020-09-22 20:24:11.943263
2021-08-19 03:11:21.933240
2022-09-06 07:57:38.001791
2019-08-28 06:50:04.127538
2021-08-25 08:23:48.017364
2019-12-06 15:51:23.746140
2018-09-14 05:12:30.046402
2022-10-08 04:29:51.289504
2022-12-12 14:46:33.584073
2021-08-23 01:51:58.663508
2021-02-07 15:51:41.977328
2021-06-07 22:29:57.109595
2021-11-08 10:54:36.758142
2020-01-03 09:58:24.861947
2020-10-10 00:16:27.018242
2018-12-10 06:52:07.690561
2021-01-23 08:41:28.667145
2019-03-16 01:00:01.539366
2018-11-20 17:47:14.374893
2022-12-09 17:32:52.367848
2019-05-07 02:00:16.320971
2021-12-04 09:22:11.748926
2021-07-19 11:42:54.957177
2022-04-14 12:13:54.360947
2019-09-15 19:06:17.572376
2021-09-17 16:01:29.869262
2020-02-01 22:44:45.353927
2020-01-10 01:02:58.369096
2017-07-20 05:58:45.810126
2019-11-12 11:35:46.831151
2019-03-24 23:45:36.553694
2

2018-10-18 13:39:17.244022
2019-08-03 05:47:20.082281
2017-10-25 06:26:01.577679
2018-12-31 03:23:12.699095
2021-11-19 16:00:47.170563
2022-08-27 07:10:05.171570
2021-10-01 21:31:22.395495
2019-06-23 05:54:56.604623
2019-07-15 18:48:35.511859
2020-11-28 14:11:36.313976
2019-12-15 04:04:34.434242
2020-02-11 13:02:39.237040
2021-05-11 06:07:37.545824
2017-09-16 18:14:27.320653
2020-02-27 10:36:37.878257
2017-11-22 17:20:10.299657
2019-02-13 04:55:27.913103
2021-09-11 06:46:23.520177
2021-11-30 20:51:03.342726
2021-08-28 04:05:37.911364
2019-05-14 09:33:07.302127
2020-12-29 16:34:28.798587
2021-09-19 04:12:07.638102
2022-06-04 03:54:25.672223
2021-08-04 16:40:16.163388
2019-03-18 20:20:41.872904
2018-06-12 06:35:31.471252
2017-05-03 18:28:31.842669
2018-08-19 15:59:12.391901
2020-01-12 05:49:05.072374
2020-12-12 14:06:28.948100
2017-09-16 05:18:25.352696
2020-12-30 20:27:29.699590
2021-06-01 02:50:12.579653
2020-07-10 06:21:08.590309
2019-09-22 04:38:10.303707
2018-08-06 19:02:46.362337
2

2017-07-02 16:46:07.579124
2020-09-02 19:30:19.568773
2021-07-29 23:58:52.884288
2019-12-23 09:48:34.816168
2021-08-21 11:22:36.498931
2019-03-16 01:28:33.815750
2022-03-20 17:00:51.719192
2017-12-29 10:07:40.501861
2022-08-22 11:56:39.982693
2020-10-02 21:17:51.833641
2017-02-09 11:46:27.166602
2020-02-16 03:23:32.088911
2022-11-29 04:55:00.205036
2020-11-24 07:22:18.457900
2018-03-25 17:46:28.220976
2021-07-17 00:38:47.919925
2022-03-27 02:06:12.850816
2022-12-25 08:52:49.023720
2022-02-07 20:51:10.347682
2022-05-23 07:19:52.731792
2018-11-06 09:16:25.518892
2021-02-09 03:44:33.868012
2017-08-25 11:52:06.872934
2021-08-17 13:33:42.844334
2019-12-20 15:46:02.533749
2018-05-02 23:53:53.598205
2017-03-30 11:39:59.027822
2018-02-25 06:06:32.768781
2018-02-27 05:35:09.675125
2021-07-28 10:10:02.351397
2018-11-18 06:35:56.131236
2021-07-01 07:34:58.205606
2021-07-29 16:00:23.943340
2022-04-16 15:45:31.909193
2018-12-19 07:08:59.031158
2022-04-29 19:13:48.962676
2021-10-21 16:56:54.242818
2

2017-04-28 03:50:44.501941
2017-05-06 13:07:00.826633
2022-08-24 11:38:59.082083
2021-12-11 05:38:14.671269
2020-02-23 10:20:25.053605
2018-05-02 04:07:49.484705
2017-12-09 06:10:09.385970
2020-12-17 23:40:02.781135
2017-02-10 02:34:41.133773
2022-03-10 17:09:31.006928
2019-08-13 18:48:26.837020
2020-12-26 04:42:03.008052
2021-10-28 20:06:41.598291
2017-09-08 05:08:15.015747
2021-07-26 12:05:05.670797
2019-04-02 22:32:27.859232
2018-11-24 00:49:23.948078
2019-03-15 17:52:13.161967
2018-04-22 04:07:24.756427
2018-03-24 23:52:46.686893
2019-01-10 23:38:51.735045
2020-04-21 16:44:39.128740
2019-04-23 10:24:02.342846
2017-02-08 08:03:00.884127
2019-02-19 07:30:43.989894
2021-01-07 05:06:15.237181
2019-02-11 07:37:13.556388
2017-09-18 10:10:33.265428
2022-08-27 10:47:44.624753
2020-07-03 21:30:29.227373
2021-01-26 20:03:53.206605
2021-06-02 09:28:10.991491
2020-02-29 18:45:12.403105
2018-05-14 18:28:14.850821
2022-08-28 19:53:05.663092
2019-12-18 21:08:00.873209
2019-12-27 11:33:30.237328
2

2017-05-03 10:52:41.247957
2017-04-15 04:40:52.775143
2019-05-29 10:36:55.342285
2020-04-24 22:21:39.859546
2018-12-04 02:45:57.979178
2022-09-14 11:44:53.346713
2018-03-08 17:06:19.363558
2020-12-13 12:02:13.575919
2021-12-17 11:27:26.345752
2021-11-05 10:29:13.248341
2018-09-17 05:51:18.680192
2021-08-07 01:20:18.402297
2020-12-23 22:56:48.178953
2020-10-26 19:56:05.367436
2022-06-24 21:35:34.964152
2019-05-27 04:19:20.643765
2022-04-19 07:13:03.338912
2020-06-23 07:24:48.697541
2022-09-01 14:59:26.792908
2021-04-19 02:23:21.327948
2020-08-05 02:08:33.908788
2018-07-16 00:57:54.853944
2018-01-20 23:10:25.013007
2021-07-20 15:37:31.428708
2019-08-10 17:59:39.402554
2019-12-02 07:29:22.790180
2017-03-28 01:05:30.068495
2021-03-24 04:38:49.394926
2017-10-17 07:27:51.824112
2022-08-06 21:19:58.469458
2017-06-12 07:44:01.099821
2022-12-22 19:28:16.534957
2020-03-16 09:05:29.573218
2018-11-17 13:48:34.817192
2021-11-11 05:05:26.434274
2017-01-07 19:32:56.901818
2020-06-08 05:08:16.913597
2

2022-09-04 06:25:53.706080
2021-07-04 07:46:54.671504
2018-07-06 00:51:31.059480
2017-12-21 04:49:31.128689
2020-11-28 16:36:35.505121
2019-10-11 05:57:19.435412
2019-09-21 02:43:44.640722
2020-12-14 14:27:35.698320
2021-05-08 20:51:11.984068
2021-01-13 04:12:54.013710
2022-04-15 07:18:30.245418
2021-10-20 22:21:09.520211
2022-02-12 15:06:26.861511
2022-09-14 22:48:53.933613
2020-08-22 21:49:06.588233
2022-03-17 03:20:15.494802
2021-11-17 06:54:18.902453
2019-03-29 14:48:08.870643
2021-01-08 19:29:07.348249
2019-01-02 20:45:37.733646
2020-08-30 10:40:34.279056
2020-07-06 23:36:02.527983
2021-10-26 20:19:03.664173
2022-01-15 01:12:39.178892
2018-10-20 08:53:23.318262
2021-07-17 10:26:42.141999
2020-06-04 18:27:04.670419
2021-05-05 04:04:44.053295
2017-02-04 10:47:05.958693
2020-01-14 16:35:23.364280
2021-10-20 03:21:36.143279
2017-11-19 08:59:34.370198
2022-04-13 20:20:35.157892
2021-12-05 04:28:47.592794
2021-05-02 23:31:55.000749
2017-11-05 19:55:19.399048
2022-09-04 17:15:29.101639
2

In [17]:
#Creating a ride date column
ride_sharing['ride_date'] = random_date

### Back to the future
A new update to the data pipeline feeding into the ride_sharing DataFrame has been updated to register each ride's date. This information is stored in the ride_date column of the type object, which represents strings in pandas.

A bug was discovered which was relaying rides taken today as taken next year. To fix this, you will find all instances of the ride_date column that occur anytime in the future, and set the maximum possible value of this column to today's date. Before doing so, you would need to convert ride_date to a datetime object.

The datetime package has been imported as dt, alongside all the packages you've been using till now.

In [18]:
import datetime as dt
# Convert ride_date to datetime
ride_sharing['ride_dt'] = pd.to_datetime(ride_sharing['ride_date'])

# Save today's date
today = pd.Timestamp('today')

# Set all in the future to today's date
ride_sharing.loc[ride_sharing['ride_dt'] > today, 'ride_dt'] = today

# Print maximum of ride_dt column
print(ride_sharing['ride_dt'].max())

2021-02-12 16:39:53.224570


In [19]:
#Creating a subset of the dataset 
ride_sharing_sub = ride_sharing.loc[0:77, :]
ride_sharing_sub.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 78 entries, 0 to 77
Data columns (total 16 columns):
 #   Column           Non-Null Count  Dtype         
---  ------           --------------  -----         
 0   Unnamed: 0       78 non-null     int64         
 1   duration         78 non-null     object        
 2   station_A_id     78 non-null     int64         
 3   station_A_name   78 non-null     object        
 4   station_B_id     78 non-null     int64         
 5   station_B_name   78 non-null     object        
 6   bike_id          78 non-null     int64         
 7   user_type        78 non-null     int64         
 8   user_birth_year  78 non-null     int64         
 9   user_gender      78 non-null     object        
 10  user_type_cat    78 non-null     category      
 11  duration_trim    78 non-null     object        
 12  duration_time    78 non-null     int64         
 13  tire_sizes       78 non-null     category      
 14  ride_date        78 non-null     datetime64[

In [20]:
ride_sharing_sub.columns

Index(['Unnamed: 0', 'duration', 'station_A_id', 'station_A_name',
       'station_B_id', 'station_B_name', 'bike_id', 'user_type',
       'user_birth_year', 'user_gender', 'user_type_cat', 'duration_trim',
       'duration_time', 'tire_sizes', 'ride_date', 'ride_dt'],
      dtype='object')

In [21]:
#Dropping unnecessary columns
cols_to_go = ['Unnamed: 0', 'user_type_cat', 'duration_trim', 'duration_time', 'ride_dt']
ride_sharing_sub = ride_sharing_sub.drop(cols_to_go, axis = 1)

In [22]:
#creating an id for each row
id = extra.id
ride_sharing_sub.insert(loc = 0, column = 'ride_id', value = id)

In [23]:
#Stripping the string 'minutes' from the duration column
ride_sharing_sub['duration'] = ride_sharing_sub['duration'].str.strip('minutes')

In [24]:
#Creating new duration entries. Just wanted to do so, no reason
duration = extra.duration
ride_sharing_sub['duration'] = duration

In [25]:
#Creating new user_birth_year entries. Just wanted to do so, no reason.
user_birth_year = extra.user_birth_year
ride_sharing_sub['user_birth_year'] = user_birth_year

In [26]:
ride_sharing_sub.head()

Unnamed: 0,ride_id,duration,station_A_id,station_A_name,station_B_id,station_B_name,bike_id,user_type,user_birth_year,user_gender,tire_sizes,ride_date
0,0,11,81,Berry St at 4th St,323,Broadway at Kearny,5480,2,1988,Male,27,2022-08-02 07:46:05.371717
1,1,8,3,Powell St BART Station (Market St at 4th St),118,Eureka Valley Recreation Center,5193,2,1988,Male,27,2022-08-02 07:46:05.371717
2,2,11,67,San Francisco Caltrain Station 2 (Townsend St...,23,The Embarcadero at Steuart St,3652,3,1988,Male,27,2022-08-02 07:46:05.371717
3,3,7,16,Steuart St at Market St,28,The Embarcadero at Bryant St,1883,1,1969,Male,27,2022-08-02 07:46:05.371717
4,4,11,22,Howard St at Beale St,350,8th St at Brannan St,4626,2,1986,Male,27,2022-08-02 07:46:05.371717


### Finding duplicates
A new update to the data pipeline feeding into ride_sharing has added the ride_id column, which represents a unique identifier for each ride.

The update however coincided with radically shorter average ride duration times and irregular user birth dates set in the future. Most importantly, the number of rides taken has increased by 20% overnight, leading you to think there might be both complete and incomplete duplicates in the ride_sharing DataFrame.

In this exercise, you will confirm this suspicion by finding those duplicates. A sample of ride_sharing is in your environment, as well as all the packages you've been working with thus far.

In [27]:
# Find duplicates
duplicates = ride_sharing_sub.duplicated(subset = 'ride_id', keep = False)
print(duplicates)

0     False
1     False
2     False
3     False
4     False
      ...  
73    False
74     True
75     True
76     True
77     True
Length: 78, dtype: bool


In [28]:
# Sort your duplicated rides
duplicated_rides = ride_sharing_sub[duplicates].sort_values(by = 'ride_id')
print(duplicated_rides.head())

    ride_id  duration  station_A_id  \
22       33        10             5   
39       33         2            30   
53       55         9            21   
65       55         9            16   
74       71        11            67   

                                       station_A_name  station_B_id  \
22       Powell St BART Station (Market St at 5th St)           356   
39     San Francisco Caltrain (Townsend St at 4th St)           130   
53   Montgomery St BART Station (Market St at 2nd St)            78   
65                            Steuart St at Market St            93   
74  San Francisco Caltrain Station 2  (Townsend St...            90   

                  station_B_name  bike_id  user_type  user_birth_year  \
22   Valencia St at Clinton Park     2165          2             1979   
39      22nd St Caltrain Station     5213          1             1979   
53           Folsom St at 9th St     1502          2             1985   
65  4th St at Mission Bay Blvd S     5392     

In [29]:
# Print relevant columns of duplicated_rides
print(duplicated_rides[['ride_id','duration','user_birth_year']])

    ride_id  duration  user_birth_year
22       33        10             1979
39       33         2             1979
53       55         9             1985
65       55         9             1985
74       71        11             1997
75       71        11             1997
76       89         9             1986
77       89         9             2060


In [30]:
# Drop complete duplicates from ride_sharing
ride_dup = ride_sharing_sub.drop_duplicates()
ride_dup[ride_dup.duplicated(subset = 'ride_id', keep = False)]

Unnamed: 0,ride_id,duration,station_A_id,station_A_name,station_B_id,station_B_name,bike_id,user_type,user_birth_year,user_gender,tire_sizes,ride_date
22,33,10,5,Powell St BART Station (Market St at 5th St),356,Valencia St at Clinton Park,2165,2,1979,Male,26,2022-08-02 07:46:05.371717
39,33,2,30,San Francisco Caltrain (Townsend St at 4th St),130,22nd St Caltrain Station,5213,1,1979,Male,27,2022-08-02 07:46:05.371717
53,55,9,21,Montgomery St BART Station (Market St at 2nd St),78,Folsom St at 9th St,1502,2,1985,Female,26,2022-08-02 07:46:05.371717
65,55,9,16,Steuart St at Market St,93,4th St at Mission Bay Blvd S,5392,2,1985,Male,27,2022-08-02 07:46:05.371717
74,71,11,67,San Francisco Caltrain Station 2 (Townsend St...,90,Townsend St at 7th St,1920,2,1997,Male,27,2022-08-02 07:46:05.371717
75,71,11,21,Montgomery St BART Station (Market St at 2nd St),58,Market St at 10th St,316,2,1997,Female,27,2022-08-02 07:46:05.371717
76,89,9,22,Howard St at Beale St,72,Page St at Scott St,5162,2,1986,Female,27,2022-08-02 07:46:05.371717
77,89,9,21,Montgomery St BART Station (Market St at 2nd St),64,5th St at Brannan St,1299,2,2060,Male,26,2022-08-02 07:46:05.371717


In [31]:
# Create statistics dictionary for aggregation function
statistics = {'user_birth_year': 'min', 'duration': 'mean'}

In [32]:
# Group by ride_id and compute new statistics
ride_unique = ride_dup.groupby('ride_id').agg(statistics).reset_index()
ride_unique

Unnamed: 0,ride_id,user_birth_year,duration
0,0,1988,11
1,1,1988,8
2,2,1988,11
3,3,1969,7
4,4,1986,11
...,...,...,...
69,94,1993,25
70,95,1959,11
71,96,1991,7
72,98,1989,21


In [33]:
# Find duplicated values again
duplicates = ride_unique.duplicated(subset = 'ride_id', keep = False)
duplicated_rides = ride_unique[duplicates == True]

# Assert duplicates are processed
assert duplicated_rides.shape[0] == 0

### 2. Text & Categorical Data Problems
Categorical and text data can often be some of the messiest parts of a dataset due to their unstructured nature.We will fix whitespace and capitalization inconsistencies in category labels, collapse multiple categories into one, and reformat strings for consistency.

#### Finding consistency
In this exercise we'll be working with the airlines DataFrame which contains survey responses on the San Francisco Airport from airline customers.

The DataFrame contains flight metadata such as the airline, the destination, waiting times as well as answers to key questions regarding cleanliness, safety, and satisfaction. Another DataFrame named categories was created, containing all correct possible values for the survey columns.

In this exercise, we will use both of these DataFrames to find survey answers with inconsistent values, and drop them, effectively performing an outer and inner join on both these DataFrames. 

In [34]:
airlines = pd.read_csv('../Datasets/airlines_final.csv')
airlines.head()

Unnamed: 0.1,Unnamed: 0,id,day,airline,destination,dest_region,dest_size,boarding_area,dept_time,wait_min,cleanliness,safety,satisfaction
0,0,1351,Tuesday,UNITED INTL,KANSAI,Asia,Hub,Gates 91-102,2018-12-31,115.0,Clean,Neutral,Very satisfied
1,1,373,Friday,ALASKA,SAN JOSE DEL CABO,Canada/Mexico,Small,Gates 50-59,2018-12-31,135.0,Clean,Very safe,Very satisfied
2,2,2820,Thursday,DELTA,LOS ANGELES,West US,Hub,Gates 40-48,2018-12-31,70.0,Average,Somewhat safe,Neutral
3,3,1157,Tuesday,SOUTHWEST,LOS ANGELES,West US,Hub,Gates 20-39,2018-12-31,190.0,Clean,Very safe,Somewhat satsified
4,4,2992,Wednesday,AMERICAN,MIAMI,East US,Hub,Gates 50-59,2018-12-31,559.0,Somewhat clean,Very safe,Somewhat satsified


In [35]:
#Creating the categories dataframe
data = {'cleanliness' : ['Clean', 'Average', 'Somewhat clean', 'Somewhat dirty', 'Dirty'],
        'safety': ['Neutral', 'Very Safe', 'Somewhat safe', 'Very unsafe', 'Somewhat unsafe'],
        'satisfaction': ['Very satisfied', 'neutral', 'Somewhat satisfied', 'Somewhat unsatisfied', 'Very unsatisfied']
       }

categories = pd.DataFrame(data)
# Print categories DataFrame
print(categories)

# Print unique values of survey columns in airlines
print('Cleanliness: ', airlines['cleanliness'].unique(), "\n")
print('Safety: ', airlines['safety'].unique(), "\n")
print('Satisfaction: ', airlines['satisfaction'].unique(), "\n")

      cleanliness           safety          satisfaction
0           Clean          Neutral        Very satisfied
1         Average        Very Safe               neutral
2  Somewhat clean    Somewhat safe    Somewhat satisfied
3  Somewhat dirty      Very unsafe  Somewhat unsatisfied
4           Dirty  Somewhat unsafe      Very unsatisfied
Cleanliness:  ['Clean' 'Average' 'Somewhat clean' 'Somewhat dirty' 'Dirty'] 

Safety:  ['Neutral' 'Very safe' 'Somewhat safe' 'Very unsafe' 'Somewhat unsafe'] 

Satisfaction:  ['Very satisfied' 'Neutral' 'Somewhat satsified' 'Somewhat unsatisfied'
 'Very unsatisfied'] 



In [36]:
airlines.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2477 entries, 0 to 2476
Data columns (total 13 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Unnamed: 0     2477 non-null   int64  
 1   id             2477 non-null   int64  
 2   day            2477 non-null   object 
 3   airline        2477 non-null   object 
 4   destination    2477 non-null   object 
 5   dest_region    2477 non-null   object 
 6   dest_size      2477 non-null   object 
 7   boarding_area  2477 non-null   object 
 8   dept_time      2477 non-null   object 
 9   wait_min       2477 non-null   float64
 10  cleanliness    2477 non-null   object 
 11  safety         2477 non-null   object 
 12  satisfaction   2477 non-null   object 
dtypes: float64(1), int64(2), object(10)
memory usage: 251.7+ KB


In [37]:
#Finding the inconsistencies within the cleanliness, safety, and satisfaction columns. 
inconsistent_categories = set(airlines['cleanliness']).difference(categories['cleanliness'])
print('Inconsistencies in the cleanliness column is null:', inconsistent_categories)

inconsistent_categories1 = set(airlines['safety']).difference(categories['safety'])
print('Inconsistencies in the safety column is:', inconsistent_categories1)


inconsistent_categories2 = set(airlines['satisfaction']).difference(categories['satisfaction'])
print('Inconsistencies in the satisfaction column are:', inconsistent_categories2)

Inconsistencies in the cleanliness column is null: set()
Inconsistencies in the safety column is: {'Very safe'}
Inconsistencies in the satisfaction column are: {'Neutral', 'Somewhat satsified'}


In [38]:
# Find the cleanliness category in airlines not in categories
cat_clean = set(airlines['cleanliness']).difference(categories['cleanliness'])
print(cat_clean)

# Find rows with that category
cat_clean_rows = airlines['cleanliness'].isin(cat_clean)

# Print rows with inconsistent category
print(airlines[cat_clean_rows])

# Print rows with consistent categories only
print(airlines[~cat_clean_rows])


set()
Empty DataFrame
Columns: [Unnamed: 0, id, day, airline, destination, dest_region, dest_size, boarding_area, dept_time, wait_min, cleanliness, safety, satisfaction]
Index: []


In [43]:
#Find the safety categories in airlines not in categories
safety_clean = set(airlines['safety']).difference(categories['safety'])

#Find rows with that category
safety_clean_rows = airlines['safety'].isin(safety_clean)

#Print rows with inconsistent category
print(airlines[safety_clean_rows])

#Print rows with consistent categories only
print(airlines[~safety_clean_rows])

      Unnamed: 0    id        day        airline        destination  \
1              1   373     Friday         ALASKA  SAN JOSE DEL CABO   
3              3  1157    Tuesday      SOUTHWEST        LOS ANGELES   
4              4  2992  Wednesday       AMERICAN              MIAMI   
5              5   634   Thursday         ALASKA             NEWARK   
6              6  2578   Saturday        JETBLUE         LONG BEACH   
...          ...   ...        ...            ...                ...   
2466        2798  3099     Sunday         ALASKA             NEWARK   
2470        2802   394     Friday         ALASKA        LOS ANGELES   
2473        2805  2222   Thursday      SOUTHWEST            PHOENIX   
2474        2806  2684     Friday         UNITED            ORLANDO   
2476        2808  2162   Saturday  CHINA EASTERN            QINGDAO   

        dest_region dest_size boarding_area   dept_time  wait_min  \
1     Canada/Mexico     Small   Gates 50-59  2018-12-31     135.0   
3        

### Inconsistent categories
We'll be revisiting the airlines DataFrame from the previous lesson.

As a reminder, the DataFrame contains flight metadata such as the airline, the destination, waiting times as well as answers to key questions regarding cleanliness, safety, and satisfaction on the San Francisco Airport.

In this exercise, you will examine two categorical columns from this DataFrame, dest_region and dest_size respectively, assess how to address them and make sure that they are cleaned and ready for analysis. 

In [45]:
# Print unique values of both columns
print(airlines['dest_region'].unique(), '\n')
print(airlines['dest_size'].unique())

['Asia' 'Canada/Mexico' 'West US' 'East US' 'Midwest US' 'EAST US'
 'Middle East' 'Europe' 'eur' 'Central/South America'
 'Australia/New Zealand' 'middle east'] 

['Hub' 'Small' '    Hub' 'Medium' 'Large' 'Hub     ' '    Small'
 'Medium     ' '    Medium' 'Small     ' '    Large' 'Large     ']
