# `clean_date()`: Clean and validate date strings

## Introduction

The function `clean_date()` cleans a column containing date strings, and standardizes them in a desired format. The function `validate_date()` validates either a single date string or a column of date strings, returning "cleaned" standing for at the first stage the value is valid, and "unknown" otherwise. Note that the first stage means the initial format is correct. However, if the scenario like minite equals to 70 occurs, the function cannot immediately recognize this kind of error at first stage. They will be recognized during running of `clean_date()`. Also, this kind of error will not be cleaned by our function. 

Currently, many flexible date format like the following format are supported as valid input:

* `1996.07.10 AD at 15:08:56 PDT`
* `Tuesday, April 12, 1952 AD 3:30:42pm PST`
* `2003 Sep 25`	
* `12:00am`
* `Thu Sep 25 10:36:28 2003`

Various delimiters between the digits are also allowed: 
`[" ", ".", ",", ";", "-", "/", "'", "st", "nd", "rd", "th", "at", "on", "and", "ad", "AD", "of"]`

Phone numbers can be converted to the following formats via the `target_format` parameter. Also, users can specify many flexible target format like these:

* `YYYY-MM-DD`
* `yyyy.MM.dd AD at HH:mm:ss Z`
* `EEE, d MMM yyyy HH:mm:ss Z`

Users also can specify `origin_timezone` and `target_timezone` like `PDT`,`GMT` etc. When formatting the date, timezone will be transferred from origin timezone to target timezone.

Invalid parsing is handled with the `fix_empty` parameter:

* `auto_minimum` (default):
    * For hours, minutes and seconds, just fill them with zeros
    * For years, months and days, fill it with the minimum value
* `empty`: just left the missing component as it is
* `auto_nearest`:
    * For hours, minutes and seconds, fill it with the nearest value
    * For years, months and days, fill it with the nearest value

After cleaning, a **report** is printed that provides the following information:

* How many values were cleaned (the value must be transformed)
* How many values could not be cleaned
* And the data summary: how many values are in the correct format, and how many values are null

The following sections demonstrate the functionality of `clean_date()` and `validate_date()`. 

### An example dirty dataset

In [14]:
import pandas as pd
import numpy as np
df = pd.DataFrame({"date":
                   ['1996.07.10 AD at 15:08:56 PDT',
                    'Thu Sep 25 10:36:28 2003',
                    'Thu Sep 25 10:36:28 BRST 2003',
                    '2003 10:36:28 BRST 25 Sep Thu',
                    'Thu Sep 25 10:36:28 2003',
                    'Thu 10:36:28',
                    'Thu 10:36',
                    '10:36',
                    'Thu Sep 25 2003',
                    'Sep 25 2003',
                    'Sep 2003',
                    'Sep',
                    '2003',
                    '2003-09-25',
                    '2003-Sep-25',
                    '25-Sep-2003',
                    'Sep-25-2003',
                    '09-25-2003',
                    '25-09-2003',
                    '10-09-2003',
                    '10-09-03',
                    '2003.Sep.25',
                    '2003/09/25',
                    '2003 Sep 25',
                    '2003 09 25',
                    '10pm',
                    '12:00am',
                    'Sep 03',
                    'Sep of 03',
                    'Wed, July 10, 96',
                    '1996.07.10 AD at 15:08:56 PDT',
                    'Tuesday, April 12, 1952 AD 3:30:42pm PST',
                    'November 5, 1994, 8:15:30 am EST',
                    '3rd of May 2001',
                    '5:50 AM on June 13, 1990', 
                    'NULL',
                    'nan',
                    'I\'m a little cat',
                    'This is Sep.']})
df

Unnamed: 0,date
0,1996.07.10 AD at 15:08:56 PDT
1,Thu Sep 25 10:36:28 2003
2,Thu Sep 25 10:36:28 BRST 2003
3,2003 10:36:28 BRST 25 Sep Thu
4,Thu Sep 25 10:36:28 2003
5,Thu 10:36:28
6,Thu 10:36
7,10:36
8,Thu Sep 25 2003
9,Sep 25 2003


## 1. Default `clean_date()`

By default, the `target_format` parameter is set to "YYYY-MM-DD hh:mm:ss", the `origin_timezone` parameter is set to "UTC", the `fix_empty` parameter is set to "auto_minimum" and the `show_report` parameter is set to "True". And we don't specify the `target_timezone` parameter.

In [15]:
from dataprep.clean import clean_date
clean_date(df, 'date')

Date Cleaning Report:
	35 values cleaned (89.74%)
	2 values unable to be parsed (5.13%), set to NaN
Result contains 35 (89.74%) values in the correct format and 4 null values (10.26%)


Unnamed: 0,date,date_clean
0,1996.07.10 AD at 15:08:56 PDT,1996-07-10 15:08:56
1,Thu Sep 25 10:36:28 2003,2003-09-25 10:36:28
2,Thu Sep 25 10:36:28 BRST 2003,2003-09-25 10:36:28
3,2003 10:36:28 BRST 25 Sep Thu,2003-09-25 10:36:28
4,Thu Sep 25 10:36:28 2003,2003-09-25 10:36:28
5,Thu 10:36:28,2000-01-01 10:36:28
6,Thu 10:36,2000-01-01 10:36:00
7,10:36,2000-01-01 10:36:00
8,Thu Sep 25 2003,2003-09-25 00:00:00
9,Sep 25 2003,2003-09-25 00:00:00


## 2. `target_format` parameter
This section demonstrate some valid target format. In fact, our function can support very flexible target formats, such as `YYYY-MM-DD` and `yyyy.MM.dd AD at HH:mm:ss z`. Users just need to specify tokens standing for year, month, day, hour, minute and second with valid separators. 

The tokens we support are listed in the following table.

|  Component | Token |
|  ----      | ----  |
|  Year      | `"yyyy", "yy", "YYYY", "YY", "Y", "y"` |
|  Month     | `"MM", "M", "MMM", "MMMMM"` |
|  Day       | `"dd", "d", "DD", "D"` |
|  Hour      | `"hh", "h", "HH", "H"` |
|  Minute    | `"mm", "m"` |
|  Second    | `"ss", "s", "SS", "S"` |
|  Weekday   | `"eee", "EEE", "eeeee", "EEEEE"` |
|  Timezone  | `"Z",'z'` |

The separators we support are listed here: `[" ", ".", ",", ";", "-", "/", "'", "st", "nd", "rd", "th", "at", "on", "and", "ad", "AD", "of"]`

### Example format: `YYYY-MM-DD`

In [16]:
clean_date(df, 'date', target_format='YYYY-MM-DD')

Date Cleaning Report:
	34 values cleaned (87.18%)
	2 values unable to be parsed (5.13%), set to NaN
Result contains 35 (89.74%) values in the correct format and 4 null values (10.26%)


Unnamed: 0,date,date_clean
0,1996.07.10 AD at 15:08:56 PDT,1996-07-10
1,Thu Sep 25 10:36:28 2003,2003-09-25
2,Thu Sep 25 10:36:28 BRST 2003,2003-09-25
3,2003 10:36:28 BRST 25 Sep Thu,2003-09-25
4,Thu Sep 25 10:36:28 2003,2003-09-25
5,Thu 10:36:28,2000-01-01
6,Thu 10:36,2000-01-01
7,10:36,2000-01-01
8,Thu Sep 25 2003,2003-09-25
9,Sep 25 2003,2003-09-25


### Example format: `yyyy.MM.dd AD at HH:mm:ss Z`

In [17]:
clean_date(df, 'date', target_format='yyyy.MM.dd AD at HH:mm:ss Z')

Date Cleaning Report:
	35 values cleaned (89.74%)
	2 values unable to be parsed (5.13%), set to NaN
Result contains 35 (89.74%) values in the correct format and 4 null values (10.26%)


Unnamed: 0,date,date_clean
0,1996.07.10 AD at 15:08:56 PDT,1996.07.10 AD at 15:08:56 UTC+00:00
1,Thu Sep 25 10:36:28 2003,2003.09.25 AD at 10:36:28 UTC+00:00
2,Thu Sep 25 10:36:28 BRST 2003,2003.09.25 AD at 10:36:28 UTC+00:00
3,2003 10:36:28 BRST 25 Sep Thu,2003.09.25 AD at 10:36:28 UTC+00:00
4,Thu Sep 25 10:36:28 2003,2003.09.25 AD at 10:36:28 UTC+00:00
5,Thu 10:36:28,2000.01.01 AD at 10:36:28 UTC+00:00
6,Thu 10:36,2000.01.01 AD at 10:36:00 UTC+00:00
7,10:36,2000.01.01 AD at 10:36:00 UTC+00:00
8,Thu Sep 25 2003,2003.09.25 AD at 00:00:00 UTC+00:00
9,Sep 25 2003,2003.09.25 AD at 00:00:00 UTC+00:00


### Example format: `yyyy.MM.dd AD at HH:mm:ss z`

In [18]:
clean_date(df, 'date', target_format='yyyy.MM.dd AD at HH:mm:ss z')

Date Cleaning Report:
	35 values cleaned (89.74%)
	2 values unable to be parsed (5.13%), set to NaN
Result contains 35 (89.74%) values in the correct format and 4 null values (10.26%)


Unnamed: 0,date,date_clean
0,1996.07.10 AD at 15:08:56 PDT,1996.07.10 AD at 15:08:56 UTC
1,Thu Sep 25 10:36:28 2003,2003.09.25 AD at 10:36:28 UTC
2,Thu Sep 25 10:36:28 BRST 2003,2003.09.25 AD at 10:36:28 UTC
3,2003 10:36:28 BRST 25 Sep Thu,2003.09.25 AD at 10:36:28 UTC
4,Thu Sep 25 10:36:28 2003,2003.09.25 AD at 10:36:28 UTC
5,Thu 10:36:28,2000.01.01 AD at 10:36:28 UTC
6,Thu 10:36,2000.01.01 AD at 10:36:00 UTC
7,10:36,2000.01.01 AD at 10:36:00 UTC
8,Thu Sep 25 2003,2003.09.25 AD at 00:00:00 UTC
9,Sep 25 2003,2003.09.25 AD at 00:00:00 UTC


### Example format: `EEE, d MMM yyyy HH:mm:ss Z`

In [19]:
clean_date(df, 'date', target_format='EEE, d MMM yyyy HH:mm:ss Z')

Date Cleaning Report:
	35 values cleaned (89.74%)
	2 values unable to be parsed (5.13%), set to NaN
Result contains 35 (89.74%) values in the correct format and 4 null values (10.26%)


Unnamed: 0,date,date_clean
0,1996.07.10 AD at 15:08:56 PDT,"Wed, 10 Jul 1996 15:08:56 UTC+00:00"
1,Thu Sep 25 10:36:28 2003,"Thu, 25 Sep 2003 10:36:28 UTC+00:00"
2,Thu Sep 25 10:36:28 BRST 2003,"Thu, 25 Sep 2003 10:36:28 UTC+00:00"
3,2003 10:36:28 BRST 25 Sep Thu,"Thu, 25 Sep 2003 10:36:28 UTC+00:00"
4,Thu Sep 25 10:36:28 2003,"Thu, 25 Sep 2003 10:36:28 UTC+00:00"
5,Thu 10:36:28,"Thu, 1 Jan 2000 10:36:28 UTC+00:00"
6,Thu 10:36,"Thu, 1 Jan 2000 10:36:00 UTC+00:00"
7,10:36,"Sat, 1 Jan 2000 10:36:00 UTC+00:00"
8,Thu Sep 25 2003,"Thu, 25 Sep 2003 00:00:00 UTC+00:00"
9,Sep 25 2003,"Thu, 25 Sep 2003 00:00:00 UTC+00:00"


## 3. `origin_timezone` and `target_timezone` parameter
This section demostrates valide origin timezones and target timezones. `origin_timezone` in our function means user-specified timezone for input data. `target_timezone` in our function means user-specified timezone for output data.

In our function, the range of `origin_timezone` and `target_timezone` includes two parts:
* All timezones in `pytz.all_timezones`
* Abbreviation for common-used timezones

|  Timezone Name | UTC offset |
|  ----      | ----  |
|UTC             | 0          |
|ACT| -5|
|ADT|-3|
|AEDT|11|
|AEST|10|
|AKDT|-8|
|AKST|-9|
|AMST|-3|
|AMT|-4|
|ART|-3|
|ArabiaST|3|
|AtlanticST|-4|
|AWST|8|
|AZOST|0|
|AZOT|0|
|BOT|-4|
|BRST|-2|
|BRT|-3|
|BST|1|
|BTT|6|
|CAT|2|
|CDT|-5|
|CEST|2|
|CET|1|
|CHOST|9|
|CHOT|8|
|CHUT|10|
|CKT|-10|
|CLST|-3|
|CLT|-4|
|CentralST|-6|
|ChinaST|8|
|CubaST|-5|
|ChST|10|
|EASST|-5|
|EAST|-6|
|EAT|3|
|ECT|-5|
|EDT|-4|
|EEST|3|
|EET|2|
|EST|-5|
|FKST|-3|
|GFT|-3|
|GILT|12|
|GMT|0|
|GST|4|
|HKT|8|
|HST|-10|
|ICT|7|
|IDT|3|
|IrishST|1|
|IsraelST|2|
|JST|9|
|KOST|11|
|LINT|4|
|MDT|-6|
|MHT|12|
|MSK|3|
|MST|-7|
|MYT|8|
|NUT|-11|
|NZDT|13|
|NZST|12|
|PDT|-7|
|PET|-5|
|PGT|10|
|PHT|8|
|PONT|11|
|PST|-8|
|SAST|2|
|SBT|11|
|SGT|8|
|SRT|-3|
|SST|-11|
|TAHT|-10|
|TLT|9|
|TVT|12|
|ULAST|9|
|ULAT|8|
|UYST|-2|
|UYT|-3|
|VET|-4|
|WAST|2|
|WAT|1|
|WEST|1|
|WET|0|
|WIB|7|
|WIT|9|
|WITA|8|

### Example format: 
`origin_timezone`: `PDT`

`target_timezone`: `ChinaST`

`target_format`: `yyyy.MM.dd AD at HH:mm:ss Z`

In [20]:
clean_date(df, 'date', origin_timezone='PDT', target_timezone='ChinaST',target_format='yyyy.MM.dd AD at HH:mm:ss Z')

Date Cleaning Report:
	35 values cleaned (89.74%)
	2 values unable to be parsed (5.13%), set to NaN
Result contains 35 (89.74%) values in the correct format and 4 null values (10.26%)


Unnamed: 0,date,date_clean
0,1996.07.10 AD at 15:08:56 PDT,1996.07.11 AD at 06:08:56 UTC+08:00
1,Thu Sep 25 10:36:28 2003,2003.09.26 AD at 01:36:28 UTC+08:00
2,Thu Sep 25 10:36:28 BRST 2003,2003.09.26 AD at 01:36:28 UTC+08:00
3,2003 10:36:28 BRST 25 Sep Thu,2003.09.26 AD at 01:36:28 UTC+08:00
4,Thu Sep 25 10:36:28 2003,2003.09.26 AD at 01:36:28 UTC+08:00
5,Thu 10:36:28,2000.01.02 AD at 01:36:28 UTC+08:00
6,Thu 10:36,2000.01.02 AD at 01:36:00 UTC+08:00
7,10:36,2000.01.02 AD at 01:36:00 UTC+08:00
8,Thu Sep 25 2003,2003.09.25 AD at 15:00:00 UTC+08:00
9,Sep 25 2003,2003.09.25 AD at 15:00:00 UTC+08:00


### Example format: 
`origin_timezone`: `EST`

`target_timezone`: `PDT`

`target_format`: `yyyy.MM.dd AD at HH:mm:ss Z`

In [21]:
clean_date(df, 'date', origin_timezone='EST', target_timezone='PDT',target_format='yyyy.MM.dd AD at HH:mm:ss Z')

Date Cleaning Report:
	35 values cleaned (89.74%)
	2 values unable to be parsed (5.13%), set to NaN
Result contains 35 (89.74%) values in the correct format and 4 null values (10.26%)


Unnamed: 0,date,date_clean
0,1996.07.10 AD at 15:08:56 PDT,1996.07.12 AD at 03:08:56 UTC-07:00
1,Thu Sep 25 10:36:28 2003,2003.09.26 AD at 22:36:28 UTC-07:00
2,Thu Sep 25 10:36:28 BRST 2003,2003.09.26 AD at 22:36:28 UTC-07:00
3,2003 10:36:28 BRST 25 Sep Thu,2003.09.26 AD at 22:36:28 UTC-07:00
4,Thu Sep 25 10:36:28 2003,2003.09.26 AD at 22:36:28 UTC-07:00
5,Thu 10:36:28,2000.01.02 AD at 22:36:28 UTC-07:00
6,Thu 10:36,2000.01.02 AD at 22:36:00 UTC-07:00
7,10:36,2000.01.02 AD at 22:36:00 UTC-07:00
8,Thu Sep 25 2003,2003.09.26 AD at 12:00:00 UTC-07:00
9,Sep 25 2003,2003.09.26 AD at 12:00:00 UTC-07:00


### Example format: 
`origin_timezone`: `PST`

`target_timezone`: `GMT`

`target_format`: `yyyy.MM.dd AD at HH:mm:ss Z`

In [22]:
clean_date(df, 'date', origin_timezone='PST', target_timezone='GMT',target_format='yyyy.MM.dd AD at HH:mm:ss Z')

Date Cleaning Report:
	35 values cleaned (89.74%)
	2 values unable to be parsed (5.13%), set to NaN
Result contains 35 (89.74%) values in the correct format and 4 null values (10.26%)


Unnamed: 0,date,date_clean
0,1996.07.10 AD at 15:08:56 PDT,1996.07.10 AD at 23:08:56 UTC+00:00
1,Thu Sep 25 10:36:28 2003,2003.09.25 AD at 18:36:28 UTC+00:00
2,Thu Sep 25 10:36:28 BRST 2003,2003.09.25 AD at 18:36:28 UTC+00:00
3,2003 10:36:28 BRST 25 Sep Thu,2003.09.25 AD at 18:36:28 UTC+00:00
4,Thu Sep 25 10:36:28 2003,2003.09.25 AD at 18:36:28 UTC+00:00
5,Thu 10:36:28,2000.01.01 AD at 18:36:28 UTC+00:00
6,Thu 10:36,2000.01.01 AD at 18:36:00 UTC+00:00
7,10:36,2000.01.01 AD at 18:36:00 UTC+00:00
8,Thu Sep 25 2003,2003.09.25 AD at 08:00:00 UTC+00:00
9,Sep 25 2003,2003.09.25 AD at 08:00:00 UTC+00:00


## 4. `fix_empty` parameter
This section demostrates valid options of `fix_empty` parameter. The user can specify the way of fixing empty value from value set: {'empty', 'auto_nearest', 'auto_minimum'}.  The **default fixed_empty** is `'auto_minimum'`

### auto_minimum
* For hours, minutes and seconds, just fill them with zeros
* For years, months and days, fill it with the minimum value
    * Min value of year: 2000
    * Min value of month: 1
    * Min value of day: 1

In [23]:
clean_date(df, 'date', fix_empty='auto_minimum')

Date Cleaning Report:
	35 values cleaned (89.74%)
	2 values unable to be parsed (5.13%), set to NaN
Result contains 35 (89.74%) values in the correct format and 4 null values (10.26%)


Unnamed: 0,date,date_clean
0,1996.07.10 AD at 15:08:56 PDT,1996-07-10 15:08:56
1,Thu Sep 25 10:36:28 2003,2003-09-25 10:36:28
2,Thu Sep 25 10:36:28 BRST 2003,2003-09-25 10:36:28
3,2003 10:36:28 BRST 25 Sep Thu,2003-09-25 10:36:28
4,Thu Sep 25 10:36:28 2003,2003-09-25 10:36:28
5,Thu 10:36:28,2000-01-01 10:36:28
6,Thu 10:36,2000-01-01 10:36:00
7,10:36,2000-01-01 10:36:00
8,Thu Sep 25 2003,2003-09-25 00:00:00
9,Sep 25 2003,2003-09-25 00:00:00


### empty
Just left the missing component as it is

In [24]:
clean_date(df, 'date', fix_empty='empty')

Date Cleaning Report:
	35 values cleaned (89.74%)
	2 values unable to be parsed (5.13%), set to NaN
Result contains 35 (89.74%) values in the correct format and 4 null values (10.26%)


Unnamed: 0,date,date_clean
0,1996.07.10 AD at 15:08:56 PDT,1996-07-10 15:08:56
1,Thu Sep 25 10:36:28 2003,2003-09-25 10:36:28
2,Thu Sep 25 10:36:28 BRST 2003,2003-09-25 10:36:28
3,2003 10:36:28 BRST 25 Sep Thu,2003-09-25 10:36:28
4,Thu Sep 25 10:36:28 2003,2003-09-25 10:36:28
5,Thu 10:36:28,---------- 10:36:28
6,Thu 10:36,---------- 10:36:--
7,10:36,---------- 10:36:--
8,Thu Sep 25 2003,2003-09-25 --:--:--
9,Sep 25 2003,2003-09-25 --:--:--


### auto_nearest
* For hours, minutes and seconds, just fill them with nearest time value
* For years, months and days, fill it with the nearest date

In [25]:
clean_date(df, 'date', fix_empty='auto_nearest')

Date Cleaning Report:
	35 values cleaned (89.74%)
	2 values unable to be parsed (5.13%), set to NaN
Result contains 35 (89.74%) values in the correct format and 4 null values (10.26%)


Unnamed: 0,date,date_clean
0,1996.07.10 AD at 15:08:56 PDT,1996-07-10 15:08:56
1,Thu Sep 25 10:36:28 2003,2003-09-25 10:36:28
2,Thu Sep 25 10:36:28 BRST 2003,2003-09-25 10:36:28
3,2003 10:36:28 BRST 25 Sep Thu,2003-09-25 10:36:28
4,Thu Sep 25 10:36:28 2003,2003-09-25 10:36:28
5,Thu 10:36:28,2021-02-02 10:36:28
6,Thu 10:36,2021-02-02 10:36:41
7,10:36,2021-02-02 10:36:41
8,Thu Sep 25 2003,2003-09-25 22:49:41
9,Sep 25 2003,2003-09-25 22:49:41


## 5. `show_report` parameter
If `show_report = True`, a report contains:

* How many values are cleaned
* How many values are unable to cleaned (due to their invalid format)
* How many values are with correct format
* How many null values are there

will be generated.

If `show_report = False`, the report won't be generated.

In [26]:
clean_date(df, 'date', show_report=True)

Date Cleaning Report:
	35 values cleaned (89.74%)
	2 values unable to be parsed (5.13%), set to NaN
Result contains 35 (89.74%) values in the correct format and 4 null values (10.26%)


Unnamed: 0,date,date_clean
0,1996.07.10 AD at 15:08:56 PDT,1996-07-10 15:08:56
1,Thu Sep 25 10:36:28 2003,2003-09-25 10:36:28
2,Thu Sep 25 10:36:28 BRST 2003,2003-09-25 10:36:28
3,2003 10:36:28 BRST 25 Sep Thu,2003-09-25 10:36:28
4,Thu Sep 25 10:36:28 2003,2003-09-25 10:36:28
5,Thu 10:36:28,2000-01-01 10:36:28
6,Thu 10:36,2000-01-01 10:36:00
7,10:36,2000-01-01 10:36:00
8,Thu Sep 25 2003,2003-09-25 00:00:00
9,Sep 25 2003,2003-09-25 00:00:00


In [10]:
clean_date(df, 'date', show_report=False)

Unnamed: 0,date,date_clean
0,1996.07.10 AD at 15:08:56 PDT,1996-07-10 15:08:56
1,Thu Sep 25 10:36:28 2003,2003-09-25 10:36:28
2,Thu Sep 25 10:36:28 BRST 2003,2003-09-25 10:36:28
3,2003 10:36:28 BRST 25 Sep Thu,2003-09-25 10:36:28
4,Thu Sep 25 10:36:28 2003,2003-09-25 10:36:28
5,Thu 10:36:28,2000-01-01 10:36:28
6,Thu 10:36,2000-01-01 10:36:00
7,10:36,2000-01-01 10:36:00
8,Thu Sep 25 2003,2003-09-25 00:00:00
9,Sep 25 2003,2003-09-25 00:00:00


## 6. `validate_date()`

`validate_date()` returns `cleaned` when the input has a valid date format, which indicates at first stage the date format is correct. If it is null value, the function returns `null`. Otherwise the function returns `unknown`.
Valid types are the same as `clean_date()`.

In [8]:
from dataprep.clean import validate_date
print(validate_date("Novvvvvvvvember 5, 1994, 8:15:30 am EST hahaha"))
print(validate_date("1994, 8:15:30"))
print(validate_date("Hello."))

unknown
cleaned
unknown


In [13]:
df = pd.DataFrame({"messy_date":
                   ["T, Ap 12, 1952 AD 3:30:42p", "5:50 AM on June 13, 1990", "3rd of May 2001", "55/23/2014",
                    "10pm", "10p", "2003-Sep-25", 
                    "Sepppppp", "23 4 1962", "2003 10:36:28 BRST 25 Sep Thu", 
                    "hello", np.nan, "NULL"]
                  })
df["valid"] = validate_date(df["messy_date"])
df

Unnamed: 0,messy_date,valid
0,"T, Ap 12, 1952 AD 3:30:42p",unknown
1,"5:50 AM on June 13, 1990",cleaned
2,3rd of May 2001,cleaned
3,55/23/2014,cleaned
4,10pm,cleaned
5,10p,cleaned
6,2003-Sep-25,cleaned
7,Sepppppp,unknown
8,23 4 1962,cleaned
9,2003 10:36:28 BRST 25 Sep Thu,cleaned
