Source : https://www.youtube.com/watch?v=igWjq3jtLYI&list=PLeo1K3hjS3uvMADnFjV1yg6E5nVU4kOob&index=4
        
Most common problem in data analysis is lack of uniformity in the structure of input data. 
Example : 5th Jan 2019 can be written as 
    2019-01-05
    Jan 5, 2019
    01/05/2019
    2019.01.05
    2019/01/05
    etc..
    
We can use pandas to_datetime function to same format.

In [1]:
import pandas as pd

In [7]:
dates = ['2019-01-05', '2019-05-01', 'Jan 5, 2019', '01/05/2019', '2019.01.05', '2019/01/05', '20190105', '20190501']
dates

['2019-01-05',
 '2019-05-01',
 'Jan 5, 2019',
 '01/05/2019',
 '2019.01.05',
 '2019/01/05',
 '20190105',
 '20190501']

In [8]:
pd.to_datetime(dates)

DatetimeIndex(['2019-01-05', '2019-05-01', '2019-01-05', '2019-01-05',
               '2019-01-05', '2019-01-05', '2019-01-05', '2019-05-01'],
              dtype='datetime64[ns]', freq=None)

In [9]:
dates = ['2019-01-05 2:30:00 PM', '2019-05-01', 'Jan 5, 2019 14:30:00', '01/05/2019', '2019.01.05', '2019/01/05', '20190105', '20190501']
pd.to_datetime(dates)

DatetimeIndex(['2019-01-05 14:30:00', '2019-05-01 00:00:00',
               '2019-01-05 14:30:00', '2019-01-05 00:00:00',
               '2019-01-05 00:00:00', '2019-01-05 00:00:00',
               '2019-01-05 00:00:00', '2019-05-01 00:00:00'],
              dtype='datetime64[ns]', freq=None)

# In Europe date format is dd/mm/yyyy, where as in US its mm/dd/yyyy
How to handle it.

In [10]:
# Lets say i define a date in EU format is 01 May 2019
# dayfirst default to False.. ie US format is used. Set it to True to use EU format.

pd.to_datetime('5/1/2017', dayfirst = True)

Timestamp('2017-01-05 00:00:00')

In [13]:
dates = ['2019-01-05 2:30:00 PM', '2019-05-01', 'Jan 5, 2019 14:30:00', '01/05/2019', '2019.01.05', '2019/01/05', '20190105', '20190501']
print('US Dates : ', pd.to_datetime(dates))
print('\n')
print('EU Dates : ', pd.to_datetime(dates, dayfirst = True))

US Dates :  DatetimeIndex(['2019-01-05 14:30:00', '2019-05-01 00:00:00',
               '2019-01-05 14:30:00', '2019-01-05 00:00:00',
               '2019-01-05 00:00:00', '2019-01-05 00:00:00',
               '2019-01-05 00:00:00', '2019-05-01 00:00:00'],
              dtype='datetime64[ns]', freq=None)


EU Dates :  DatetimeIndex(['2019-05-01 14:30:00', '2019-05-01 00:00:00',
               '2019-01-05 14:30:00', '2019-05-01 00:00:00',
               '2019-01-05 00:00:00', '2019-01-05 00:00:00',
               '2019-01-05 00:00:00', '2019-05-01 00:00:00'],
              dtype='datetime64[ns]', freq=None)


In [14]:
# Custom format
pd.to_datetime('22$10$2019', format = '%d$%m$%Y')

Timestamp('2019-10-22 00:00:00')

In [17]:
# handle non-date in the data-set
dates = ['abc', '2019-01-05 2:30:00 PM', '2019-05-01', 'Jan 5, 2019 14:30:00', '01/05/2019', '2019.01.05', '2019/01/05', '20190105', '20190501']
print(pd.to_datetime(dates, errors = 'ignore')) # ignore the error and will not process the data.
print('\n')
print(pd.to_datetime(dates, errors = 'coerce')) # Process the good data and set 'NaT' for bad data.

['abc' '2019-01-05 2:30:00 PM' '2019-05-01' 'Jan 5, 2019 14:30:00'
 '01/05/2019' '2019.01.05' '2019/01/05' '20190105' '20190501']


DatetimeIndex([                'NaT', '2019-01-05 14:30:00',
               '2019-05-01 00:00:00', '2019-01-05 14:30:00',
               '2019-01-05 00:00:00', '2019-01-05 00:00:00',
               '2019-01-05 00:00:00', '2019-01-05 00:00:00',
               '2019-05-01 00:00:00'],
              dtype='datetime64[ns]', freq=None)


In [18]:
# Epoch (Unix time) is number of seconds that have passed since 1 Jan 1970 00:00:00 UTC.
# refer to site (time is in GMT) : https://www.epochconverter.com/
epochTime = 1571788962
pd.to_datetime(epochTime, unit = 's')
# unit default is 'ns' for Nano Seconds. Where as the Epoch time is in Seconds.

Timestamp('2019-10-23 00:02:42')

In [19]:
pd.to_datetime([epochTime], unit = 's') # converting into a DatetimeIndex by using array of the variable.


DatetimeIndex(['2019-10-23 00:02:42'], dtype='datetime64[ns]', freq=None)

In [20]:
dt = pd.to_datetime([epochTime], unit = 's')
dt

DatetimeIndex(['2019-10-23 00:02:42'], dtype='datetime64[ns]', freq=None)

In [21]:
dt.view('int64') # Converting the date time into epoch time.

array([1571788962000000000], dtype=int64)