## Python standard library

First some environment variables must be set in order to set localtime

In [2]:
from os import environ

environ['LC_ALL'] = 'fr_FR.UTF-8'
environ['TZ'] = 'Europe/Paris'

## `time` module

In [3]:
import time

This module provides various time-related functions. It is based on `sys/times.h` and coded in C. Important terminology and conventions from the [python documentation](https://docs.python.org/3/library/time.html):

* The epoch is the point where the time starts, and is platform dependent.  

* UTC is Coordinated Universal Time, compromise between English and French.

* DST is Daylight Saving Time, an adjustment of the timezone during part of the year.

`time` specificities:

* To find out what the epoch is on a given platform, look at `time.gmtime(0)`. For Unix, the epoch is January 1, 1970, 00:00:00 (UTC). 

* DST rules are determined by local law and can change from year to year. The C library has a table containing those and is the only source of True Wisdom in this respect.

* The functions in this module may not handle dates and times before the epoch or far in the future. The cut-off point in the future is determined by the C library; for 32-bit systems, it is typically in 2038.

* Function strptime() can parse 2-digit years when given %y format code. When 2-digit years are parsed, they are converted according to the POSIX and ISO C standards: values 69–99 are mapped to 1969–1999, and values 0–68 are mapped to 2000–2068.

* The precision of the various real-time functions may be less than suggested by the units in which their value or argument is expressed. E.g. on most Unix systems, the clock “ticks” only 50 or 100 times a second.

* On the other hand, the precision of `time()` and `sleep()` is better than their Unix equivalents: times are expressed as floating point numbers, time() returns the most accurate time available (using Unix gettimeofday() where available), and sleep() will accept a time with a nonzero fraction (Unix select() is used to implement this, where available).

* The time value as returned by `gmtime()`, `localtime()`, and `strptime()`, and accepted by `asctime()`, `mktime()` and `strftime()`, is a sequence of 9 integers. The return values of `gmtime()`, `localtime()`, and `strptime()` also offer attribute names for individual fields.

Use the following functions to convert between time representations:

<table class="docutils align-center">
<colgroup>
<col style="width: 33%">
<col style="width: 33%">
<col style="width: 33%">
</colgroup>
<thead>
<tr class="row-odd"><th class="head"><p>From</p></th>
<th class="head"><p>To</p></th>
<th class="head"><p>Use</p></th>
</tr>
</thead>
<tbody>
<tr class="row-even"><td><p>seconds since the epoch</p></td>
<td><p><a class="reference internal" href="#time.struct_time" title="time.struct_time"><code class="xref py py-class docutils literal notranslate"><span class="pre">struct_time</span></code></a> in
UTC</p></td>
<td><p><a class="reference internal" href="#time.gmtime" title="time.gmtime"><code class="xref py py-func docutils literal notranslate"><span class="pre">gmtime()</span></code></a></p></td>
</tr>
<tr class="row-odd"><td><p>seconds since the epoch</p></td>
<td><p><a class="reference internal" href="#time.struct_time" title="time.struct_time"><code class="xref py py-class docutils literal notranslate"><span class="pre">struct_time</span></code></a> in
local time</p></td>
<td><p><a class="reference internal" href="#time.localtime" title="time.localtime"><code class="xref py py-func docutils literal notranslate"><span class="pre">localtime()</span></code></a></p></td>
</tr>
<tr class="row-even"><td><p><a class="reference internal" href="#time.struct_time" title="time.struct_time"><code class="xref py py-class docutils literal notranslate"><span class="pre">struct_time</span></code></a> in
UTC</p></td>
<td><p>seconds since the epoch</p></td>
<td><p><a class="reference internal" href="calendar.html#calendar.timegm" title="calendar.timegm"><code class="xref py py-func docutils literal notranslate"><span class="pre">calendar.timegm()</span></code></a></p></td>
</tr>
<tr class="row-odd"><td><p><a class="reference internal" href="#time.struct_time" title="time.struct_time"><code class="xref py py-class docutils literal notranslate"><span class="pre">struct_time</span></code></a> in
local time</p></td>
<td><p>seconds since the epoch</p></td>
<td><p><a class="reference internal" href="#time.mktime" title="time.mktime"><code class="xref py py-func docutils literal notranslate"><span class="pre">mktime()</span></code></a></p></td>
</tr>
</tbody>
</table>

In [4]:
import sys

In [5]:
time.gmtime(0)

time.struct_time(tm_year=1970, tm_mon=1, tm_mday=1, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=3, tm_yday=1, tm_isdst=0)

### Naive/Aware time zone

Python/Pandas timestamp types without a associated time zone are referred to as “Time Zone Naive”. Python/Pandas timestamp types with an associated time zone are referred to as “Time Zone Aware”.

### locale

Locale are usually set at the operating system level, in containerized system, the must be eventualy set manualy

[Understanding locale environment variables](https://www.ibm.com/support/knowledgecenter/en/ssw_aix_72/globalization/understand_locale_environ_var.html) and [table](https://www.ibm.com/support/knowledgecenter/en/ssw_aix_72/globalization/environ_var_precedence_exmp.html).   

Example:

<table summary="" class="defaultstyle ibm-grid"><colgroup><col style="width:33.22147651006711%"><col style="width:29.86577181208054%"><col style="width:36.91275167785235%"></colgroup><thead style="text-align:left;">
<tr style="vertical-align:bottom;">
<th id="d14428e62" class="thbot">Environment Variable and Category Names</th>

<th id="d14428e65" class="thleft thbot">Value of Environment Variables</th>

<th id="d14428e68" class="thleft thbot">Value of Category after Call to
setlocale (LC_ALL,"")</th>

</tr>

</thead>
<tbody>
<tr>
<td headers="d14428e62 "><strong>LC_COLLATE</strong></td>

<td headers="d14428e65 " class="tdleft">de_DE</td>

<td headers="d14428e68 " class="tdleft">de_DE</td>

</tr>

<tr>
<td headers="d14428e62 "><strong>LC_CTYPE</strong></td>

<td headers="d14428e65 " class="tdleft">de_DE</td>

<td headers="d14428e68 " class="tdleft">de_DE</td>

</tr>

<tr>
<td headers="d14428e62 "><strong>LC_MONETARY</strong></td>

<td headers="d14428e65 ">en_US</td>

<td headers="d14428e68 ">en_US</td>

</tr>

<tr>
<td headers="d14428e62 "><strong>LC_NUMERIC</strong></td>

<td headers="d14428e65 ">(unset)</td>

<td headers="d14428e68 ">da_DK</td>

</tr>

<tr>
<td headers="d14428e62 "><strong>LC_TIME</strong></td>

<td headers="d14428e65 ">(unset)</td>

<td headers="d14428e68 ">da_DK</td>

</tr>

<tr>
<td headers="d14428e62 "><strong>LC_MESSAGES</strong></td>

<td headers="d14428e65 ">(unset)</td>

<td headers="d14428e68 ">da_DK</td>

</tr>

<tr>
<td headers="d14428e62 "><strong>LC_ALL</strong></td>

<td headers="d14428e65 ">(unset)</td>

<td headers="d14428e68 ">(not applicable)</td>

</tr>

<tr>
<td headers="d14428e62 "><strong>LANG</strong></td>

<td headers="d14428e65 ">da_DK</td>

<td headers="d14428e68 ">(not applicable)</td>

</tr>

</tbody>
</table>

In [6]:
import locale

In [7]:
locale.getdefaultlocale()

('fr_FR', 'UTF-8')

In [8]:
locale.getlocale()

(None, None)

In [9]:
locale.setlocale(locale.LC_ALL, 'fr_FR.UTF-8')

'fr_FR.UTF-8'

Time zone is beeing set according to TZ environment variable

In [10]:
time.tzset()

In [11]:
time.localtime()

time.struct_time(tm_year=2020, tm_mon=2, tm_mday=19, tm_hour=10, tm_min=23, tm_sec=43, tm_wday=2, tm_yday=50, tm_isdst=0)

In [12]:
time.localtime().tm_zone

'CET'

In [13]:
time.time_ns()

1582104223312507367

In [14]:
time.ctime()

'Wed Feb 19 10:23:43 2020'

## `datetime` module

The `datetime` module uses the `time` module, is coded in Python and supplies classes for manipulating dates and times.

* [ISO 8601](https://en.wikipedia.org/wiki/ISO_8601)
* Python official [documentation](https://docs.python.org/fr/3/library/datetime.html)

In [15]:
import datetime as dt
from datetime import datetime

### Subclasses

Subclass relationships:

<pre>
object
    timedelta
    tzinfo
        timezone
    time
    date
        datetime 
</pre>

Python/Pandas timestamp types without a associated time zone are referred to as “Time Zone Naive”. Python/Pandas timestamp types with an associated time zone are referred to as “Time Zone Aware”.

- All these types are immuables
- `date` objects are always *naives*
- `time` or `datetime` types can be either *naive* or *aware* (such an object d must success `(d.tzinfo != None) & (d.tzinfo.utcoffset(d) ! None)`

In [16]:
dt.timedelta?

[0;31mInit signature:[0m [0mdt[0m[0;34m.[0m[0mtimedelta[0m[0;34m([0m[0mself[0m[0;34m,[0m [0;34m/[0m[0;34m,[0m [0;34m*[0m[0margs[0m[0;34m,[0m [0;34m**[0m[0mkwargs[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m     
Difference between two datetime values.

timedelta(days=0, seconds=0, microseconds=0, milliseconds=0, minutes=0, hours=0, weeks=0)

All arguments are optional and default to 0.
Arguments may be integers or floats, and may be positive or negative.
[0;31mFile:[0m           /opt/conda/lib/python3.7/datetime.py
[0;31mType:[0m           type
[0;31mSubclasses:[0m     


In [17]:
dt.tzinfo?

[0;31mInit signature:[0m [0mdt[0m[0;34m.[0m[0mtzinfo[0m[0;34m([0m[0mself[0m[0;34m,[0m [0;34m/[0m[0;34m,[0m [0;34m*[0m[0margs[0m[0;34m,[0m [0;34m**[0m[0mkwargs[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m      Abstract base class for time zone info objects.
[0;31mFile:[0m           /opt/conda/lib/python3.7/datetime.py
[0;31mType:[0m           type
[0;31mSubclasses:[0m     timezone, _tzinfo, tzutc, tzoffset


In [18]:
dt.timezone?

[0;31mInit signature:[0m [0mdt[0m[0;34m.[0m[0mtimezone[0m[0;34m([0m[0mself[0m[0;34m,[0m [0;34m/[0m[0;34m,[0m [0;34m*[0m[0margs[0m[0;34m,[0m [0;34m**[0m[0mkwargs[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m      Fixed offset from UTC implementation of tzinfo.
[0;31mFile:[0m           /opt/conda/lib/python3.7/datetime.py
[0;31mType:[0m           type
[0;31mSubclasses:[0m     


In [19]:
dt.date?

[0;31mInit signature:[0m [0mdt[0m[0;34m.[0m[0mdate[0m[0;34m([0m[0mself[0m[0;34m,[0m [0;34m/[0m[0;34m,[0m [0;34m*[0m[0margs[0m[0;34m,[0m [0;34m**[0m[0mkwargs[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m      date(year, month, day) --> date object
[0;31mFile:[0m           /opt/conda/lib/python3.7/datetime.py
[0;31mType:[0m           type
[0;31mSubclasses:[0m     datetime


In [20]:
dt.datetime?

[0;31mInit signature:[0m [0mdt[0m[0;34m.[0m[0mdatetime[0m[0;34m([0m[0mself[0m[0;34m,[0m [0;34m/[0m[0;34m,[0m [0;34m*[0m[0margs[0m[0;34m,[0m [0;34m**[0m[0mkwargs[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m     
datetime(year, month, day[, hour[, minute[, second[, microsecond[,tzinfo]]]]])

The year, month and day arguments are required. tzinfo may be None, or an
instance of a tzinfo subclass. The remaining arguments may be ints.
[0;31mFile:[0m           /opt/conda/lib/python3.7/datetime.py
[0;31mType:[0m           type
[0;31mSubclasses:[0m     


In [21]:
dt.time?

[0;31mInit signature:[0m [0mdt[0m[0;34m.[0m[0mtime[0m[0;34m([0m[0mself[0m[0;34m,[0m [0;34m/[0m[0;34m,[0m [0;34m*[0m[0margs[0m[0;34m,[0m [0;34m**[0m[0mkwargs[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m     
time([hour[, minute[, second[, microsecond[, tzinfo]]]]]) --> a time object

All arguments are optional. tzinfo may be None, or an instance of
a tzinfo subclass. The remaining arguments may be ints.
[0;31mFile:[0m           /opt/conda/lib/python3.7/datetime.py
[0;31mType:[0m           type
[0;31mSubclasses:[0m     


### `datetime` constructors

Constructors are:
- `datetime.datetime(*pargs, **kwargs)`
- `datetime.today(*pargs, **kwargs)`
- `datetime.now(*pargs, **kwargs)`
- `datetime.utcnow(*pargs, **kwargs)`
- `datetime.fromtimestamp(*pargs, **kwargs)`
- `datetime.fromordinal(*pargs, **kwargs)`
- `datetime.combine(*pargs, **kwargs)`
- `datetime.fromisoformat(*pargs, **kwargs)`
- `datetime.strptime(*pargs, **kwargs)`

`Datetime` objects have:
- class attributes `min`, `max` an `resolution`
- instance attributes `year`, `month`, `day`, `hour`, `minute`, `second`, `microsecond`, `tzinfo`, `fold`


See the documentation for usage.

In [22]:
t1 = dt.datetime(2019, 1, 1, 0, 0, 0, 0)
t1

datetime.datetime(2019, 1, 1, 0, 0)

In [23]:
now = dt.datetime.now()
now

datetime.datetime(2020, 2, 19, 10, 23, 43, 452652)

In [24]:
now - t1

datetime.timedelta(days=414, seconds=37423, microseconds=452652)

In [25]:
today = dt.datetime.today()
today

datetime.datetime(2020, 2, 19, 10, 23, 43, 469193)

In [26]:
today.strftime('%d.%m.%Y')

'19.02.2020'

## Numpy datetime64

The data type is called “datetime64”, so named because “datetime” is already taken by the datetime library included in Python.

* [ISO 8601](https://en.wikipedia.org/wiki/ISO_8601)
* [Datetimes and Timedeltas](https://docs.scipy.org/doc/numpy/reference/arrays.datetime.html?highlight=numpy%20datetime64)
* [Datetime Support Functions](https://docs.scipy.org/doc/numpy/reference/routines.datetime.html)

In [27]:
import numpy as np
from numpy import datetime64

In [28]:
d0 = datetime64('2002-10-27T04:30', '10s')

In [29]:
d0 + 1

numpy.datetime64('2002-10-27T04:30:10','10s')

In [30]:
np.datetime_data(d0)

('s', 10)

In [31]:
a = np.array(['1979-03-22T12'], dtype='M8[h]')
b = np.array([3*60], dtype='m8[m]')
a + b

array(['1979-03-22T15:00'], dtype='datetime64[m]')

In [32]:
a = np.timedelta64(1, 'Y')

In [33]:
a

numpy.timedelta64(1,'Y')

In [34]:
np.datetime_data(a)

('Y', 1)

In [35]:
d0 + b

array(['2002-10-27T07:30:00'], dtype='datetime64[10s]')

In [36]:
from warnings import

try:
    d0 + a
except TypeError as e:
    warn(str(e), UserWarning)

  after removing the cwd from sys.path.


In [37]:
c = np.timedelta64(1, 's')

In [38]:
d0 + c

numpy.datetime64('2002-10-27T04:30:01')

In [39]:
d0.astype(int)

103569300

In [40]:
datetime.utcfromtimestamp(d0.astype(int))

datetime.datetime(1973, 4, 13, 17, 15)

In [41]:
datetime.utcfromtimestamp(d0.astype(int))

datetime.datetime(1973, 4, 13, 17, 15)

In [42]:
d = np.arange('2002-10-27T04:30', 10, 1, dtype='M8[ns]')

In [43]:
date2int = lambda elt: np.datetime64(elt, 's', dtype='datetime64[s]').astype(np.int64)

randomDates = np.random.RandomState(42).randint(date2int('2002-10-27T04:30'),
                                                date2int('2003-10-27T04:30'),
                                                10,
                                                dtype='i8')
dates = np.sort(randomDates).astype('datetime64[s]')

In [44]:
dates

array(['2002-11-22T01:11:29', '2003-03-30T07:08:12',
       '2003-04-14T00:13:06', '2003-05-01T11:11:18',
       '2003-06-28T04:33:08', '2003-07-22T17:06:44',
       '2003-07-24T04:27:30', '2003-08-27T14:34:58',
       '2003-09-01T15:07:10', '2003-09-03T01:12:47'],
      dtype='datetime64[s]')

In [45]:
d[1].astype(int).astype('M8[ns]')

numpy.datetime64('2002-10-27T04:30:00.000000001')

In [46]:
# equals d.size * d.itemsize
d.nbytes

80

## Pandas 

Pandas [time series documentation](https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html)

In [47]:
import pandas as pd
from pandas import (
    date_range,
    DataFrame,
    Timestamp
)
from datetime import (
    timezone,
    timedelta
)

In [48]:
dti = date_range('2018-01-01', periods=3, freq='H')

In [49]:
date_range('1/1/2012', freq='0.1ms', periods=1000)

DatetimeIndex([       '2012-01-01 00:00:00', '2012-01-01 00:00:00.000100',
               '2012-01-01 00:00:00.000200', '2012-01-01 00:00:00.000300',
               '2012-01-01 00:00:00.000400', '2012-01-01 00:00:00.000500',
               '2012-01-01 00:00:00.000600', '2012-01-01 00:00:00.000700',
               '2012-01-01 00:00:00.000800', '2012-01-01 00:00:00.000900',
               ...
               '2012-01-01 00:00:00.099000', '2012-01-01 00:00:00.099100',
               '2012-01-01 00:00:00.099200', '2012-01-01 00:00:00.099300',
               '2012-01-01 00:00:00.099400', '2012-01-01 00:00:00.099500',
               '2012-01-01 00:00:00.099600', '2012-01-01 00:00:00.099700',
               '2012-01-01 00:00:00.099800', '2012-01-01 00:00:00.099900'],
              dtype='datetime64[ns]', length=1000, freq='100U')

In [50]:
pdf = DataFrame({'naive': [datetime(2019, 1, 1, 0)],
                 'aware': [Timestamp(year=2019, month=1, day=1,
                           nanosecond=500, tz=dt.timezone(dt.timedelta(hours=-8)))]})
pdf

Unnamed: 0,naive,aware
0,2019-01-01,2019-01-01 00:00:00.000000500-08:00


## Pyarrow

* https://arrow.apache.org/docs/python/timestamps.html
* https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#datetime-types

## Spark

Spark stores timestamps as 64-bit integers representing microseconds since the UNIX epoch. It does not store any metadata about time zones with its timestamps.

Spark interprets timestamps with the session local time zone, (i.e. `spark.sql.session.timeZone`). If that time zone is undefined, Spark turns to the default system time zone.

In [51]:
from pyspark.sql import SparkSession
from pyarrow import TimestampValue

In [52]:
spark = ( 
    SparkSession.builder
        .master("local")
        .appName("time")
        .getOrCreate()
)

In [53]:
sc = spark.sparkContext

In [54]:
coll = sc.parallelize(np.arange(100_000))

In [55]:
coll.count()

100000

In [56]:
coll.stats()

(count: 100000, mean: 49999.5, stdev: 28867.513458037913, max: 99999.0, min: 0.0)

In [57]:
coll2 = coll.flatMap(lambda x: (x * 2, 1))

In [58]:
coll2.stats()

(count: 200000, mean: 50000.0, stdev: 64548.94784193651, max: 199998.0, min: 0.0)