# Serializing and deserializing datetimes

### Absolute vs. civil times

One very important concept with regards to serializing and deserializing datetimes is keeping in mind the distinction between *absolute times* (timestamps) and *civil times* (datetimes or wall times). These two concepts are often overloaded in common parlance, but they have very different properties with respect to serialization and deserialization.

**Absolute times** represent a specific point in time, where the time on the wall does not matter. For events in the past, such as timestamps on logs, you *usually* care about the absolute time (either for purposes of ordering or for purposes of calculating elapsed time).

*Examples of when you may want an absolute time*:

- Transactions and logs
- Timing a biological process (e.g. feeding a fish)
- Timing an chemical or astronomical process

**Civil times** represent a specific time *as defined by the clock on the wall*. If, for example, you have a meeting at noon every Friday but your time zone shifts by 1 hour, you would not start holding the meeting at 11 AM because that 168 hours after the previous meeting. For events in the future you *usually* care about civil times.

*Examples of when you may want to use civil times*:

- Meetings
- Work hours
- Recurring events
- When to feed a Gremlin

## Serializing and deserializing timestamps as numeric offsets

While you, as a human, are probably used to consuming your datetimes as some sort of string, like "October 21, 2015" or "2015-10-21", when dealing with *absolute* datetimes, the most natural representation is probably as an offset from some reference time, e.g. 17 hours after this process started or 250,000 seconds after the Unix epoch.

Assuming you know the reference time and units of the offset (and this may be a big if), this may be the most compact and unambiguous representation of your absolute times.

The most common numeric representation of time is [Unix time](https://en.wikipedia.org/wiki/Unix_time): number of seconds since the Unix epoch: 01 January 1970 00:00:00 UTC.

### Exercise: Write a function to store a message with metadata in JSON

Assume you are writing a chat application that sends timestamped message in JSON. You would like to write a function that takes a "to" user, a "from" user and a message, and outputs a JSON-encoded string with the to, from, timestamp and message information, like so:

```
{
   "user_to": "cool_beans1973",
   "user_from": "xXx_the_matrix_xXx",
   "sent_epoch": 946721730.0,
   "message": "Looks like we survived Y2K! Should we go see Deuce Bigalow on Friday?"
}
```

In [1]:
import sd_answers

In [2]:
def encode_message(user_to: str, user_from: str, message: str) -> str:
    """Encode a message to be sent in JSON"""
    return sd_answers.encode_message(user_to, user_from, message)

In [3]:
### Tests
from freezegun import freeze_time
import json

user_to = "cool_beans1973"
user_from = "xXx_the_matrix_xXx"
message = "Test messageé"

with freeze_time("2000-01-01T05:15:30.214333-05:00"):
    json_str = encode_message(user_to, user_from, message)
    
    decoded = json.loads(json_str)
    assert decoded["user_to"] == user_to
    assert decoded["user_from"] == user_from
    assert decoded["sent_epoch"] == 946721730.214333
    assert decoded["message"] == message

print("Passed!")

Passed!


### Exercise: Write a function to retrieve and display the message

Now we want to *deserialize* the JSON message and display it to our user in a human-readable way. Assuming the recipient is in `America/New_York`, the example from above would be displayed as:

```
(2000-01-01 05:15:30) xXx_the_matrix_xXx:
Looks like we survived Y2K! Should we go see Deuce Bigalow on Friday?
```


In [4]:
def display_message(json_str: str) -> str:
    """Generate a display string for a JSON-encoded message"""
    return sd_answers.display_message(json_str)

In [5]:
### Tests
user_to = "cool_beans1973"
user_from = "xXx_the_matrix_xXx"
message = "Test messageé"

with freeze_time("2000-01-01T05:15:30.214333-05:00"):
    json_str = encode_message(user_to, user_from, message)
    
    display_str = display_message(json_str)
    expected = f"(2000-01-01 05:15:30) {user_from}\n{message}"
    
    assert display_str == expected

print("Passed!")

Passed!


## Serializing and deserializing timestamps as strings

While serializing datetimes as epoch strings is a good choice for data *exclusively* interpreted by machines, or when a very compact representation is required, they are very difficult for people to read and grasp intuitively, consider the timestamp `1389552400.0` - how long ago was that?

How much later is `1401595200.0`? Are these two events days apart? Weeks? Months? Years?

How about `1357102800.0` and `1357707600.0`?

What about about `44632400` and `444632400`? How easy would it be to spot that you've mistyped one for the other?

In [6]:
from datetime import datetime, timezone, timedelta

def print_timestamp_utc(ts):
    print(f"{ts:010d}: {datetime.fromtimestamp(ts, tz=timezone.utc)}")

In [7]:
print_timestamp_utc(1389552400)
print_timestamp_utc(1401595200)

1389552400: 2014-01-12 18:46:40+00:00
1401595200: 2014-06-01 04:00:00+00:00


In [8]:
print_timestamp_utc(1357102800)
print_timestamp_utc(1357707600)

1357102800: 2013-01-02 05:00:00+00:00
1357707600: 2013-01-09 05:00:00+00:00


In [9]:
print_timestamp_utc(44632400)
print_timestamp_utc(444632400)

0044632400: 1971-06-01 13:53:20+00:00
0444632400: 1984-02-03 05:00:00+00:00


### `isoformat`

When serializing your timestamps, you want a string format that is:

1. Unambiguous
2. Easy to parse
3. Ideally compact

Because we are looking at *timestamps*, the best thing to do is to use a strict subset of ISO 8601, which is what is emitted when you call `isoformat()`.

```python
def datetime.isoformat(sep='T', timespec='auto') -> str:
    ...
```

The function `isoformat` generates a (mostly) ISO 8601 compatible datetime, configurable with the `sep` parameter (which takes a single character) and the `timespec` parameter, which allows you to specify the degree of truncation, a diagram that may not be terribly useful but which I had fun drawing illustrates the formats generated by this datetime

```
YYYY─MM─DD[*HH[:MM[:SS[.fff[fff]]]][+HH:MM[:SS[.ffffff]]]]
────┬───── ┬ ┬  ┬   ┬    ┬  ┬       ──────┬────────────
    │      │ │  │   │    │  │         auto─truncating, only present for aware datetimes
    │      │ │  │   │    │  └─ 'microseconds'
    │      │ │  │   │    │
  always   │ │  │   │    └─ 'milliseconds'
           │ │  │   │
     sep  ─┘ │  │   └─ 'seconds'
             │  │
    'hours' ─┘  └─ 'minutes'
```

`'auto'`: `'seconds'` if `microseconds` is 0 else `microseconds`

**Examples**

In [10]:
datetime(2020, 9, 7, 14, 27, 2, 123456).isoformat()

'2020-09-07T14:27:02.123456'

In [11]:
# Auto-truncates at seconds if no microseconds
datetime(2020, 9, 7, 14, 27, 2).isoformat()

'2020-09-07T14:27:02'

In [12]:
# Specify less truncation than default
datetime(2020, 9, 7, 14, 27, 2).isoformat(timespec='microseconds')

'2020-09-07T14:27:02.000000'

In [13]:
# Specify more truncation than default
datetime(2020, 9, 7, 14, 27, 2, 123456).isoformat(timespec='hours')

'2020-09-07T14'

In [14]:
# Change the time separator
datetime(2020, 9, 7, 14, 27, 2).isoformat(sep=' ')

'2020-09-07 14:27:02'

In [15]:
# An aware datetime
datetime(2020, 9, 7, 14, 27, 2, tzinfo=timezone.utc).isoformat()

'2020-09-07T14:27:02+00:00'

In [16]:
datetime(2020, 9, 7, 14, 27, 2, tzinfo=timezone(timedelta(hours=-5, minutes=-30))).isoformat()

'2020-09-07T14:27:02-05:30'

In [17]:
datetime(2020, 9, 7, 14, 27, 2,
         tzinfo=timezone(timedelta(hours=-5, minutes=-30, seconds=-12))).isoformat()

'2020-09-07T14:27:02-05:30:12'

### `fromisoformat`
Added in Python 3.7, `fromisoformat()` is a function that will create a datetime from *any* format that `datetime.isoformat` emits. It is guaranteed that:

```python
dt == datetime.fromisoformat(dt.isoformat(*args, **kwargs))
```

for all valid `dt`, `args` and `kwargs` (though note that it may not attach the same `tzinfo` object, the `datetime`s will merely represent the same *time*.

If you are using a version of Python older than Python 3.7, `dateutil.parser.isoparse` can be used to parse any valid ISO 8601 datetime (though it can also be used to parse *any* ISO 8601 datetime, not just the ones output by `isoformat`).

### Exercise: Write a function to parse log messages

Assuming your logger is configured to emit logs with the following format:

```
"<datetime_isoformat> : <level> : <name> : <log message>"
```

Parse the log message into a structured dictionary format with the fields `datetime`, `level`, `name` and `message`.

**Examples**:

```
2019-04-18T18:46:37.211352-04:00 : DEBUG : __main__iso : This is a message
2019-04-18T18:46:37.213751-04:00 : WARNING : __main__iso : This is a warning
```

In [18]:
def parse_log_line(line: str):
    dt_str, level_str, name, message = line.split(' : ', 4)
    dt = datetime.fromisoformat(dt_str)
    
    return {
        'datetime': dt,
        'level': level_str,
        'name': name,
        'message': message,
    }

In [19]:
parse_log_line('2019-04-18T18:46:37.211352-04:00 : DEBUG : __main__iso : This is a message')

{'datetime': datetime.datetime(2019, 4, 18, 18, 46, 37, 211352, tzinfo=datetime.timezone(datetime.timedelta(days=-1, seconds=72000))),
 'level': 'DEBUG',
 'name': '__main__iso',
 'message': 'This is a message'}

### Exercise: Configure the logger to output timestamps in an ISO 8601 format

**Bonus exercise**: Can you figure out how to set up the `logging` module to emit the format from the previous exercise? It's somewhat easy if you do not support microseconds, and stupidly difficult if you do!

In [20]:
import logging
logger = logging.getLogger(__name__)
ch = logging.StreamHandler()
logger.addHandler(ch)
logger.setLevel(logging.DEBUG)

In [21]:
formatter = logging.Formatter("{asctime} : {levelname} : {name} : {message}",
                              datefmt="%Y-%m-%dT%H:%M:%S%z", style="{")
ch.setFormatter(formatter)

In [22]:
from dateutil.tz import tzlocal

class IsoFormatter(logging.Formatter):
    def __init__(self, fmt=None, tzinfo=tzlocal(), style="%"):
        super().__init__(fmt=fmt, datefmt=None, style=style)
        self._tzinfo = tzinfo

    def formatTime(self, record, *args, **kwargs):
        dt = datetime.fromtimestamp(record.created, tz=self._tzinfo)
        
        return dt.isoformat()

In [23]:
formatter = IsoFormatter("{asctime} : {levelname} : {name} : {message}", style="{")
ch.setFormatter(formatter)

logger.debug("This is a message")
logger.warning("This is a warning")

2019-04-18T18:54:24.768324-04:00 : DEBUG : __main__ : This is a message
