# Serializing and deserializing datetimes

## Serializing and deserializing timestamps as strings

It is fairly common to need or want to represent datetimes as strings, for logging, storage or messaging, and to want to get those strings back out as datetimes. This workbook will discuss one way to do this.

In [1]:
import sd_tests
from datetime import datetime, timezone, timedelta

### Deserializing vs. Parsing strings

*Deserializing* strings is the word I am using here to describe the act of parsing strings of a known format. It is a specific subset of all parsing tasks, and carries different recommendations from other parsing tasks.

In this workbook, we will be discussing the case where you are writing both the serializing code and the deserialization code and thus know the exact format to be used. In this case, you can use a very strict parser that validates the data (by throwing an error for out-of-spec datetimes) and can be optimized given that it knows the format of the string.

When parsing user input or datetimes generated by another process in possibly-unpredictable formats, you may use `dateutil.parser.parse` or other liberal parsers, but note that in all cases, it is best to use the strictest parser that accomplishes your task and *no stricter*.

### `isoformat`

When serializing your timestamps, you want a string format that is:

1. Unambiguous
2. Easy to parse
3. Ideally compact

Because we are looking at *timestamps*, the best thing to do is to use a strict subset of ISO 8601, which is what is emitted when you call `isoformat()`.

```python
def datetime.isoformat(sep='T', timespec='auto') -> str:
    ...
```

The function `isoformat` generates a (mostly) ISO 8601 compatible datetime, configurable with the `sep` parameter (which takes a single character) and the `timespec` parameter, which allows you to specify the degree of truncation, a diagram that may not be terribly useful but which I had fun drawing illustrates the formats generated by this datetime

```
YYYY─MM─DD[*HH[:MM[:SS[.fff[fff]]]][+HH:MM[:SS[.ffffff]]]]
────┬───── ┬ ┬  ┬   ┬    ┬  ┬       ──────┬────────────
    │      │ │  │   │    │  │         auto─truncating, only present for aware datetimes
    │      │ │  │   │    │  └─ 'microseconds'
    │      │ │  │   │    │
  always   │ │  │   │    └─ 'milliseconds'
           │ │  │   │
     sep  ─┘ │  │   └─ 'seconds'
             │  │
    'hours' ─┘  └─ 'minutes'
```

`'auto'`: `'seconds'` if `microseconds` is 0 else `microseconds`

**Examples**

In [2]:
datetime(2020, 9, 7, 14, 27, 2, 123456).isoformat()

'2020-09-07T14:27:02.123456'

In [3]:
# Auto-truncates at seconds if no microseconds
datetime(2020, 9, 7, 14, 27, 2).isoformat()

'2020-09-07T14:27:02'

In [4]:
# Specify less truncation than default
datetime(2020, 9, 7, 14, 27, 2).isoformat(timespec='microseconds')

'2020-09-07T14:27:02.000000'

In [5]:
# Specify more truncation than default
datetime(2020, 9, 7, 14, 27, 2, 123456).isoformat(timespec='hours')

'2020-09-07T14'

In [6]:
# Change the time separator
datetime(2020, 9, 7, 14, 27, 2).isoformat(sep=' ')

'2020-09-07 14:27:02'

In [7]:
# An aware datetime
datetime(2020, 9, 7, 14, 27, 2, tzinfo=timezone.utc).isoformat()

'2020-09-07T14:27:02+00:00'

In [8]:
datetime(2020, 9, 7, 14, 27, 2, tzinfo=timezone(timedelta(hours=-5, minutes=-30))).isoformat()

'2020-09-07T14:27:02-05:30'

In [9]:
datetime(2020, 9, 7, 14, 27, 2,
         tzinfo=timezone(timedelta(hours=-5, minutes=-30, seconds=-12))).isoformat()

'2020-09-07T14:27:02-05:30:12'

### `fromisoformat`
Added in Python 3.7, `fromisoformat()` is a function that will create a datetime from *any* format that `datetime.isoformat` emits. It is guaranteed that:

```python
dt == datetime.fromisoformat(dt.isoformat(*args, **kwargs))
```

for all valid `dt`, `args` and `kwargs` (though note that it may not attach the same `tzinfo` object, the `datetime`s will merely represent the same *time*.

If you are using a version of Python older than Python 3.7, `dateutil.parser.isoparse` can be used to parse any valid ISO 8601 datetime (though it can also be used to parse *any* ISO 8601 datetime, not just the ones output by `isoformat`).

### Exercise: Write a function to parse log messages

Assuming your logger is configured to emit logs with the following format:

```
"<datetime_isoformat> : <level> : <name> : <log message>"
```

Parse the log message into a structured dictionary format with the fields `datetime`, `level`, `name` and `message`.

**Examples**:

```
2019-04-18T18:46:37.211352-04:00 : DEBUG : __main__iso : This is a message
2019-04-18T18:46:37.213751-04:00 : WARNING : __main__iso : This is a warning
```

In [16]:
def parse_log_line(line: str) -> dict:
    try:
        dtiso, lvl, name, msg = line.split(" : ")
    except ValueError:
        raise ValueError('Invalid line format in log')

    return {
        'datetime': datetime.fromisoformat(dtiso),
        'level': lvl,
        'name': name,
        'message': msg
    }

parse_log_line('2019-04-18T18:46:37.211352-04:00 : DEBUG : __main__iso : This is a message')

{'datetime': datetime.datetime(2019, 4, 18, 18, 46, 37, 211352, tzinfo=datetime.timezone(datetime.timedelta(days=-1, seconds=72000))),
 'level': 'DEBUG',
 'name': '__main__iso',
 'message': 'This is a message'}

In [17]:
### Uncomment to test
sd_tests.test_parse_log_line(parse_log_line)

Passed!


### Bonus Exercise: Configure the logger to output timestamps in an ISO 8601 format

Can you figure out how to set up the `logging` module to emit the format from the previous exercise? It's somewhat easy if you do not support microseconds, and stupidly difficult if you do!

In [None]:
from sd_answers import get_iso_logger

logger = get_iso_logger(__name__)
logger.debug("This is a message")
logger.warning("This is a warning")