Skip to content

Commit

Permalink
Add basic documentation and examples
Browse files Browse the repository at this point in the history
  • Loading branch information
leonoverweel committed Aug 27, 2023
1 parent 609e95b commit db7c085
Show file tree
Hide file tree
Showing 8 changed files with 120 additions and 1 deletion.
1 change: 1 addition & 0 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
*.csv filter=lfs diff=lfs merge=lfs -text
58 changes: 58 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
# `tz-canary` - Time Zone Canary

In a perfect world, all time series data is time-zone-aware and stored in UTC.
Sadly, we do not live in a perfect world.
Time series data often lacks a time zone identifier, or worse, does not actually adhere to the time zone it claims to be in.

`tz-canary` inspects the Daylight Savings Time (DST) switches in a time series to infer a set of plausible time zones the data could be in.
It allows you to **infer** the full set of plausible time zones for the data, or to **validate** whether a given time zone is plausible for the data.

## Installation

TODO - after pushing v0.1.0 to PyPI

## Usage

The simplest way to use `tz-canary` is to validate a given time zone for a time series:

```python
import pandas as pd
from tz_canary import validate_time_zone

df = pd.read_csv("docs/data/example_data.csv", index_col="datetime", parse_dates=True)

validate_time_zone(df.index, "Europe/Amsterdam") # will pass
validate_time_zone(df.index, "America/New_York") # will raise ImplausibleTimeZoneError
validate_time_zone(df.index, "UTC") # will raise ImplausibleTimeZoneError
```

You can also get a list of all plausible time zones for a time series:

```python
from pprint import pprint

import pandas as pd
from tz_canary import infer_time_zone

df = pd.read_csv("docs/data/example_data.csv", index_col="datetime", parse_dates=True)

plausible_time_zones = infer_time_zone(df.index)
pprint(plausible_time_zones)

# Output:
# {zoneinfo.ZoneInfo(key='Africa/Ceuta'),
# zoneinfo.ZoneInfo(key='Arctic/Longyearbyen'),
# zoneinfo.ZoneInfo(key='Europe/Amsterdam'),
# ...
# zoneinfo.ZoneInfo(key='Europe/Zurich')}
```

TODO - add example for building a `TransitionsData` object and using that to validate/infer many time series.

## Development

TODO - add overview of setup (git LFS, poetry, pre-commit, pytest, etc.)

## Contributing

TODO - add contributing guidelines
3 changes: 3 additions & 0 deletions docs/data/example_data.csv
Git LFS file not shown
24 changes: 24 additions & 0 deletions docs/data/generate_example_data.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
import pandas as pd


def generate_example_data():
df_example = pd.DataFrame(
index=pd.date_range(
start="2023-01-01",
end="2023-12-31",
freq="15T",
tz="Europe/Amsterdam",
name="datetime",
),
data={"best_color": "orange"},
)

# We strip the time zone information from the index to simulate a file that does not
# specify time zone information.
df_example.index = df_example.index.tz_localize(None)

df_example.to_csv("example_data.csv")


if __name__ == "__main__":
generate_example_data()
17 changes: 17 additions & 0 deletions docs/examples/example_infer.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
from pprint import pprint

import pandas as pd

from tz_canary import infer_time_zone

df = pd.read_csv("docs/data/example_data.csv", index_col="datetime", parse_dates=True)

plausible_time_zones = infer_time_zone(df.index)
pprint(plausible_time_zones)

# Output:
# {zoneinfo.ZoneInfo(key='Africa/Ceuta'),
# zoneinfo.ZoneInfo(key='Arctic/Longyearbyen'),
# zoneinfo.ZoneInfo(key='Europe/Amsterdam'),
# ...
# zoneinfo.ZoneInfo(key='Europe/Zurich')}
9 changes: 9 additions & 0 deletions docs/examples/example_validate.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
import pandas as pd

from tz_canary import validate_time_zone

df = pd.read_csv("docs/data/example_data.csv", index_col="datetime", parse_dates=True)

validate_time_zone(df.index, "Europe/Amsterdam") # will pass
validate_time_zone(df.index, "America/New_York") # will raise ImplausibleTimeZoneError
validate_time_zone(df.index, "UTC") # will raise ImplausibleTimeZoneError
4 changes: 4 additions & 0 deletions tz_canary/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
from tz_canary.exceptions import ImplausibleTimeZoneError # noqa: F401
from tz_canary.infer import infer_time_zone # noqa: F401
from tz_canary.transitions_data import TransitionsData # noqa: F401
from tz_canary.validate import validate_time_zone # noqa: F401
5 changes: 4 additions & 1 deletion tz_canary/validate.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,10 +46,13 @@ def validate_time_zone(
)

given_time_zone = time_zone or dt_index.tz # TODO - normalize to ZoneInfo
if isinstance(given_time_zone, str):
given_time_zone = ZoneInfo(given_time_zone)

plausible_time_zones = infer_time_zone(dt_index, transition_data)

if given_time_zone not in plausible_time_zones:
raise ImplausibleTimeZoneError(
f"The given time zone `{given_time_zone}` is not plausible for `dt_index`. "
f"Plausible time zones are: {plausible_time_zones}"
f"It may be one of: `{sorted([tz.key for tz in plausible_time_zones])}`."
)

0 comments on commit db7c085

Please sign in to comment.