Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Support writing timestamps with timezones with to_sql #22654

Merged
merged 36 commits into from
Nov 8, 2018
Merged
Show file tree
Hide file tree
Changes from 18 commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
776240b
ENH: Write timezone columns to SQL
Sep 9, 2018
befd200
add tests and change type to Timestamp
Sep 10, 2018
e9f122f
Lint error and comment our skipif
Sep 10, 2018
969d2da
Handle DatetimeTZ block
Sep 10, 2018
cc79b90
Ensure the datetimetz data is 2D first
Sep 11, 2018
24dbaa5
Merge remote-tracking branch 'upstream/master' into writing_timezone_sql
Sep 11, 2018
6e86d58
Reading timezones returns timezones in UTC
Sep 11, 2018
c7c4a7a
Add whatsnew and some touchups
Sep 12, 2018
6aa4878
Merge remote-tracking branch 'upstream/master' into writing_timezone_sql
Sep 14, 2018
513bbc8
Test other dbs
Sep 14, 2018
58772e1
timestamps are actually returned as naive local for myself, sqlite
Sep 14, 2018
1a29148
localize -> tz_localize
Sep 14, 2018
96e9188
sqlite doesnt support date types
Sep 15, 2018
ded5584
type
Sep 15, 2018
d575089
Merge remote-tracking branch 'upstream/master' into writing_timezone_sql
Sep 15, 2018
a7d1b3e
retest
Sep 15, 2018
305759c
read_table vs read_query sqlite difference
Sep 16, 2018
7a79531
Add note in the to_sql docs
Sep 19, 2018
24823f8
Modify whatsnew
Sep 19, 2018
7db4eaa
Merge remote-tracking branch 'upstream/master' into writing_timezone_sql
Sep 19, 2018
76e46dc
Merge remote-tracking branch 'upstream/master' into writing_timezone_sql
Sep 21, 2018
978a0d3
Address review
Sep 21, 2018
8025248
Fix sqlalchemy ref
Sep 21, 2018
0e89370
clarify documentation and whatsnew
Sep 26, 2018
bab5cfb
Add an api breaking entry change as well
Sep 27, 2018
de62788
Merge remote-tracking branch 'upstream/master' into writing_timezone_sql
Oct 10, 2018
e940279
Merge remote-tracking branch 'upstream/master' into writing_timezone_sql
Oct 24, 2018
8c754b5
Add new section in whatsnew
Oct 25, 2018
e85842f
Merge remote-tracking branch 'upstream/master' into writing_timezone_sql
Oct 26, 2018
5af83f7
Fix whatsnew to reflect prior bug
Oct 26, 2018
6b3a3f1
Merge remote-tracking branch 'upstream/master' into writing_timezone_sql
Nov 6, 2018
c4304ec
Merge remote-tracking branch 'upstream/master' into writing_timezone_sql
Nov 7, 2018
1054fdb
handle case when column is datetimeindex
Nov 7, 2018
f21c755
Add new whatsnew entry
Nov 7, 2018
f872ff7
Merge remote-tracking branch 'upstream/master' into writing_timezone_sql
Nov 7, 2018
ef3b20f
don't check name
Nov 7, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.24.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -179,6 +179,7 @@ Other Enhancements
- :class:`IntervalIndex` has gained the :meth:`~IntervalIndex.set_closed` method to change the existing ``closed`` value (:issue:`21670`)
- :func:`~DataFrame.to_csv`, :func:`~Series.to_csv`, :func:`~DataFrame.to_json`, and :func:`~Series.to_json` now support ``compression='infer'`` to infer compression based on filename extension (:issue:`15008`).
The default compression for ``to_csv``, ``to_json``, and ``to_pickle`` methods has been updated to ``'infer'`` (:issue:`22004`).
- :func:`to_sql` now supports writing ``TIMESTAMP WITH TIME ZONE`` columns (:issue:`9086`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be

:meth:`DataFrame.to_sql`

- :func:`to_timedelta` now supports iso-formated timedelta strings (:issue:`21877`)
- :class:`Series` and :class:`DataFrame` now support :class:`Iterable` in constructor (:issue:`2193`)
- :class:`DatetimeIndex` gained :attr:`DatetimeIndex.timetz` attribute. Returns local time with timezone information. (:issue:`21358`)
Expand Down
7 changes: 7 additions & 0 deletions pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -2306,6 +2306,13 @@ def to_sql(self, name, con, schema=None, if_exists='fail', index=True,
--------
pandas.read_sql : read a DataFrame from a table

Notes
-----
Timezone aware datetime columns will be written as
``Timestamp with timezone`` type with SQLAlchemy if supported by the
database. Otherwise, the datetimes will be stored as local, naive
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "local, naive" timestamps can be a bit confusing. Is it local to the system running python, or local to the timezone of the datetime (so basically what you get with .tz_localize(None)). I suppose the last one?

timestamps.

References
----------
.. [1] http://docs.sqlalchemy.org
Expand Down
39 changes: 23 additions & 16 deletions pandas/io/sql.py
Original file line number Diff line number Diff line change
Expand Up @@ -592,12 +592,17 @@ def insert_data(self):
data_list = [None] * ncols
blocks = temp._data.blocks

for i in range(len(blocks)):
b = blocks[i]
for b in blocks:
if b.is_datetime:
# convert to microsecond resolution so this yields
# datetime.datetime
d = b.values.astype('M8[us]').astype(object)
# return datetime.datetime objects
if b.is_datetimetz:
# GH 9086: Ensure we return datetimes with timezone info
# Need to return 2-D data; DatetimeIndex is 1D
d = b.values.to_pydatetime()
d = np.expand_dims(d, axis=0)
else:
# convert to microsecond resolution for datetime.datetime
d = b.values.astype('M8[us]').astype(object)
else:
d = np.array(b.get_values(), dtype=object)

Expand All @@ -612,7 +617,7 @@ def insert_data(self):
return column_names, data_list

def _execute_insert(self, conn, keys, data_iter):
data = [{k: v for k, v in zip(keys, row)} for row in data_iter]
data = [dict(zip(keys, row)) for row in data_iter]
conn.execute(self.insert_statement(), data)

def insert(self, chunksize=None):
Expand Down Expand Up @@ -741,8 +746,9 @@ def _get_column_names_and_types(self, dtype_mapper):
def _create_table_setup(self):
from sqlalchemy import Table, Column, PrimaryKeyConstraint

column_names_and_types = \
self._get_column_names_and_types(self._sqlalchemy_type)
column_names_and_types = self._get_column_names_and_types(
self._sqlalchemy_type
)

columns = [Column(name, typ, index=is_index)
for name, typ, is_index in column_names_and_types]
Expand Down Expand Up @@ -841,14 +847,14 @@ def _sqlalchemy_type(self, col):

from sqlalchemy.types import (BigInteger, Integer, Float,
Text, Boolean,
DateTime, Date, Time)
DateTime, Date, Time, TIMESTAMP)

if col_type == 'datetime64' or col_type == 'datetime':
try:
tz = col.tzinfo # noqa
return DateTime(timezone=True)
except:
return DateTime
# GH 9086: TIMESTAMP is the suggested type if the column contains
# timezone information
if col.dt.tz is not None:
return TIMESTAMP(timezone=True)
return DateTime
if col_type == 'timedelta64':
warnings.warn("the 'timedelta' type is not supported, and will be "
"written as integer values (ns frequency) to the "
Expand Down Expand Up @@ -1275,8 +1281,9 @@ def _create_table_setup(self):
structure of a DataFrame. The first entry will be a CREATE TABLE
statement while the rest will be CREATE INDEX statements.
"""
column_names_and_types = \
self._get_column_names_and_types(self._sql_type_name)
column_names_and_types = self._get_column_names_and_types(
self._sql_type_name
)

pat = re.compile(r'\s+')
column_names = [col_name for col_name, _, _ in column_names_and_types]
Expand Down
36 changes: 35 additions & 1 deletion pandas/tests/io/test_sql.py
Original file line number Diff line number Diff line change
Expand Up @@ -962,7 +962,8 @@ def test_sqlalchemy_type_mapping(self):
utc=True)})
db = sql.SQLDatabase(self.conn)
table = sql.SQLTable("test_type", db, frame=df)
assert isinstance(table.table.c['time'].type, sqltypes.DateTime)
# GH 9086: TIMESTAMP is the suggested type for datetimes with timezones
assert isinstance(table.table.c['time'].type, sqltypes.TIMESTAMP)

def test_database_uri_string(self):

Expand Down Expand Up @@ -1362,9 +1363,42 @@ def check(col):
df = sql.read_sql_table("types_test_data", self.conn)
check(df.DateColWithTz)

def test_datetime_with_timezone_roundtrip(self):
# GH 9086
# Write datetimetz data to a db and read it back
# For dbs that support timestamps with timezones, should get back UTC
# otherwise naive data should be returned
expected = DataFrame({'A': date_range(
'2013-01-01 09:00:00', periods=3, tz='US/Pacific'
)})
expected.to_sql('test_datetime_tz', self.conn)

if self.flavor == 'postgresql':
# SQLalchemy "timezones" (i.e. offsets) are coerced to UTC
expected['A'] = expected['A'].dt.tz_convert('UTC')
else:
# Otherwise, timestamps are returned as local, naive
expected['A'] = expected['A'].dt.tz_localize(None)

result = sql.read_sql_table('test_datetime_tz', self.conn)
result = result.drop('index', axis=1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can put index=False in the to_sql call, then you don't need to drop the index here I think

tm.assert_frame_equal(result, expected)

result = sql.read_sql_query(
'SELECT * FROM test_datetime_tz', self.conn
)
result = result.drop('index', axis=1)
if self.flavor == 'sqlite':
# read_sql_query does not return datetime type like read_sql_table
assert isinstance(result.loc[0, 'A'], string_types)
result['A'] = to_datetime(result['A'])
tm.assert_frame_equal(result, expected)

def test_date_parsing(self):
# No Parsing
df = sql.read_sql_table("types_test_data", self.conn)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we rather test that it is not parsed instead of removing it? (but I agree this currently looks like this is not doing much)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see. My cursory glance thought it was duplicate of the line below; you're right, I will try to add an assert to this result as well

expected_type = object if self.flavor == 'sqlite' else np.datetime64
assert issubclass(df.DateCol.dtype.type, expected_type)

df = sql.read_sql_table("types_test_data", self.conn,
parse_dates=['DateCol'])
Expand Down