Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

removes date datatype support from the tap code #95

Merged
merged 6 commits into from Mar 12, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
3 changes: 3 additions & 0 deletions CHANGELOG.md
@@ -1,5 +1,8 @@
# Changelog

## 3.0.0
* Remove support for date datatype [#95](https://github.com/singer-io/tap-google-sheets/pull/95)

## 2.1.0
* Updates to run on python 3.11.7 [#94](https://github.com/singer-io/tap-google-sheets/pull/94)

Expand Down
8 changes: 4 additions & 4 deletions README.md
Expand Up @@ -48,21 +48,21 @@ This tap:
- Invalid types: formulaValue, errorValue
- Then check:
- [effectiveFormat.numberFormat.type](https://developers.google.com/sheets/api/reference/rest/v4/spreadsheets/cells#NumberFormatType)
- Valid types: UNEPECIFIED, TEXT, NUMBER, PERCENT, CURRENCY, DATE, TIME, DATE_TIME, SCIENTIFIC
- Valid types: UNEPECIFIED, TEXT, NUMBER, PERCENT, CURRENCY, TIME, DATE_TIME, SCIENTIFIC
- Determine JSON schema column data type based on the value and the above cell metadata settings.
- If DATE, DATE_TIME, or TIME, set JSON schema format accordingly
- If DATE_TIME, or TIME, set JSON schema format accordingly

[**values (GET)**](https://developers.google.com/sheets/api/reference/rest/v4/spreadsheets.values/get)
- Endpoint: https://sheets.googleapis.com/v4/spreadsheets/${spreadsheet_id}/values/'${sheet_name}'!${row_range}?dateTimeRenderOption=SERIAL_NUMBER&valueRenderOption=UNFORMATTED_VALUE&majorDimension=ROWS
- This endpoint loops through sheets and row ranges to get the [unformatted values](https://developers.google.com/sheets/api/reference/rest/v4/ValueRenderOption) (effective values only), dates and datetimes as [serial numbers](https://developers.google.com/sheets/api/reference/rest/v4/DateTimeRenderOption)
- This endpoint loops through sheets and row ranges to get the [unformatted values](https://developers.google.com/sheets/api/reference/rest/v4/ValueRenderOption) (effective values only), datetimes as [serial numbers](https://developers.google.com/sheets/api/reference/rest/v4/DateTimeRenderOption)
- Primary keys: _sdc_row
- Replication strategy: Full (GET file audit data for spreadsheet_id in config)
- Process/Transformations:
- Loop through sheets (compared to catalog selection)
- Send metadata for sheet
- Loop through ALL columns for columns having a column header
- Loop through ranges of rows for ALL rows in sheet available area max row (from sheet metadata)
- Transform values, if necessary (dates, date-times, times, boolean).
- Transform values, if necessary (date-times, times, boolean).
- Date/time serial numbers converted to date, date-time, and time strings. Google Sheets uses Lotus 1-2-3 [Serial Number](https://developers.google.com/sheets/api/reference/rest/v4/DateTimeRenderOption) format for date/times. These are converted to normal UTC date-time strings.
- Process/send records to target

Expand Down
2 changes: 1 addition & 1 deletion setup.py
Expand Up @@ -3,7 +3,7 @@
from setuptools import setup, find_packages

setup(name='tap-google-sheets',
version='2.1.0',
version='3.0.0',
description='Singer.io tap for extracting data from the Google Sheets v4 API',
author='jeff.huth@bytecode.io',
classifiers=['Programming Language :: Python :: 3 :: Only'],
Expand Down
14 changes: 4 additions & 10 deletions tap_google_sheets/schema.py
Expand Up @@ -123,7 +123,7 @@ def get_sheet_schema_columns(sheet):
# INVALID: errorType, formulaType
# https://developers.google.com/sheets/api/reference/rest/v4/spreadsheets/other#ExtendedValue
#
# column_number_format_type = UNEPECIFIED, TEXT, NUMBER, PERCENT, CURRENCY, DATE,
# column_number_format_type = UNEPECIFIED, TEXT, NUMBER, PERCENT, CURRENCY,
# TIME, DATE_TIME, SCIENTIFIC
# https://developers.google.com/sheets/api/reference/rest/v4/spreadsheets/cells#NumberFormatType
#
Expand All @@ -136,18 +136,12 @@ def get_sheet_schema_columns(sheet):
col_properties = {'type': ['null', 'boolean', 'string']}
column_gs_type = 'boolValue'
elif column_effective_value_type == 'numberValue':
if column_number_format_type == 'DATE_TIME':
if column_number_format_type in ['DATE_TIME', 'DATE']:
col_properties = {
'type': ['null', 'string'],
'format': 'date-time'
}
column_gs_type = 'numberType.DATE_TIME'
elif column_number_format_type == 'DATE':
col_properties = {
'type': ['null', 'string'],
'format': 'date'
}
column_gs_type = 'numberType.DATE'
elif column_number_format_type == 'TIME':
col_properties = {
'type': ['null', 'string'],
Expand Down Expand Up @@ -215,11 +209,11 @@ def get_sheet_schema_columns(sheet):
}
columns.append(column)

if column_gs_type in {'numberType.DATE_TIME', 'numberType.DATE', 'numberType.TIME', 'numberType'}:
if column_gs_type in {'numberType.DATE_TIME', 'numberType.TIME', 'numberType'}:
col_properties = {
'anyOf': [
col_properties,
{'type': ['null', 'string']} # all the date, time has string types in schema
{'type': ['null', 'string']} # all the time has string types in schema
]
}
# add the column properties in the `properties` in json schema for the respective column name
Expand Down
17 changes: 0 additions & 17 deletions tap_google_sheets/transform.py
Expand Up @@ -80,19 +80,6 @@ def transform_sheet_datetime_data(value, unformatted_value, sheet_title, col_nam
sheet_title, col_name, col_letter, row_num, col_type))
return str(value)

# transform date values in the sheet
def transform_sheet_date_data(value, unformatted_value, sheet_title, col_name, col_letter, row_num, col_type):
if isinstance(unformatted_value, (int, float)):
# passing both the formatted as well as the unformatted value, so we can use the string value in
# case of any errors while date transform
date_str, is_error = excel_to_dttm_str(value, unformatted_value)
return_str = date_str if is_error else date_str[:10]
return return_str
else:
LOGGER.info('WARNING: POSSIBLE DATA TYPE ERROR; SHEET: {}, COL: {}, CELL: {}{}, TYPE: {}'.format(
sheet_title, col_name, col_letter, row_num, col_type))
return str(value)

# transform time values in the sheet
def transform_sheet_time_data(value, unformatted_value, sheet_title, col_name, col_letter, row_num, col_type):
if isinstance(unformatted_value, (int, float)):
Expand Down Expand Up @@ -231,10 +218,6 @@ def get_column_value(value, unformatted_value, sheet_title, col_name, col_letter
elif col_type == 'numberType.DATE_TIME':
return transform_sheet_datetime_data(value, unformatted_value, sheet_title, col_name, col_letter, row_num, col_type)

# DATE
elif col_type == 'numberType.DATE':
return transform_sheet_date_data(value, unformatted_value, sheet_title, col_name, col_letter, row_num, col_type)

# TIME ONLY (NO DATE)
elif col_type == 'numberType.TIME':
return transform_sheet_time_data(value, unformatted_value, sheet_title, col_name, col_letter, row_num, col_type)
Expand Down
4 changes: 1 addition & 3 deletions tests/test_google_sheets_datatypes.py
Expand Up @@ -164,8 +164,7 @@ def test_run(self):
}
string_column_formats = {
"Datetime": "%Y-%m-%dT%H:%M:%S.%fZ",
"Time": "%H:%M:%S",
"Date": "%Y-%m-%d",
"Time": "%H:%M:%S"
}

for record in record_data:
Expand Down Expand Up @@ -207,7 +206,6 @@ def test_run(self):
"Currency": "stringValue",
"Datetime": "numberType.DATE_TIME",
"Time": "numberType.TIME",
"Date": "numberType.DATE",
"String": "stringValue",
"Number": "numberType",
"Boolean": "boolValue",
Expand Down
2 changes: 1 addition & 1 deletion tests/unittests/test_null_cell_format.py
Expand Up @@ -74,7 +74,7 @@ def test_null_date_effectiveFormat(self):
"null",
"string"
],
"format": "date"
"format": "date-time"
}

sheet_json_schema, columns = schema.get_sheet_schema_columns(sheet)
Expand Down