Skip to content

Enhance Tap-sftp to Support Multiple Encoding Formats#44

Merged
sgandhi1311 merged 17 commits into
masterfrom
add-encoding-format
Oct 24, 2024
Merged

Enhance Tap-sftp to Support Multiple Encoding Formats#44
sgandhi1311 merged 17 commits into
masterfrom
add-encoding-format

Conversation

@sgandhi1311
Copy link
Copy Markdown
Member

@sgandhi1311 sgandhi1311 commented Oct 16, 2023

Description of change

Currently, Tap-sftp exclusively supports the utf-8 encoding format. In order to enhance its versatility and compatibility, we are planning to expand support to encompass all encoding formats specified by Python 3.9.

Manual QA steps

  • Done the manual testing by passing different encoding formats.
  • Kept the encoding format blank in the configuration, to test it sets the default to utf-8

Risks

  • Low

Rollback steps

  • revert this branch

@RushiT0122 RushiT0122 self-requested a review October 17, 2023 14:17
Copy link
Copy Markdown

@RushiT0122 RushiT0122 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add unit tests.

@sgandhi1311
Copy link
Copy Markdown
Member Author

Please add unit tests.

done

Comment thread tap_sftp/helper.py
Comment on lines +15 to +19
def write_record(stream_name, record, stream_alias=None, time_extracted=None, ensure_ascii=True):
"""Write a single record for the given stream.

"""
write_message(RecordMessage(stream=(stream_alias or stream_name),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def write_record(stream_name, record, stream_alias=None, time_extracted=None, ensure_ascii=True):
"""Write a single record for the given stream.
"""
write_message(RecordMessage(stream=(stream_alias or stream_name),
def write_record(stream_name, record, stream_alias=None, time_extracted=None, ensure_ascii=True):
"""
Write a single record for the given stream.
"""
write_message(RecordMessage(stream=(stream_alias or stream_name),

Comment thread tap_sftp/sync.py Outdated

singer.write_record(stream.tap_stream_id, to_write)
write_record(stream.tap_stream_id, to_write, ensure_ascii=False)
# singer.write_record(stream.tap_stream_id, to_write)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this comment.

Comment thread tests/unittests/test_encoding_format.py Outdated
Comment on lines +40 to +49
try:
with patch("sys.stdout", new_callable=StringIO) as mock_stdout:
do_discover(config)
output = mock_stdout.getvalue().strip()
expected_output = json.dumps(
{"streams": ["stream1", "stream2"]}, indent=2
)
self.assertEqual(output, expected_output)
except Exception as e:
self.fail(f"Exception occurred: {str(e)}")
Copy link
Copy Markdown

@RushiT0122 RushiT0122 Nov 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some queries:

  • Are we expecting any exception to occur in this test?
  • At what point we are expecting exception to occur?
  • Shouldn't we put try-catch block around it and use appropriate assertions to validate the exception?

conn = client.SFTPConnection("10.0.0.1", "username", port="22")

rows_synced = sync.sync_file(conn, {"filepath": "/root_dir/file.csv.gz", "last_modified": "2020-01-01"}, None, {"key_properties": ["id"], "delimiter": ","})
rows_synced = sync.sync_file(conn, {"filepath": "/root_dir/file.csv.gz", "last_modified": "2020-01-01"}, None, {"key_properties": ["id"], "delimiter": ","}, encoding_format=DEFAULT_ENCODING_FORMAT)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line is too long with addition of encoding, please refactor it here and at other location as well.

Comment thread tests/test_sftp_encoding.py Outdated
Comment on lines +210 to +213
# Verify that the full table was syncd
for tap_stream_id in self.expected_first_sync_streams():
self.assertEqual(self.expected_first_sync_row_counts()[tap_stream_id],
record_count_by_stream[tap_stream_id])
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should validate the extracted records with actual source records.

Copy link
Copy Markdown

@RushiT0122 RushiT0122 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requested changes in-line.

Comment thread tests/unittests/test_encoding_format.py Outdated
Comment on lines +48 to +54
if encoding_format == "utf-8":
# Bypassing encoding check for `utf-8` as it is widely used
mock_is_valid_encoding.assert_not_called()
else:
mock_is_valid_encoding.assert_called_with("latin_1")
mock_discover_streams.assert_called_with(config, encoding_format)
self.assertEqual(captured_output, sys.stdout) # Ensure sys.stdout is restored
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not use if-else for assertion specially in parameterised tests. We should separate these tests.

Comment thread tap_sftp/__init__.py Outdated
Comment on lines +21 to +23
if encoding_format != "utf-8":
if not is_valid_encoding(encoding_format):
raise Exception("Unknown Encoding - {}. Enter the valid encoding format".format(encoding_format))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_valid_encoding() returns True for utf-8, these if statements can be combined.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

combined the if statements.

@sgandhi1311 sgandhi1311 merged commit f82d8e0 into master Oct 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants