Enhance Tap-sftp to Support Multiple Encoding Formats#44
Merged
Conversation
RushiT0122
approved these changes
Oct 17, 2023
Member
Author
done |
RushiT0122
reviewed
Nov 28, 2023
Comment on lines
+15
to
+19
| def write_record(stream_name, record, stream_alias=None, time_extracted=None, ensure_ascii=True): | ||
| """Write a single record for the given stream. | ||
|
|
||
| """ | ||
| write_message(RecordMessage(stream=(stream_alias or stream_name), |
There was a problem hiding this comment.
Suggested change
| def write_record(stream_name, record, stream_alias=None, time_extracted=None, ensure_ascii=True): | |
| """Write a single record for the given stream. | |
| """ | |
| write_message(RecordMessage(stream=(stream_alias or stream_name), | |
| def write_record(stream_name, record, stream_alias=None, time_extracted=None, ensure_ascii=True): | |
| """ | |
| Write a single record for the given stream. | |
| """ | |
| write_message(RecordMessage(stream=(stream_alias or stream_name), |
RushiT0122
reviewed
Nov 28, 2023
|
|
||
| singer.write_record(stream.tap_stream_id, to_write) | ||
| write_record(stream.tap_stream_id, to_write, ensure_ascii=False) | ||
| # singer.write_record(stream.tap_stream_id, to_write) |
RushiT0122
reviewed
Nov 28, 2023
Comment on lines
+40
to
+49
| try: | ||
| with patch("sys.stdout", new_callable=StringIO) as mock_stdout: | ||
| do_discover(config) | ||
| output = mock_stdout.getvalue().strip() | ||
| expected_output = json.dumps( | ||
| {"streams": ["stream1", "stream2"]}, indent=2 | ||
| ) | ||
| self.assertEqual(output, expected_output) | ||
| except Exception as e: | ||
| self.fail(f"Exception occurred: {str(e)}") |
There was a problem hiding this comment.
Some queries:
- Are we expecting any exception to occur in this test?
- At what point we are expecting exception to occur?
- Shouldn't we put try-catch block around it and use appropriate assertions to validate the exception?
RushiT0122
reviewed
Nov 28, 2023
| conn = client.SFTPConnection("10.0.0.1", "username", port="22") | ||
|
|
||
| rows_synced = sync.sync_file(conn, {"filepath": "/root_dir/file.csv.gz", "last_modified": "2020-01-01"}, None, {"key_properties": ["id"], "delimiter": ","}) | ||
| rows_synced = sync.sync_file(conn, {"filepath": "/root_dir/file.csv.gz", "last_modified": "2020-01-01"}, None, {"key_properties": ["id"], "delimiter": ","}, encoding_format=DEFAULT_ENCODING_FORMAT) |
There was a problem hiding this comment.
Line is too long with addition of encoding, please refactor it here and at other location as well.
RushiT0122
reviewed
Nov 29, 2023
Comment on lines
+210
to
+213
| # Verify that the full table was syncd | ||
| for tap_stream_id in self.expected_first_sync_streams(): | ||
| self.assertEqual(self.expected_first_sync_row_counts()[tap_stream_id], | ||
| record_count_by_stream[tap_stream_id]) |
There was a problem hiding this comment.
We should validate the extracted records with actual source records.
RushiT0122
reviewed
Dec 13, 2023
Comment on lines
+48
to
+54
| if encoding_format == "utf-8": | ||
| # Bypassing encoding check for `utf-8` as it is widely used | ||
| mock_is_valid_encoding.assert_not_called() | ||
| else: | ||
| mock_is_valid_encoding.assert_called_with("latin_1") | ||
| mock_discover_streams.assert_called_with(config, encoding_format) | ||
| self.assertEqual(captured_output, sys.stdout) # Ensure sys.stdout is restored |
There was a problem hiding this comment.
We should not use if-else for assertion specially in parameterised tests. We should separate these tests.
RushiT0122
approved these changes
Dec 13, 2023
cosimon
reviewed
Oct 22, 2024
Comment on lines
+21
to
+23
| if encoding_format != "utf-8": | ||
| if not is_valid_encoding(encoding_format): | ||
| raise Exception("Unknown Encoding - {}. Enter the valid encoding format".format(encoding_format)) |
Contributor
There was a problem hiding this comment.
is_valid_encoding() returns True for utf-8, these if statements can be combined.
Member
Author
There was a problem hiding this comment.
combined the if statements.
cosimon
approved these changes
Oct 22, 2024
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description of change
Currently, Tap-sftp exclusively supports the utf-8 encoding format. In order to enhance its versatility and compatibility, we are planning to expand support to encompass all encoding formats specified by Python 3.9.
Manual QA steps
Risks
Rollback steps