Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can tokenize using custom delimiters from UNA segment #3

Closed
wants to merge 2 commits into from

Conversation

ghost
Copy link

@ghost ghost commented May 12, 2019

For now we only support 4 out of 6 different delimiters.
We are currently missing the Decimal mark delimiter (.). The other delimiter is the Repetition separator but it is not supported in some versions of edifact.

@ghost ghost requested a review from orhantoy May 12, 2019 17:40
Copy link
Owner

@orhantoy orhantoy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work Chaymae 💪

I have a few suggestions, see the inline comments.

It would also be good to have a simple, high-level test case that tests actually parsing a message including a UNA segment - the tests you've added just test the tokenizer.
I'm saying this because we might need to strip/ignore the UNA segment in the parser but that would become easier to tell if you add this high-level test.

We are currently missing the Decimal mark delimiter (.).

This is intentional because it is up to the user to determine if a certain value is considered a numeric.
We could provide a helper method for this that uses the decimal mark delimiter to convert a value to a number. But for now let's just ignore it.

The other delimiter is the Repetition separator but it is not supported in some versions of edifact.

Not sure what the effect of that separator would be? Maybe something to consider in a later PR?

@@ -7,6 +7,7 @@
/spec/reports/
/tmp/
/Gemfile.lock
*.swp
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a better approach is to configure this on your local machine: https://help.github.com/en/articles/ignoring-files#create-a-global-gitignore

# TODO: Should check if the message starts with `UNA`, and then extract the different separator/terminator settings to be used for initializing the tokenizer.
new
def for_message(edifact_message)
if edifact_message =~ /\AUNA/
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if edifact_message =~ /\AUNA/
if edifact_message[0..2] == 'UNA'

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the second one is better than the first ?? 🤔

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regex seems like overkill for something like this.

def for_message(edifact_message)
if edifact_message =~ /\AUNA/
# Example: UNA:+.? '
new(release_character: edifact_message[6], segment_terminator: edifact_message[8], data_element_separator: edifact_message[4], component_data_element_separator: edifact_message[3])
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it could be more clear which characters after UNA are what. I'm thinking something like this:

component_data_element_separator, data_element_separator, _decimal_mark, release_character, _reserved, segment_terminator = edifact_message.slice(3, 6).split('')

It becomes a super long line so if it could be broken up that would be nice.

@@ -1,4 +1,36 @@
RSpec.describe Edifunct::Tokenizer do
describe ".for_message" do

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✂️

end

context "when UNA header is missing" do

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✂️

@ghost ghost mentioned this pull request Jun 9, 2019
@orhantoy
Copy link
Owner

orhantoy commented Mar 4, 2020

I made some modifications and pushed up 3d9f3f9, which is basically the changes you had here but with a few changes. Thanks for the contribution ✌️

@orhantoy orhantoy closed this Mar 4, 2020
@orhantoy orhantoy deleted the feature/dynamic-delimiters branch March 4, 2020 22:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants