Can tokenize using custom delimiters from UNA segment #3

ghost · 2019-05-12T17:40:38Z

For now we only support 4 out of 6 different delimiters.
We are currently missing the Decimal mark delimiter (.). The other delimiter is the Repetition separator but it is not supported in some versions of edifact.

orhantoy

Nice work Chaymae 💪

I have a few suggestions, see the inline comments.

It would also be good to have a simple, high-level test case that tests actually parsing a message including a UNA segment - the tests you've added just test the tokenizer.
I'm saying this because we might need to strip/ignore the UNA segment in the parser but that would become easier to tell if you add this high-level test.

We are currently missing the Decimal mark delimiter (.).

This is intentional because it is up to the user to determine if a certain value is considered a numeric.
We could provide a helper method for this that uses the decimal mark delimiter to convert a value to a number. But for now let's just ignore it.

The other delimiter is the Repetition separator but it is not supported in some versions of edifact.

Not sure what the effect of that separator would be? Maybe something to consider in a later PR?

orhantoy · 2019-05-13T23:21:08Z

.gitignore

@@ -7,6 +7,7 @@
 /spec/reports/
 /tmp/
 /Gemfile.lock
+*.swp


I think a better approach is to configure this on your local machine: https://help.github.com/en/articles/ignoring-files#create-a-global-gitignore

orhantoy · 2019-05-13T23:24:27Z

lib/edifunct/tokenizer.rb

-        # TODO: Should check if the message starts with `UNA`, and then extract the different separator/terminator settings to be used for initializing the tokenizer.
-        new
+      def for_message(edifact_message)
+        if edifact_message =~ /\AUNA/


Suggested change

if edifact_message =~ /\AUNA/

if edifact_message[0..2] == 'UNA'

why the second one is better than the first ?? 🤔

Regex seems like overkill for something like this.

orhantoy · 2019-05-13T23:36:27Z

lib/edifunct/tokenizer.rb

+      def for_message(edifact_message)
+        if edifact_message =~ /\AUNA/
+          # Example: UNA:+.? '
+          new(release_character: edifact_message[6], segment_terminator: edifact_message[8], data_element_separator: edifact_message[4], component_data_element_separator: edifact_message[3])


I think it could be more clear which characters after UNA are what. I'm thinking something like this:

component_data_element_separator, data_element_separator, _decimal_mark, release_character, _reserved, segment_terminator = edifact_message.slice(3, 6).split('')

It becomes a super long line so if it could be broken up that would be nice.

orhantoy · 2019-05-13T23:36:48Z

spec/edifunct/tokenizer_spec.rb

@@ -1,4 +1,36 @@
 RSpec.describe Edifunct::Tokenizer do
+  describe ".for_message" do
+


orhantoy · 2019-05-13T23:36:59Z

spec/edifunct/tokenizer_spec.rb

+    end
+
+    context "when UNA header is missing" do
+


orhantoy · 2020-03-04T22:10:51Z

I made some modifications and pushed up 3d9f3f9, which is basically the changes you had here but with a few changes. Thanks for the contribution ✌️

chaymaeBZ added 2 commits May 12, 2019 19:36

can tokenize using custom delimiters

1714931

remove todo comment

5791ebd

ghost requested a review from orhantoy May 12, 2019 17:40

orhantoy requested changes May 13, 2019

View reviewed changes

ghost mentioned this pull request Jun 9, 2019

Wrong UNA segment tokenization #4

Closed

orhantoy closed this Mar 4, 2020

orhantoy deleted the feature/dynamic-delimiters branch March 4, 2020 22:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can tokenize using custom delimiters from UNA segment #3

Can tokenize using custom delimiters from UNA segment #3

ghost commented May 12, 2019 •

edited by ghost

Loading

orhantoy left a comment

orhantoy May 13, 2019

orhantoy May 13, 2019

ghost May 25, 2019

orhantoy May 26, 2019

orhantoy May 13, 2019

orhantoy May 13, 2019

orhantoy May 13, 2019

orhantoy commented Mar 4, 2020

	if edifact_message =~ /\AUNA/
	if edifact_message[0..2] == 'UNA'

		@@ -1,4 +1,36 @@
		RSpec.describe Edifunct::Tokenizer do
		describe ".for_message" do

Can tokenize using custom delimiters from UNA segment #3

Can tokenize using custom delimiters from UNA segment #3

Conversation

ghost commented May 12, 2019 • edited by ghost Loading

orhantoy left a comment

Choose a reason for hiding this comment

orhantoy May 13, 2019

Choose a reason for hiding this comment

orhantoy May 13, 2019

Choose a reason for hiding this comment

ghost May 25, 2019

Choose a reason for hiding this comment

orhantoy May 26, 2019

Choose a reason for hiding this comment

orhantoy May 13, 2019

Choose a reason for hiding this comment

orhantoy May 13, 2019

Choose a reason for hiding this comment

orhantoy May 13, 2019

Choose a reason for hiding this comment

orhantoy commented Mar 4, 2020

ghost commented May 12, 2019 •

edited by ghost

Loading