Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse unit starting with dot #402

Closed
eimerej opened this issue Sep 17, 2020 · 8 comments
Closed

Parse unit starting with dot #402

eimerej opened this issue Sep 17, 2020 · 8 comments

Comments

@eimerej
Copy link

eimerej commented Sep 17, 2020

I have Las with unit starting with dot:
TDEP ..1IN :0.1-ft Frame Depth {F13.4}

In this case, the first dot is part of the mnemonic instead of the unit.
This is a special behavior implemented in lasio to support '..' caused by mnemonic abbreviation.
From the function configure_metadata_patterns(line, section_name)

if ".." in line and section_name == "Curves":
   # ...
    name_re = name_with_dots_re

According to the standard (2.0 and 3.0) mnemonic should not contains dot but unit could.

I see two way of solving this issue:

  1. follow the standard and do not use the name_with_dots_re regexp
  2. if there is some space between the mnemonic and the first dot assume that the dot belong to the unit and not to the mnemonic:
# replace
if ".." in line and section_name == "Curves":
#  by something like that
if re.search('[^ ]\.\.', line) and section_name == "Curves":
@kinverarity1
Copy link
Owner

Hi @eimerej thank you for raising this issue! Excellent points, and thank you for the PR also!

@dcslagel
Copy link
Collaborator

dcslagel commented Dec 24, 2020

Hi @kinverarity1, @eimerej,

Thinking through the [name][.][unit] parsing scenarios. While Lasio doesn't currently parse all of these in the following (expected?) ways, 1 - 4 are do-able with just the information in a given metadata line. Number 5 shows situations where parsing is ambiguous. Draft branch at #413, demonstrates test cases for these examples so you can see the current state of parsing behavior.

Could you check that these scenarios demonstrate the desired parsing behavior?

  1. TDEP ..1IN :0.1-ft Frame Depth {F13.4}
  • This could parse clearly as Name = 'TDEP', Unit = .1IN
  • Pattern [space][dot-delimiter][unit][space]
  1. TDEP.. 1IN :0.1-ft Frame Depth {F13.4}
  • This could parse clearly as Name = 'TDEP.', Unit = "", Value = "1IN"
  • Pattern [abbreviation-dot][dot-delimiter][unit][space]
  1. TDEP. . .1IN :0.1-ft Frame Depth {F13.4}
  • This could parse clearly as Name = 'TDEP.', Unit = "", Value = ".1IN"
  • Pattern [abbreviation-dot][space][dot-delimiter][unit][space]
  1. TDEP...1IN :0.1-ft Frame Depth {F13.4}
  • This could parse clearly as Name = 'TDEP.', Unit = .1IN
  • Pattern [abbreviation-dot][dot-delimiter][unit][space]
  1. TDEP..1IN :0.1-ft Frame Depth {F13.4}
    and
    TDEP. .1IN :0.1-ft Frame Depth {F13.4}
  • These cases are ambiguous as to how to parse Name and Unit
  • A sensible default approach is probably to add a configuration option that enables the user to specify which way to parse, set the config-option to parse in one way and log a warning message that informs the user of the ambiguity and configuration option to use adjust the parsing.

DC

@kinverarity1
Copy link
Owner

Sorry about the delay on getting to this issue, it's a tricky one! @dcslagel I think your points 1 through 5 above are correct. I'm not fussed about how to parse example 5 - whatever the parsing spits out seems reasonable.

@dcslagel How would you like me to proceed with #413?

@dcslagel
Copy link
Collaborator

dcslagel commented Jan 19, 2021

Hi @kinverarity1,

Okay, this is good feedback! Thanks. Hold off on proceeding with #413. The tests will fail for now. I've been mulling over how to implement handling these in code. I hope to work on it in the next couple of weeks.

DC

@dcslagel
Copy link
Collaborator

dcslagel commented Feb 5, 2021

Hi @eimerej,

The initial example in this issue TDEP ..1IN :0.1-ft Frame Depth {F13.4}, is it missing a value field? Is your expectation that this line should parse into fields like so:
name == "TDEP"
unit == ".1IN"
value == ""
descr == "0.1-ft Frame Depth {F13.4}"

Thanks,
DC

@eimerej
Copy link
Author

eimerej commented Feb 5, 2021

Hi @dcslagel,
Yes, there is no value in this case.

@dcslagel
Copy link
Collaborator

dcslagel commented Feb 17, 2021

Just a quick update, I'm still working on this, just delayed with other activities..

.. Making progress...
DC

@dcslagel
Copy link
Collaborator

dcslagel commented Mar 3, 2021

Hi @eimerej , @kinverarity1 ,

After spending a fair amount of time looking that this, it @eimerej 's solution in pull-request #403 is the right one for now. It takes care of the cases .1 and .2 that I listed above and the cases .3 and .4 are highly unlikely and can be solved if there is ever a real-world case reported. Case .5, @kinverarity1 , mentioned it is okay to let it resolve with the current logic in Laisio.

So all that said, I encourage merging pull request #403. Thank you for allowing me to dig in to this problem for extended period of time..

DC

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants