Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when LAS data contains text with spaces enclosed in double quotes #271

Closed
ghost opened this issue Mar 6, 2019 · 8 comments
Closed

Comments

@ghost
Copy link

ghost commented Mar 6, 2019

I am encountering an error trying to load a LAS file that has a paramaeter that sometimes contains spaces. This seem to occur for many LAS files that contain picks: the pick name is a text field which may contain spaces.

It is possible to replace these spaces with underscores, or something like that?

Here's an example LAS file:

~Version
 VERS              .         2.0                           : CWLS LOG ASCII STANDARD - VERSION 2.0
 WRAP              .         NO                            : ONE LINE PER DEPTH STEP
 DLM               .         SPACE                         : DELIMITING CHARACTER(SPACE TAB OR COMMA)
~Well Information
#_______________________________________________________________________________
#
#PARAMETER_NAME    .UNIT     VALUE                         : DESCRIPTION
#_______________________________________________________________________________
STRT               .m        321.16                        : First reference value
STOP               .m        3188.59                       : Last reference value
STEP               .m        0                             : Step increment
NULL               .         -9999                         : Missing value
WELL               .         xxx                           : Well name
~Curve Information
#_______________________________________________________________________________
#
#LOGNAME           .UNIT     LOG_ID                        : DESCRIPTION
#_______________________________________________________________________________
MD                 .m                                      :  
ZONE               .unitless                               :  
~Ascii
321.16     pick_alpha
1753.2     pick_beta    
1953.5     "pick gamma"      
2141.05    "pick delta"    
2185.34    pick_epsilon

Here is what I get from LASIO

# Try to load data as pandas table
las = lasio.read('data/test_file.las')
las.df()

The result is badly parsed:

	ZONE
MD	
321.16	pick_alpha
1753.2	pick_beta
1953.5	"pick
gamma"	2141.05
"pick	delta"
2185.34	pick_epsilon

PS thanks for your work on the library so far, it is proving tremendously useful!

@ghost
Copy link
Author

ghost commented Mar 6, 2019

I've found a workaround which seems to be okay, passing a couple of regex substitutions with the null_policy keyword:

LAS_CLEANERS = [
    # matches two words within double quotes
    (r'"([a-zA-Z]*) ([a-zA-Z]*)"', r'\1_\2'),

    # matches three words within double quotes
    (r'"([a-zA-Z]*) ([a-zA-Z]*) ([a-zA-Z]*)"', r'\1_\2_\3')
]

las = lasio.read('data/test_file.las', null_policy=LAS_CLEANERS)

I'm not sure how to generalise this to further number of words. But this works for now :)

@kinverarity1
Copy link
Owner

Thanks for the report! That's a neat work-around.

Unfortunately lasio is still only written to support numerical data sections (i.e. LAS v2), so I wouldn't expect that to work yet. I'd like to support text data sections but haven't had time to work on it.

@ghost
Copy link
Author

ghost commented Mar 6, 2019

OK, fair enough. I'll close the issue for now then, as we have a simple workaround.

@ghost ghost closed this as completed Mar 6, 2019
@kinverarity1
Copy link
Owner

No worries - it's certainly something I'd like to have implemented. Thanks for the example!

@ghost
Copy link
Author

ghost commented Feb 18, 2021

Hi @kinverarity1 ,

I've realised my solution above has a major disadvantage - it turns off the default NULL_POLICY, so regular null values are ignored!

Is there a way to use some custom regex substitutions, in combination with a NULL_POLICY from lasio.defaults?

@ghost ghost reopened this Feb 18, 2021
@Anjum48
Copy link

Anjum48 commented Feb 19, 2021

At the moment, if one or more of the columns are non-numeric, the NULL_POLICY fails to replace missing values with np.nan.

This is because when the below array is created

lasio/lasio/reader.py

Lines 456 to 458 in 817fb82

array = np.array(
[i for i in items(file_obj, start_line_no=line_nos[0], end_line_no=line_nos[1])]
)

the array will be of type string (<U32), and the missing numbers in lasio.defaults.NULL_SUBS won't be matched, since we are comparing as string e.g. "-999" against a number like -999.

@kinverarity1
Copy link
Owner

Thanks @Anjum48 - I have opened a new issue for that.

@kinverarity1
Copy link
Owner

@Connossor I have changed the data section code to split into items while respecting quoted strings. Hopefully that fixes the original issue you raised, although obviously other NULL values are still being ignored per @Anjum48's comment. I've opened #422 to deal with that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants