Allow different data types per curve in data section reader #461

kinverarity1 · 2021-04-25T03:39:03Z

This commit changes the data section reader function (reader.py:read_data_section_iterative) to return a generator which yields the data ndarray (1D) for each curve in turn. The old behaviour was for the function to return a 2D ndarray.

It also:

reshaping of the data array to 2D now occurs solely inside the data section reader function, which has a new argument "n_columns" to allow this to happen
adds dtypes kwarg to las.py:LASFile.read
add reader.py:identify_dtypes_from_data function to implement dtypes="auto" (default value) by using the first row of
data to automatically identify column data types
changes to las.py:LASFile.read to cater for the above

Consequences:

This PR will fundamentally resolve Data section containing non-numeric chars is parsed entirely as str #439 Data section containing non-numeric chars is parsed entirely as str
Provides a basis for fixing e.g. Time-indexed LAS file #416 Time-indexed LAS file
Possibly partially resolve (and provide a way to fix) NULL value replacing valid value in DEPT #227 NULL value replacing valid value in DEPT
@dcslagel Eventually we will need to merge this into PR Add a numpy engine for reading using numpy.genfromtxt() #452 (numpy engine) or rebase - I'm happy to help out with this.

Details:

Coverage increased from 85% to 86%
Benchmark test from 1.4 s to 1.0 s ?

---------- coverage: platform linux, python 3.6.13-final-0 -----------
Name                       Stmts   Miss  Cover
----------------------------------------------
lasio/__init__.py             13      2    85%
lasio/convert_version.py      20     20     0%
lasio/defaults.py             11      0   100%
lasio/examples.py             42     10    76%
lasio/excel.py                88     34    61%
lasio/exceptions.py            6      0   100%
lasio/las.py                 419     60    86%
lasio/las_items.py           190     29    85%
lasio/las_version.py          50     15    70%
lasio/reader.py              435     27    94%
lasio/writer.py              171      9    95%
----------------------------------------------
TOTAL                       1445    206    86%
Coverage XML written to file coverage.xml


----------------------------------------------- benchmark: 1 tests ----------------------------------------------
Name (time in s)                Min     Max    Mean  StdDev  Median     IQR  Outliers     OPS  Rounds  Iterations
-----------------------------------------------------------------------------------------------------------------
test_read_v12_sample_big     1.0652  1.0758  1.0700  0.0041  1.0694  0.0062       2;0  0.9346       5           1
-----------------------------------------------------------------------------------------------------------------

This commit changes the data section reader function (reader.py:read_data_section_iterative()) to return a generator which yields the data ndarray (1D) for each curve in turn. The old behaviour was for the function to return a 2D ndarray. It also: - reshaping of the data array to 2D now occurs solely inside the data section reader function, which has a new argument "n_columns" to allow this to happen - adds dtypes kwarg to las.py:LASFile.read - add reader.py:identify_dtypes_from_data() to implement dtypes="auto" (default value) by using the first row of data to automatically identify column data types - changes to las.py:LASFile.read to cater for the above

lasio/las.py

dcslagel

This change looks good. I had one small change request.

kinverarity1 added 2 commits April 25, 2021 12:57

Add tests for dtypes kwarg to LASFile.read

290b24f

kinverarity1 added the data-section-parser A bug or enhancement relating to the data section parser label Apr 25, 2021

kinverarity1 added 2 commits April 25, 2021 13:20

Add test for dtypes=False

270c6b2

Implement dtypes=False

06acbaf

kinverarity1 requested a review from dcslagel April 25, 2021 04:18

kinverarity1 mentioned this pull request Apr 25, 2021

Add a numpy engine for reading using numpy.genfromtxt() #452

Merged

dcslagel requested changes Apr 25, 2021

View reviewed changes

lasio/las.py Outdated Show resolved Hide resolved

dcslagel reviewed Apr 25, 2021

View reviewed changes

kinverarity1 added 2 commits April 26, 2021 09:42

Rename 'n' to 'curve_idx'

7fa577d

Use enumerate() instead of range(len())

fbd631a

This was referenced Apr 26, 2021

Support reading and writing all LAS 3.0 features #5

Open

Read data section as dataframe #424

Closed

kinverarity1 merged commit 5eb1854 into master Apr 26, 2021

kinverarity1 deleted the reshape-in-data-reader branch April 26, 2021 00:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow different data types per curve in data section reader #461

Allow different data types per curve in data section reader #461

kinverarity1 commented Apr 25, 2021 •

edited

Loading

dcslagel left a comment

Allow different data types per curve in data section reader #461

Allow different data types per curve in data section reader #461

Conversation

kinverarity1 commented Apr 25, 2021 • edited Loading

dcslagel left a comment

Choose a reason for hiding this comment

kinverarity1 commented Apr 25, 2021 •

edited

Loading