Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error while extracting curves, "Data must be 1-dimensional" #405

Closed
AchyuthB opened this issue Dec 15, 2021 · 6 comments
Closed

Error while extracting curves, "Data must be 1-dimensional" #405

AchyuthB opened this issue Dec 15, 2021 · 6 comments

Comments

@AchyuthB
Copy link

Hi,
For few of the LIS files, while extracting curve data, it gives the error Data must be 1-dimensional.

code:
curves = lis.curves(logical_file, format_spec, sample_rate=sample_rate, strict=False)

Is there a way to get around this and fetch the curve data successfully?

Thanks in advance.

-Achyuth

@ErlendHaa
Copy link
Contributor

Is this perhaps a Pandas error?

@AchyuthB
Copy link
Author

Hi @ErlendHaa,

Yes, you are right. It is Pandas error. I am trying to transpose the set of data returned by lis.curves(). Sometimes, the transpose fails, because the data returned by lis..curves() is not in the expected format. I was just wondering if there was any means to format the data returned by lis.curves() so that it can be transposed.

Below is the screen shot of working and error data. Both belongs to the same lis file.
image

Code snippet

def extract_curve_data(logical_file_index, format_spec_index, logical_file, format_spec):
    try:
        for sample_rate in g_sample_rates:
            curves = lis.curves(logical_file, format_spec, sample_rate=sample_rate, strict=False)
            if len(curves) == 0:
                log_debug(" - Script Info: No curve data available for format_spec with index " + str(format_spec_index) + " and sample_rate " + str(sample_rate))
                continue
            curves_data = pd.DataFrame(curves).T
    except Exception as e:
        log_debug(" - Script Warn: While processing curve data - " + str(e))

Output

- Script Info: Processing physical file :: D:\FeedFiles\LIS\/205_27-_2_dwl__1631147.lis

- Script Info: Extracting data for logical file :: FBSTB .005
- Script Info: No curve data available for format_spec with index 0 and sample_rate 1
- Script Info: No curve data available for format_spec with index 0 and sample_rate 60
- Script Info: No curve data available for format_spec with index 0 and sample_rate 4
- Script Warn: While processing curve data - Data must be 1-dimensional

@ErlendHaa
Copy link
Contributor

ErlendHaa commented Dec 16, 2021

There is nothing wrong with the data, nor the formatting of it. The returned array from lis.curves is as expected. The thing is that one or more of the curves in the file contains non-scalar sample values. I.e. each depth sample is an array of data - instead of a single value. A practical example would be an image-curve. numpy handles this completely fine, but pandas is made for strictly tabular data and nested structures aren't allowed (hence the "not 1 dimensional" error).

If you are force to work with Pandas and not numpy you could:

[1] drop the non-scalar curves before creating the dataframe (check the numpy.dtype of each curve)
[2] expand each sample-array into one column pr sample. I know people do this, but honestly it creates a mess and would advice against it.

If you want to work with these none-scalar curves I strongly suggest sticking to numpy, because it handles the complexity of the data structure just fine without any weird transformations.

@AchyuthB
Copy link
Author

Hi @ErlendHaa,
Thank you for explaining this in detail.
The non-scalar curves look like the multi-dimensional ones in DLIS.

Regards,
Achyuth

@ErlendHaa
Copy link
Contributor

Yes, they are the same thing. At least conceptually.

@AchyuthB
Copy link
Author

AchyuthB commented Jan 6, 2022

Hi,
I somehow managed to seperate and extract the single and multi dimensional curve data from the LIS files.
I know this is not the perfect solution. Will work on improving the code. :-)

def seperate_curve_data(curves):
    # curves = lis.curves(logical_file, format_spec, sample_rate=sample_rate, strict=False)

    multi_dim_curves  = []
    single_dim_curves = []
    curves_meta = curves.dtype.names

    for row_values in curves:
        multi_dim_curve  = list()
        single_dim_curve = list()
        multi_dim_index = list()
        single_dim_index = list()
        multi_spec_blocks = list()
        single_spec_blocks = list()

        i=0
        for curve in row_values:
            
            if len(np.shape(curve)) > 0 and len((curve)) > 1:
                multi_dim_curve.append(tuple(curve))
                multi_dim_index.append(i)
            else:
                single_dim_curve.append(curve)
                single_dim_index.append(i)
            i = i + 1

        multi_dim_curves.append(multi_dim_curve)
        single_dim_curves.append(tuple(single_dim_curve))

    single_curves = np.asarray(single_dim_curves) 

    for i in multi_dim_index:
        multi_spec_blocks.append(list(curves_meta)[i])

    for i in single_dim_index:
        single_spec_blocks.append(list(curves_meta)[i])

    return single_curves, single_spec_blocks, multi_dim_curves, multi_spec_blocks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants