-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Column headers from first dataset are used for all subsequent datasets #128
Comments
According to the spec the first four columns must always be Qz, R, [dR, [Qz] ]. If you have a column that is not one of those, then you need to add a fifth column. See https://www.reflectometry.org/file_format/specification#column-description |
Thanks @andyfaff, the example script is quite artificial just to demonstrate the behaviour. Our use case is that we're including the first four columns specified by the specification in all datasets, but in some datasets we want to include a further 4 columns. We find that only the column headers that were set on the first dataset are the ones that are used throughout the file. |
Can you try this? import numpy as np
from orsopy.fileio.data_source import DataSource, Person, Experiment, Sample, Measurement
from orsopy.fileio import Reduction, Software
from orsopy.fileio.orso import Orso, OrsoDataset, save_orso
from orsopy.fileio.base import Column, ErrorColumn
FILEPATH = "test_header_output.ort"
###################
# Set up the header data
###################
owner = Person(name=None, affiliation=None)
experiment = Experiment(
title=None,
instrument="Test Instrument",
start_date=None,
probe="neutron",
)
sample = Sample(name="Sample Name")
measurement = Measurement(instrument_settings=None, data_files=[])
creator = Person(name="Creator Name", affiliation="Affiliation")
software = Software(name="Software Name", version="v1")
data_source = DataSource(owner=owner, experiment=experiment, sample=sample, measurement=measurement)
reduction = Reduction(software=software, creator=creator)
###############################
# Create the first OrsoDataset for the file
###############################
columns_1 = [
Column(name="Qz", unit="1/angstrom", physical_quantity="normal_wavevector_transfer"),
Column(name="R", unit=None, physical_quantity="reflectivity"),
]
header_1 = Orso(data_source, reduction, columns_1, "dataset_1")
data_1 = np.array([
np.full(5, 2),
np.full(5, 3),
]).T
dataset_1 = OrsoDataset(info=header_1, data=data_1)
###################################################
# Create the second OrsoDataset for the file with one fewer column
###################################################
columns_2 = [
Column(name="Qz", unit="1/angstrom", physical_quantity="normal_wavevector_transfer"),
Column(name="R", unit=None, physical_quantity="reflectivity"),
ErrorColumn(error_of="R"),
ErrorColumn(error_of="Q"),
Column(name="Theta", unit=None, physical_quantity="incident_angle"),
]
header_2 = Orso(data_source, reduction, columns_2, "dataset_2")
data_2 = np.array([
np.full(5, 2),
np.full(5, 3),
np.full(5, 0.1),
np.full(5, 0.2),
np.full(5, 2.3),
]).T
dataset_2 = OrsoDataset(info=header_2, data=data_2)
####################
# Save the datasets to file
####################
save_orso(datasets=[dataset_1, dataset_2], fname=FILEPATH) |
Note that you can also output a NeXus file by appending these two lines to the code: from orsopy.fileio.orso import save_nexus
save_nexus(datasets=[dataset_1, dataset_2], fname=FILEPATH.replace(".ort", ".orb")) |
Hi @bmaranville, thanks for the re-worked example script. When I run that I think it demonstrates what I'm referring to - both datasets have only two column headers, Qz and R. This leaves the second, 5 column dataset with three columns that don't have headers:
Thanks for the information about the Nexus output, that's really useful to hear. And it looks like the column headers print out correctly there, so it's just the |
Ah - I think I understand the issue now. I was looking at the YAML headers, which look correct, but you're referring to the text column headers on one line before the data starts, and I see what you mean about only having 2 labels there. |
Fixed by eea1bf3 # # ORSO reflectivity data file | 1.1 standard | YAML encoding | https://www.reflectometry.org/
# data_source:
# owner:
# name: null
# affiliation: null
# experiment:
# title: null
# instrument: Test Instrument
# start_date: null
# probe: neutron
# sample:
# name: Sample Name
# measurement:
# instrument_settings: null
# data_files: []
# reduction:
# software: {name: Software Name, version: v1}
# creator:
# name: Creator Name
# affiliation: Affiliation
# data_set: dataset_1
# columns:
# - {name: Qz, unit: 1/angstrom, physical_quantity: normal_wavevector_transfer}
# - {name: R, physical_quantity: reflectivity}
# # Qz (1/angstrom) R
2.0000000000000000e+00 3.0000000000000000e+00
2.0000000000000000e+00 3.0000000000000000e+00
2.0000000000000000e+00 3.0000000000000000e+00
2.0000000000000000e+00 3.0000000000000000e+00
2.0000000000000000e+00 3.0000000000000000e+00
# data_set: dataset_2
# columns:
# - name: Qz
# unit: 1/angstrom
# physical_quantity: normal_wavevector_transfer
# - name: R
# physical_quantity: reflectivity
# - error_of: R
# - error_of: Q
# - name: Theta
# physical_quantity: incident_angle
# # Qz (1/angstrom) R sR sQ Theta
2.0000000000000000e+00 3.0000000000000000e+00 1.0000000000000001e-01 2.0000000000000001e-01 2.2999999999999998e+00
2.0000000000000000e+00 3.0000000000000000e+00 1.0000000000000001e-01 2.0000000000000001e-01 2.2999999999999998e+00
2.0000000000000000e+00 3.0000000000000000e+00 1.0000000000000001e-01 2.0000000000000001e-01 2.2999999999999998e+00
2.0000000000000000e+00 3.0000000000000000e+00 1.0000000000000001e-01 2.0000000000000001e-01 2.2999999999999998e+00
2.0000000000000000e+00 3.0000000000000000e+00 1.0000000000000001e-01 2.0000000000000001e-01 2.2999999999999998e+00 |
Brilliant, thanks very much @bmaranville. I'll keep an eye out for the next version of orsopy so we can pull in this fix. |
@bmaranville do we want to make a release of orsopy so that @rbauststfc can take advantage of this? |
sounds good to me... but I won't be able to help much until the middle of next week. |
There's no significant rush from our side, whenever you next have the opportunity would be great. |
We're always looking for new contributors to the project, so PRs and help advancing the cause is always great. |
As this is a stright forward bug-fix, I don't see a reason to not push a realse. I can take care of that latest next week. |
Fixed in 1.2.1., closing. |
The library gives you the flexibility to put multiple datasets in a single file. Each of these datasets can be set up with different numbers of columns, and the column names can also be different for each dataset. However, the saved file continues to use the column headers provided for the first dataset.
This script can be used to demonstrate:
The saved ORSO file gives the following column information for each dataset:
The R column header is present for all three datasets, but it is only relevant for the first. The final dataset should have Theta as the column header, but it remains as Qz.
For information, we have a use case for wanting multiple datasets with different numbers of columns in the same file. However, we don't currently have a use case for wanting multiple datasets with different column names, I just thought it would be worth pointing out the behaviour.
The text was updated successfully, but these errors were encountered: