# Glass fiber-reinforced polyamide 66 3D X-ray computed tomography dataset for deep learning segmentation

This notebook is a demo of how to open `.raw` files.

You should first extract the contents of the `crack.zip` and `pa66.zip` files to this same directory.

The `.info` files contain the shape (`x`, `y`, and `z` sizes) of the corresponding files with the same name so that you don't need to type them.

The module `read_raw.py` has a function that will be used to load the files' contents into `numpy.ndarray`s.

Author: [`joaopcbertoldo`](joaopcbertoldo.github.io)

<a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-sa/4.0/88x31.png" /></a><br /><span xmlns:dct="http://purl.org/dc/terms/" href="http://purl.org/dc/dcmitype/Dataset" property="dct:title" rel="dct:type">Glass fiber-reinforced polyamide 66 3D X-ray computed tomography dataset for deep learning segmentation</span> by <span xmlns:cc="http://creativecommons.org/ns#" property="cc:attributionName"><a rel="author" href="https://orcid.org/0000-0002-9512-772X">Joao P C Bertoldo</a>, <a rel="author" href="https://orcid.org/0000-0002-1349-8042">Etienne Decencière</a>, <a rel="author" href="https://orcid.org/0000-0003-3268-4892">David Ryckelynck</a>, and <a rel="author" href="https://orcid.org/0000-0002-4075-5577">Henry Proudhon</a></span> is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">Creative Commons Attribution-ShareAlike 4.0 International License</a>.

In [1]:
import os
from pathlib import Path
import numpy as np

from read_raw import HST_read

In [2]:
data_dir = Path(".").absolute()
sorted(os.listdir(data_dir))

['.ipynb_checkpoints',
 '__pycache__',
 'crack.raw',
 'crack.raw.info',
 'crack.segmentation.raw',
 'crack.segmentation.raw.info',
 'crack.zip',
 'open_raw.ipynb',
 'pa66.ground_truth.raw',
 'pa66.ground_truth.raw.info',
 'pa66.raw',
 'pa66.raw.info',
 'pa66.test.error_volume.raw',
 'pa66.test.error_volume.raw.info',
 'pa66.test.segmentation.raw',
 'pa66.test.segmentation.raw.info',
 'pa66.zip',
 'pack.py',
 'read_raw.py',
 'readme.md']

You should see the files:

- `crack.raw`
- `crack.raw.info`
- `crack.segmentation.raw`
- `crack.segmentation.raw.info`
- `pa66.ground_truth.raw`
- `pa66.ground_truth.raw.info`
- `pa66.raw`
- `pa66.raw.info`
- `pa66.test.error_volume.raw`
- `pa66.test.error_volume.raw.info`
- `pa66.test.segmentation.raw`
- `pa66.test.segmentation.raw.info`

In [3]:
pa66 = dict(
    data="pa66.raw",
    ground_truth="pa66.ground_truth.raw",
    error="pa66.test.error_volume.raw",
    prediction="pa66.test.segmentation.raw",
)

crack = dict(
    data="crack.raw",
    prediction="crack.segmentation.raw",
)

# transform the filenames in Path objects (it's handy...)
pa66 = {
    key: data_dir / fname 
    for key, fname in pa66.items()
}

crack = {
    key: data_dir / fname 
    for key, fname in crack.items()
}

File sizes

In [4]:
# pa66
for key, file in pa66.items():
    print(f"{key=} {file.stat().st_size / 1024**3:.2f} GB")

key='data' 2.39 GB
key='ground_truth' 2.39 GB
key='error' 0.38 GB
key='prediction' 0.38 GB


In [5]:
# crack
for key, file in crack.items():
    print(f"{key=} {file.stat().st_size / 1024**3:.2f} GB")

key='data' 5.43 GB
key='prediction' 5.43 GB


shapes

In [6]:
for key, file in pa66.items():
    
    info_file = file.with_suffix(".raw.info")
    print(f"{info_file.name}")
    
    with info_file.open("r") as f:
        print(f.read(), "\n")

pa66.raw.info
! PyHST_SLAVE VOLUME INFO FILE
NUM_X = 1300
NUM_Y = 1040
NUM_Z = 1900
 

pa66.ground_truth.raw.info
! PyHST_SLAVE VOLUME INFO FILE
NUM_X = 1300
NUM_Y = 1040
NUM_Z = 1900
 

pa66.test.error_volume.raw.info
! PyHST_SLAVE VOLUME INFO FILE
NUM_X = 1300
NUM_Y = 1040
NUM_Z =  300
DATA_TYPE = uint8
 

pa66.test.segmentation.raw.info
! PyHST_SLAVE VOLUME INFO FILE
NUM_X = 1300
NUM_Y = 1040
NUM_Z =  300
DATA_TYPE = uint8
 



Notice that 2 of the files have 1900 z-slices (the full dataset), and 2 have 300 z-slice (only the test set).

In [7]:
for key, file in crack.items():
    
    info_file = file.with_suffix(".raw.info")
    print(f"{info_file.name}")
    
    with info_file.open("r") as f:
        print(f.read(), "\n")

crack.raw.info
: PyHST_SLAVE VOLUME INFO FILE
NUM_X = 1579
NUM_Y = 1845
NUM_Z = 2002
 

crack.segmentation.raw.info
! PyHST_SLAVE VOLUME INFO FILE
NUM_X = 1579
NUM_Y = 1845
NUM_Z = 2002
 



load the files

In [8]:
pa66_arrays = {
    key: HST_read(
        str(file),
        autoparse_filename=False,
        data_type=np.uint8,
        zrange=range(10),  # todo: removeme
    )
    for key, file in pa66.items()
}

In [9]:
crack_arrays = {
    key: HST_read(
        str(file),
        autoparse_filename=False,
        data_type=np.uint8,
        zrange=range(10),  # todo: removeme
    )
    for key, file in crack.items()
}

SystemExit: The file does not seem to be a PyHST info file

  warn("To exit: use 'exit', 'quit', or Ctrl-D.", stacklevel=1)


In [None]:
arrays["prediction"] = arrays["prediction"][:, :, :3]
arrays["error"] = arrays["error"][:, :, :3]

In [None]:
for key, arr in arrays.items():
    print(f"{key=} \n{arr.dtype=} \n{arr.shape=}\n")

In [None]:
# the compression algorithm
ft = tables.Filters(
    complevel=9, 
    complib='zlib', 
    shuffle=True, 
    bitshuffle=False,
)

In [None]:
ft

#### sample

In [None]:
bibtex_citation = "TO BE UPDATED"

sample_description=""\
"Glass fiber-reinforced Poly Amide 66 3D X-ray "\
"tomography dataset for deep learning segmentation.\n"\
f"If you use this volume, please cite us: {bibtex_citation}"

sample = SampleData(
    filename="pa66",  # for "PolyAmide66"
    sample_name="pa66", 
    sample_description=sample_description,
    filters=ft,
    replate=True,
)

#### all sets image (train, val, test)

##### data

In [None]:
sample.add_image(
    imagename="pa66",
    replace=True,
)

sample.set_description(
    sample_description, node="/pa66"
)

In [None]:
sample.add_data_array(
    location="/pa66",
    name="data",
    array=arrays["data"],
    filters=ft,
    replace=True,
)

sample.set_description(
    "Data (gray level image stack)", 
    node="/pa66/data",
)

##### ground truth

In [None]:
sample.add_data_array(
    location="/pa66",
    name="ground_truth",
    array=arrays["ground_truth"],
    filters=ft,
    replace=True,
)

sample.set_description(
    "Ground truth segmentation."\
    " Values are 0, 1, and 2, which respectively correspond "
    "to the phases matrix, fiber, and porosity", 
    node="/pa66/ground_truth",
)

#### test set image

In [None]:
?sample.set_origin

In [None]:
sample.add_image(
    imagename="pa66_test_set",
    replace=True,
)

In [None]:
sample.set_origin(
    "/pa66_test_set",
    (0, 0, 7),  # todo changeme to 300 later
)

In [None]:
sample.set_description(
    "Segmentation of the test set (last 300 layers) generated by a 2d u-net model.", 
    node="/pa66_test_set"
)

##### errors

In [None]:
sample.add_data_array(
    location="/pa66_test_set",
    name="prediction",
    array=arrays["prediction"],
    filters=ft,
    replace=True, 
)

sample.set_description(
    "Segmentation generated by a 2d u-net model.", 
    node="/pa66_test_set/prediction",
)

In [None]:
sample.add_data_array(
    location="/pa66_test_set",
    name="error",
    array=arrays["error"],
    filters=ft,
    replace=True, 
)

sample.set_description(
    "Disagreement between the ground truth and "
    "the model's prediction on the test set: "
    "1 means 'incorrect', 0 means 'correct'.", 
    node="/pa66_test_set/error",
)

# Things TODO

- do it full size

- fix the origin thing for the test volumes

- set the voxel size

- add these metadata

```
labels:
  - 0
  - 1
  - 2
labels_names:
  0: "matrix"
  1: "fiber"
  2: "porosity"
set_partitions:
  train:
    x_range: [0, 1300]
    y_range: [0, 1040]
    z_range: [0, 1300]
    alias: "train"
  val:
    x_range: [0, 1300]
    y_range: [0, 1040]
    z_range: [1078, 1206]
    alias: "val"
  test:
    x_range: [0, 1300]
    y_range: [0, 1040]
    z_range: [1300, 1600]
    alias: "test"
```