# EDA Preparation
Metadata including study/series information, SOP class, and number of files are stored at the case level in a file 'metadata.csv'. The purpose of this script is to add image level metadata, such as image height, width, aspect ratio, resolution, and maximum pixel data from the DICOM layover file to this metadata. 


In [1]:
import pandas as pd
from bcd.data_prep.meta import MetaPrep

In [2]:
fpin =  "/home/john/projects/bcd/data/interim/input/meta/metadata.csv"
fpout = "/home/john/projects/bcd/data/interim/input/meta/image_metadata.csv"
suspect = "data/raw/CBIS-DDSM/Mass-Training_P_01382_LEFT_MLO/07-20-2016-DDSM-93921/1.000000-full mammogram images-05891/1-1.dcm"

## Metadata
Metadata are summarized below.

df = pd.read_csv(fpin)
df.info()
df.head()

For each series, image metadata will be added increasing the number of records from 6775 to 10,239. Additional metadata from the DICOM overlay file will include:
```{table} Image Level Metadata
:name: image_level_metadata_ref
| variable     | from      | DICOM variable                |
|--------------|-----------|-------------------------------|
| width        | DICOM     | Columns                       |
| height       | DICOM     | Rows                          |
| aspect ratio | DICOM     | width / height                |
| bits         | DICOM     | Bits Stored                   |
| sip value    | DICOM     | SmallestImagePixelValue       |
| lip value    | DICOM     | LargestImagePixelValue        |
| pixel range  | DICOM     | lip value - sip value         |
```
These data will be added for the full image, the cropped image, and the ROI mask image.

## Create Metadata

In [3]:
mp = MetaPrep(infilepath=fpin, outfilepath=fpout)
mp.prep_images()

DEBUG:bcd.data_prep.meta:Processed 100 rows and 153 images in 6.22 seconds at 16.07 rows per second / 24.58 images per second.
DEBUG:bcd.data_prep.meta:Processed 200 rows and 299 images in 14.73 seconds at 13.58 rows per second / 20.3 images per second.
DEBUG:bcd.data_prep.meta:Processed 300 rows and 462 images in 20.53 seconds at 14.61 rows per second / 22.51 images per second.
DEBUG:bcd.data_prep.meta:Processed 400 rows and 615 images in 26.62 seconds at 15.03 rows per second / 23.1 images per second.
DEBUG:bcd.data_prep.meta:Processed 500 rows and 760 images in 33.07 seconds at 15.12 rows per second / 22.98 images per second.
DEBUG:bcd.data_prep.meta:Processed 600 rows and 904 images in 39.78 seconds at 15.08 rows per second / 22.73 images per second.
DEBUG:bcd.data_prep.meta:Processed 700 rows and 1047 images in 46.36 seconds at 15.1 rows per second / 22.58 images per second.
DEBUG:bcd.data_prep.meta:Processed 800 rows and 1193 images in 53.16 seconds at 15.05 rows per second / 22.