# Mix of Inversion files

The aim of this notebook is to mix indoor and outdoor inversion data to create a full synthetic dataset.

Size fo indoor and outdoor datasets are as following:

- indoor images are 229, with 0° and 180° phase which leads to 458 inverted holeographic images.
- outdoor images are 208, with 0° and 180° phase which leads to 416 inverted holeographic images.

Our goal here is to mix all these possible combination of images.

The parameters that we take into consideration to mix that dataset (and that will remain as metadata file) are:

- indoor
  - object number (0 - 14)
  - orientation (A, B, C, D)
  - distance from source (4, 8)
  - inclination (0, 20)
- outdoor
  - orientation (A, B, C, D)

Regarding the inversion, we also have an additional parameter for both indoor and outdoor:

- indoor
  - inversion phase (0, 180)
- outdoor
  - inversion phase (0, 180)


## Indoor and Outdoor metadata

The idea is to have `indoor` and `outdoor` metadata as `CSV` files.

These files will eventually be used for mixing information.

```
  ├── interim/               - inversion folder
  │   ├── indoor/               - indoor inversion
  │   │    ├── 1-A-4-20-0.npy
  │   │    ├── 1-A-4-20-180.npy
  │   │    └── ...
  │   └── outdoor/              - outdoor inversion
  │        ├── 1-25-4-0.npy
  │        ├── 1-25-4-180.npy
  │        └──...
  │
  ├── processed/
  │   └── meta/                 - outdoor inversion
  │        ├── indoor.csv           - annotation for indoor dataset
  │        ├── outdoor.csv          - annotation for outdoor dataset
  │        └── mixed.csv            - annotation for mixed dataset (indoor and outdoor) all possible combinations

```


## Dataset structure

The main forlder will be `processed`.

Then, while the `abcdefg.npy` object is obviously a `numpy` tensor of `[62, 62, 40]` shape (or `[40, 62, 62]`), the json metadata file is organized as follow:

---

```csv
file_name, id, location, category, name, orientation, distance_from_source, inclination, inversion_phase, shape
in_low_01_20_A_6_180, 01, "indoor", "mine", "pmn-4", "A", 8, 20, 180, "zig-zag"
in_low_01_20_A_6_0, 01, "indoor", "mine", "pmn-4", "A", 8, 20, 0, "zig-zag"
...
```

---

```csv
file_name, id, location, category, orientation, inversion_phase, shape, additional
43_25_6_180, 43, "outdoor", "ground-smarta", 25, 180, "zig-zag", null
43_25_6_0, 43, "outdoor", "ground-smarta", 25, 0, "zig-zag", null
...
```

---

And then we also have mixed inversions as follow:

---

```csv
mix_name, in_name, in_id, in_category, in_orientation, in_distance_from_source, in_inclination, in_inversion_phase, in_shape, out_name, out_id, out_category, out_orientation, out_inversion_phase, out_shape, out_additional

in_low_01_20_A_6_180-43_25_6_180, in_low_01_20_A_6_180, 01, "pmn-4", "A", 8, 20, 180, "zig-zag", 43_25_6_180, 43, "ground-smarta", 25, 180, "zig-zag", null
in_low_01_20_A_6_0-43_25_6_0, in_low_01_20_A_6_0, 01, "pmn-4", "A", 8, 20, 0, "zig-zag", 43_25_6_0, 43, "ground-smarta", 25, 0, "zig-zag", null
...
```

---


## Extracting metadata


In [1]:
import pandas as pd
import os
import numpy as np
from src.utils.const import *
import matplotlib.pyplot as plt

locations = ['indoor', 'outdoor']
prefixes = {
    'indoor': ['in'],
    'outdoor': ['']
}
positions = {
    'indoor': ['bas', 'low', 'pad'],
    'outdoor': ['']
}

in_categories = []


df = pd.read_csv(datapath / Path('raw/indoor_objects.csv'))
interimpath = datapath / Path('interim')

In [2]:
metapath

PosixPath('/Users/emanuelevivoli/asmara/data/processed/meta')

In [3]:
list(df['id'])


[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]

In [4]:
df_dict = {k: (v1, v2) for k, (v1, v2) in zip(
    df['id'], zip(df['name'], df['classification']))}


In [5]:
df_dict


{1: ('pmn4', 'mine'),
 2: ('pmn1', 'mine'),
 3: ('vs50', 'mine'),
 4: ('dm11', 'mine'),
 5: ('M14', 'mine'),
 6: ('pma2', 'mine'),
 7: ('wood-cylinder', 'clutter'),
 8: ('wrapped-can', 'clutter'),
 9: ('stone', 'clutter'),
 10: ('coin', 'archeology'),
 11: ('clay-holes', 'archeology'),
 12: ('clay-full', 'archeology'),
 13: ('clay-big', 'archeology'),
 14: ('knife', 'archeology')}

### Indoor


In [6]:
indexes = list(range(0, 6))
# name, id, category, orientation, distance_from_source, inclination, inversion_phase, shape
prefix = 'in'
keys = ["location", "distance_from_source", "id", "orientation",
        "shape", "inversion_phase"]  # inclination, name, category

############
#! INDOOR
############

location = 'indoor'
in_metadata = []
# in_inversions = []

names = os.listdir(os.path.join(interimpath, location))
names = [name for name in names if name.endswith('.npy')]

for name in names:

    name = name.split('.')[0]
    name_list = name.split('_')
    if len(name_list) > 6:
        custom_indexes = np.ones(len(indexes), dtype=int)
        custom_indexes[0] = 0
        custom_indexes[1] = 0
        custom_indexes[2] = 0
        custom_indexes += indexes
        inclination = 20
    else:
        inclination = 0
        custom_indexes = indexes

    obj = {}
    for c_index, c_key in zip(custom_indexes, keys):
        obj[f'{prefix}_{c_key}'] = name_list[c_index]

    # obj['location'] = 'indoor' if obj['location'] == 'in' else 'outdoor'
    obj[f'{prefix}_location'] = location
    obj[f'{prefix}_distance_from_source'] = 8 if obj[f'{prefix}_distance_from_source'] == 'low' else 4 if obj[f'{prefix}_distance_from_source'] == 'bas' else None
    obj[f'{prefix}_inclination'] = inclination
    obj[f'{prefix}_file_name'] = name
    category_, name_ = df_dict.get(int(obj[f'{prefix}_id']), (None, None))
    # category = "mine"
    obj[f'{prefix}_category'] = category_
    # name = "pmn-4"
    obj[f'{prefix}_name'] = name_

    in_metadata.append(obj)

    # inv_file = np.load(os.path.join(interimpath, location, f'{name}.npy'))
    # in_inversions.append(inv_file)

# assert len(in_inversions) == len(in_metadata)


### Save CSV


In [7]:
in_df = pd.DataFrame.from_dict(in_metadata)
in_columns = ['file_name', 'id', 'category', 'name', 'orientation', 'distance_from_source', 'inclination', 'inversion_phase', 'shape', 'location']
in_columns = [f'{prefix}_{column}' for column in in_columns]
in_df.to_csv(metapath / Path('indoor.csv'), index=False, header=True,
             columns=in_columns)


### Outdoor


In [8]:
indexes = list(range(0, 4))
prefix = 'out'
keys = ["id", "orientation", "shape", "inversion_phase"]

############
#! OUTDOOR
############

location = 'outdoor'
out_metadata = []
out_inversions = []

names = os.listdir(os.path.join(interimpath, location))
names = [name for name in names if name.endswith('.npy')]

for name in names:

    name = name.split('.')[0]
    name_list = name.split('_')
    if len(name_list) > 4:
        custom_indexes = np.zeros(len(indexes), dtype=int)
        custom_indexes[3] = 1
        custom_indexes += indexes
        additional = name_list[3]
    else:
        additional = None
        custom_indexes = indexes

    obj = {}
    for c_index, c_key in zip(custom_indexes, keys):
        obj[f'{prefix}_{c_key}'] = name_list[c_index]

    obj[f'{prefix}_additional'] = additional
    obj[f'{prefix}_file_name'] = name
    obj[f'{prefix}_location'] = location

    category_ = 'ground-smarta'
    # category = "mine"
    obj[f'{prefix}_category'] = category_

    out_metadata.append(obj)

    # inv_file = np.load(os.path.join(interimpath, location, f'{name}.npy'))
    # out_inversions.append(inv_file)

# assert len(out_inversions) == len(out_metadata)


In [9]:
out_df = pd.DataFrame.from_dict(out_metadata)
out_columns=['file_name', 'id', 'category', 'orientation', 'inversion_phase', 'shape', 'location', 'additional']
out_columns = [f'{prefix}_{column}' for column in out_columns]
out_df.to_csv(metapath / Path('outdoor.csv'), index=False, header=True,
              columns=out_columns)


## Mixig In with Out


In [10]:
import json
from tqdm import tqdm

# mix_inversion = []

mix_metadata = []

if not os.path.exists(metapath):
    os.makedirs(metapath)

# if not os.path.exists(os.path.join(DATA_PROC, 'inversions')):
#     os.makedirs(os.path.join(DATA_PROC, 'inversions'))

for i in tqdm(range(len(in_metadata))):
    in_meta = in_metadata[i]
    # in_inv = in_inversions[i]

    for j in range(len(out_metadata)):
        out_meta = out_metadata[j]
        # out_inv = out_inversions[j]

        # indexes
        # print(f'{i}/{len(in_inversions)} - {j}/{len(out_inversions)}')

        # mixing metadata
        meta = {}
        meta['mix_name'] = f'{in_meta["in_file_name"]}__out_{out_meta["out_file_name"]}'
        # meta['indoor'] = in_meta
        # meta['outdoor'] = out_meta
        meta = {**meta, **in_meta, **out_meta}
        mix_metadata.append(meta)

        # with open(f'{metapath}/{meta["name"]}.json', 'w') as f:
        #     json.dump(meta, f)

        # ? we don't need to create the mixing dataset ...
        # ? we actually mix them online :)
        # mixing tensors
        # mixed_inv = in_inv + out_inv
        # mix_inversion.append(mixed_inv)
        # np.save(file=f'{DATA_PROC}/inversions/{meta["name"]}.npy', arr=mixed_inv)

# assert len(mix_inversion) == len(mix_metadata)


100%|██████████| 458/458 [00:01<00:00, 337.40it/s]


In [11]:
mix_df = pd.DataFrame.from_dict(mix_metadata)

mix_columns=['mix_name'] + in_columns + out_columns
mix_df.to_csv(metapath / Path('mix.csv'), index=False, header=True,
              columns=mix_columns)