## **Creating python type and integration with pyarrow to parquet format**


**objective** : convert a dict structure as python type and save it as parquet with pyarrow

In [8]:
import pyarrow as pa

from pixano.core.arrow_types.all_pixano_types import createPaType, PixanoType

camera_setting = {
    "cam_K": [
        1758.377685546875,
        0.0,
        360.0000000121072,
        0.0,
        1781.137258093513,
        269.9999999622624,
        0.0,
        0.0,
        1.0,
    ],
    "cam_R_w2c": [
        -0.8058909773826599,
        -0.5643280148506165,
        -0.17909124493598938,
        -0.5611616969108582,
        0.8244928121566772,
        -0.0728636085987091,
        0.18877841532230377,
        0.04177902266383171,
        -0.9811305999755859,
    ],
    "cam_t_w2c": [-10.521206855773926, 40.88941192626953, 1092.1990966796875],
    "depth_scale": 0.1,
}


### **Step 1 : create your type**

- Go in pixano.core.arrow_types and create a file.py containing your type. 

- Use **PixanoType** as base class

- Define attributs and your methods

- **Overide to_struct method** :  the **name of fields** corresponds with the **attribut who will be export** to pyarrow ! 

PixanoType provide a generic **from_dict** and **to_dict** methods based on the to_struct fields names.


In [9]:
class Camera(PixanoType):
    def __init__(self, cam_K, cam_R_w2c, cam_t_w2c, depth_scale):
        self.cam_K = cam_K
        self.cam_R_w2c = cam_R_w2c
        self.cam_t_w2c = cam_t_w2c
        self.depth_scale = depth_scale
    
    @staticmethod
    def to_struct():
        return pa.struct([
            pa.field('cam_K', pa.list_(pa.float64())),
            pa.field('cam_R_w2c', pa.list_(pa.float64())),
            pa.field('cam_t_w2c', pa.list_(pa.float64())),
            pa.field('depth_scale', pa.float64())
        ])


### **Step 2 : integration with pyarrow**

- Initialise type by calling **createPaType** and stock it in a variable

Now you can have acces to the Array class from the variable and creating pyarrow array.





In [10]:
CameraType = createPaType(Camera.to_struct(), 'Camera', Camera)

cam1 = Camera.from_dict(camera_setting)
cam_arr = CameraType.Array.from_list([cam1])
cam_arr

<CameraArray object at 0x7f1b757df9a0>
-- is_valid: all not null
-- child 0 type: list<item: double>
  [
    [
      1758.377685546875,
      0,
      360.0000000121072,
      0,
      1781.137258093513,
      269.9999999622624,
      0,
      0,
      1
    ]
  ]
-- child 1 type: list<item: double>
  [
    [
      -0.8058909773826599,
      -0.5643280148506165,
      -0.17909124493598938,
      -0.5611616969108582,
      0.8244928121566772,
      -0.0728636085987091,
      0.18877841532230377,
      0.04177902266383171,
      -0.9811305999755859
    ]
  ]
-- child 2 type: list<item: double>
  [
    [
      -10.521206855773926,
      40.88941192626953,
      1092.1990966796875
    ]
  ]
-- child 3 type: double
  [
    0.1
  ]

### **Step 3 : create Table**

- Import needed type

- Create all necessary **array**

- Define **schema**




In [11]:
from pixano.core.arrow_types.bbox import BBox, BBoxType

bbox_arr = BBoxType.Array.from_list([BBox.from_xywh([1, 2, 3, 4])])

schema=pa.schema(
            [
                pa.field("Camera", CameraType),
                pa.field("Bbox", BBoxType)
            ]
        )



- Create table and save it as parquet

In [12]:

table = pa.Table.from_arrays([cam_arr, bbox_arr], schema=schema)

import tempfile
import pyarrow.parquet as pq

with tempfile.NamedTemporaryFile(suffix=".parquet") as temp_file:
    temp_file_path = temp_file.name
    pq.write_table(table, temp_file_path, store_schema=True)
    re_table = pq.read_table(temp_file_path)

You can now read table and convert back to python type

In [13]:
re_table.to_pylist()

[{'Camera': <__main__.Camera at 0x7f1b75679590>,
  'Bbox': <pixano.core.arrow_types.bbox.BBox at 0x7f1b7567a190>}]

In [19]:
Bbox0 = re_table.to_pylist()[0]['Bbox']
Bbox0.to_xyxy()

[1.0, 2.0, 4.0, 6.0]