# binda Example Usage
This notebook provides a walkthrough example of analysing and editing a binary file using binda.

The example will load a jpeg file, analyse its Exif metadata and change the make and model of the camera used to take the photo.

The specificaton of the jpeg Exif metadata is provided here: https://www.media.mit.edu/pia/Research/deepview/exif.html



## Import binda

In [1]:
import binda as bd

## Load photo
Load as binary data

In [3]:
data = None
with open('PHOTO.JPG', 'rb') as file:
  data = file.read()

# Read JPEG Header

We need to determine the byteorder for the Exif data. This differs between camera manafacturers. To get this, create a structure defining the first 14 bytes of the jpeg file. The 2 last of these contain the byte order as a 2 byte string.

## Define the Structure of the header
All tags will be defined as a big endian int, and converted to a hex string by applying hex to the dataframe column. This will make it easier to reconcile the tags against the spec document.
* 2 bytes for the jpeg marker containing 0xff, 0xdb
* 2 bytes for the exif marker containing 0xff, 0xe1
* 2 byte int for the size of the exif section
* 4 byte string containing 'Exif'
* 2 bytes containing 0x00, 0x00
* 2 bytes to indicate ByteOrder. 'II' for little endian and 'MM' for big endian

## Create a data handler and read it
* Create the data handler
* Read and display the header

In [4]:
# Define the header structure
jpeg_header_sructure = bd.Structure(0, [
      bd.Variable(name='jpeg_tag', size=2, datatype=int, byteorder=bd.ByteOrder.BIG),
      bd.Variable(name='exif_tag', size=2, datatype=int, byteorder=bd.ByteOrder.BIG),
      bd.Variable(name='exif_size', size=2, datatype=int),
      bd.Variable(name='exif_label', size=4, datatype=str),
      bd.Variable(name='zeros', size=2, datatype=int),
      bd.Variable(name='byteorder', size=2, datatype=str)
    ])

# Create a data handler and read the Exif header. Exif strings are in ascii format
handler = bd.DataHandler(data, structures={'jpeg_header': jpeg_header_sructure}, str_encode='ascii')
jpeg_header = handler.read_structure('jpeg_header')

# Convert the tags to a hex string.
jpeg_header['jpeg_tag'] = jpeg_header['jpeg_tag'].apply(hex)
jpeg_header['exif_tag'] = jpeg_header['exif_tag'].apply(hex)

# Get the byte order
byteorder = bd.ByteOrder.LITTLE if jpeg_header.loc[0, 'byteorder'] == 'II' else bd.ByteOrder.BIG
print(f"EXIF tags use a {byteorder.value} endian byte order.")
jpeg_header

EXIF tags use a big endian byte order.


Unnamed: 0,jpeg_tag,exif_tag,exif_size,exif_label,zeros,byteorder
0,0xffd8,0xffe1,65071,Exif,0,MM


## Read the Exif Image File Directory (IFD) Header
Now that we know the byte order, we can read the first IFD header (IFD0). This contains the tags for the main image contained in the jpeg file. Reading the header will give us the number of IFD tags contained in the jpeg file.
* Define the structure and add it to the data handler
* Read and display the directory header
* Get the number of tags available in the IFD

In [5]:
# Define the structure for the IFD header.
# 2 bytes for the IFD marker (2A00)
# 4 bytes pointing to where the IFD directory starts. Should be 8 
# 2 byte int for the number of entries / tags
ifd_header_structure = bd.Structure(len(jpeg_header_sructure), [
      bd.Variable(name='ifd_tag', size=2, datatype=int),  # 2A00
      bd.Variable(name='ifd_start', size=4, datatype=int, byteorder=byteorder),
      bd.Variable(name='num_ifd0_tags', size=2, datatype=int, byteorder=byteorder),
    ])

handler.add_structure('ifd_header', ifd_header_structure)

# Read the diretory header, convert the ifd0 tag and get the number of tags and display
ifd_header = handler.read_structure('ifd_header')
ifd_header['ifd_tag'] = ifd_header['ifd_tag'].apply(hex)
num_tags = ifd_header.loc[0, 'num_ifd0_tags']
print(f"There are {num_tags} EXIF tags.")
ifd_header

There are 11 EXIF tags.


Unnamed: 0,ifd_tag,ifd_start,num_ifd0_tags
0,0x2a00,8,11


## Read the Directory Tags
We now know that the Exif data is stored with a big endian byte structure and that there are 11 tags. We still need to find out which tags are available and the data type. To do this we can perform an initial read of the Exif tags, which will give us the Tag ID and the data type. Each Exif tag consists of a:
* 2 byte tag;
* 2 byte format; and
* 4 byte component count.

The 2 byte format is one of the following:
|Format Code|Format Desc|Size|
|---|---|---|
|1|Unsigned Byte|1|
|2|ASCII String|1|
|3|Unsigned Short|2|
|4|Unsigned Long|4|
|5|Unsigned Rational|8|
|6|Signed Byte|1|
|7|Undefined|1|
|8|Signed Short|2|
|8|Signed Long|4|
|9|Signed Rational|8|
|10|Single|4|
|11|Double|8|

In [6]:
# Define the structure for the directory tags. This is a repeating structure of [num_tags] rows
tags_structure = bd.Structure(ifd_header_structure.start + len(ifd_header_structure), 
                              rows=num_tags, 
                              variables=[
                                  bd.Variable(name='id', size=2, datatype=int, byteorder=byteorder),
                                  bd.Variable(name='format', size=2, datatype=int, byteorder=byteorder),
                                  bd.Variable(name='count', size=4, datatype=int, byteorder=byteorder),
                                  bd.Variable(name='value', size=4, datatype=bytes, byteorder=byteorder),
    ])

handler.add_structure('tags', tags_structure)

# Read and display the structure
tags_data = handler.read_structure('tags')
tags_data['id'] = tags_data['id'].apply(hex)
tags_data

Unnamed: 0,id,format,count,value
0,0x10f,2,6,b'\x00\x00\x00\x92'
1,0x110,2,10,b'\x00\x00\x00\x98'
2,0x112,3,1,b'\x00\x01\x00\x00'
3,0x11a,5,1,b'\x00\x00\x00\xa2'
4,0x11b,5,1,b'\x00\x00\x00\xaa'
5,0x128,3,1,b'\x00\x02\x00\x00'
6,0x131,2,6,b'\x00\x00\x00\xb2'
7,0x132,2,20,b'\x00\x00\x00\xb8'
8,0x213,3,1,b'\x00\x01\x00\x00'
9,0x8769,4,1,b'\x00\x00\x00\xcc'


## Get Make an Model
Get the make and model of the camera used to take the photo.

Make is stored in tag 0x011f and model is stored in tag 0x0110. We can tell from the above output that both of these have a format of 2, which corresponds to a string. 

Both of these contain a count > 4, which means the value is the offset to where the data is stored, not the value itself. We will need these as an int.

Create and read a new structure for these 2 tags, defining the value as an int. Luckly they are the first 2 tags so we already know the start position. If we were editing other tags, we would need to calculate the start position by adding (row num * num bytes in row) to the start location.

In [7]:
# Define the structure for the directory tags. This is a repeating structure of [num_tags] rows
make_and_model_tags_location = bd.Structure(ifd_header_structure.start + len(ifd_header_structure), 
                              rows=2, 
                              variables=[
                                  bd.Variable(name='id', size=2, datatype=int, byteorder=byteorder),
                                  bd.Variable(name='format', size=2, datatype=int, byteorder=byteorder),
                                  bd.Variable(name='count', size=4, datatype=int, byteorder=byteorder),
                                  bd.Variable(name='value', size=4, datatype=int, byteorder=byteorder),
    ])

handler.add_structure('mm_tags', make_and_model_tags_location)

# Read and display the structure
tags_data = handler.read_structure('mm_tags')
tags_data['id'] = tags_data['id'].apply(hex)
tags_data

Unnamed: 0,id,format,count,value
0,0x10f,2,6,146
1,0x110,2,10,152


### Read the make and model
THe values are the location of the corresponding data within the tags section, which starts at byte 12.
* Create a variable for make with an offset 12 + 146, a type of str and a size of 6.
* Create a variable for model with an offset of 12 + 152, a type of str and a size of 10.
* Read them.

In [8]:
make_var = bd.Variable(name='make', offset=146+12, size=6, datatype=str)
model_var = bd.Variable(name='model', offset=152+12, size=10, datatype=str)
make = handler.read_variable(make_var)
model = handler.read_variable(model_var)
print(f"The photo was taken with a {make} {model}")

The photo was taken with a Apple  iPhone 6s 


## Change the Make and Model tags
We can edit the tags using write_variable, passing in the new data. This will edit the data in the datahandler, which can then be saved to a new file.

In [9]:
handler.write_variable('Giroux', make_var)
handler.write_variable('Daguerreo', model_var)

with open('PHOTO_EDITED.JPG', 'wb') as file:
  data = file.write(handler.data)

## Confirm that it has worked
We can confirm that our tags have been written by checking the properties of the new photo PHOTO_EDITED.jpg

Changes can also be made by editing a dataframe saving the changes using save_structure.