# Jpeg格式分析

## Jpeg的基本格式

JPEG格式的全名为: JPEG File Interchange Format (JFIF)

JPEG是由一个个的段(Segment)组成的, 每个段以标记(Marker)来标识.
一共有三种类型的段:
1. 没有指定长度的段
    | {Marker} (2 bytes) | {Data} |
    |--------------------|--------|
    | SOS (0xFFDA) | ... |
2. 长度固定的段
    | {Marker} (2 bytes) | {Data} |
    |--------------------|--------|
    | SOI (0xFFD8) | (0 bytes) |
    | DRI (0xFFDD) | ... (4 bytes) |
3. 长度不定的段
    | {Marker} (2 bytes) | {Size} (2 bytes, big endian) | {Data} ({Size} bytes) |
    |--------------------|------------------------------|-----------------------|
    | DHT (0xFFC4) | 418 (0x01A2) | ... (418 bytes) |


一个完整的 JPEG 格式由 SOI(0xFFD8) 开始, 以 EOI(0xFFD9) 结束, 中间存放其他段:

![image.png](attachment:image.png)


- JFIF file structure

|Segment|Code|Description|
|:-------|:----|:-----------|
|SOI      |FF D8|Start of Image|
|JFIF-APP0|FF E0 s1 s2 4A 46 49 46 00 ...|optional, The JFIF APP0 marker provides information which is missing from the JPEG stream: version number, X and Y pixel density (dots per inch or dots per cm), pixel aspect ratio(derived from X and Y pixel density), thumbnail.|
|JFXX-APP0|FF E0 s1 s2 4A 46 58 58 00 ...|optional, Immediately following the JFIF APP0 marker segment may be a JFIF extension APP0 marker segment. This segment may only be present for JFIF versions 1.02 and above. It allows to embed a thumbnail image in 3 different formats.|
|EXIF-APP1|FF E1 s1 s2 45 78 69 66 00 00 ...|optional, Store Exif info|
|...      |...  |additional marker segments(for example SOF, DHT, COM)|
|SOS      |FF DA|Start of Scan|
|         |     |compressed image data|
|EOI      |FF D9|End of Image|


**JPEG 格式的扩展都是通过 APPn(0xFFEn) 段来扩展的, 一些额外信息都保存再该段中, 如:**

- JFIF-APP0(0xFFE0)
- JFXX-APP0(0xFFE0)
- EXIF-APP1(0xFFE1)

### Jpeg必须有的几个段
|Segment|Code|Description|
|:-------|:----|:-----------|
|SOI |FF D8|  Start Of Image|
|DQT |FF DB|  Define Quantization Table(s)|
|SOF0|FF C0|  Start Of Frame (baseline DCT)|
|DHT |FF C4|  Define Huffman Table(s)|
|SOS |FF DA|  Start Of Scan|
|EOI |FF D9|  End Of Image|

## Jpeg格式图片段解析脚本



In [11]:
jpg_tag_id_name_map = {
    b'\xff\xd8' : 'SOI',
    b'\xff\xc0' : 'SOF0',
    b'\xff\xc2' : 'SOF2',
    b'\xff\xc4' : 'DHT',
    b'\xff\xdb' : 'DQT',
    b'\xff\xdd' : 'DRI',
    b'\xff\xfe' : 'COM',
    b'\xff\xda' : 'SOS',
    b'\xff\xd9' : 'EOI',
}

jpg_tag_name_id_map = {v:k for k,v in jpg_tag_id_name_map.items()}

def get_tag_name(tag_id : bytes):
    
    if tag_id in jpg_tag_id_name_map:
        return jpg_tag_id_name_map[tag_id]
    
    if len(tag_id) != 2 or tag_id[0] != 0xff:
        return ''
    
    if tag_id[1] in range(0xd0, 0xd7 + 1):
        return 'RST' + str(tag_id[1] & 0x0f)
    if tag_id[1] in range(0xe0, 0xef + 1):
        return 'APP' + str(tag_id[1] & 0x0f)

    return 'Unknow: ' + tag_id.hex().upper()

def get_tag_id(tag_name : str):
    tag_name = tag_name.upper()
    if tag_name in jpg_tag_name_id_map:
        return jpg_tag_name_id_map[tag_name]
    
    if len(tag_name) < 4 or int(tag_name[3:]) > 0xf:
        return b''
    
    if tag_name[:3] == 'RST':
        return b'\xff' + (0xd0 | int(tag_name[3:])).to_bytes(1, 'big')
    if tag_name[:3] == 'APP':
        return b'\xff' + (0xe0 | int(tag_name[3:])).to_bytes(1, 'big')


def print_marker_from_soi(data, soi_pos):
    
    tag_pos = soi_pos
    while(True):

        tag = data[tag_pos : tag_pos + 2]
        print(f'{hex(tag_pos): <10} {tag.hex().upper(): <7} {get_tag_name(tag)}')

        if tag == get_tag_id('EOI'):
            break

        if tag == get_tag_id('SOI'):
            tag_pos += 2
            continue

        if tag == get_tag_id('SOS'):
            tag_pos = data.find(get_tag_id('EOI'), tag_pos)
            continue

        payload_len = int.from_bytes(data[tag_pos + 2 : tag_pos + 4], byteorder='big')

        tag_pos = tag_pos + 2 + payload_len

img_path = 'data/test.JPG'

f = open(img_path, 'rb')
data = f.read()

jpeg_count = 0
soi_pos = -1

while(True):
    soi_pos = data.find(get_tag_id('SOI'), soi_pos + 1)
    
    if soi_pos == -1:
        break
    
    print(f'\n------------------------ {jpeg_count} ------------------------')
    print_marker_from_soi(data, soi_pos)

    jpeg_count += 1



------------------------ 0 ------------------------
0x0        FFD8    SOI
0x2        FFE1    APP1
0x67c0     FFE1    APP1
0x71c2     FFDB    DQT
0x7248     FFC0    SOF0
0x725b     FFC4    DHT
0x73ff     FFDA    SOS
0x5ecad1   FFD9    EOI

------------------------ 1 ------------------------
0x22e0     FFD8    SOI
0x22e2     FFDB    DQT
0x2368     FFC0    SOF0
0x237b     FFC4    DHT
0x251f     FFDA    SOS
0x67bc     FFD9    EOI


## 参考资料:
- https://www.media.mit.edu/pia/Research/deepview/exif.html
- https://en.wikipedia.org/wiki/JPEG_File_Interchange_Format
- https://www.w3.org/Graphics/JPEG/jfif3.pdf