Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AVC Encoder Architecture rework #33

Merged
merged 9 commits into from
Mar 2, 2017

Conversation

QuPengfei
Copy link

Encoder Architecture Changes (Primarily AVC)

Encoder architecture restructuring for H.264 (with some impact to HEVC now) on HSW+

  • Improvements to the shaders
  • Improvements to the B frame efficiency
  • Improvements to the low bit rate mode
  • Improved features in two stage VME/PAK pipeline

v1:
Reduce the patch number and re org for VME and MFX related patches.
Patch re org for VME pipeline
Patch re org for MFX pipeline
keep assert for internal logic and replace assert for input validation function.
Remove unnecessary comments and enum value.
Use the 64bit version OUT_BCS_RELOC64.
Move kernel binary into header file.
use misc parameter from encoder_context structure.

v2:
add APL support
For each patch, new file is added in the Makefile.am
Fix zero divide issue under cqp
Correct the license header for the new file.
Fix the Batch buffer logic when conditional end enabled.
Remove the execution bit for the files.
misc changes.

v3:
rebase to lastest master branch and use i965->intel->media_mocs to configure the memory MOCS.
remove unused enum value

v4:
use i965->intel->media_mocs to configure the memory MOCS in PAK pipeline

Reviewed-by: Sean V Kelleyseanvk@posteo.de
Reviewed-by: yakuizhao yakui.zhao@intel.com

};
/*
the definition for encoder status
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this is mentioned in previous review cycle, the following two structures vary on the different platforms/codec.
It doesn't make sense that it is added as the generic structure.
Otherwise we will have to be changed frequently.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have a plan to refine these two structures for all VP8/VP9/AVC/HEVC. it will common for them

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the number of used registers vary on the different codecs. When some status needed to be added for one specific codec, it will cause that the layout of status_buffer are also affected for other codecs.(For example: the status buffer layout includes some holes).

So it looks better that every codec defines its own layout of status buffer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with yakui even if having a plan to refine the structures later. If these structure can be used for all codecs, we can move these structures to the common file later.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, i can move these two structure into gen9_avc_encoder.h And refine it later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this goes along with the concern about the constants in the common file too. If we aren't doing it now, let's keep it specific until we are ready to make the change in a later patch series.


/*
kernel operation related defines
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following two definitions still include a lot of definitions that is not used.
For example: MBENC_WIDI, VDENC_ME, RESET_VLINE_STRIDE and so on.,
Please double check it and remove the unused/unmeaningful definition.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WIDI is in the next plan.
Other definitions will be removed.


/*
kernel operation related defines
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The following two definitions still include a lot of definitions that is not used.
For example: MBENC_WIDI, VDENC_ME, RESET_VLINE_STRIDE and so on.,
Please double check it and remove the unused definition.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WIDI is in the next plan.
Other definitions will be removed.

INTEL_AVC_EXTENDED_PROFILE = 88,
INTEL_AVC_HIGH_PROFILE = 100,
INTEL_AVC_HIGH10_PROFILE = 110,
INTEL_AVC_HIGH422_PROFILE = 122,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the misleading definition(for example: HIGH422, HIGH444 and so on).
Otherwise maybe someone think that it will be supported in the driver.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HIGH422, HIGH444 will be removed.

pstate->dw5.frame_size_over_flag = 1;
pstate->dw5.frame_size_under_flag = 1;
pstate->dw5.intra_mb_ipcm_flag = 1;
pstate->dw5.mb_rate_ctrl_flag = 0; /* Always 0 in VDEnc mode */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still keep the VDENC comment?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it will be removed


pdata = i965_map_gpe_resource(gpe_resource);

gen9_avc_init
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it is pointed in previous review cycle, the incorrect ref_count is used for avc_priv_surface->dmv_top/dmv_bottom.
Please double check it again.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it will be fixed


pdata = i965_map_gpe_resource(gpe_resource);

gen9_avc_init
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same issue of reference count.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it will be fixed

{
generic_state->frames_per_100s = encoder_context->brc.framerate[0].num/encoder_context->brc.framerate[0].den * 100;
generic_state->frame_rate = encoder_context->brc.framerate[0].num/encoder_context->brc.framerate[0].den ;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it is pointed in previous review cycle, the above code has the precision loss.
Had better be changed to:
num * 100 / den.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it will be fixed

@@ -4493,14 +4500,14 @@ gen9_avc_set_curbe_sfd(VADriverContextP ctx,
memset(cmd,0,sizeof(gen9_avc_sfd_curbe_data));

cmd->dw0.enable_intra_cost_scaling_for_static_frame = 1 ;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The VDENC comment had better be removed in previous patch.
Otherwise it is changed back and forth.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it will be removed

ret = VA_STATUS_ERROR_INVALID_PARAMETER;
goto out;
switch (profile) {
case VAProfileH264ConstrainedBaseline:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is common place for quality level and not suitable to put h.264 special codes here.

ret = VA_STATUS_ERROR_INVALID_PARAMETER;
goto out;
switch (profile) {
case VAProfileH264ConstrainedBaseline:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is the common place for quality level and not suitable to put h.264 special codes here

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move quality level check in the brc_prepare function

i965_object_surface_to_2d_gpe_resource_with_align(struct i965_gpe_resource *res,
struct object_surface *obj_surface)
{
unsigned int swizzle;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For those special surface, it will be better directly change its orig_width and orig_height to 16 aligned value. So this aligned function is not required and i965_object_surface_to_2d_gpe_resource() is enough for your requirement

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i965_object_surface_to_2d_gpe_resource() also will be used by other function, such as VPP. it will cause the issue to change the value in the function i965_object_surface_to_2d_gpe_resource(). that is the reason why add the version with aligned

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keep i965_object_surface_to_2d_gpe_resource() and don't modify it. But update obj_surface->orig_width and obj_surface->orig_width to new aligned value

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Directly change the surface value(orig_width ,orig_height) looks not good. Here GPE resource is the birdge to attend the final operation. it is better to do alignment with gpe resource. The issue maybe be there when the post process on this surface.
BTW, the best align operation should be done when obj surface is allocated . But this is implemented in the middle ware. For driver, it only work around.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Directly change the surface value(orig_width ,orig_height) looks not good. Here GPE resource is the birdge to attend the final operation. it is better to do alignment with gpe resource. The issue maybe be there when the post process on this surface.
BTW, the best align operation should be done when obj surface is allocated . But this is implemented in the middle ware. For driver, it only work around.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is resonable and ok for me

Copy link
Contributor

@xhaihao xhaihao Feb 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i965_object_surface_to_2d_gpe_resource() and i965_object_surface_to_2d_gpe_resource_with_align are almost the same. I suggest take the alignment as an input parameter to i965_object_surface_to_2d_gpe_resource_with_align(), define i965_object_surface_to_2d_gpe_resource as below
void i965_object_surface_to_2d_gpe_resource(struct i965_gpe_context *gpe_context,
struct i965_gpe_surface *gpe_surface)
{
i965_object_surface_to_2d_gpe_resource_with_align(gpe_context, gpe_surface, 1);
}

We will only modify one function if there is any error in i965_object_surface_to_2d_gpe_resource(),i965_object_surface_to_2d_gpe_resource_with_align(). In addition, we can set other alignments if needed.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it will be update accordingly

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, other alignments can be set as needed.

attrib_list[i].value = ENCODER_QUALITY_RANGE;
profile == VAProfileH264High ){
attrib_list[i].value = ENCODER_QUALITY_RANGE;
if(IS_GEN9(i965->intel.device_info))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

KBL is also recognized as GEN9 but the comment only says SKL and APL.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is correct. it will be fixed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming you are including KBL compatible kernels in the patch?

@@ -1394,7 +1398,12 @@ intel_enc_hw_context_init(VADriverContextP ctx,
encoder_context->codec = CODEC_H264;

if (obj_config->entrypoint == VAEntrypointEncSliceLP)
encoder_context->quality_range = ENCODER_LP_QUALITY_RANGE;
encoder_context->quality_range = ENCODER_QUALITY_RANGE_AVC;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this patch do some changes to VDEnc? If not, it will break VDEnc with quality level setting. The current VDEnc only supports 0 and 1, like as the legacy avc encoder.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No changes for vdenc. i will revert this change for VDENC.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good catch.

switch (obj_config->profile){
case VAProfileH264ConstrainedBaseline:
case VAProfileH264Main:
case VAProfileH264High:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Miss VAProfileH264MultiviewHigh and VAProfileH264StereoHigh, and you can use encoder_context->codec as conditional.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move it into the function gen9_mfc_context_init and gen9_vme_context_init

return intel_enc_hw_context_init(ctx, obj_config, gen9_vme_context_init, gen9_mfc_context_init);
default:
return intel_enc_hw_context_init(ctx, obj_config, gen9_vme_context_init, gen9_mfc_context_init);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to avoid too many platform specific code (IS_xxx() etc) in a generic file, you could move these code to a gen9 related file.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move it into the function gen9_mfc_context_init and gen9_vme_context_init

@@ -69,7 +69,9 @@
#define DEFAULT_SATURATION 50

#define ENCODER_QUALITY_RANGE 2
#define ENCODER_QUALITY_RANGE_AVC 8
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the new encoder really support quality level from 1 to 8? I think HIGH(1)/Medium(2, default)/Low(3) is enough.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AVC kernel support quality level 1-7 by now. The test result show 1-3 is closer and 4-6 is closer.

MediaSDK support 1-7.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, these kernels do support 1-7 and we need to support them for customer use cases.

OUT_BCS_BATCH(batch, MFX_PIPE_BUF_ADDR_STATE | (65 - 2));

/* the DW1-3 is for pre_deblocking */
OUT_BUFFER_3DW(batch, avc_ctx->res_pre_deblocking_output.bo, 1, 0, 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pre_deblocking and other graphics addresses have MOCS setting too.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, MOCS setting is used in default for OUT_BUFFER_3DW.


/* the DW34-36 is the MV for the current reference */
OUT_BCS_RELOC64(batch, avc_ctx->res_direct_mv_buffersr[NUM_MFC_AVC_DMV_BUFFERS - 2].bo,
I915_GEM_DOMAIN_INSTRUCTION, 0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please set write domain for the current frame.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. We should make sure we have all the right domains set. It's not an uncommon problem.


OUT_BCS_BATCH(batch, i965->intel.mocs_state);

/* the DW34-36 is the MV for the current reference */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

current reference? I think it is the current frame/surface.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is correct. it will be fixed.

VAEncSequenceParameterBufferH264 *seq_param = avc_state->seq_param;
VAEncPictureParameterBufferH264 *pic_param = avc_state->pic_param;

/* TODO: add support for non flat matrix */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems you have added support for non flat matrix, so please remove this comment.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes.the comments will be removed.

}

slice_hor_pos = slice_param->macroblock_address % generic_state->frame_width_in_mbs;
slice_ver_pos = slice_param->macroblock_address / generic_state->frame_height_in_mbs;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

slice_ver_pos should be equal to slice_param->macroblock_address / generic_state->frame_width_in_mbs, otherwise it will be wrong for multiple slices.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it will be fixed


if (next_slice_param) {
next_slice_hor_pos = next_slice_param->macroblock_address % generic_state->frame_width_in_mbs;
next_slice_ver_pos = next_slice_param->macroblock_address / generic_state->frame_height_in_mbs;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

next_slice_ver_pos should be equal to next_slice_param->macroblock_address / generic_state->frame_width_in_mbs, otherwise it will be wrong for multiple slices

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it will be fixed

/* Setup current reconstruct frame */
obj_surface = encode_state->reconstructed_object;
va_status = i965_check_alloc_surface_bo(ctx, obj_surface, 1, VA_FOURCC_NV12, SUBSAMPLE_YUV420);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

encode_state->reconstructed_object has been checked in intel_encoder_check_avc_parameter().

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here i965_check_alloc_surface_bo will be removed.

}

if (avc_ctx->pres_slice_batch_buffer_2nd_level)
intel_batchbuffer_free(avc_
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Miss VAProfileH264MultiviewHigh and VAProfileH264StereoHigh,

}

if (avc_ctx->pres_slice_batch_buffer_2nd_level)
intel_batchbuffer_free(avc_
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If not, please remove the code below, otherwise please update the comment

mi_store_reg_mem_param.bo = status_buffer->bo;
mi_store_reg_mem_param.offset = status_buffer->image_status_mask_offset;
mi_store_reg_mem_param.mmio_offset = status_buffer->image_status_mask_reg_offset;
gen8_gpe_mi_store_register_mem(ctx, batch, &mi_store_reg_mem_param);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only the first info in status_buffer is used later, I suggest to read status_buffer->bs_byte_count_frame_offset only and delete unused fields.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these info also help debug. i suggest keep them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @QuPengfei, although they are not used, they are informative and saves time from having to jump to the PRM for the DW definitions. No harm is done and makes the code easier to maintain.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually reading/writing too many unused registers and memories in the pipeline will impact the performance a little bit.

0xF1EFFF0C, 0xF01104F1, 0x10FF0A50, 0x000FF1C0, 0x00000000, 0x00000000, 0x00000000, 0x00000000
}
}
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file only adds a constant table and actually i965_encoder.c has been used as a generic level for encoding. I suggest rename this file name, otherwise it is easy to mislead the reader.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i965_encoder.c is mainly for generic functions implementation and i965_encoder_common.c is for common table defination. We can put all the encoder generic tables here. And it make more clear and easy to read.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to use a specific file name for constant data, like as the file used for AVC constant (gen9_avc_const_def.c)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, let's favor more specific titles for the file. Saying common implies we are going to move more tables here, which we're not currently doing.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, rename the file as i965_encoder_const_def.c

};
/*
the definition for encoder status
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with yakui even if having a plan to refine the structures later. If these structure can be used for all codecs, we can move these structures to the common file later.

int i;

/* brc */
generic_state->max_bit_rate = ALIGN(encoder_context->brc.bits_per_second[0], 1000) / 1000;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1000 is not power of 2, so the ALIGN macro can not be used here

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it will be fixed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xhaihao You know this issue has caught people more than once. In future we may need to explicitly define ALIGN to be something ALIGNPWR2 or something.

{
generic_state->frames_per_100s = encoder_context->brc.framerate[0].num * 100/encoder_context->brc.framerate[0].den ;
generic_state->frame_rate = encoder_context->brc.framerate[0].num/encoder_context->brc.framerate[0].den ;
generic_state->window_size = (int)(encoder_context->brc.window_size * generic_state->frame_rate /1000);// brc.windows size in ms as the unit
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be better to rename generic_state->window_size to generic->frames_per_window_size in case the read is confused.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the variable name will be change to frames_per_window_size

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree


if (obj_surface->private_data &&
obj_surface->free_private_data != gen9_free_surfaces_avc) {
obj_surface->free_private_data(&obj_surface->private_data);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Releasing the private data which is used by another context might cause critical issue , hence I don't think releasing private data here is a good idea. We need a rework for private data usage.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to mail discussion, it is a common issue for all encoder.it should file a bug and fix it in seperate patch.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we should fix private data usage in another patch. What I mean is not to release private data here because releasing private data will result in other issues. BTW
fhvwy has created #42 to track this issue

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per discussion on VP8, we should at a minimum make a change in this patch series such that private_data is released if the private_data is not expected. We can later remove it once we have a common solution for private data usage. @yakuizhao

mi_store_data_imm.dw0 = media_function;
gen8_gpe_mi_store_data_imm(ctx, batch, &mi_store_data_imm);

intel_batchbuffer_emit_mi_flush(batch);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to call intel_batchbuffer_emit_mi_flush() before gen8_gpe_mi_store_data_imm(), like as gen9_avc_run_kernel_media_object_walker().

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, it will be fixed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please do make sure elsewhere in the code you are flushing before you do the store, not just at this location.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree. more locations should be checked.

struct i965_gpe_resource *gpe_resource)
{
struct encoder_vme_mfc_context * pak_context = (struct encoder_vme_mfc_context *)encoder_context->vme_context;
struct generic_enc_codec_state * generic_state = (struct generic_enc_codec_state * )pak_context-
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

free() can accept NULL pointer

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure. the free() will be call directly.

struct i965_gpe_resource *gpe_resource)
{
struct encoder_vme_mfc_context * pak_context = (struct encoder_vme_mfc_context *)encoder_context->vme_context;
struct generic_enc_codec_state * generic_state = (struct generic_enc_codec_state * )pak_context-
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

free() can accept NULL pointer

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure. the free() will be call directly.

seanvk
seanvk previously requested changes Feb 14, 2017
Copy link
Contributor

@seanvk seanvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good progress, still more changes needed see comments from myself and others. BTW, please create an issue in Github for AVC quality improvement for purposes of adding "Fixes #n" to the commit headers. Or break up into multiple issues.

0xF1EFFF0C, 0xF01104F1, 0x10FF0A50, 0x000FF1C0, 0x00000000, 0x00000000, 0x00000000, 0x00000000
}
}
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, let's favor more specific titles for the file. Saying common implies we are going to move more tables here, which we're not currently doing.

};
/*
the definition for encoder status
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this goes along with the concern about the constants in the common file too. If we aren't doing it now, let's keep it specific until we are ready to make the change in a later patch series.

src/Makefile.am Outdated
@@ -161,6 +161,7 @@ source_h = \
i965_encoder_common.h \
i965_encoder_api.h \
gen9_avc_const_def.h \
gen9_avc_encoder_kernels.h \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The kernels should be using the APACHE 2 license and NOT the MIT license. Please see VP8 patch series comments and subsequent changes by @xhaihao . Same rule applies for AVC kernels.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

license header will be changed to APACHE2 accordingly for kernel file.

src/Makefile.am Outdated
@@ -104,6 +104,7 @@ source_c = \
intel_common_vpp_internal.c \
i965_encoder_common.c \
gen9_avc_const_def.c \
i965_avc_encoder_common.c \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change to AVC in name

src/Makefile.am Outdated
@@ -162,6 +163,7 @@ source_h = \
i965_encoder_api.h \
gen9_avc_const_def.h \
gen9_avc_encoder_kernels.h \
i965_avc_encoder_common.h \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change to AVC in name

max_qp_p = 0; /* TODO: update it */
grow = 0; /* TODO: update it */
shrink = 0; /* TODO: update it */

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer not to see "TODO" in the code. Rather, you should create a Github issue and assign it to yourself detailing the additional changes needed. @QuPengfei

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the comments will removed. by now no addational changes needed.

mi_store_reg_mem_param.bo = status_buffer->bo;
mi_store_reg_mem_param.offset = status_buffer->image_status_mask_offset;
mi_store_reg_mem_param.mmio_offset = status_buffer->image_status_mask_reg_offset;
gen8_gpe_mi_store_register_mem(ctx, batch, &mi_store_reg_mem_param);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @QuPengfei, although they are not used, they are informative and saves time from having to jump to the PRM for the DW definitions. No harm is done and makes the code easier to maintain.

attrib_list[i].value = ENCODER_QUALITY_RANGE;
profile == VAProfileH264High ){
attrib_list[i].value = ENCODER_QUALITY_RANGE;
if(IS_GEN9(i965->intel.device_info))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming you are including KBL compatible kernels in the patch?

@@ -69,7 +69,9 @@
#define DEFAULT_SATURATION 50

#define ENCODER_QUALITY_RANGE 2
#define ENCODER_QUALITY_RANGE_AVC 8
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, these kernels do support 1-7 and we need to support them for customer use cases.

@@ -1394,7 +1398,12 @@ intel_enc_hw_context_init(VADriverContextP ctx,
encoder_context->codec = CODEC_H264;

if (obj_config->entrypoint == VAEntrypointEncSliceLP)
encoder_context->quality_range = ENCODER_LP_QUALITY_RANGE;
encoder_context->quality_range = ENCODER_QUALITY_RANGE_AVC;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good catch.

@QuPengfei
Copy link
Author

update the patch. Please help to review.


if (next_slice_param) {
next_slice_hor_pos = next_slice_param->macroblock_address % generic_state->frame_width_in_mbs;
next_slice_ver_pos = next_slice_param->macroblock_address / generic_state->frame_height_in_mbs;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrong calculation for ver_pos.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is already fixed in this series of patches.

};
/*
the definition for encoder status
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the number of used registers vary on the different codecs. When some status needed to be added for one specific codec, it will cause that the layout of status_buffer are also affected for other codecs.(For example: the status buffer layout includes some holes).

So it looks better that every codec defines its own layout of status buffer.

/*
the definition for encoder codec state
*/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the subsequent patches, it seems that the below structure is obtained from intel_encoder_context step by step.
But in fact the current intel_encoder_context includes a lot of duplicated definitions
>rate_control_mde
>frame_width_in_pixel
>frame_height_in_pixel
And so on. It doesn't make sense if there exist a lot of dup.
Why not uses the definitions in intel_encoder_context instead?

At the same time the structure of generic_enc_codec_state is declared as generic. But in fact some definitions are only specific to some codec. For example: frame_width_in_mbs is only specific to H264. When it is for HEVC/VP9, it should be frame_width_in_ctb/cb/cu/lcu. It doesn't make sense that the specific definition is misleading for other codecs.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for fame_width_in_pixel and frame_height_in_pixel and etc, these variabls are used almost in each function. it is easy to acces and read to put them together in one structure. I think it is OK.

for frame_width_in_mbs, it is common for AVC/VP8, and even if HEVC/VP9, it is also used on some caculation when MB alignment.
frame_width_in_ctb/cb/cu/lcu is specified for HEVC only,VP should use SB related. that is the reason there are no definition in this structure generic_enc_codec_state here.

extern const unsigned int gen9_avc_inter_rounding_b[PRESET_NUM];
// This applies only for progressive pictures. For interlaced, CAF is currently not disabled.
extern const unsigned int gen9_avc_disable_all_fractional_check_for_high_res[PRESET_NUM];
extern unsigned char gen9_avc_adaptive_inter_rounding_p[AVC_QP_MAX];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add the const to configure the const-table as read-only.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add the "const" to configure the const-table as read-only.
It seems that this is not updated in this patch.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add the "const" to configure the const-table as read-only.
It seems that this is not updated in this patch.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, it will be fixed in the next version

0x10001000, 0x10001000, 0x10001000, 0x10001000
};

static unsigned int slice_type_kernel[3] = {1,2,0};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add the const prefix for slice_type_kernel.

please add the static prefix for the *curbe_init_data and so on.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it will be fixed in the next version

OUT_BUFFER_2DW(batch, avc_ctx->list_reference_res[i].bo, 1, 0);
}

/* DW 51, reference picture attributes */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use mocs_state for reference_picture attribute

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, this will be fixed in the next version

allocate_flag = i965_allocate_gpe_resource(i965->intel.bufmgr,
&avc_ctx->res_intra_row_store_scratch_buffer,
size,

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please free the slice_batch_buffer_2nd_level before sumbitting the batch buffer.
Otherwise the buffer domain of slice_batch_buffer is incorrect.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it will be fixed.

allocate_flag = i965_allocate_gpe_resource(i965->intel.bufmgr,
&avc_ctx->res_intra_row_store_scratch_buffer,
size,

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please free the pres_slice_batch_buffer_2nd_level before submitting the batch buffer.
Otherwise the buffer domain of pres_slice_batch_buffer_2nd_level is incorrect.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is confused. how about the first level batch buffer? Does it need the same free operation before the batch buffer submitting? or Only second level batch buffer need this operation?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it is clear for me. it will be fixed.

res->width = obj_surface->orig_width;
res->height = obj_surface->orig_height;
res->width = ALIGN(obj_surface->orig_width,alignment);
res->height = ALIGN(obj_surface->orig_height,alignment);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it a mistake?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, this function is re defined with input parameter alignment

struct gen9_mfx_avc_img_state
{
union {
struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it the common structure or special for gen9?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is special for gen9. i will move it to gen9_avc_encoder.h

@seanvk
Copy link
Contributor

seanvk commented Feb 18, 2017

@QuPengfei When will we see an updated PR from you? I'll like to see this soon. I know you have a lot of feedback.

@seanvk
Copy link
Contributor

seanvk commented Feb 18, 2017

@QuPengfei your commit headers are missing:

Fixes #n

I'd like to see you add those so the patches are actually linked to the issues for QA.

Thanks,

*
* The above copyright notice and this permission notice (including the
* next paragraph) shall be included in all copies or substantial portions
* of the Software.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we also change it to apache license?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no,apache2 only for media kernel binary

common structure and define
*/
struct gen9_avc_encoder_context {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This context definition is special for GEN9. why not move it to gen9_avc_encoder.h?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am thinking about supporting AVC on more platform, such as KBL/BDW etc and move this structure into the common. Yes, the structure name is gen9, i will change it to i965_avc_encoder_context to be common for other platforms.


#define MAX_AVC_SLICE_NUM 256
struct avc_enc_state {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This context definition is special for GEN9. why not move it to gen9_avc_encoder.h?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is common for AVC on GenX. I am thinking about supporting AVC on more platform, such as KBL/BDW etc and move this structure into the common.


if ((slice_type == SLICE_TYPE_P) ||
(slice_type == SLICE_TYPE_B)) {
for (i = 0; i < MAX(avc_state->num_refs[0],4); i++) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that the min(avc_state->num_refs, 4) should be used instead of MAX.
Please fix it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, i will fix it.

downscaled_width_32x = generic_state->frame_width_32x;
downscaled_height_32x = generic_state->frame_height_32x;
i965_CreateSurfaces(ctx,
downscaled_width_32x,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't need this 32x surface if 32x hme is disable for some small pictures.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree. allocate the 32x surface according to the HME flag


/* scaling related surface */
if(avc_state->mb_status_supported)
{
Copy link
Contributor

@pengche1 pengche1 Feb 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor suggestion. keep the identical coding style as others. like this
if () {

}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree. keep the same style

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree. keep the same style as the following
if()
{

}

memset(&kernel_walker_param, 0, sizeof(kernel_walker_param));
if(surface_param.use_32x_scaling)
{
kernel_walker_param.resolution_x = downscaled_width_in_mb ;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete the blank space before ;

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree.



/* maybe it is not needed by now. it is used in crypt mode*/
i965_free_gpe_resource(&avc_ctx->res_brc_mbenc_curbe_write_buffer);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to delete this comment or remove this buffer temporarily

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree. remove it.

@QuPengfei
Copy link
Author

update the patch. Please review.

Note:

  1. allocate the 32x surface according to the HME flag
  2. remove the surface res_brc_mbenc_curbe_write_buffer
  3. fix ref num checking
  4. change the structure name gen9_avc_encoder_context to i965_avc_encoder_context
  5. misc changes

@QuPengfei QuPengfei force-pushed the avc-enc-improvement-01 branch 2 times, most recently from e8d5908 to b645d21 Compare February 28, 2017 06:11
@QuPengfei
Copy link
Author

rebase to master.

note:

  1. fix the surface read/write domain for read only operation

Copy link
Contributor

@yakuizhao yakuizhao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@xhaihao
Copy link
Contributor

xhaihao commented Mar 2, 2017

I see a strange flag of <U+FEFF> in some new files in your commits, could you double check each commit and try to remove this flag?

For example,
git log -p ef9cc76

diff --git a/src/gen9_avc_encoder.h b/src/gen9_avc_encoder.h
new file mode 100644
index 0000000..a00b058
--- /dev/null
+++ b/src/gen9_avc_encoder.h
@@ -0,0 +1,2345 @@
+<U+FEFF>/*

    • Copyright @ 2016 Intel Corporation

@xhaihao
Copy link
Contributor

xhaihao commented Mar 2, 2017

Please change the year of 2016 to 2017 in the copyright statement in a new file.

Pengfei Qu added 3 commits March 2, 2017 10:35
v1:
add align version for obj surface conversion to gpe surface
remove comments and enum value

v2:
use intel->media_mocs to configure the memory property

v3:
add maro definition for AVC status/ctl register

Fixes intel#43

Signed-off-by: Pengfei Qu <Pengfei.Qu@intel.com>
Reviewed-by: Sean V Kelley<seanvk@posteo.de>
v1:
add context init function for AVC encoder

v2:
add file in the Makefile.am

Fixes intel#43

Signed-off-by: Pengfei Qu <Pengfei.Qu@intel.com>
Reviewed-by: Sean V Kelley<seanvk@posteo.de>
v1:
add fies in the Makefile.am

Fixes intel#43

Signed-off-by: Pengfei Qu <Pengfei.Qu@intel.com>
Reviewed-by: Sean V Kelley<seanvk@posteo.de>
@QuPengfei
Copy link
Author

update the patch.

Note:

  1. change the license header in the new file "2016" to "2017.
  2. remove the strange symbal in new file

Pengfei Qu added 6 commits March 2, 2017 13:38
v1:move kernal array into the c file
v2:add file into Makefile.am
v3:use the Apache 2 license header for media kernel binary

Fixes intel#43

Reviewed-by: Sean V Kelley<seanvk@posteo.de>
Signed-off-by: Pengfei Qu <Pengfei.Qu@intel.com>
Fixes intel#43

Signed-off-by: Pengfei Qu <Pengfei.Qu@intel.com>
Reviewed-by: Sean V Kelley<seanvk@posteo.de>
v1:add kernel pointer for different platform

Fixes intel#43

Signed-off-by: Pengfei Qu <Pengfei.Qu@intel.com>
Reviewed-by: Sean V Kelley<seanvk@posteo.de>
VME pipeline:
add resource and surface allocation and free function
add init table for frame mbbrc update
add scaling kernel for AVC encoder
add BRC init reset kernel for AVC RC logic
add BRC frame update-kernel for AVC RC logic
add BRC MB level update kernel for AVC RC logic
add REF frame QA caculation and MB level const data
add MBENC kernel for AVC encoder
add ME kernel for AVC encoder
add WP/SFD kernel for AVC encoder
add kernel init/destroy function for AVC encoder
add kernel related parameter check function for AVC
add VME pipeline init prepare/run function for AVC encoder

v1:
allocate the 32x surface accordlingly.

Fixes intel#43

Reviewed-by: Sean V Kelley<seanvk@posteo.de>
Signed-off-by: Pengfei Qu <Pengfei.Qu@intel.com>
MFX pipeline:
add MFX command for AVC encoder
add MFX Picture slice level command init for AVC
add MFX pipeline init prepare run for AVC encode
add VME/MFX context init for AVC encoder

v1:
set memory MOCS property

v2:
add Macro definition of the AVC quality level range

v3:
add check for the reference frame num.

Fixes intel#43

Reviewed-by: Sean V Kelley<seanvk@posteo.de>
Signed-off-by: Pengfei Qu <Pengfei.Qu@intel.com>
… on SKL/APL

v1:
add AVC encoder support on APL

Fixes intel#43

Signed-off-by: Pengfei Qu <Pengfei.Qu@intel.com>
Reviewed-by: Sean V Kelley<seanvk@posteo.de>
@xhaihao xhaihao dismissed seanvk’s stale review March 2, 2017 05:43

Pengfei's provide the new patch series,

Copy link
Contributor

@seanvk seanvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good discussion and progress, lgtm.

@xhaihao xhaihao dismissed pengche1’s stale review March 2, 2017 06:37

Pengfei provided a new patch series

@xhaihao xhaihao merged commit 481ec15 into intel:master Mar 2, 2017
@seanvk seanvk removed the in progress label Mar 2, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants