Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vulkan revision 2 #4933

Merged
merged 22 commits into from Dec 24, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
26f4d2c
vo_gpu: vulkan: refactor vk_cmdpool
haasn Sep 27, 2017
aa2a992
vo_gpu: vulkan: reorganize vk_cmd slightly
haasn Sep 28, 2017
24bde8f
vo_gpu: vulkan: refactor command submission
haasn Sep 28, 2017
3782c30
vo_gpu: vulkan: add a vk_signal abstraction
haasn Sep 28, 2017
85ee3d9
vo_gpu: vulkan: properly track image dependencies
haasn Sep 29, 2017
52bdb3f
vo_gpu: allow invalidating FBO in renderpass_run
haasn Aug 18, 2017
3bb783c
vo_gpu: invalidate fbotex before drawing
haasn Aug 18, 2017
7dbb3e8
vo_gpu: vulkan: support split command pools
haasn Sep 24, 2017
f44c21e
vo_gpu: aggressively prefer async compute
haasn Sep 24, 2017
bc20f0f
vo_gpu: vulkan: make the swapchain more robust
haasn Sep 29, 2017
4df7c51
vo_gpu: vulkan: use correct access flag for present
haasn Sep 29, 2017
967ad8e
vo_gpu: vulkan: properly depend on the swapchain acquire semaphore
haasn Sep 29, 2017
9789bac
vo_gpu: attempt re-using the FBO format for p->output_tex
haasn Sep 29, 2017
baddd7b
vo_gpu: vulkan: prefer vkCmdCopyImage over vkCmdBlitImage
haasn Sep 29, 2017
a4ca400
vo_gpu: vulkan: refine queue family selection algorithm
haasn Sep 30, 2017
42cfd3a
vo_gpu: vulkan: allow disabling async tf/comp
haasn Oct 7, 2017
37ceb38
vo_gpu: vulkan: fix the rgb565a1 names -> rgb5a1
haasn Oct 10, 2017
235619a
vo_gpu: vulkan: fix dummyPass creation
haasn Oct 12, 2017
dd01d9f
vo_gpu: vulkan: fix sharing mode on malloc'd buffers
haasn Oct 16, 2017
ac1ccf7
vo_gpu: vulkan: omit needless #define
haasn Oct 17, 2017
2ed563f
vo_gpu: vulkan: fix some image barrier oddities
haasn Oct 25, 2017
a7ee7d9
vo_gpu: vulkan: fix segfault due to index mismatch
haasn Oct 28, 2017
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
24 changes: 19 additions & 5 deletions DOCS/man/options.rst
Expand Up @@ -4267,11 +4267,25 @@ The following video options are currently all specific to ``--vo=gpu`` and
Controls the number of VkQueues used for rendering (limited by how many
your device supports). In theory, using more queues could enable some
parallelism between frames (when using a ``--swapchain-depth`` higher than
1). (Default: 1)

NOTE: Setting this to a value higher than 1 may cause graphical corruption,
as mpv's vulkan implementation currently does not try and protect textures
against concurrent access.
1), but it can also slow things down on hardware where there's no true
parallelism between queues. (Default: 1)

``--vulkan-async-transfer``
Enables the use of async transfer queues on supported vulkan devices. Using
them allows transfer operations like texture uploads and blits to happen
concurrently with the actual rendering, thus improving overall throughput
and power consumption. Enabled by default, and should be relatively safe.

``--vulkan-async-compute``
Enables the use of async compute queues on supported vulkan devices. Using
this, in theory, allows out-of-order scheduling of compute shaders with
graphics shaders, thus enabling the hardware to do more effective work while
waiting for pipeline bubbles and memory operations. Not beneficial on all
GPUs. It's worth noting that if async compute is enabled, and the device
supports more compute queues than graphics queues (bound by the restrictions
set by ``--vulkan-queue-count``), mpv will internally try and prefer the
use of compute shaders over fragment shaders wherever possible. Not enabled
by default, since it seems to cause issues with some drivers.

``--d3d11-warp=<yes|no|auto>``
Use WARP (Windows Advanced Rasterization Platform) with the D3D11 GPU
Expand Down
7 changes: 7 additions & 0 deletions ta/ta_talloc.h
Expand Up @@ -124,6 +124,13 @@ char *ta_talloc_asprintf_append_buffer(char *s, const char *fmt, ...) TA_PRF(2,
(idxvar)--; \
} while (0)

// Returns whether or not there was any element to pop.
#define MP_TARRAY_POP(p, idxvar, out) \
((idxvar) > 0 \
? (*(out) = (p)[--(idxvar)], true) \
: false \
)

#define talloc_struct(ctx, type, ...) \
talloc_memdup(ctx, &(type) TA_EXPAND_ARGS(__VA_ARGS__), sizeof(type))

Expand Down
2 changes: 1 addition & 1 deletion video/out/gpu/osd.c
Expand Up @@ -314,7 +314,7 @@ void mpgl_osd_draw_finish(struct mpgl_osd *ctx, int index,
const int *factors = &blend_factors[part->format][0];
gl_sc_blend(sc, factors[0], factors[1], factors[2], factors[3]);

gl_sc_dispatch_draw(sc, fbo.tex, vertex_vao, MP_ARRAY_SIZE(vertex_vao),
gl_sc_dispatch_draw(sc, fbo.tex, false, vertex_vao, MP_ARRAY_SIZE(vertex_vao),
sizeof(struct vertex), part->vertices, part->num_vertices);
}

Expand Down
6 changes: 6 additions & 0 deletions video/out/gpu/ra.h
Expand Up @@ -53,6 +53,7 @@ enum {
RA_CAP_GLOBAL_UNIFORM = 1 << 8, // supports using "naked" uniforms (not UBO)
RA_CAP_GATHER = 1 << 9, // supports textureGather in GLSL
RA_CAP_FRAGCOORD = 1 << 10, // supports reading from gl_FragCoord
RA_CAP_PARALLEL_COMPUTE = 1 << 11, // supports parallel compute shaders
};

enum ra_ctype {
Expand Down Expand Up @@ -84,6 +85,8 @@ struct ra_format {
// only applies to 2-component textures
bool linear_filter; // linear filtering available from shader
bool renderable; // can be used for render targets
bool dummy_format; // is not a real ra_format but a fake one (e.g. FBO).
// dummy formats cannot be used to create textures

// If not 0, the format represents some sort of packed fringe format, whose
// shader representation is given by the special_imgfmt_desc pointer.
Expand Down Expand Up @@ -285,6 +288,9 @@ struct ra_renderpass_params {
enum ra_blend blend_src_alpha;
enum ra_blend blend_dst_alpha;

// If true, the contents of `target` not written to will become undefined
bool invalidate_target;

// --- type==RA_RENDERPASS_TYPE_COMPUTE only

// Shader text, like vertex_shader/frag_shader.
Expand Down
8 changes: 2 additions & 6 deletions video/out/gpu/shader_cache.c
Expand Up @@ -786,11 +786,6 @@ static void gl_sc_generate(struct gl_shader_cache *sc,
ADD(header, "#define texture texture2D\n");
}

if (sc->ra->glsl_vulkan && type == RA_RENDERPASS_TYPE_COMPUTE) {
ADD(header, "#define gl_GlobalInvocationIndex "
"(gl_WorkGroupID * gl_WorkGroupSize + gl_LocalInvocationID)\n");
}

// Additional helpers.
ADD(header, "#define LUT_POS(x, lut_size)"
" mix(0.5 / (lut_size), 1.0 - 0.5 / (lut_size), (x))\n");
Expand Down Expand Up @@ -974,13 +969,14 @@ static void gl_sc_generate(struct gl_shader_cache *sc,
}

struct mp_pass_perf gl_sc_dispatch_draw(struct gl_shader_cache *sc,
struct ra_tex *target,
struct ra_tex *target, bool discard,
const struct ra_renderpass_input *vao,
int vao_len, size_t vertex_stride,
void *vertices, size_t num_vertices)
{
struct timer_pool *timer = NULL;

sc->params.invalidate_target = discard;
gl_sc_generate(sc, RA_RENDERPASS_TYPE_RASTER, target->params.format,
vao, vao_len, vertex_stride);
if (!sc->current_shader)
Expand Down
2 changes: 1 addition & 1 deletion video/out/gpu/shader_cache.h
Expand Up @@ -50,7 +50,7 @@ void gl_sc_blend(struct gl_shader_cache *sc,
enum ra_blend blend_dst_alpha);
void gl_sc_enable_extension(struct gl_shader_cache *sc, char *name);
struct mp_pass_perf gl_sc_dispatch_draw(struct gl_shader_cache *sc,
struct ra_tex *target,
struct ra_tex *target, bool discard,
const struct ra_renderpass_input *vao,
int vao_len, size_t vertex_stride,
void *ptr, size_t num);
Expand Down
27 changes: 19 additions & 8 deletions video/out/gpu/video.c
Expand Up @@ -1134,7 +1134,7 @@ static void dispatch_compute(struct gl_video *p, int w, int h,
}

static struct mp_pass_perf render_pass_quad(struct gl_video *p,
struct ra_fbo fbo,
struct ra_fbo fbo, bool discard,
const struct mp_rect *dst)
{
// The first element is reserved for `vec2 position`
Expand Down Expand Up @@ -1192,15 +1192,15 @@ static struct mp_pass_perf render_pass_quad(struct gl_video *p,
&p->tmp_vertex[num_vertex_attribs * 1],
vertex_stride);

return gl_sc_dispatch_draw(p->sc, fbo.tex, p->vao, num_vertex_attribs,
return gl_sc_dispatch_draw(p->sc, fbo.tex, discard, p->vao, num_vertex_attribs,
vertex_stride, p->tmp_vertex, num_vertices);
}

static void finish_pass_fbo(struct gl_video *p, struct ra_fbo fbo,
const struct mp_rect *dst)
bool discard, const struct mp_rect *dst)
{
pass_prepare_src_tex(p);
pass_record(p, render_pass_quad(p, fbo, dst));
pass_record(p, render_pass_quad(p, fbo, discard, dst));
debug_check_gl(p, "after rendering");
cleanup_binds(p);
}
Expand All @@ -1218,6 +1218,11 @@ static void finish_pass_tex(struct gl_video *p, struct ra_tex **dst_tex,
return;
}

// If RA_CAP_PARALLEL_COMPUTE is set, try to prefer compute shaders
// over fragment shaders wherever possible.
if (!p->pass_compute.active && (p->ra->caps & RA_CAP_PARALLEL_COMPUTE))
pass_is_compute(p, 16, 16);

if (p->pass_compute.active) {
gl_sc_uniform_image2D_wo(p->sc, "out_image", *dst_tex);
if (!p->pass_compute.directly_writes)
Expand All @@ -1229,7 +1234,7 @@ static void finish_pass_tex(struct gl_video *p, struct ra_tex **dst_tex,
debug_check_gl(p, "after dispatching compute shader");
} else {
struct ra_fbo fbo = { .tex = *dst_tex, };
finish_pass_fbo(p, fbo, &(struct mp_rect){0, 0, w, h});
finish_pass_fbo(p, fbo, true, &(struct mp_rect){0, 0, w, h});
}
}

Expand Down Expand Up @@ -2788,7 +2793,7 @@ static void pass_draw_to_screen(struct gl_video *p, struct ra_fbo fbo)

pass_dither(p);
pass_describe(p, "output to screen");
finish_pass_fbo(p, fbo, &p->dst_rect);
finish_pass_fbo(p, fbo, false, &p->dst_rect);
}

static bool update_surface(struct gl_video *p, struct mp_image *mpi,
Expand Down Expand Up @@ -3053,9 +3058,15 @@ void gl_video_render_frame(struct gl_video *p, struct vo_frame *frame,
if (frame->num_vsyncs > 1 && frame->display_synced &&
!p->dumb_mode && (p->ra->caps & RA_CAP_BLIT))
{
// Attempt to use the same format as the destination FBO
// if possible. Some RAs use a wrapped dummy format here,
// so fall back to the fbo_format in that case.
const struct ra_format *fmt = fbo.tex->params.format;
if (fmt->dummy_format)
fmt = p->fbo_format;
bool r = ra_tex_resize(p->ra, p->log, &p->output_tex,
fbo.tex->params.w, fbo.tex->params.h,
p->fbo_format);
fmt);
if (r) {
dest_fbo = (struct ra_fbo) { p->output_tex };
p->output_tex_valid = true;
Expand Down Expand Up @@ -3194,7 +3205,7 @@ static void reinterleave_vdpau(struct gl_video *p,
const struct ra_format *fmt = ra_find_unorm_format(p->ra, 1, comps);
ra_tex_resize(p->ra, p->log, tex, w, h * 2, fmt);
struct ra_fbo fbo = { *tex };
finish_pass_fbo(p, fbo, &(struct mp_rect){0, 0, w, h * 2});
finish_pass_fbo(p, fbo, true, &(struct mp_rect){0, 0, w, h * 2});

output[n] = *tex;
}
Expand Down
7 changes: 7 additions & 0 deletions video/out/opengl/ra_gl.c
Expand Up @@ -283,6 +283,8 @@ static struct ra_tex *gl_tex_create(struct ra *ra,
const struct ra_tex_params *params)
{
GL *gl = ra_gl_get(ra);
assert(!params->format->dummy_format);

struct ra_tex *tex = gl_tex_create_blank(ra, params);
if (!tex)
return NULL;
Expand Down Expand Up @@ -382,6 +384,7 @@ static const struct ra_format fbo_dummy_format = {
.flags = F_CR,
},
.renderable = true,
.dummy_format = true,
};

// Create a ra_tex that merely wraps an existing framebuffer. gl_fbo can be 0
Expand Down Expand Up @@ -996,6 +999,10 @@ static void gl_renderpass_run(struct ra *ra,
assert(params->target->params.render_dst);
assert(params->target->params.format == pass->params.target_format);
gl->BindFramebuffer(GL_FRAMEBUFFER, target_gl->fbo);
if (pass->params.invalidate_target && gl->InvalidateFramebuffer) {
GLenum fb = target_gl->fbo ? GL_COLOR_ATTACHMENT0 : GL_COLOR;
gl->InvalidateFramebuffer(GL_FRAMEBUFFER, 1, &fb);
}
gl->Viewport(params->viewport.x0, params->viewport.y0,
mp_rect_w(params->viewport),
mp_rect_h(params->viewport));
Expand Down
23 changes: 20 additions & 3 deletions video/out/vulkan/common.h
Expand Up @@ -48,10 +48,27 @@ struct mpvk_ctx {
VkSurfaceKHR surf;
VkSurfaceFormatKHR surf_format; // picked at surface initialization time

struct vk_malloc *alloc; // memory allocator for this device
struct vk_cmdpool *pool; // primary command pool for this device
struct vk_cmd *last_cmd; // most recently submitted command
struct vk_malloc *alloc; // memory allocator for this device
struct spirv_compiler *spirv; // GLSL -> SPIR-V compiler
struct vk_cmdpool **pools; // command pools (one per queue family)
int num_pools;
struct vk_cmd *last_cmd; // most recently submitted command

// Queued/pending commands. These are shared for the entire mpvk_ctx to
// ensure submission and callbacks are FIFO
struct vk_cmd **cmds_queued; // recorded but not yet submitted
struct vk_cmd **cmds_pending; // submitted but not completed
int num_cmds_queued;
int num_cmds_pending;

// Pointers into *pools
struct vk_cmdpool *pool_graphics; // required
struct vk_cmdpool *pool_compute; // optional
struct vk_cmdpool *pool_transfer; // optional

// Common pool of signals, to avoid having to re-create these objects often
struct vk_signal **signals;
int num_signals;

// Cached capabilities
VkPhysicalDeviceLimits limits;
Expand Down