Skip to content
This repository has been archived by the owner on Sep 30, 2024. It is now read-only.
notro edited this page Sep 10, 2013 · 10 revisions

Frame rate is most often expressed in frames per second (FPS).
In the motion picture industry, where traditional film stock is used, the industry standard filming and projection formats are 24 frames per second. Historically, 25fps was used in some European countries.

FBTFT uses the term fps in two instances:

  • As an argument to fbtft_device
    This should in my earlier understanding be the maximum speed at which the driver would update the display.
    Default is fps=20.
  • In debug output (DEBUG_TIME_FIRST_UPDATE and DEBUG_TIME_EACH_UPDATE)
    This generates a message in the kernel log like this:
    hy28afb spi0.0: Elapsed time for display update: 43.530004 ms (fps: 22, lines=240)
    FBTFT measures the time for a display update and divides that by the video memory size to get the fps value.
    It is really only fps if the whole display is updated, and not only a portion of it.

FBTFT is based on st7735fb.c by Matt Porter. Both use Framebuffer Deferred IO.
This means that when an application mmap's /dev/fb1 and starts writing to video memory, this generates a page fault and schedules a display update to happen after a preset time. During this time, the application continues writing to memory. When the display update happens, all pages in video memory that have changed, will be written to the display.

This scenario works fine on slow displays that can't keep up with all the changes.

There is however a problem here. The first time the application writes to video memory, a display update is scheduled to happen 1/20 second later. When that time arrives, the list of changed pages is locked during the display update. If that update takes 1/20 seconds, the next update will happen 1/20 seconds after that.

Thus we get: app's first write -> wait 1/20 s -> update display 1/20 s -> pending page faults comes in -> wait 1/20 s -> update display ...

When showing a video we will in reality get half the fps I were expecting, because the Deferred IO system is locked during display_update.

One solution to this problem could be to use a fixed display update schedule. Every 1/20 second the driver checks for changes, and writes if neccesary.

I have to ponder this some more.

Another issue is that above fps=20 we only have 4 distinct values: 25, 26, 34, 51 (explained further down).

Verified: Adding some timing code to FBTFT verified the above assumptions.

Some measurements

In the following tests, I have run a full screen movie with different fps values. The resolution of the movie matches the display, so no scaling is done.
I'm using the HY28A display to test with:

sudo modprobe fbtft_device name=hy28afb rotate=3 fps=XX
sudo modprobe hy28afb

This is the info that mplayer displays:

$ time mplayer -nolirc -vo fbdev2:/dev/fb1 -vf scale=320:-3 /home/pi/test.mpg
MPlayer svn r34540 (Debian), built with gcc-4.6 (C) 2000-2012 MPlayer Team

Playing /home/pi/test.mpg.
libavformat version 53.21.1 (external)
Mismatching header version 53.19.0
MPEG-ES file format detected.
VIDEO:  MPEG1  320x240  (aspect 1)  25.000 fps  6553.2 kbps (819.1 kbyte/s)
Load subtitles in /home/pi/
Opening video filter: [scale w=320 h=-3]
==========================================================================
Opening video decoder: [ffmpeg] FFmpeg's libavcodec codec family
libavcodec version 53.35.0 (external)
Mismatching header version 53.32.2
Selected video codec: [ffmpeg1] vfm: ffmpeg (FFmpeg MPEG-1)
==========================================================================
Audio: no sound
Starting playback...
Unsupported PixelFormat 61
Unsupported PixelFormat 53
Unsupported PixelFormat 81
Movie-Aspect is 1.33:1 - prescaling to correct movie aspect.
[swscaler @ 0xb64f1640]No accelerated colorspace conversion found from yuv420p to rgb565le.
[swscaler @ 0xb64f1640]using unscaled yuv420p -> rgb565le special converter
VO: [fbdev2] 320x240 => 320x240 BGR 16-bit

I use time to find when mplayer can't keep up anymore. The real value shows this.
I'm using top in a different SSH session to get some feeling about the CPU usage. See the %Cpu id(le) value.

fps=20

$ time mplayer -nolirc -vo fbdev2:/dev/fb1 -vf scale=320:-3 /home/pi/test.mpg
V:  xx.x xxxx/xxxx 25% 10%  0.0% 0 0
real    0m49.312s
user    0m20.400s
sys     0m9.140s

$ top
%Cpu(s): 50.5 us, 28.8 sy,  0.0 ni, 19.7 id,  0.0 wa,  0.0 hi,  1.0 si,  0.0 st

  PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM    TIME+  COMMAND
 9905 pi        20   0 57912  14m 6360 S  60.0  3.3   0:05.25 mplayer
 9880 root      20   0     0    0    0 S   4.1  0.0   0:11.60 kworker/u:2
 9903 pi        20   0  4676 1412 1024 R   3.7  0.3   0:00.40 top
 9854 root      20   0     0    0    0 S   2.8  0.0   0:16.70 kworker/0:2
 9894 root      20   0     0    0    0 S   1.9  0.0   0:04.93 kworker/0:1
 8160 pi        20   0  9804 1712 1064 S   1.2  0.4   0:07.99 sshd
 1581 root      20   0  1744  504  420 S   0.6  0.1   1:39.72 ifplugd
 7902 pi        20   0  9804 1728 1068 S   0.6  0.4   0:06.69 sshd

fps=25

$ time mplayer -nolirc -vo fbdev2:/dev/fb1 -vf scale=320:-3 /home/pi/test.mpg
V:  xx.x xxxx/xxxx 42% 4%  0.0% 0 0
real    0m49.317s
user    0m21.930s
sys     0m10.890s

$ top
%Cpu(s): 50.5 us, 34.6 sy,  0.0 ni, 14.3 id,  0.0 wa,  0.0 hi,  0.7 si,  0.0 st
  PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM    TIME+  COMMAND
 9918 pi        20   0 57912  14m 6360 R  64.8  3.3   0:04.05 mplayer
 9919 root      20   0     0    0    0 S   2.5  0.0   0:00.08 kworker/0:0
 9810 root      20   0     0    0    0 S   2.2  0.0   0:19.38 kworker/u:1
 9917 pi        20   0  4676 1412 1024 R   2.2  0.3   0:00.25 top
 9854 root      20   0     0    0    0 S   1.6  0.0   0:17.89 kworker/0:2
   37 root      20   0     0    0    0 S   1.3  0.0   0:25.15 mmcqd/0
 8160 pi        20   0  9804 1712 1064 S   1.3  0.4   0:08.62 sshd
 9880 root      20   0     0    0    0 S   1.3  0.0   0:12.50 kworker/u:2
 9920 root      20   0     0    0    0 S   1.3  0.0   0:00.04 kworker/u:0

fps=30

mplayer still keeps up.

$ time mplayer -nolirc -vo fbdev2:/dev/fb1 -vf scale=320:-3 /home/pi/test.mpg
V:  xx.x xxxx/xxxx 30% 8%  0.0% 0 0
real    0m49.346s
user    0m24.060s
sys     0m14.130s

$ top
%Cpu(s): 65.0 us, 31.6 sy,  0.0 ni,  2.0 id,  0.0 wa,  0.0 hi,  1.3 si,  0.0 st

  PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM    TIME+  COMMAND
 9928 pi        20   0 57912  14m 6360 R  78.1  3.3   0:09.46 mplayer
 9920 root      20   0     0    0    0 S   3.7  0.0   0:01.44 kworker/u:0
 9921 pi        20   0  4676 1412 1024 R   3.1  0.3   0:00.77 top
 9919 root      20   0     0    0    0 S   2.5  0.0   0:01.61 kworker/0:0
 9854 root      20   0     0    0    0 S   2.2  0.0   0:19.02 kworker/0:2
 9810 root      20   0     0    0    0 S   1.5  0.0   0:20.73 kworker/u:1
 8160 pi        20   0  9804 1712 1064 S   1.2  0.4   0:09.41 sshd
 7902 pi        20   0  9804 1728 1068 S   0.9  0.4   0:07.04 sshd

fps=50

mplayer starts to lag behind.

$ time mplayer -nolirc -vo fbdev2:/dev/fb1 -vf scale=320:-3 /home/pi/test.mpg
V:  xx.x xxxx/xxxx 80% 6%  0.0% 0 0
real    0m50.323s
user    0m33.110s
sys     0m8.440s

$ top
%Cpu(s): 73.3 us, 25.5 sy,  0.0 ni,  0.6 id,  0.0 wa,  0.0 hi,  0.6 si,  0.0 st

  PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM    TIME+  COMMAND
 9943 pi        20   0 57912  14m 6360 R  82.3  3.3   0:08.37 mplayer
 9920 root      20   0     0    0    0 S   5.6  0.0   0:02.63 kworker/u:0
 9942 pi        20   0  4676 1412 1024 R   3.4  0.3   0:00.39 top
 9854 root      20   0     0    0    0 S   3.1  0.0   0:20.47 kworker/0:2
 9919 root      20   0     0    0    0 S   2.2  0.0   0:02.71 kworker/0:0
 8160 pi        20   0  9804 1712 1064 S   1.3  0.4   0:10.22 sshd
 1581 root      20   0  1744  504  420 S   0.9  0.1   1:40.81 ifplugd
 9810 root      20   0     0    0    0 S   0.6  0.0   0:22.24 kworker/u:1

fps=100

mplayer lags terribly behind.

$ time mplayer -nolirc -vo fbdev2:/dev/fb1 -vf scale=320:-3 /home/pi/test.mpg
V:  xx.x xxxx/xxxx 120% 14%  0.0% 0 0
real    1m20.483s
user    0m45.020s
sys     0m17.550s

$ top
%Cpu(s): 63.7 us, 35.2 sy,  0.0 ni,  0.3 id,  0.0 wa,  0.0 hi,  0.9 si,  0.0 st

 9897 pi        20   0 57912  14m 6360 R  79.6  3.3   0:04.38 mplayer
 9896 pi        20   0  4676 1412 1024 R   5.8  0.3   0:01.09 top
 9894 root      20   0     0    0    0 S   4.4  0.0   0:02.60 kworker/0:1
 9810 root      20   0     0    0    0 S   4.1  0.0   0:15.94 kworker/u:1
 9880 root      20   0     0    0    0 S   2.9  0.0   0:08.94 kworker/u:2
 9854 root      20   0     0    0    0 S   2.0  0.0   0:14.51 kworker/0:2

fps=50 txbuflen=-1

txbuflen=-1 allocates a transmit buffer with the same size as video memory (default 4096). This gives only one SPI transfer for all the pixel data, instead of ~40 small transfers.

$ time mplayer -nolirc -vo fbdev2:/dev/fb1 -vf scale=320:-3 /home/pi/test.mpg
real    0m49.333s
user    0m13.010s
sys     0m2.240s

$ top
%Cpu(s): 59.7 us, 36.4 sy,  0.0 ni,  0.6 id,  0.0 wa,  0.0 hi,  3.4 si,  0.0 st

fps=100 txbuflen=-1

$ time mplayer -nolirc -vo fbdev2:/dev/fb1 -vf scale=320:-3 /home/pi/test.mpg
real    1m10.753s
user    0m22.180s
sys     0m6.860s

$ top
%Cpu(s): 49.6 us, 46.9 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  3.5 si,  0.0 st

Conclusion

 fps     real        idle   txbuflen
-------------------------------------
  20     49.312s    19.7     4096
  25     49.317s    14.3     4096
  30     49.346s     2.0     4096
  50     50.323s     0.6     4096
 100   1m20.483s     0.3     4096
  50     49.333s     0.6       -1
 100   1m10.753s     0.0       -1

At fps={50,100} mplayer doesn't get the CPU time it needs to play the movie at the correct speed. Linux uses too much resources to push the changed video memory to the display.

Currently I don't know if the culprit is the FBTFT driver that uses to much CPU to keep track of changed pages and copying and byteflipping the change pixels for transfer, and/or it's the SPI Controller driver that has to handle an interrupt every 12 bytes to fill the transmit FIFO.

Update: Increasing the transmit buffer size, gives a little performance boost.

Testing with DMA support

After adding experimental DMA support to FBTFT and using a DMA capable SPI master driver.

fps=30

$ time mplayer -nolirc -vo fbdev2:/dev/fb1 -vf scale=320:-3 /home/pi/test.mpg
real    0m49.311s
user    0m12.440s
sys     0m0.250s

$ top
Tasks:  70 total,   2 running,  68 sleeping,   0 stopped,   0 zombie
%Cpu(s): 27.2 us,  7.8 sy,  0.0 ni, 64.6 id,  0.0 wa,  0.0 hi,  0.3 si,  0.0 st
KiB Mem:    448672 total,   144736 used,   303936 free,    17044 buffers
KiB Swap:   102396 total,        0 used,   102396 free,    89444 cached

  PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM    TIME+  COMMAND
 2756 pi        20   0 57912  14m 6360 R  26.6  3.3   0:03.65 mplayer
 2628 root      20   0     0    0    0 S   3.6  0.0   0:00.28 kworker/0:1
 2403 root      20   0     0    0    0 S   2.9  0.0   0:00.91 spi0

fps=50

$ time mplayer -nolirc -vo fbdev2:/dev/fb1 -vf scale=320:-3 /home/pi/test.mpg
real    0m49.297s
user    0m12.730s
sys     0m0.250s

$ top
Tasks:  70 total,   2 running,  68 sleeping,   0 stopped,   0 zombie
%Cpu(s): 26.8 us, 10.2 sy,  0.0 ni, 63.1 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem:    448672 total,   145824 used,   302848 free,    17092 buffers
KiB Swap:   102396 total,        0 used,   102396 free,    90436 cached

  PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM    TIME+  COMMAND
 2763 pi        20   0 57912  14m 6360 R  26.3  3.3   0:02.85 mplayer
 2628 root      20   0     0    0    0 S   4.9  0.0   0:01.42 kworker/0:1
 2403 root      20   0     0    0    0 S   3.6  0.0   0:02.23 spi0

fps=100

$ time mplayer -nolirc -vo fbdev2:/dev/fb1 -vf scale=320:-3 /home/pi/test.mpg
real    1m1.249s
user    0m13.290s
sys     0m0.300s

$ top
Tasks:  70 total,   1 running,  69 sleeping,   0 stopped,   0 zombie
%Cpu(s): 21.7 us, 12.7 sy,  0.0 ni, 65.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem:    448672 total,   145856 used,   302816 free,    17116 buffers
KiB Swap:   102396 total,        0 used,   102396 free,    90444 cached

  PID USER      PR  NI  VIRT  RES  SHR S  %CPU %MEM    TIME+  COMMAND
 2770 pi        20   0 57912  14m 6360 D  20.7  3.3   0:02.38 mplayer
 2771 root      20   0     0    0    0 D   6.1  0.0   0:00.37 kworker/0:0
 2403 root      20   0     0    0    0 D   4.2  0.0   0:03.89 spi0

DMA Conclusion

 fps     real        idle   txbuflen
-------------------------------------
  30     49.311s      64     4096
  50     49.297s      63     4096
 100    1m1.249s      65     4096

There is still a bottleneck somewhere.

To be continued...

Detailed walkthrough

This is a detailed look at the update_display codepath in FBTFT.

jiffies

The kernel keeps track of the flow of time by means of timer interrupts. Every time a timer interrupt occurs, the value of an internal kernel counter, jiffies, is incremented.

The macro HZ defines the number of jiffies in one second

File: /arch/arm/include/asm/param.h

# define HZ		CONFIG_HZ	/* Internal kernel timer frequency */
File: .config
// default in Raspian
CONFIG_HZ=100

HZ defines the resolution of fps:

	fbdefio->delay =           HZ/fps;

Thus above fps=20 we only have 4 distinct values: 100/25=4, 100/26=3, 100/34=2, 100/51=1 (integer division)
FBTFT writes the "real" fps to the kernel log when the framebuffer is registered.

Data structures

File: include/linux/fb.h

struct fb_info {
...
#ifdef CONFIG_FB_DEFERRED_IO
	struct delayed_work deferred_work;
	struct fb_deferred_io *fbdefio;
#endif
...
};

#ifdef CONFIG_FB_DEFERRED_IO
struct fb_deferred_io {
	/* delay between mkwrite and deferred handler */
	unsigned long delay;
	struct mutex lock; /* mutex that protects the page list */
	struct list_head pagelist; /* list of touched pages */
	/* callback */
	void (*first_io)(struct fb_info *info);
	void (*deferred_io)(struct fb_info *info, struct list_head *pagelist);
};
#endif
File: include/linux/workqueue.h

struct delayed_work {
        struct work_struct work;
        struct timer_list timer;

        /* target workqueue and CPU ->timer uses to queue ->work */
        struct workqueue_struct *wq;
        int cpu;
};

Initialization codepath

Driver calls fbtft_framebuffer_alloc()

File: adafruit22fb.c
static int adafruit22fb_probe(struct spi_device *spi)
...
	info = fbtft_framebuffer_alloc(&adafruit22_display, &spi->dev);
File: fbtft-core.c
struct fb_info *fbtft_framebuffer_alloc(struct fbtft_display *display, struct device *dev)
...
	unsigned fps = display->fps;
...
	/* defaults */
	if (!fps)
		fps = 20;
...
	/* platform_data override ? */
	if (pdata) {
		if (pdata->fps)
			fps = pdata->fps;
...
	fbdefio = kzalloc(sizeof(struct fb_deferred_io), GFP_KERNEL);
...
	info->fbdefio = fbdefio;
...
	fbdefio->delay =           HZ/fps;
	fbdefio->deferred_io =     fbtft_deferred_io;
	fb_deferred_io_init(info);
File: drivers/video/fb_defio.c
void fb_deferred_io_init(struct fb_info *info)
{
	struct fb_deferred_io *fbdefio = info->fbdefio;

	BUG_ON(!fbdefio);
	mutex_init(&fbdefio->lock);
	info->fbops->fb_mmap = fb_deferred_io_mmap;
	INIT_DELAYED_WORK(&info->deferred_work, fb_deferred_io_work);
	INIT_LIST_HEAD(&fbdefio->pagelist);
	if (fbdefio->delay == 0) /* set a default of 1 s */
		fbdefio->delay = HZ;
}
File: include/linux/workqueue.h
// INIT_DELAYED_WORK(&info->deferred_work, fb_deferred_io_work)
#define INIT_DELAYED_WORK(_work, _func)					\
	__INIT_DELAYED_WORK(_work, _func, 0)
// __INIT_DELAYED_WORK(&info->deferred_work, fb_deferred_io_work, 0)
#define __INIT_DELAYED_WORK(_work, _func, _tflags)			\
	do {								\
		INIT_WORK(&(_work)->work, (_func));			\
		__setup_timer(&(_work)->timer, delayed_work_timer_fn,	\
			      (unsigned long)(_work),			\
			      (_tflags) | TIMER_IRQSAFE);		\
	} while (0)
// INIT_WORK(&info->deferred_work->work, fb_deferred_io_work);
#define INIT_WORK(_work, _func)						\
	do {								\
		__INIT_WORK((_work), (_func), 0);			\
	} while (0)
// __INIT_WORK(&info->deferred_work->work, fb_deferred_io_work, 0);
#define __INIT_WORK(_work, _func, _onstack)				\
	do {								\
		__init_work((_work), _onstack);				\
		(_work)->data = (atomic_long_t) WORK_DATA_INIT();	\
		INIT_LIST_HEAD(&(_work)->entry);			\
		PREPARE_WORK((_work), (_func));				\
	} while (0)
File: kernel/workqueue.c

// __init_work(&info->deferred_work->work, 0);
void __init_work(struct work_struct *work, int onstack)
{
	if (onstack)
		debug_object_init_on_stack(work, &work_debug_descr);
	else
		debug_object_init(work, &work_debug_descr);
}
// PREPARE_WORK(&info->deferred_work->work, fb_deferred_io_work);
/*
 * initialize a work item's function pointer
 */
#define PREPARE_WORK(_work, _func)					\
	do {								\
		(_work)->func = (_func);				\
	} while (0)
File: include/linux/timer.h

// __setup_timer(&info->deferred_work->timer, delayed_work_timer_fn, (unsigned long)(&info->deferred_work), TIMER_IRQSAFE);
#define __setup_timer(_timer, _fn, _data, _flags)			\
	do {								\
		__init_timer((_timer), (_flags));			\
		(_timer)->function = (_fn);				\
		(_timer)->data = (_data);				\
	} while (0)

Display update codepath

Action: Application mmap's /dev/fb1

File: drivers/video/fb_defio.c

static const struct vm_operations_struct fb_deferred_io_vm_ops = {
	.fault		= fb_deferred_io_fault,
	.page_mkwrite	= fb_deferred_io_mkwrite,
};

static int fb_deferred_io_mmap(struct fb_info *info, struct vm_area_struct *vma)
{
	vma->vm_ops = &fb_deferred_io_vm_ops;
	vma->vm_flags |= VM_DONTEXPAND | VM_DONTDUMP;
	if (!(info->flags & FBINFO_VIRTFB))
		vma->vm_flags |= VM_IO;
	vma->vm_private_data = info;
	return 0;
}

Action: Application writes to videomemory

/* vm_ops->page_mkwrite handler */
static int fb_deferred_io_mkwrite(struct vm_area_struct *vma,
				  struct vm_fault *vmf)
{
	struct page *page = vmf->page;
	struct fb_info *info = vma->vm_private_data;
	struct fb_deferred_io *fbdefio = info->fbdefio;
	struct page *cur;

	/* this is a callback we get when userspace first tries to
	write to the page. we schedule a workqueue. that workqueue
	will eventually mkclean the touched pages and execute the
	deferred framebuffer IO. then if userspace touches a page
	again, we repeat the same scheme */

	file_update_time(vma->vm_file);

	/* protect against the workqueue changing the page list */
	mutex_lock(&fbdefio->lock);

	/* first write in this cycle, notify the driver */
	if (fbdefio->first_io && list_empty(&fbdefio->pagelist))
		fbdefio->first_io(info);

	/*
	 * We want the page to remain locked from ->page_mkwrite until
	 * the PTE is marked dirty to avoid page_mkclean() being called
	 * before the PTE is updated, which would leave the page ignored
	 * by defio.
	 * Do this by locking the page here and informing the caller
	 * about it with VM_FAULT_LOCKED.
	 */
	lock_page(page);

	/* we loop through the pagelist before adding in order
	to keep the pagelist sorted */
	list_for_each_entry(cur, &fbdefio->pagelist, lru) {
		/* this check is to catch the case where a new
		process could start writing to the same page
		through a new pte. this new access can cause the
		mkwrite even when the original ps's pte is marked
		writable */
		if (unlikely(cur == page))
			goto page_already_added;
		else if (cur->index > page->index)
			break;
	}

	list_add_tail(&page->lru, &cur->lru);

page_already_added:
	mutex_unlock(&fbdefio->lock);

	/* come back after delay to process the deferred IO */
	schedule_delayed_work(&info->deferred_work, fbdefio->delay);
	return VM_FAULT_LOCKED;
}
File: include/linux/workqueue.h

// schedule_delayed_work(&info->deferred_work, fbdefio->delay);
/**
 * schedule_delayed_work - put work task in global workqueue after delay
 * @dwork: job to be done
 * @delay: number of jiffies to wait or 0 for immediate execution
 *
 * After waiting for a given time this puts a job in the kernel-global
 * workqueue.
 */
static inline bool schedule_delayed_work(struct delayed_work *dwork,
					 unsigned long delay)
{
	return queue_delayed_work(system_wq, dwork, delay);
}
// queue_delayed_work(system_wq, &info->deferred_work, fbdefio->delay);
/**
 * queue_delayed_work - queue work on a workqueue after delay
 * @wq: workqueue to use
 * @dwork: delayable work to queue
 * @delay: number of jiffies to wait before queueing
 *
 * Equivalent to queue_delayed_work_on() but tries to use the local CPU.
 */
static inline bool queue_delayed_work(struct workqueue_struct *wq,
				      struct delayed_work *dwork,
				      unsigned long delay)
{
	return queue_delayed_work_on(WORK_CPU_UNBOUND, wq, dwork, delay);
}
File: kernel/workqueue.c

// queue_delayed_work_on(WORK_CPU_UNBOUND, system_wq, &info->deferred_work, fbdefio->delay);
/**
 * queue_delayed_work_on - queue work on specific CPU after delay
 * @cpu: CPU number to execute work on
 * @wq: workqueue to use
 * @dwork: work to queue
 * @delay: number of jiffies to wait before queueing
 *
 * Returns %false if @work was already on a queue, %true otherwise.  If
 * @delay is zero and @dwork is idle, it will be scheduled for immediate
 * execution.
 */
bool queue_delayed_work_on(int cpu, struct workqueue_struct *wq,
			   struct delayed_work *dwork, unsigned long delay)
{
	struct work_struct *work = &dwork->work;
	bool ret = false;
	unsigned long flags;

	/* read the comment in __queue_work() */
	local_irq_save(flags);

	if (!test_and_set_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(work))) {
		__queue_delayed_work(cpu, wq, dwork, delay);
		ret = true;
	}

	local_irq_restore(flags);
	return ret;
}
// __queue_delayed_work(WORK_CPU_UNBOUND, system_wq, &info->deferred_work, fbdefio->delay);
static void __queue_delayed_work(int cpu, struct workqueue_struct *wq,
				struct delayed_work *dwork, unsigned long delay)
{
	struct timer_list *timer = &dwork->timer;
	struct work_struct *work = &dwork->work;

	WARN_ON_ONCE(timer->function != delayed_work_timer_fn ||
		     timer->data != (unsigned long)dwork);
	WARN_ON_ONCE(timer_pending(timer));
	WARN_ON_ONCE(!list_empty(&work->entry));

	/*
	 * If @delay is 0, queue @dwork->work immediately.  This is for
	 * both optimization and correctness.  The earliest @timer can
	 * expire is on the closest next tick and delayed_work users depend
	 * on that there's no such delay when @delay is 0.
	 */
	if (!delay) {
		__queue_work(cpu, wq, &dwork->work);
		return;
	}

	timer_stats_timer_set_start_info(&dwork->timer);

	dwork->wq = wq;
	dwork->cpu = cpu;
	timer->expires = jiffies + delay;

	if (unlikely(cpu != WORK_CPU_UNBOUND))
		add_timer_on(timer, cpu);
	else
		add_timer(timer);
}
File: kernel/timer.c

// add_timer(&info->deferred_work->timer);
/**
 * add_timer - start a timer
 * @timer: the timer to be added
 *
 * The kernel will do a ->function(->data) callback from the
 * timer interrupt at the ->expires point in the future. The
 * current time is 'jiffies'.
 *
 * The timer's ->expires, ->function (and if the handler uses it, ->data)
 * fields must be set prior calling this function.
 *
 * Timers with an ->expires field in the past will be executed in the next
 * timer tick.
 */
void add_timer(struct timer_list *timer)
{
	BUG_ON(timer_pending(timer));
	mod_timer(timer, timer->expires);
}

When the timer fires

File: kernel/workqueue.c

void delayed_work_timer_fn(unsigned long __data)
{
        struct delayed_work *dwork = (struct delayed_work *)__data;

        /* should have been called from irqsafe timer with irq already off */
        __queue_work(dwork->cpu, dwork->wq, &dwork->work);
}
File: drivers/video/fb_defio.c

/* workqueue callback */
static void fb_deferred_io_work(struct work_struct *work)
{
	struct fb_info *info = container_of(work, struct fb_info,
						deferred_work.work);
	struct list_head *node, *next;
	struct page *cur;
	struct fb_deferred_io *fbdefio = info->fbdefio;

	/* here we mkclean the pages, then do all deferred IO */
	mutex_lock(&fbdefio->lock);
	list_for_each_entry(cur, &fbdefio->pagelist, lru) {
		lock_page(cur);
		page_mkclean(cur);
		unlock_page(cur);
	}

	/* driver's callback with pagelist */
	fbdefio->deferred_io(info, &fbdefio->pagelist);

	/* clear the list */
	list_for_each_safe(node, next, &fbdefio->pagelist) {
		list_del(node);
	}
	mutex_unlock(&fbdefio->lock);
}
File: fbtft-core.c

void fbtft_deferred_io(struct fb_info *info, struct list_head *pagelist)
{
	struct fbtft_par *par = info->par;
	struct page *page;
	unsigned long index;
	unsigned y_low=0, y_high=0;
	int count = 0;

	/* debug can be changed via sysfs */
	fbtft_debug_sync_value(par);

	/* Mark display lines as dirty */
	list_for_each_entry(page, pagelist, lru) {
		count++;
		index = page->index << PAGE_SHIFT;
		y_low = index / info->fix.line_length;
		y_high = (index + PAGE_SIZE - 1) / info->fix.line_length;
		fbtft_fbtft_dev_dbg(DEBUG_DEFERRED_IO, par, info->device, "page->index=%lu y_low=%d y_high=%d\n", page->index, y_low, y_high);
		if (y_high > info->var.yres - 1)
			y_high = info->var.yres - 1;
		if (y_low < par->dirty_lines_start)
			par->dirty_lines_start = y_low;
		if (y_high > par->dirty_lines_end)
			par->dirty_lines_end = y_high;
	}

	par->fbtftops.update_display(info->par);
}

References

piwik