Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

u-boot: eMMC transfer speed significantly slower than stock u-boot on Tegra186 #42

Closed
derekstraka opened this issue Aug 16, 2017 · 11 comments

Comments

@derekstraka
Copy link
Contributor

When loading a kernel from the eMMC device, the stock u-boot is able to copy data at a rate of ~36.2 MiB/s while the yocto u-boot only transfers at ~314.5 KiB/s.

Stock u-boot:

Tegra186 (P2771-0000-500) # version
U-Boot 2016.07-g0ce7ca2 (Jul 20 2017 - 00:45:25 -0700)
aarch64-unknown-linux-gnu-gcc (GCC) 4.8.5
GNU ld (GNU Binutils) 2.24

Tegra186 (P2771-0000-500) # mmc info
Device: Tegra SD/MMC
Manufacturer ID: 11
OEM: 100
Name: 032G3 
Tran Speed: 52000000
Rd Block Len: 512
MMC version 5.1
High Capacity: Yes
Capacity: 29.1 GiB
Bus Width: 8-bit
Erase Group Size: 512 KiB
HC WP Group Size: 4 MiB
User Capacity: 29.1 GiB WRREL
Boot Capacity: 4 MiB ENH
RPMB Capacity: 4 MiB ENH

Retrieving file: /boot/Image
20280368 bytes read in 534 ms (36.2 MiB/s)

Built u-boot:

Tegra186 (P2771-0000-500) # version
U-Boot 2016.07 (Aug 16 2017 - 13:55:00 +0000)
aarch64-poky-linux-gcc (GCC) 6.3.0
GNU ld (GNU Binutils) 2.28.0.20170307

Retrieving file: /boot/Image-4.4.38-l4t-r28.1+gebda89f
10563776 bytes read in 32753 ms (314.5 KiB/s)

Tegra186 (P2771-0000-500) # mmc info
Device: Tegra SD/MMC
Manufacturer ID: 11
OEM: 100
Name: 032G3 
Tran Speed: 52000000
Rd Block Len: 512
MMC version 5.1
High Capacity: Yes
Capacity: 29.1 GiB
Bus Width: 8-bit
Erase Group Size: 512 KiB
HC WP Group Size: 4 MiB
User Capacity: 29.1 GiB WRREL
Boot Capacity: 4 MiB ENH
RPMB Capacity: 4 MiB ENH
@derekstraka
Copy link
Contributor Author

I'm guessing the clock rate isn't being set correctly on the yocto version of the bootloader.

@derekstraka
Copy link
Contributor Author

Debug logging in the tega mmc driver indicates the clock is being adjusted prior to kernel load. Also verified the data bus width is set to 8.

MMC: no card present
 mmc_core_init called
mmc_set_power: power = 15
mmc_set_power: pwr = E
host version = 804
 mmc_change_clock called 48000000
div = 1
mmc_change_clock: clkcon = 00000005
 mmc_change_clock called 375000
div = 9
mmc_change_clock: clkcon = 00000405
timeout: 00018001 cmd 8
timeout: 00018001 cmd 55
 mmc_change_clock called 375000
div = 9
mmc_change_clock: clkcon = 00000405
 mmc_change_clock called 48000000
div = 1
mmc_change_clock: clkcon = 00000005
switch to partitions #0, OK
mmc0(part 0) is current device
Scanning mmc 0:1...
Found /boot/extlinux/extlinux.conf
Retrieving file: /boot/extlinux/extlinux.conf
291 bytes read in 116 ms (2 KiB/s)
Boot Options
1:	primary Image-4.4.38-l4t-r28.1+gebda89f

@madisongh
Copy link
Member

I've noticed this with R28.1-based builds, too. I just haven't had a chance to track it down yet.

@derekstraka
Copy link
Contributor Author

Oddly, this seems to be related to the root file system layout rather than u-boot itself. I was able to swap out the u-boot built with yocto and still get the high speed load of the kernel using the stock ubuntu rootfs. I'm also able to get the slow transfer using the R28.1 bootloader with a yocto rootfs.

@derekstraka
Copy link
Contributor Author

Stock R28.1

nvidia@tegra-ubuntu:~$ sudo tune2fs -l /dev/mmcblk0p1
tune2fs 1.42.13 (17-May-2015)
Filesystem volume name:   <none>
Last mounted on:          /
Filesystem UUID:          8c3ddc87-e1a5-4679-9fbb-645970b2ca50
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize
Filesystem flags:         signed_directory_hash 
Default mount options:    user_xattr acl
Filesystem state:         clean
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              1835008
Block count:              7340032
Reserved block count:     367001
Free blocks:              6401151
Free inodes:              1656322
First block:              0
Block size:               4096
Fragment size:            4096
Reserved GDT blocks:      1022
Blocks per group:         32768
Fragments per group:      32768
Inodes per group:         8192
Inode blocks per group:   512
Flex block group size:    16
Filesystem created:       Thu Aug 17 20:30:26 2017
Last mount time:          Thu Aug 17 21:25:08 2017
Last write time:          Thu Aug 17 21:25:08 2017
Mount count:              3
Maximum mount count:      -1
Last checked:             Thu Aug 17 20:30:26 2017
Check interval:           0 (<none>)
Lifetime writes:          3534 MB
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:	          256
Required extra isize:     28
Desired extra isize:      28
Journal inode:            8
First orphan inode:       1441948
Default directory hash:   half_md4
Directory Hash Seed:      217dab92-d115-45f9-9841-8d33d04eab50
Journal backup:           inode blocks

Yocto

/var/volatile/tmp # ./tune2fs -l /dev/mmcblk0p1
tune2fs 1.42.13 (17-May-2015)
Filesystem volume name:   <none>
Last mounted on:          /
Filesystem UUID:          64c2f61f-eac9-4bcb-93fe-b52c7a42337b
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize
Filesystem flags:         signed_directory_hash 
Default mount options:    user_xattr acl
Filesystem state:         clean
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              16384
Block count:              65536
Reserved block count:     3276
Free blocks:              31051
Free inodes:              15741
First block:              1
Block size:               1024
Fragment size:            1024
Reserved GDT blocks:      255
Blocks per group:         8192
Fragments per group:      8192
Inodes per group:         2048
Inode blocks per group:   256
Flex block group size:    16
Filesystem created:       Thu Aug 17 14:56:13 2017
Last mount time:          Thu Aug 17 21:31:27 2017
Last write time:          Thu Aug 17 21:31:27 2017
Mount count:              1
Maximum mount count:      -1
Last checked:             Thu Aug 17 14:56:14 2017
Check interval:           0 (<none>)
Lifetime writes:          27 MB
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:	          128
Journal inode:            8
Default directory hash:   half_md4
Directory Hash Seed:      5828484e-e4ec-4e07-ba46-81f5f98fe350
Journal backup:           inode blocks

@derekstraka
Copy link
Contributor Author

Adding EXTRA_IMAGECMD_ext4 += "-b 4096 -I 256" to classes/image_types_tegra.bbclass helped some. The read speed was increased to 1.2 MiB/s from 314.5 KiB/s. The filesystem parameters were as follows

/var/volatile/tmp # ./tune2fs -l /dev/mmcblk0p1
tune2fs 1.42.13 (17-May-2015)
Filesystem volume name:   <none>
Last mounted on:          /
Filesystem UUID:          57ea9db7-f841-4cb2-9318-2e1c1af3bab1
Filesystem magic number:  0xEF53
Filesystem revision #:    1 (dynamic)
Filesystem features:      has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize
Filesystem flags:         signed_directory_hash 
Default mount options:    user_xattr acl
Filesystem state:         clean
Errors behavior:          Continue
Filesystem OS type:       Linux
Inode count:              16384
Block count:              16384
Reserved block count:     819
Free blocks:              7250
Free inodes:              15741
First block:              0
Block size:               4096
Fragment size:            4096
Reserved GDT blocks:      3
Blocks per group:         32768
Fragments per group:      32768
Inodes per group:         16384
Inode blocks per group:   1024
Flex block group size:    16
Filesystem created:       Thu Aug 17 21:52:35 2017
Last mount time:          Thu Aug 17 21:58:12 2017
Last write time:          Thu Aug 17 21:58:12 2017
Mount count:              1
Maximum mount count:      -1
Last checked:             Thu Aug 17 21:52:36 2017
Check interval:           0 (<none>)
Lifetime writes:          29 MB
Reserved blocks uid:      0 (user root)
Reserved blocks gid:      0 (group root)
First inode:              11
Inode size:	          256
Required extra isize:     32
Desired extra isize:      32
Journal inode:            8
Default directory hash:   half_md4
Directory Hash Seed:      45cccbfb-c070-4c5b-b390-36ce45b6b919
Journal backup:           inode blocks

@madisongh
Copy link
Member

If I change image_types_tegra.bbclass to use ext3 instead of ext4, the kernel loads at 16.6MiB/sec.

@derekstraka
Copy link
Contributor Author

Confirmed that ext3 loads fast. I'm good with using ext3 for my work, so I'm going to make a PR that allows me to override the filesystem type used by image_types_tegra.bbclass. By default, I'll leave it at ext4.

madisongh pushed a commit that referenced this issue Aug 18, 2017
The usage of ext4 with a yocto image can cause slow loading of the kernel image from
u-boot (See #42).  As a workaround, an ext3 filesystem can be used, so allow
integrators to override the default ext4 filesystem.

(PR #44)

Signed-off-by: Derek Straka <derek@asterius.io>
Signed-off-by: Matt Madison <matt@madison.systems>
madisongh added a commit that referenced this issue Aug 19, 2017
instead of bundling the initramfs with the kernel,
with a new variable, TEGRA_INITRAMFS_INITRD.

(Related to #42)

Signed-off-by: Matt Madison <matt@madison.systems>
madisongh added a commit that referenced this issue Aug 19, 2017
to reduce the size of the kernel image.

(Related to #42)

Signed-off-by: Matt Madison <matt@madison.systems>
madisongh added a commit that referenced this issue Aug 19, 2017
for some ext4-related patches.

(for #42)

Signed-off-by: Matt Madison <matt@madison.systems>
@madisongh
Copy link
Member

The root cause of this issue is that U-Boot's ext4fs support does not handle extents very well. When a file gets large enough, extent index blocks will get created for it, and that leads to exercising a very slow code path.

I added some changes to the kernel recipe to make it easier to split the initramfs out of the kernel and into a separate initrd file, to help reduce the size of the Image file. I also changed the default ext4 options to increase the block size. While these would help in some cases (depending on kernel configuration and such) they were really workarounds.

So I've patched U-Boot to cache extent index blocks while reading a file. It's not the most elegant way to deal with the issue, but it's minimally intrusive.

With the patch in place I'm seeing 30MiB/sec+ transfers for the kernel, instead of 1MiB/sec.

madisongh pushed a commit that referenced this issue Aug 27, 2017
The usage of ext4 with a yocto image can cause slow loading of the kernel image from
u-boot (See #42).  As a workaround, an ext3 filesystem can be used, so allow
integrators to override the default ext4 filesystem.

(PR #44)

Signed-off-by: Derek Straka <derek@asterius.io>
Signed-off-by: Matt Madison <matt@madison.systems>
madisongh added a commit that referenced this issue Aug 27, 2017
instead of bundling the initramfs with the kernel,
with a new variable, TEGRA_INITRAMFS_INITRD.

(Related to #42)

Signed-off-by: Matt Madison <matt@madison.systems>
madisongh added a commit that referenced this issue Aug 27, 2017
to reduce the size of the kernel image.

(Related to #42)

Signed-off-by: Matt Madison <matt@madison.systems>
madisongh added a commit that referenced this issue Aug 27, 2017
for some ext4-related patches.

(for #42)

Signed-off-by: Matt Madison <matt@madison.systems>
@drewmoseley
Copy link

@madisongh is there any interest in submitting your commit 75663dd9bf063f82cd7b578e7c43f9c5d3b2a51b upstream? I'm seeing a similar issue on a non-tegra system.

@madisongh
Copy link
Member

@drewmoseley I always thought that patch was a bit of a hack, so I never thought it would be worth upstreaming. Looks like one of the NVIDIA engineers has already posted a similar patch, although I don't see that it's been pulled in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants