Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Status of mainline kernel support #75

Open
sumpfralle opened this issue Mar 18, 2018 · 135 comments
Open

Status of mainline kernel support #75

sumpfralle opened this issue Mar 18, 2018 · 135 comments

Comments

@sumpfralle
Copy link

@sumpfralle sumpfralle commented Mar 18, 2018

I guess, it is of vital importance for the longevity of this project (and the hardware) to bring support the peripherals of GnuBee devices to the upstream kernel development tree. Otherwise we will end up with an outdated kernel (and thus more electronic landfill) somehwen ...

Since this is a not a short-term task, I propose to use this ticket for collecting the status of the ongoing mainlining efforts.

Could someone please start with summarizing the current state?
Thank you!

@neilbrown
Copy link
Contributor

@neilbrown neilbrown commented Mar 19, 2018

https://github.com/neilbrown/linux/commits/gnubee/v4.15 mostly works. I just today noticed that only the first SATA controller works. Might fix that tomorrow.
I've posted some patches for inclusion in drivers/staging: https://lkml.org/lkml/2018/3/14/1035 Will repost with some improvements tomorrow.

@sumpfralle
Copy link
Author

@sumpfralle sumpfralle commented Mar 28, 2018

Thank you for your quick response, for all the work you have put into mainline support and for the entertaining lwn article you wrote.
Your progress is way better, than I expected - this revives my positive emotions for the GnuBee platform. Thank you!

@Adirelle
Copy link

@Adirelle Adirelle commented Mar 29, 2018

@neilbrown Have you hit this bug with the ethernet driver ?

Right now, I have a job that checks the network connectivity every minute and reboot the GB if it fails...

I am very happy that a kernel hacker has got some interest in this project. Thanks for your time and your work.

@neilbrown
Copy link
Contributor

@neilbrown neilbrown commented Mar 29, 2018

I've seen something like bug #54 when the Gnubee was sitting at the u-boot prompt, and once when sitting at a shell prompt prompt in the initramfs, but never with a mainline kernel fully booted and the network configured.
This message: https://groups.google.com/d/msg/gnubee/cJuFmwCu4XI/F1KwJSIfAgAJ describes the problem the way I see it - the whole switch attached to the gnubee dies.
My guess would be that there is some inter-switch protocol (possibly the Spanning Tree Protocol) which the embedded switch in the gnubee is messing up.
In mainline Linux the embedded switch is configured as a boring transparent switch with no smarts, and maybe that is why is doesn't confuse other switches as much.
It would be interesting to see what happens if the gnubee is directly connected to a PC instead of to a switch.

@alethiophile
Copy link

@alethiophile alethiophile commented Mar 30, 2018

My experience is also that when it's booted into the Linux kernel, the network crash problem goes away. I see the problem when the board is in any of the three states uboot prompt/initramfs prompt/halted, but powered on.

@Adirelle
Copy link

@Adirelle Adirelle commented Mar 30, 2018

Weird. It always happens on booted kernel (the librecmc-based one, provided by GB) and does not take the switch down. The GB still have ICMP (e.g. ping works) but the TCP and UDP become unusable. That how the jobs detects a failure : if it can ping a fixed IP but cannot connect to its echo service, it causes the system to reboot.

@neilbrown
Copy link
Contributor

@neilbrown neilbrown commented Apr 1, 2018

Maybe there are two different bugs here - one that kills the switch and one that kills TCP but leaves ICMP working....

@xvybihal
Copy link

@xvybihal xvybihal commented Apr 12, 2018

@Adirelle I also hit bug #54 with this mainline kernel. GnuBee PC1 is sitting here on my desk, working "fine" (quotes because of ping shown later). But when I come back some hours/day(s) later, I can not connect to it - the time when it becomes unavailable via network is random, as far as I can tell. I Have to poweroff and power on again, nothing else worked for me ((un)plug cable).

gnubee ~ # uname -a
Linux gnubee.jvi.cz 4.15.12+ #3 SMP Wed Apr 4 15:29:16 CEST 2018 mips GNU/Linux

gnubee ~ # cat /etc/os-release 
PRETTY_NAME="Debian GNU/Linux 9 (stretch)"
NAME="Debian GNU/Linux"
VERSION_ID="9"
VERSION="9 (stretch)"
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"

gnubee ~ # cat /etc/network/interfaces
auto lo
iface lo inet loopback

auto eth0
iface eth0 inet static
	address 172.16.202.254
	netmask 255.255.0.0
	gateway 172.16.whatever

Strange thing is, that when I run ping long enough, I see that every other while it takes gnubee quite long time to respond (100+ ms). I can deffinitely rule out faulty switch or cable.

https://gist.githubusercontent.com/xvybihal/b403c3c2677e423f3aea73feb8255a91/raw/d137ab85de894e2d5e3cb5bb9e2ffae16d023231/Gnubee%2520ping

Unfortunately I do not have UART cable to check what is going on, when GnuBee is not available via network.

This "thing" is nothing new, it was acting this way from the begining, with every kernel provided, and Deabian installed. Makes GnuBee pretty unusable for me, if I have to restart it several times a day.

@neilbrown
Copy link
Contributor

@neilbrown neilbrown commented Apr 12, 2018

Thanks for the detailed bug report. I love details.
What is happening on the gnubee before it starts failing?
Is it just sitting there idle? Is it initiating network traffic itself? It is responding to requests?
Are there lots of requests, or occasional? or few? What protocol or protocols?
I've started a job fetching a 10K file over http every 2 seconds. I'll see if it is still working in the morning.

@Adirelle
Copy link

@Adirelle Adirelle commented Apr 12, 2018

It does not seem to be linked to network activity. Stopping the transmission daemon to reduce the bandwidth and number of connections does not prevent it to happen (neither postponing it). And I have downloaded an Ubuntu DVD torrent over a 1Gb link without any issue.

@neilbrown
Copy link
Contributor

@neilbrown neilbrown commented Apr 12, 2018

So what is your gnubee doing when the problem happens? How do you first notice it? How long after boot does it typically fail? <1hr? <12hrs?
If I can reproduce the problem it is very likely that I can fix it
If not ... there are some kernel messages in that other bug which might be enough of a hint. Maybe I'll try copying a bigger file.

@xvybihal
Copy link

@xvybihal xvybihal commented Apr 13, 2018

In my case, the GnuBee is doing nothing when it happens. What do I mean by nothing? The system installed is clean, no additional network daemons or services are installed. No network transfers are happening (except of time sync, and common stuff in the base system). I do not use it, it just sits here and doing nothing. The time when it happens is just random. Sometimes its hours, sometime day or two.
Next time it happens, I will try to take out the SD card and copy some log here, thats probably the only thing I can do, without UART cable (which I am going to enquip myself with soon).

@xvybihal
Copy link

@xvybihal xvybihal commented Apr 13, 2018

Looking at the log/messages, there might be something helpful. Uploading whole file - it looks pretty similar with what @Adirelle posted messages.tar.gz

@Adirelle
Copy link

@Adirelle Adirelle commented Apr 13, 2018

So what is your gnubee doing when the problem happens? How do you first notice it? How long after boot does it typically fail? <1hr? <12hrs?

Well, it seems totally random. The gnubee could be idle or I could be using it. Right now, it is running the following services : kernel nfs server, dropbear, dms (a upnp media server), ntpd, mysqld, transmission-daemon, ypbind, and there is a job running rdiff-backup once a day.

Sometimes it can be ok a whole week, sometimes only a few hours. I have the impression that the uptime rarely gets over 48 hours.

If I can reproduce the problem it is very likely that I can fix it

I understand that -- I am a developer myself -- but unfortunately I could not find something to trigger the bug. I have only got the kernel error message.

I have a script that reboots it within one minute after it happens. It could dump the result of some commands in a file before rebooting, if you had some suggestions.

@neilbrown
Copy link
Contributor

@neilbrown neilbrown commented Apr 13, 2018

Thanks for posting the whole log file (did I say that I love details?)
The problem appears to be that interrupts get disabled on one of the CPUs - presumably CPU0 as that gets all device interrupts on my gnubee.. This is reported by RCU at timestamp 2179 which should be about 21 seconds after it happened. Then at timestamp 6317 (about 1 hour later) the network watchdog complains.
With interrupts disabled on one CPU, the network card will become unreliable. As polling is sometimes used you can still get some traffic through, but it won't be fast. A 'ping' probably works because it sends a packet every send, and only wants one in reply. If it checks for incoming packets whenever it transmits a packet (quite likely), it will appear to work normally.
The next question is: why do interrupts get disabled? Two things might be useful.
1/ @Adirelle if you could get your script to "echo t > /proc/sysrq-trigger" when the problem is detected, that might help. It should write stack traces for all process to the kernel log. Seeing those would be most helpful.
2/ @xvybihal If you could rebuild your kernel with LOCK_DEP enabled, that might produce more useful info (I hope). You need to enable CONFIG_PROVE_LOCKING and CONFIG_DEBUG_LOCKDEP and you may as well add CONFIG_PROVE_RCU. Then if it happens again, collect the logs the same way that you did before.
Thanks.

@Adirelle
Copy link

@Adirelle Adirelle commented Apr 13, 2018

@neilbrown I have setup a script that will logs the following commands when it happens. I will post the next one.

date
uptime
vmstat 1 1
netstat -ieW
netstat -aopenW
lsmod
ps -ef
echo t > /proc/sysrq-trigger
dmesg
@neilbrown
Copy link
Contributor

@neilbrown neilbrown commented Apr 14, 2018

I have found a smoking gun.... I don't know if it is the smoking gun.
I went looking for places in the code which disable interrupts without clearly re-enabling them. I didn't expect to find any - that would be too easy. I found two!
One is almost certainly of no consequence - it always happens very early during initialization, so somethings must re-enable them.
The other is in the network code. I don't understand the code enough to know when it happens, but it looks like it is in response to some event like the cable being pulled (though I tried that and it doesn't trigger anything). The same bug is in the 4.4 kernel code
I've push out a fix to my gnubee/v4.15 kernel branch. Please test in you can.

@neheb
Copy link
Contributor

@neheb neheb commented Apr 14, 2018

@neilbrown Have you tried patching the stock Mediatek MMC driver to add support for mt7621? I tried and failed. Something about missing pinctl. I think I needed to edit the dts file...

This was my basis: jonpry/openwrt_mt7688@a85e6d9

@neilbrown
Copy link
Contributor

@neilbrown neilbrown commented Apr 14, 2018

@neilbrown Have you tried patching the stock Mediatek MMC driver to add support for mt7621?

No I haven't. As MMC is currently working, that driver is not a priority for me. My current priorities are roughly:

  • reboot - kernel currently leaves NOR flash in an inconsistent state
  • SATA access seems really slow - 10MB/sec!
  • network switch - enable VLAN support so different ports can be on different subnets
  • 2nd network interface: the SOC has 2 interfaces to the switch, only one works at present
  • crypto engine
  • PCI driver has horrible hacks to select correct interrupt

So I won't be focusing on MMC for quite a while. I'd be very happy for you or anyone else to dig into it and ask questions. If you have specific focused questions, I'd be happy to share any expertise I might have. I'd suggest opening a separate issue for each driver.

@neheb
Copy link
Contributor

@neheb neheb commented Apr 14, 2018

I just tested dd if=/dev/zero of=test count=1M with a speed result of 7.5MB/s on a btrfs array. No wonder transmission is slow...

I will dig through the MMC driver in a few days to see if I can get it working. Apparently the following commit to the one I linked has updated DTS entries.

@Adirelle
Copy link

@Adirelle Adirelle commented Apr 15, 2018

@neilbrown

Compiled your v4.15 and it boots (which is already something since I was not sure about what I was doing) but :

  • the initramfs does not find my root partition, since it lies on a md RAID1 array.
  • it messes with the attached switch, probably because its own switch is not configured.

Your kernel branch might or might not fix some bugs, but as long as it works, I prefer having a recent version quite easy to build over the ones from libreCMC/LEDE.

PS: I will try to find what I need to add to the initramfs to support rootfs on MD array.
PS2: I cross-compiled the kernel on a Alpine-Linux-based VM, I hope this will not cause any issue.

@Adirelle
Copy link

@Adirelle Adirelle commented Apr 15, 2018

Got it working ; I was also missing some other modules. I will let you know if the network lockup happens again.

@neilbrown
Copy link
Contributor

@neilbrown neilbrown commented Apr 15, 2018

Another convert - hurray :-)
I've merged your patch - thanks.

@Adirelle
Copy link

@Adirelle Adirelle commented Apr 16, 2018

Ok. So on the bad news:

  • clock skew is back again (like #49 (comment)).
  • performance seems lower than with the provided kernel, but I am not sure why.

I will try to compare kernel settings with the ones from LibreCMC.

@Adirelle
Copy link

@Adirelle Adirelle commented Apr 16, 2018

@dgazineu Is there a way to flash the firmware from a running Linux ? (to shorten the whole "put your image on a USB stick, plug it on the GB, reboot, let uboot flash the firmware, remove the USB stick, reboot" cycle).

@neilbrown
Copy link
Contributor

@neilbrown neilbrown commented Apr 16, 2018

clock skew is back again (like #49 (comment)).

You need the cpuclock clock-frequency in arch/mips/boot/dts/ralink/gbpc1.dts to match the value set by the u-boot that you have installed. I have a u-boot that configures 900MHz so I set clock-frequency to 900000000.

performance seems lower than with the provided kernel, but I am not sure why.

I just discovered that large kernel modules hurt performance. I rebuilt with CONFIG_XFS=y (instead of =m, in O/.config) and filesystem throughput is a lot faster. I've pushed an update for the defconfig file.

Is there a way to flash the firmware from a running Linux ?

Probably, using flash_erase and flash_cp from mtd-utils. I haven't played much with them - be careful.
I set up my linux desktop as a tftp server and test kernels using tftpboot. It is fairly painless.

@Adirelle
Copy link

@Adirelle Adirelle commented Apr 16, 2018

I have a u-boot that configures 900MHz so I set clock-frequency to 900000000.

Ah, I think mine is configured at 880Mhz. It seems there were some discrepancy between shipped u-boot and the kernel updates that were provided later.

I set up my linux desktop as a tftp server and test kernels using tftpboot. It is fairly painless.

To be sure I understand you right : once the new image is available through TFTP, you use the provided u-boot menu to download it and flash it. Or do you run the kernel without flashing it (which would be ideal) ?

@neilbrown
Copy link
Contributor

@neilbrown neilbrown commented Apr 16, 2018

flash_erase works as expected. flash_cp doesn't. Something wrong with the spi driver...

@neilbrown
Copy link
Contributor

@neilbrown neilbrown commented Apr 16, 2018

I don't need to flash the kernel to test it.
The script I use to build the kernel finishes with

cp O/arch/mips/boot/uImage.bin /srv/tftpboot/GB-PCx_uboot.bin
echo 'tftpboot;bootm 80200000'

I reboot (or power-cycle) the gnubee and press '4' repeatedly during the early messages. Once is probably enough, but more doesn't hurt.
When I get the prompt I cut/paste that last message printed by the script.
For this to work you at least need to set serverip in your u-boot environment
e.g.

setenv serverip 192.168.1.4
saveenv

You might also want to set, or at least check (printenv) bootfile, and ipaddr

@Adirelle
Copy link

@Adirelle Adirelle commented Jan 6, 2019

in the 4.15 code

1/ Should it not be in the 4.20 code, since it is not even checked in the 4.15 ?
2/ Does it only affect a check or should I expect some data corruption or other side effects with 4.15 ?

@Adirelle
Copy link

@Adirelle Adirelle commented Jan 6, 2019

It fixes the error on v4.20. I made read-only fsck of the underlying filesystem and it seems ok. Should I test with v4.15 too ?

BTW, is there a way to build the initramfs and u-boot image in the path indicated by the "O" variable instead of the source tree ?

@neilbrown
Copy link
Contributor

@neilbrown neilbrown commented Jan 6, 2019

1/ Yes, 4.20. Sorry.
2/ No data corruption. You might get an error if you try to reshape an array or add a journal.

Thanks for testing and reporting - I'll send the patch upstream.
I doubt you would notice any difference if you made the same change in 4.15.

If you set GNUBEE_INITRAMFS_TREE in gnubee-tools/config to some other directory, the initramfs should be built there. The u-boot image is always built in the O directory ($GNUBEE_KERNEL_OBJECTS) and then copied to $GNUBEE_BUILD_DIR.
Everything should be configurable in the 'config' file.

@Adirelle
Copy link

@Adirelle Adirelle commented Jan 6, 2019

Thank you.

Here are more error messages (which does not seem to prevent anything from working, though):

[   11.599861] mt7621_gpio 1e000600.gpio: registering 32 gpios
[   11.615781] gpio gpiochip1: (1e000600.gpio-bank1): detected irqchip that is shared with multiple gpiochips: please fix the driver.
[   11.639095] mt7621_gpio 1e000600.gpio: registering 32 gpios
[   11.652240] gpio gpiochip2: (1e000600.gpio-bank2): detected irqchip that is shared with multiple gpiochips: please fix the driver.
[   11.675488] mt7621_gpio 1e000600.gpio: registering 32 gpios
[   11.775701] cacheinfo: Failed to find cpu0 device node
[   11.786392] cacheinfo: Unable to detect cache hierarchy for CPU 0
[   12.324852] ------------[ cut here ]------------
[   12.334100] WARNING: CPU: 2 PID: 1 at /home/user/gnubee/linux/drivers/mtd/spi-nor/spi-nor.c:3659 spi_nor_init+0x134/0x1d8
[   12.369010] enabling reset hack; may not recover from unexpected reboots
[   12.382345] Modules linked in:
[   12.388419] CPU: 2 PID: 1 Comm: swapper/0 Not tainted 4.20.0+ #11
[   12.400529] Stack : 00000123 00000000 00000000 80068b8c 00000001 811e3814 80640000 0000000b
[   12.417147]         807054b8 9bc81944 00000000 811e0000 80720000 00000001 9bc818d8 53260ce1
[   12.433766]         00000000 00000000 81220000 00000007 00000000 6465746e 00000123 00000000
[   12.450382]         00000000 00000122 81210000 79616d20 80720000 00000000 80740000 806db044
[   12.450402]         00000009 00000e4b 9bc81b88 00000000 00000000
[   12.450436]  802fca50 00000008 811e0008
[   12.450442]         ...
[   12.450448] Call Trace:
[   12.450472] [<8000c244>] show_stack+0x8c/0x130
[   12.450500] [<805c63a4>] dump_stack+0x94/0xd0
[   12.450514] [<80027d00>] __warn+0x10c/0x114
[   12.450524] [<80027d48>] warn_slowpath_fmt+0x40/0x64
[   12.450535] [<80375bf8>] spi_nor_init+0x134/0x1d8
[   12.450546] [<80378b88>] spi_nor_scan+0x8f8/0xa60
[   12.450562] [<80366d60>] m25p_probe+0x178/0x218
[   12.450573] [<8030b768>] really_probe+0x2cc/0x430
[   12.450595] [<803094a4>] bus_for_each_drv+0xac/0xcc
[   12.450604] [<8030b9ec>] __device_attach+0xbc/0x130
[   12.450616] [<8030a214>] bus_probe_device+0x3c/0xb0
[   12.450626] [<80307ef0>] device_add+0x494/0x5b0
[   12.450648] [<80390f08>] spi_add_device+0x148/0x1b0
[   12.450659] [<803919e8>] spi_register_controller+0x7a4/0x940
[   12.450679] [<8030d708>] platform_drv_probe+0x40/0x7c
[   12.450688] [<8030b768>] really_probe+0x2cc/0x430
[   12.450697] [<8030be98>] __driver_attach+0xb4/0x138
[   12.450707] [<803093a0>] bus_for_each_dev+0x6c/0xb0
[   12.450719] [<8030a5d8>] bus_add_driver+0x204/0x24c
[   12.450728] [<8030c7d8>] driver_register+0xd0/0x118
[   12.450738] [<80001638>] do_one_initcall+0x84/0x19c
[   12.450758] [<80753f2c>] kernel_init_freeable+0x248/0x250
[   12.450775] [<805e1fdc>] kernel_init+0x14/0x110
[   12.450784] [<80006838>] ret_from_kernel_thread+0x14/0x1c
[   12.450832] ---[ end trace b800848cea8dadd4 ]---

By the way, it seems the SATA drives are configured for UDMA/133 despite a link at 6.0 Gpbs:

[   13.045238] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[   13.141553] ata2.00: ATA-10: ST1000LX015-1U7172, SDM1, max UDMA/133
[   13.154447] ata2.00: 1953525168 sectors, multi 16: LBA48 NCQ (depth 32), AA
[   13.262441] ata2.00: configured for UDMA/133

The same goes for ata1. The other slots are unused.

For reference, I used this config file. I have tried to disable stuff my GB1 do not need, to include always-used drivers and to enable as modules what I may need (like USB mass storage). I hope I have not disabled essential things.

@neilbrown
Copy link
Contributor

@neilbrown neilbrown commented Jan 6, 2019

The warning in spi-nor is annoying but harmless.
Some background can be read here: https://patchwork.ozlabs.org/patch/950299/
The hardware is not "broken" - but I didn't think it was worth fighting too hard.

I don't know much about ATA speeds so I cannot comment on that issue. I doubt it is related to your config-file choices.

@Adirelle
Copy link

@Adirelle Adirelle commented Jan 7, 2019

What about detected irqchip that is shared with multiple gpiochips: please fix the driver. ? Not that I have some use for the GPIO but could it cause a bug ? IIRC, there is only one GPIO available for user on the GB.

I don't know much about ATA speeds so I cannot comment on that issue.

Well, no HD nor SDD reaches 6 Gb/s (aka ATA-600, 600 to be compared to the 133 of UDMA-133) but I would like to be sure the disk rate was not limited by the link. However, it seems the ST1000LX015-1U7172 hardly reaches 93 MB/s so that should not be an issue.

I doubt it is related to your config-file choices.

Ok. While reviewing the drivers to enable, I was wondering about the hardware that is found or not on the GB-PC1. E.g., what is SPI used for and does the GB-PC1 use it ? Is it actually needed ? Same questions for I2C, ...

@neheb

Fun little project: https://github.com/vschagen/mtk-eip93

I am not familiar with hardware crypto engines. Could userland software (e.g. openss[lh] and the like) use the hardware crypto ?

@neilbrown
Copy link
Contributor

@neilbrown neilbrown commented Jan 7, 2019

SPI is used to access the Flash storage.
GPIO is used to drive the LEDs and to sense the push-button.
I don't think I2C is used.
Thanks for mentioning the irqchip thing - I'll look into it.
I believe most crypto libraries will use kernel-supported hardware when available, I'm fairly sure that includes openssl and I suspect that includes openssh.

@neheb
Copy link
Contributor

@neheb neheb commented Jan 7, 2019

The hardware only supports AES in CBC mode. SSH does not do CBC. dm-crypt defaults to XTS but can be configured to use CBC.

@Adirelle
Copy link

@Adirelle Adirelle commented Jan 12, 2019

It seems the network bug (ping is working but not TCP nor UDP) is back. I have not identified what causes it and I have no suspicious message in dmesg or logs. It happens with both the 4.15.18 & 4.20 kernels.
Trying to ifdown+ifup eth0 does not work but a reboot does.

@cmm
Copy link

@cmm cmm commented Jan 13, 2019

@Adirelle do you use NFS (or Samba with sendfile() on)?

@Adirelle
Copy link

@Adirelle Adirelle commented Jan 13, 2019

@cmm the kernel NFS server, yes. I do not know if sendfile is enabled. I removed samba a few week ago.

@cmm
Copy link

@cmm cmm commented Jan 13, 2019

@Adirelle I started getting the network bug a lot once my Gnubee started serving mostly 1080p media (instead of mostly 720p, where I was having those lockups maybe once a month); the problem disappeared completely once I stopped using NFS and moved to Samba-sans-sendfile().

it is my not-entirely-informed guess that the VFS/network interplay in the kernel is screwy, probably due to how the switch code does locking. in fact, any in-kernel code that serves data streams through the switch is probably dangerous (I don't know if there is any apart from sendfile() & NFS, though). the actual problem here is that most SoC vendors just don't test such configurations -- most small NAS systems on the market don't do NFS at all, and the SoC in Gnubee is made for routers...

@Adirelle
Copy link

@Adirelle Adirelle commented Mar 10, 2019

@neilbrown the initramfs init script made me a joke these days: as I was recompiling for the same kernel version, it refused to mount /lib/modules from the initramfs and used the /lib/modules instead, which was an issue since I changed the module settings. Is there something in the modules folder that could be used to known if the kernel builds are different ?

Edit: I was thinking about adding a file containing an hash of the .config file and using it to tell.

@neilbrown
Copy link
Contributor

@neilbrown neilbrown commented Mar 10, 2019

Is there something in the modules folder that could be used to known if the kernel builds are different ?

No. For my gnubee-tools package, I create a 'stamp' file with the date when the modules were copied in, and compare that.
You could set CONFIG_LOCALVERSION_AUTO=y. This adds part of the git hash of the top commit to the version so there is no risk of using old libraries with a newer kernel.
I've updated the gnubee1_defconfig in v4.15 to have this changed.

@neilbrown
Copy link
Contributor

@neilbrown neilbrown commented Apr 1, 2019

The mediatek network driver in mainline now supports the MT7621, so we don't need the drivers/staging driver.
It works with DSA support for the integrated switch. I now have it working on both network interfaces on the PC1 and all three on the PC2.
So I'm now using mainline (5.1-rc2) on my main gnubee and will be building firmware with mainline kernels from time to time. See the announcement in the google group.
I now consider the gnubee to be "fully supported" in mainline. Though there is still a bit of work to do, it is mostly cleaning up the code and getting it moved out of drivers/staging (spi is almost there, and mmc probably isn't far away).

@neheb
Copy link
Contributor

@neheb neheb commented Apr 1, 2019

The MMC driver situation is the same as the Ethernet. The mainline mtd-sd can work with it with some modifications.

@smurfix
Copy link

@smurfix smurfix commented Apr 1, 2019

Nice. Thanks for the work. Does mainline have a usable kconfig file? if not (or not yet …), where can I find one?

@vgiralt
Copy link

@vgiralt vgiralt commented Apr 1, 2019

@neilbrown

I now consider the gnubee to be "fully supported" in mainline. Though there is still a bit of work to do, it is mostly cleaning up the code and getting it moved out of drivers/staging (spi is almost there, and mmc probably isn't far away).

This definitely deserves a congratulations! and Thank you!

@neilbrown
Copy link
Contributor

@neilbrown neilbrown commented Apr 1, 2019

Look in https://github.com/neilbrown/linux.git branch gnubee/v5.1
This is v5.1-rc2 plus some staging patches plus a few little things from me. A couple of changes to the DTS files are needed and there are defconfig files in there. There are also (almost) identical config files in github.com/neilbrown/gnubee-tools.git

@neilbrown
Copy link
Contributor

@neilbrown neilbrown commented Apr 2, 2019

The MMC driver situation is the same as the Ethernet. The mainline mtd-sd can work with it with some modifications.

Now might be the time to put that hypothesis to the test - GregKH has just removed the mt7621-mmc driver due to licensing concerns. If you have any specific information (patches??), could you share it please.

https://lkml.org/lkml/2019/4/2/311

@neheb
Copy link
Contributor

@neheb neheb commented Apr 2, 2019

Sure. Here are some.

jonpry/openwrt_mt7688@a85e6d9
jonpry/openwrt_mt7688@2487846

A few notes: The MMC driver there is basically the 4.9 mtk-sd one with all the patches from maybe 4.17 or 4.18 backported.

edit: Note that I haven't exactly gotten it working. Probably needs extra DTS entries.

@neheb
Copy link
Contributor

@neheb neheb commented Apr 2, 2019

Here's the diff that I was able to generate:

https://gist.github.com/neheb/3d9e4cbf966f8487114df19b49f28214

@neilbrown
Copy link
Contributor

@neilbrown neilbrown commented Apr 7, 2019

Here's the diff that I was able to generate:

https://gist.github.com/neheb/3d9e4cbf966f8487114df19b49f28214

very helpful thanks. I just booted my PC2 of the SD card using the mainline driver - with these changes and some others. There is still polishing to do but the hard work is done.

Thanks!

@smurfix
Copy link

@smurfix smurfix commented Jul 7, 2019

Is the the "boring polishing" stuff pushed to mainline by now? if not, what's left to be done?

@neilbrown
Copy link
Contributor

@neilbrown neilbrown commented Jul 7, 2019

The changes needed to drivers/mmc/host/mtd-sd.c landed in 5.2-rc1.
The changes needed to the mt7621.dtsi device tree file are in staging-next and should land in 5.3-rc1.

PCI is the main outstanding driver that needs work. The clean-up work in staging has introduced a bug (occasional hang on boot) that no-one has found yet.

@Qwertie-
Copy link

@Qwertie- Qwertie- commented Aug 4, 2019

Will mainline support mean I can download the ARM debian or fedora from their website and it will just work like it does for a desktop/laptop?

@neheb
Copy link
Contributor

@neheb neheb commented Aug 4, 2019

The GnuBee uses MIPS not ARM.

@Qwertie-
Copy link

@Qwertie- Qwertie- commented Aug 5, 2019

So debian mips would work?

@vgiralt
Copy link

@vgiralt vgiralt commented Aug 5, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet