Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Major cleanup - c_whatever is finally history. #2838

Merged
merged 3 commits into from Jul 21, 2019

Conversation

jmattsson
Copy link
Member

This was a royal pain to clean up, but I believe it's now all done.
If Espressif had deigned to ship the corresponding libc headers this
wouldn't have been needed in the first place, but oh well.

Essentially, the libc.a provided by the SDK uses non-standard memory
allocation, so anything that would use the regular malloc()/free()
ends up looking for things that do not exist. Why they didn't provide
wrappers for the whole lot is beyond me. Even worse is the libc.a they
ship includes functions which rely on the missing alloc routines! So yeah,
missing headers and missing functions. Good job, Espressif. I can see why
you don't want to maintain the non-OS SDK...

Anyway, app/libc/ has now been cut down to the bare minimum overrides to
shadow the corresponding (broken) functions in the SDK's libc. The old c_xyz.h
header files have been nuked in favour of the standard <xyz.h> headers,
with a few exceptions over in sdk-overrides. Again, shipping a libc.a
without headers is a terrible thing to do. We're still living on a
prayer that libc was configured the same was as a default-configured
xtensa gcc toolchain assumes it is. That part I cannot do anything about,
unfortunately, but it's no worse than it has been before.

The linker ordering in app/Makefile has been updated to ensure we
link in our overrides first, without needing to rely on the crutch
of prefixing everything with c_ (yuck!).

From here on out it's now possible (and needed) to include the standard
header files, and use the typical malloc()/calloc()/realloc()/free(),
the strwhatever()s and memwhatever()s. These end up, through macro
and linker magic, mapped to the appropriate SDK or ROM functions.

I'm still impressed the NodeMCU founders managed to get things going,
considering the state of the SDK...


Terry, this one is for you, to help bridge the gap between the 8266 and the 32 branches. It should cut down the number of edits required between the two branches by a lot. I've promised you this long enough that I finally felt bad enough to sit down and slog through it tonight. I don't know if it makes more sense to merge this to dev, or for you to merge into the next round of big changes in your repo and get it in that way. Have a chat with @marcelstoer about what makes sense. And of course feel free to tweak if needed.

I have built all the modules successfully, but I obviously have not been able to test them all. Most of the changes are plain replacements that should be hard to have screwed up.

  • This PR is for the dev branch rather than for master.
  • This PR is compliant with the other contributing guidelines as well (if not, please describe why).
  • I have thoroughly tested my contribution.
  • The code changes are reflected in the documentation at docs/*.

This was a royal pain to clean up, but I believe it's now all done.
If Espressif had deigned to ship the corresponding libc headers this
wouldn't have been needed in the first place, but oh well.

Essentially, the libc.a provided by the SDK uses non-standard memory
allocation, so anything that would use the regular malloc()/free()
ends up looking for things that do not exist. Why they didn't provide
wrappers for the whole lot is beyond me. Even worse is the libc.a they
ship includes functions which rely on the missing alloc routines!

Anyway, app/libc has now been cut down to the bare minimum overrides to
shadow the corresponding functions in the SDK's libc. The old c_xyz.h
headerfiles have been nuked in favour of the standard <xyz.h> headers,
with a few exceptions over in sdk-overrides. Again, shipping a libc.a
without headers is a terrible thing to do. We're still living on a
prayer that libc was configured the same was as a default-configured
xtensa gcc toolchain assumes it is. That part I cannot do anything about,
unfortunately, but it's no worse than it has been before.

The linker ordering in app/Makefile has been updated to ensure we
link in our overrides first, without needing to rely on the crutch
of prefixing everything with c_ (yuck!).

From here on out it's now possible (and needed) to include the standard
header files, and use the typical malloc()/calloc()/realloc()/free(),
the strwhatever()s and memwhatever()s. These end up, through macro
and linker magic, mapped to the appropriate SDK or ROM functions.

I'm still impressed the NodeMCU founders managed to get things going,
considering the state of the SDK...
app/libc/stdio.c Show resolved Hide resolved
@TerryE
Copy link
Collaborator

TerryE commented Jul 19, 2019

Thanks Johny, This was on my TODO roadmap pretty much next anyway. I'll go through this lot and post review comments separately.

One challenge that we have is that the git squash and merge process doesn't deal with parallel changes to the same files. We also need to think of the impact of the periodic dev -> master release. I was thinking of raising this as a team discussion, but having it here works as well. I hit this quite a lot when I am working on my cleanup PRs -- they invariably need some fixing, so what I now do is always to fork my branch from the current dev and then just before pushing, I delete and re-fetch nodemcu/dev and then rebaseline against this before doing the PR.

It's going to be quite a journey to merge the two codebases and we will need to take some big steps. To be honest, I really think that we can't afford to dance around other minor contributions if we want to do this within the next six months. For example, we currently have 12 dev and 12 esp32 PRs outstanding on top of this and #2836. (We can ignore the esp32 ones for this dev branch discussion.)

IMO, we need to agree which of the 12 other outstanding dev PRs get merged and labelled as next release; we then need to merge these and do the release, and then have a moratorium (2-4 weeks) on non-cleanup PRs whilst we review (, rebaseline if needed) and merge in:

Of course any contributors can continue to prepare and submit their PRs in parallel so long as they realise that they will need to rebaseline their work against the restructured directories before any final merge.

The reason that I've added the other committers into this discussion as reviewers is that I will implement a timeout-on-response policy to maintain some form of minimum momentum. So all, please engage and speak up or accept that we are using our best endeavours to stabilise the project's future and that we can fix up any minor issues in consequence downstream.

// the mbedtls platform.c from their SDK. Really, this should go
// somewhere else. Note that the prototype here for vPortFree differs (!)
// from the one in sdk-overrides.h. That's above my pay grade.
// --nwf; 2018 Feb 18
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, past-me would be so happy to see this disappearing. :)

@@ -0,0 +1,3 @@
#include <stdlib.h>
void *mbedtls_calloc_wrap(size_t n, size_t sz) { return calloc(n, sz); }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anything preventing us from using calloc and free directly?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the mbedtls lib expects those two to be actual functions with standard prototypes. On the non-OS SDK they are macros which expand to vPortXyz() things, which makes it impossible for the mbedtls lib to do stuff like void (*my_free)(void *) = MBEDTLS_WHATEVER_MACRO_FREE_IS;. Hence the need for wrapper.

Copy link
Collaborator

@TerryE TerryE Jul 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both RTOS and the SDK use the non-standard pvPortXXXX allocators; these are separate from the ROM-based mem_XXXX allocators and include extra parameters to enable some crude allocator diagnostics.

I think that I've discussed this elsewhere, but if you look as the disassembly of these routines, they are largely wrappers around the ROM-based mem_XXXX allocators for most execution paths.

In this case we would want to minimise unnecessary source changes to something like mdedtls and the compiler at -O2 will inline this into a direct calloc call anyway.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jmattsson: I see. If there are other places in the tree where we need "actual malloc and free functions", maybe move this to platform.c? But keeping them where they are is fine.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered it, but I didn't want to make everyone else suffer an extra call level just because of mbedtls. Sure, it'd be a marginal expense compared to what the functions do, but when in 99% of the cases it can all be handled with a macro, I went with penalising mbedtls only. But yes, we could put actual malloc()/free() functions into our app/libc/ if so wanted. We could also experiment and see how the SDK's libc would handle it if we provided the missing _malloc_r()/_free_r() functions (obviously without the reentrancy support) as that might allow us to not have to shadow functions it already has, but that's a whole 'nother barrel o' fish.

@jmattsson
Copy link
Member Author

Okay, I have no idea why github decided to request reviews from everyone - I only tagged Terry. And I also have no idea why Travis is unhappy with the linux build, seeing as my branch was cleared by Travis before I PR'd. It's past 2am here, fixing this will have to wait.

@@ -420,14 +419,14 @@ void vfs_clearerr( const char *name )

#ifdef BUILD_SPIFFS
if (fs_fns = myspiffs_realm( normname, &outname, FALSE )) {
fs_fns->clearerr( );
//fs_fns->clearerr( );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it might be a functional change, in contrast to much of the rest of this PR?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yikes! That one snuck in by mistake! Thanks for picking it up! Can you also enlighten me why the compiler throws a

vfs.c: In function 'vfs_clearerr':
vfs.c:422:13: error: expected identifier before '(' token
     fs_fns->clearerr( );
             ^
make[2]: *** [../../Makefile:416: .output/eagle/debug/obj/vfs.o] Error 1

at me when it's not commented out?
Changing it to:

   void (*wtf)(void) = fs_fns->clearerr;
    wtf();

does not cause an error.
I'm too tired to see what's up with this...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, it's gonna be a macro thing, isn't it?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Go to BED. I'll take a look.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Workaround pushed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, I'm gonna take Terry's suggestion and hit the sack now. A more elegant workaround would certainly not go astray.

@nwf
Copy link
Member

nwf commented Jul 19, 2019

Looks good to me, some minor nit questions notwithstanding. Thanks for doing what surely must have been tedious work.

@TerryE
Copy link
Collaborator

TerryE commented Jul 19, 2019

@jmattsson, I've gone through the diff at a scan level and it looks pretty similar to anything similar that I would have done. I think that we need some sort of validation invariant as a check. What I mean here is that these changes should only be simplifying the sources and not changing the generated binaries, so perhaps we need to diff the app/mapfile versions for a Travis-style builds before and after and make sure that we are only seeing explainable differences. Ideally, the two mapfiles should be identical. We might need to drill down if any object sizes are different to understand why and what is causing this.

On balance, I think that we should merge in this #2838 before #2836, which means that I am going to have to rebasline #2836 against dev post this merge -- grrrhhhh. Given the amount of work that has gone into this and that any other merges will mean that we need to rebaseline this, I am strongly tempted (if the valiation is OK) just to JFDI and merge this one in next, but leave #2836 which actually modifies NodeMCU behaviour until after the next master drop.

@marcelstoer, where are you old chap?

@marcelstoer
Copy link
Member

@marcelstoer, where are you old chap?

Just watching in awe from the sideline while the big boys are playing 😄

IMO, we need to agree which of the 12 other outstanding dev PRs get merged and labelled as next release

👍 but we should be doing it in a team discussion as doing it here would take focus away from the actual changes

@nwf
Copy link
Member

nwf commented Jul 19, 2019

I think I agree with @TerryE that this PR should land. It's going to conflict with a lot of other work, but this one is a non-functional change that puts everything on better footing going forward.

@jmattsson
Copy link
Member Author

I've gone through the diff at a scan level and it looks pretty similar to anything similar that I would have done. I think that we need some sort of validation invariant as a check. What I mean here is that these changes should only be simplifying the sources and not changing the generated binaries, so perhaps we need to diff the app/mapfile versions for a Travis-style builds before and after and make sure that we are only seeing explainable differences.

That would indeed be a good idea, I like it.

Ideally, the two mapfiles should be identical. We might need to drill down if any object sizes are different to understand why and what is causing this.

There will be a couple of differences I expect. Some around the mbedtls malloc/allocs, and some around the [vs]printf()s and also strdup. Those are the ones I can think of off the top of my head. If you throw a readable mapfile diff at me, I'll try to explain/justify the changes! :)

On balance, I think that we should merge in this #2838 before #2836, which means that I am going to have to rebasline #2836 against dev post this merge -- grrrhhhh.

I'm perfectly happy for you to slurp this in on the side so it becomes part of one of your PRs Terry. Just because I raised the PR here doesn't mean it's the only way to handle it. And it may very well be easier to merge this on top of yours, since any conflicts at that point should just be the slight tedium of stripping some more c_ prefixes. The value (or time cost/investment) in this PR is really around working out the what and how of shadowing functions in the SDK's libc. Then again, treating it as a separate non-functional change also has merit.

Given the amount of work that has gone into this and that any other merges will mean that we need to rebaseline this, I am strongly tempted (if the valiation is OK) just to JFDI and merge this one in next, but leave #2836 which actually modifies NodeMCU behaviour until after the next master drop.

I'm on board with whichever way you guys want to go here. It's no question that it's quite the disrupting sort of change.

@nwf Thanks, yeah, this took longer that I had expected (like everything else I've done the last few weeks, grr). A few of the hours were spent trying to build with -nostdinc though, as that was my initially preferred approach, as that would've severed our dependencies on the toolchain's headers. I was making okay progress with it until I hit the esp-gdbstub component, at which point it became obvious that the hardware headers were also covered by -nostdinc, and there was no clean way of -isystem $(shell $(CC) -print-sysroot)ing those in without also pulling in the toolchain C library headers. So, I had to restart and refocus on getting us as compatible as possible with the newlib C headers, despite the crippled SDK libc. Having done the work now, I am a lot more sympathetic to the original introduction of the c_prefixes - it did work around a lot of pain for them (even though it's caused ongoing aches).

@marcelstoer Don't sell yourself short - all the work you're doing with our CI and the cloud builder is hugely valuable too!

@TerryE TerryE mentioned this pull request Jul 20, 2019
4 tasks
@TerryE
Copy link
Collaborator

TerryE commented Jul 20, 2019

Johny, as I said in #2836, this is a multi-step journey. We shouldn't get too hung up on whether any one step is perfect; it just needs to be good and achieve material progress towards the end goal. We will be doing more steps later. In this case what we want for the modules is for them to compile to both targets successfully using the same source, so that we can unify the codebase downstream.

IMO, we should just take the hit and merge this sooner than later. I'll do that mapfile comparison later today and some quick sanity checks with the built image. If all looks OK, then I'll do the merge.

@@ -4,8 +4,4 @@
#include "c_types.h"
#include "mem.h"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent work! When I was testing build with newer xtensa toolchain those were a real pain. Just few things:
Maybe it would be a good idea to just remove this header file and it's occurrence in Makefile?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should certainly be possible to adjust the sjson module to not depend on the SDK specific things; at the time I went with minimum possible change just to get through the bulk of it.

@@ -31,7 +31,7 @@
#define SQLITE_OMIT_PROGRESS_CALLBACK 1

#include <c_stdlib.h>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just looked and there is still that one c_ header - can it be swapped to stdlib.h? AFAIK SQLite is turned off completely so it shouldn't matter anyway - just a nit pick.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

D'oh, I must've missed that one.

@TerryE
Copy link
Collaborator

TerryE commented Jul 20, 2019

OK, the good news is that this branch compiles fine. The bad news is that the two mapfiles are very hard to compare: there have been subtle changes in some of the include search lists and for some reason make does not compile wildcard lists within a directory in a collated order so editing files can change their compile order and this then effects their link order and hence their order in the binary -- bizarre.

Even so doing an objdump gives the following segment sizes:

Segment dev no-c_
.text 0000903b 0000903f
.data 00000bf8 00000be8
.rodata 00000030 00000030
.bss 00007d94 00007984

So there are small differences. I will drill down and see what they are.

@TerryE
Copy link
Collaborator

TerryE commented Jul 21, 2019

This does my head in. The differences are typically the odd 4 bytes. When I check the linker output the "size before relaxing" are the same, but the actual sizes are different. The relaxation process occurs when the linker makes cross-module storage optimisations -- for example two adjacent modules share the same constants or strings and they are PC-relative accessible to the second, then the copy constant can be dumped. On a few random samples, doing an elf-objdump -d on both gives the same O/P so the .o are essentially the same. So relaxation depends on the module order, and as I said: the module order differs on the two makes. If anyone is interested here is the Q&D perl script that I used to mine the mapfiles for the relative lines to sort and diff. As I said Q&D:

use strict;
my $last = '';
while (<>) {
  $_ =  $last . $_ if ($last && / {16}0x[0-9af]{16} /);
  $last = '';
  $last = $1 if /^( \S+)$/;
    my ($sect, $start, $len, $file) = / (\S+)\s+(\S+)\s+(\S+)\s+(\S+)/;
    next unless $start =~ /^0x[0-9a-f]{16}$/;
    next if $start eq '0x0000000000000000';
    $start = '0xXXXXXXXX';
    next if $sect =~ /\.(xtensa|comment|debug)/;
    $file =~ s!^.*?/nodemcu-firmware/sdk/esp_iot_sdk_v3.0-e4434aa/lib!SDKLIB!;
    $file =~ s!^.*?/xtensa-lx106-elf/lib!XTENSALIB!;
    next if $sect eq '*fill*'; 
    print "$sect $start $len $file\n";
}

Anyway, I propose to merge this tomorrow morning (GMT) unless any of the committers raises an objection.

@TerryE
Copy link
Collaborator

TerryE commented Jul 21, 2019

Anyway, I propose to merge this tomorrow morning (GMT) unless any of the committers raises an objection.

I am delaying this another day to give the committers time to complete their team discussion.

Copy link
Collaborator

@TerryE TerryE left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already given feedback inline

@devsaurus
Copy link
Member

While browsing through the stale issues, I did some experiments with snprintf for #1501 and gave it a try on top of this PR.
Exchanged a random instance in wifi.c like sprintf(temp,MACSTR, ... with snprintf(temp sizeof(temp),MACSTR, ... and got

/home/arnim/nodemcu-firmware_devsaurus/sdk/esp_iot_sdk_v3.0-e4434aa/lib/libc.a(lib_a-svfprintf.o):(.literal+0x10): undefined reference to `_malloc_r'
/home/arnim/nodemcu-firmware_devsaurus/sdk/esp_iot_sdk_v3.0-e4434aa/lib/libc.a(lib_a-svfprintf.o): In function `_svfprintf_r':
/home/wjg/Repo/esp-open-sdk-20170622/crosstool-NG/.build/src/newlib-2.0.0/newlib/libc/stdio/vfprintf.c:733: undefined reference to `_malloc_r'
/home/arnim/nodemcu-firmware_devsaurus/sdk/esp_iot_sdk_v3.0-e4434aa/lib/libc.a(lib_a-svfiprintf.o):(.literal+0x0): undefined reference to `_realloc_r'
/home/arnim/nodemcu-firmware_devsaurus/sdk/esp_iot_sdk_v3.0-e4434aa/lib/libc.a(lib_a-svfiprintf.o):(.literal+0x4): undefined reference to `_free_r'
/home/arnim/nodemcu-firmware_devsaurus/sdk/esp_iot_sdk_v3.0-e4434aa/lib/libc.a(lib_a-svfiprintf.o): In function `__ssprint_r':
/home/wjg/Repo/esp-open-sdk-20170622/crosstool-NG/.build/src/newlib-2.0.0/newlib/libc/stdio/vfprintf.c:234: undefined reference to `_malloc_r'
/home/wjg/Repo/esp-open-sdk-20170622/crosstool-NG/.build/src/newlib-2.0.0/newlib/libc/stdio/vfprintf.c:245: undefined reference to `_realloc_r'
/home/wjg/Repo/esp-open-sdk-20170622/crosstool-NG/.build/src/newlib-2.0.0/newlib/libc/stdio/vfprintf.c:249: undefined reference to `_free_r'
/home/arnim/nodemcu-firmware_devsaurus/sdk/esp_iot_sdk_v3.0-e4434aa/lib/libc.a(lib_a-svfiprintf.o): In function `_svfiprintf_r':
/home/wjg/Repo/esp-open-sdk-20170622/crosstool-NG/.build/src/newlib-2.0.0/newlib/libc/stdio/vfprintf.c:733: undefined reference to `_malloc_r'
collect2: error: ld returned 1 exit status

This is probably out of scope for this PR since snprintf wasn't supported up to now. But is SDK's newlib broken or is this PR missing a tweak or two that we'll need in the future?

Copy link
Member

@devsaurus devsaurus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good and feels right :)

Some changes will conflict with #2836 (e.g. vfs.h).

@@ -4,7 +4,7 @@
#include "platform.h"
#include "user_interface.h"
#include "c_types.h"
#include "c_string.h"
#include <string.h>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this is a redundant include anyway. There are more instances of this type around.
I assume that the pattern based change doesn't detect that aspect - should we use this PR to clean up?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been trying to find a smart way of weeding out redundant includes, but the best that I've come up with (including are searching SO etc.) is a tight interactive try deleting one and re-make. If the compile is clean it stays deleted; if it errors then restore it.

As I mention elsewhere we don't do ourselves any favours because the platform abstraction layer doesn't do it's job, IMO. Your modules still end up including the SDK headers which rather defeats the purpose of abstraction. Still one step at a time in the right direction is a good thing, I think.

@TerryE TerryE merged commit 526d21d into nodemcu:dev Jul 21, 2019
@jmattsson
Copy link
Member Author

@devsaurus

While browsing through the stale issues, I did some experiments with snprintf for #1501 and gave it a try on top of this PR.
Exchanged a random instance in wifi.c like sprintf(temp,MACSTR, ... with snprintf(temp sizeof(temp),MACSTR, ... and got

/home/arnim/nodemcu-firmware_devsaurus/sdk/esp_iot_sdk_v3.0-e4434aa/lib/libc.a(lib_a-svfprintf.o):(.literal+0x10): undefined reference to `_malloc_r'
/home/arnim/nodemcu-firmware_devsaurus/sdk/esp_iot_sdk_v3.0-e4434aa/lib/libc.a(lib_a-svfprintf.o): In function `_svfprintf_r':
/home/wjg/Repo/esp-open-sdk-20170622/crosstool-NG/.build/src/newlib-2.0.0/newlib/libc/stdio/vfprintf.c:733: undefined reference to `_malloc_r'
/home/arnim/nodemcu-firmware_devsaurus/sdk/esp_iot_sdk_v3.0-e4434aa/lib/libc.a(lib_a-svfiprintf.o):(.literal+0x0): undefined reference to `_realloc_r'
/home/arnim/nodemcu-firmware_devsaurus/sdk/esp_iot_sdk_v3.0-e4434aa/lib/libc.a(lib_a-svfiprintf.o):(.literal+0x4): undefined reference to `_free_r'
/home/arnim/nodemcu-firmware_devsaurus/sdk/esp_iot_sdk_v3.0-e4434aa/lib/libc.a(lib_a-svfiprintf.o): In function `__ssprint_r':
/home/wjg/Repo/esp-open-sdk-20170622/crosstool-NG/.build/src/newlib-2.0.0/newlib/libc/stdio/vfprintf.c:234: undefined reference to `_malloc_r'
/home/wjg/Repo/esp-open-sdk-20170622/crosstool-NG/.build/src/newlib-2.0.0/newlib/libc/stdio/vfprintf.c:245: undefined reference to `_realloc_r'
/home/wjg/Repo/esp-open-sdk-20170622/crosstool-NG/.build/src/newlib-2.0.0/newlib/libc/stdio/vfprintf.c:249: undefined reference to `_free_r'
/home/arnim/nodemcu-firmware_devsaurus/sdk/esp_iot_sdk_v3.0-e4434aa/lib/libc.a(lib_a-svfiprintf.o): In function `_svfiprintf_r':
/home/wjg/Repo/esp-open-sdk-20170622/crosstool-NG/.build/src/newlib-2.0.0/newlib/libc/stdio/vfprintf.c:733: undefined reference to `_malloc_r'
collect2: error: ld returned 1 exit status

Welcome to the hell I had while getting this PR done.

This is probably out of scope for this PR since snprintf wasn't supported up to now. But is SDK's newlib broken or is this PR missing a tweak or two that we'll need in the future?

Yes, what you're seeing here is the broken libc in SDKs. I think what Espressif did was they built their newlib libc.a as one normally would, with its regular malloc()/free(). Then they simply ripped out the object files containing the allocator functions from the .a, and built the SDK parts against the PortAllocator functions instead using the same macros we have. However, the PortAllocator stuff does not provide the reentrant versions of malloc()/free()/etc, so any code in their libc.a which does memory allocation is broken, since that apparently uses the reentrant versions.

Therefore, we have to "shadow" any and all libc functions that are thus broken with our own copies so the linker doesn't find the broken version in the SDK's libc. And we're currently then missing snprintf() at least.

As I mentioned somewhere above, it would be really interesting to try with writing wrappers for _malloc_r, _free_r(), etc which just chain through to the non-reentrant PortAllocator functions instead (or even directly to the ROM'd allocator), and see whether the result is stable. If so, we should be able to ditch our entire app/libc directory.

@galjonsfigur
Copy link
Member

I just had a chance to test it and I missed at least one thing - new I2C implementation that uses c_malloc instead of malloc here breaks compilation if
#define I2C_MASTER_OLD_VERSION is commented out.

@nwf
Copy link
Member

nwf commented Jul 23, 2019

@jmattsson: Any chance we could make our lives a little easier -- at least in terms of the error reports we get -- by ripping out anything that won't link from libc.a?

@TerryE
Copy link
Collaborator

TerryE commented Jul 23, 2019

I just had a chance to test it and I missed at least one thing - new I2C implementation

Actually two, but thanks v much @galjonsfigur. See #2845 fixes this

vsky279 pushed a commit to vsky279/nodemcu-firmware that referenced this pull request Jul 24, 2019
The PR removed the bulk of non-newlib headers from the NodeMCU source base.  
app/libc has now been cut down to the bare minimum overrides to shadow the 
corresponding functions in the SDK's libc. The old c_xyz.h headerfiles have been 
nuked in favour of the standard <xyz.h> headers, with a few exceptions over in 
sdk-overrides. Again, shipping a libc.a without headers is a terrible thing to do. We're 
still living on a prayer that libc was configured the same was as a default-configured
xtensa gcc toolchain assumes it is. That part I cannot do anything about, unfortunately, 
but it's no worse than it has been before.

This enables our source files to compile successfully using the standard header files, 
and use the typical malloc()/calloc()/realloc()/free(), the strwhatever()s and 
memwhatever()s. These end up, through macro and linker magic, mapped to the 
appropriate SDK or ROM functions.
@marcelstoer marcelstoer mentioned this pull request Jul 25, 2019
4 tasks
@devsaurus devsaurus mentioned this pull request Jul 25, 2019
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants