Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[libc] Pass 2: Add system calls to OpenWatcom C library for ELKS #1912

Merged
merged 14 commits into from
Jun 4, 2024

Conversation

ghaerr
Copy link
Owner

@ghaerr ghaerr commented Jun 3, 2024

Fixes #include <stddef.h> problem discussed in #1911.

Removes C library dependencies on external <stddef.h> , <stdint.h> and `<sys/types.h> headers, which ended up dragging in unwanted stuff from OpenWatcom C.

Locally defines size_t, ssize_t, int64_t, intptr_t and uintptr_t instead of using <stddef.h> or <stdint.h> for small, medium, compact and large models for ia16-elf-gcc and OpenWatcom.

Adds first round of ELKS system calls to ELKS OpenWatcom C library: open, close, read, write, lseek, brk, sbrk and ioctl which allow stdio functions to work.

Ported malloc/free functions using internal __near data model for large/compact model compilations.
Temporarily redeclared char **environ to be char __near * __far *environ to allow direct access to environment array in data segment without rewriting it. mains use of char **argv must also be declared similarly for the time being to allow access to argv array without rewriting it in large/compact models.

Added make image and make images upper level Makefile targets to allow rebuilding single or all images without rebuilding C library or applications; useful for quickly adding files to image root directory for testing.

Fixed up intended error return when size > 32K bytes passed to malloc or setvbuf.

Add fix to possibly require application code segment fixups to os2toelks.c when converting relocation entries. Maybe not needed but seeing some problems when linking large programs like basic.

@toncho11
Copy link
Contributor

toncho11 commented Jun 3, 2024

So we are heading towards a complete ELKS distro based on Whatcom compiler, but this time with more features?

Build bigger programs? I remember the 64 kb limit was quite a restriction for porting even simple (in terms of functionally) programs such as editors.

@ghaerr
Copy link
Owner Author

ghaerr commented Jun 3, 2024

It's probably not worth trying to get all the existing ELKS applications recompiled and distributed with the Watcom compiler, as that would then require two major toolchain build and installs in order to build ELKS.

Also, the kernel is being built with ia16-elf-gcc and has many associated ASM source (both internal to .c files as well as external .S) files that would need translating from 8086 AT&T syntax to Intel syntax. You gotta love two major incompatible formats for the x86 instruction set!

Yes, the goal is to introduce the enhanced feature set of the Watcom compiler to ELKS application programmers. This would include large code and data models which should allow for much larger programs to be used on ELKS. The current work is just getting "regular"-sized (small programs compiled using large model) programs runnable on ELKS using Watcom. In order to support programs larger then 128K code or 64K data, the ELKS a.out executable format and kernel loader will have to be enhanced. I'm thinking about how that might be accomplished now.

Eventually, the Watcom C compiler itself, along with the C library and linker tools, could possibly run directly on ELKS, which would be a major accomplishment, as then ELKS could conceivably become self-supporting with its own compiler, C library and kernel.

@tyama501
Copy link
Contributor

tyama501 commented Jun 3, 2024

Hello @ghaerr ,

Thank you for the great works.
Just for confirmation but
In the OpenWatcom License Agreement,
it says the original codes of SYBASE is limite to internal research and development and/or Personal Use, but our understanding is we are just including it without modification so it is ok to be here in this repository with GPL right? (Sorry if my English is poor and not understanding correctly.)

2.1 You may use, reproduce, display, perform, modify and distribute Original
Code, with or without Modifications, solely for Your internal research and
development and/or Personal Use, provided that in each instance:

@tyama501
Copy link
Contributor

tyama501 commented Jun 3, 2024

At least we have already comment in our license comment,

Software included beside the ELKS kernel (i.e. elkscmd) may be under different licensing terms

(FreeDOS is also compiling with OpenWatcom so it would be ok just using it, but ...)

@Vutshi
Copy link

Vutshi commented Jun 3, 2024

Hi @ghaerr,
These are very exciting developments. ELKS is really leveling up, getting more and more like a modern Linux platform. We’ve even got our very own variant of the gcc/clang dichotomy :)

I am curious whether the new compiler comes with a performance profiling tool. Or did I miss it, and we already had one with ia16-gcc?

Best

@ghaerr
Copy link
Owner Author

ghaerr commented Jun 3, 2024

Hello @tyama501,

it says the original codes of SYBASE is limited to internal research and development and/or Personal Use

The OpenWatcom source license.txt agreement seems to show that "Personal Use" is excepted from the Deployment grant Section 1.4 of the license agreement:

1.4 "Deploy" means to use, sublicense or distribute Covered Code other than
for Your internal research and development (R&D) and/or Personal Use, and
includes without limitation, any and all internal use or distribution of
Covered Code within Your business or organization except for R&D use and/or
Personal Use, as well as direct or indirect sublicensing or distribution of
Covered Code by You to any third party in any form or manner.

This section basically says that Personal Deployment is excepted from the further requirements of any commercial (or non-Personal) use/deployment of the software.

Since ELKS users may or may not be using OpenWatcom and ELKS for Personal Use, I have followed the requirements for commercial deployment which is in Section 2.1:

2.1 You may use, reproduce, display, perform, modify and distribute Original
Code, with or without Modifications, solely for Your internal research and
development and/or Personal Use, provided that in each instance:

(a) You must retain and reproduce in all copies of Original Code the copyright
and other proprietary notices and disclaimers of Sybase as they appear in the
Original Code, and keep intact all notices in the Original Code that refer to
this License; and

(b) You must retain and reproduce a copy of this License with every copy of
Source Code of Covered Code and documentation You distribute, and You may not
offer or impose any terms on such Source Code that alter or restrict this
License or the recipients' rights hereunder, except as permitted under
Section 6.
...

This seems to be saying that commercial use of software built from OpenWatcom source distributions must retain the original copyright notice(s) in the source code, and also reproduce a copy of the license with every copy of the source code distributed.

We are currently meeting Section 2.1(a) (keeping license notices in any original Watcom source files), but aren't reproducing an original copy of the OpenWatcom license.txt in ELKS. I will copy it over to libc/watcom/license.txt to meet this obligation, and thus should be meeting the license obligations.

At least we have already comment in our license comment,

The ELKS applications have a statement to that effect in elkscmd/LICENSE. Is there another file that should be modified to show that the ELKS C Library may include code that would be licensed under OpenWatcom license if used?

@ghaerr
Copy link
Owner Author

ghaerr commented Jun 3, 2024

Hello @Vutshi,

We’ve even got our very own variant of the gcc/clang dichotomy :)

I hadn't thought of it quite like that, but after all your messing around with Intel vs AT&T syntax, and my having to switch to Intel syntax for the OpenWatcom wasm assembler, I think you're right!! The continuing battle between syntaxes goes on!

I am curious whether the new compiler comes with a performance profiling tool. Or did I miss it, and we already had one with ia16-gcc?

I think that it does, as there are some prof directories and stuff around somewhere in the massive source base which is OpenWatcom. I haven't even had any time to jump into that yet. However, I am extremely impressed with the code generation style change that OpenWatcom brings to the table. For starters, it defaults to register passing calling convention (which ia16-elf-gcc also has), which provides some very neat code sequences which, instead of manipulating the stack all the time, lots of much faster (and smaller) register moves are used instead. One can also direct the assignment of registers directly to function arguments, which greatly simplifies and reduces the problem of executing system calls, which I happen to be working on now.

Another cool thing about the code OpenWatcom produces is that ES and DS REGISTERS ARE SCRATCH between function calls. That means that, especially in large/compact models, the compiler emits code to use DS and ES for quick and easy far pointer manipulation, and then uses SS: overrides for quick access to the application data segment. A bit hard to describe without showing lots of code, but quite a bit different than ia16-elf-gcc output. It seems that ia16-elf-gcc code generation was designed originally around small data model, while OpenWatcom has a much more complex register-allocation engine internally. Frankly, I hadn't ever considered writing ASM in the style of some of the code that is being output by OWC.

I'm interested in profiling too. If you can learn more about it, please post what you find.

@tyama501
Copy link
Contributor

tyama501 commented Jun 3, 2024

Hello @ghaerr ,

Thank you for the clarification!

@Vutshi
Copy link

Vutshi commented Jun 3, 2024

I'm interested in profiling too. If you can learn more about it, please post what you find.

I will if I eventually do something in that direction. Currently, I am obsessed with statically recompiling an old DOS game for ELKS and macOS )

@ghaerr
Copy link
Owner Author

ghaerr commented Jun 4, 2024

Initial work for compilation and user testing of ELKS programs is now completed.

This PR adds two scripts to more easily compile and link programs with OpenWatcom C for testing under ELKS. These are ewcc and ewlink - very simple scripts to compile a single .c file for each file in a project, and then link them into a final ELKS executable.

To test, set the WATCOM= and WATDIR= environment variables in a file, say wcenv.sh:

#!/usr/bin/env bash

# Set up Watcom build environment

export WATDIR=/Users/greg/net/open-watcom-v2
export WATCOM=$WATDIR/rel/bino64   # for macOS
export WATCOM=$WATDIR/rel/binl     # for Linux

add_path () {
	if [[ ":$PATH:" != *":$1:"* ]]; then
		export PATH="$1:$PATH"
	fi
}

add_path "$WATCOM"

echo PATH set to $PATH

I have got the ELKS BASIC interpreter running (in compact model only for now). To compile and test yourself:

. ./wcenv.sh
cd $TOPDIR/libc
make -f watcom.mk clean
make -f watcom.mk    (produces libc/libc.lib, required for linking)
cd $TOPDIR/elkscmd/basic
ewcc basic.c
ewcc host.c
ewcc host-stubs.c
ewlink basic.obj host.obj host-stubs.obj
cp basic $TOPDIR/elkscmd/rootfs_template/root
... then, in ELKS:
# ./basic

@toncho11
Copy link
Contributor

toncho11 commented Jun 4, 2024

That's quite a feat!

Maybe explaining the:

are suitable for a wikipage.

@tyama501
Copy link
Contributor

tyama501 commented Jun 4, 2024

Hello @ghaerr ,

Wow.
Maybe the sjis to utf8 converter is good to test for me,
since the code space is less than 64KB, it does not use asm nor graphic,
and it uses far memory.

https://github.com/tyama501/sjis-to-utf8_elks/blob/main/sjis_to_utf8.c

By using the compact model,
can we replace the fmemalloc with regular malloc and
delete the segment incrementation like the following?
data = (uint8_t __far *)((uint32_t)data + 0x10000000); // increment segment
data = (uint8_t __far *)((uint32_t)data & 0xFFFF0000); // clear offset

I know the kernel loader need to be enhanced to use data segment > 64KB,
but these also need to wait the enhancement?

Thank you!

@ghaerr
Copy link
Owner Author

ghaerr commented Jun 4, 2024

Hello @tyama501,

Maybe the sjis to utf8 converter is good to test for me,
since the code space is less than 64KB, it does not use asm nor graphic,
and it uses far memory.

Yes, having little code a lots of data is a good fit for compact model. However, far data (ie. using compact or large model) brings about its own complexities. One of them is in malloc. See below.

By using the compact model,can we replace the fmemalloc with regular malloc

Unfortunately, at the moment, no. We're still under heavy development, so this this may change. But for now, the ELKS C library malloc only allocates from the application heap data segment, not far memory. So fmemalloc must continue to be used to allocate memory outside the process boundary. The difference is that that memory can be accessed directly using char * pointers since all C pointers are far.

In the future, I hope to rethink this a bit, but its complicated. Applications will still want the ability to allocate small amounts of memory from the heap, while also allocating memory outside the process. The malloc API only takes a single (16-bit into, actually limited to 32k-1 bytes) parameter and so isn't really up to the job.

Thus, compiling a 16-bit program in large model isn't just like flat 32-bit address space where one doesn't have to worry about memory.

data = (uint8_t __far *)((uint32_t)data + 0x10000000); // increment segment

BTW, OpenWatcom supports a "huge" memory model which will do that pointer arithmetic past 64K for you. I haven't even looked at it yet. Someone would need to read up on it in the Watcom Users's Manual.

Having said all this, it could still be worthwhile to try compiling your program with OWC. Expect issues. For instance, I have ported only about 8 of the 35 system calls required; fmemalloc has not been ported. The good news is they are straightforward and you can add them yourself (or I will) in an easy PR.

I know the kernel loader need to be enhanced to use data segment > 64KB,
but these also need to wait the enhancement?

Right now, both large and compact model programs should run with OWC as long as their code and data segments are < 64K, without waiting for the kernel enhancement. I am hoping to see you and others try out the compiler and we work through the issues before a kernel enhancement, so that we know more of what is needed.

I plan to commit this PR today so you can try playing with the compiler.

Thank you!

@ghaerr
Copy link
Owner Author

ghaerr commented Jun 4, 2024

Hello @toncho11,

Maybe explaining the: ...
are suitable for a wikipage.

Good idea. We're still in heavy development mode and the scripts will likely change. Agreed more information on how to compile up OpenWatcom from source is needed ASAP. I posted that information somewhere but will have to look it up. Perhaps you might try pulling down the repo and building it, I've only done this on macOS.

Thank you!

@ghaerr
Copy link
Owner Author

ghaerr commented Jun 4, 2024

Hello @tyama501,

(cross posting here from #1505 (reply in thread).

Actually, it uses code space > 64KB, so it needs at least medium model.
(It still uses about 100KB code space after deleting features)
I will focus on debugging with current medium model until we get the large model support.

Large model should be quite easy to make happen, right now. The problem is that I haven't yet added support for both compact and large models simultaneously. Since the Watcom libc.lib is built in whole using make -f watcom.mk and the scripts ewcc and ewlink are used, there are only three files that need to be changed to support large model:

In libc/watcom.inc:

CARCH =\
    -bos2                           \
    -mcmodel=c                      \ <---- change to -mcmodel=l

In elks/tools/objtools/ewcc:

CCFLAGS="\
    -bos2                           \
    -mcmodel=c                      \ <--- same change here to -mcmodel=l

In elks/tools/objtools/ewlink make the same change as above.

I suppose it might be better to change my scripts to large model immediately so that we have the most benefit. I had started with compact because it lessened the issues involved in the os2toelks executable format conversion but I think they are soled now.

@tyama501
Copy link
Contributor

tyama501 commented Jun 4, 2024

Thank you.
Great to hear we can compile large model! But ELKSmoria uses > 64KB so I will wait kernel loader since it cannot load :)
I think in OpenWatcom compiler installer for Linux has option for supporting huge model, and default is no.
What is the huge model? (I will ask Copilot later...)

And can we still use __far prefix?
uint8_t __far * memory = (unsigned char __far *)fmemalloc(size);

Or is this mean we don't need to use it?

The difference is that that memory can be accessed directly using char * pointers since all C pointers are far.

At least we still need to do some kind of this unless we don't use the huge model right?

data = (uint8_t __far *)((uint32_t)data + 0x10000000); // increment segment

@tyama501 tyama501 mentioned this pull request Jun 7, 2024
@tyama501
Copy link
Contributor

tyama501 commented Jun 7, 2024

Hello @ghaerr ,

Before modifying for the fmemalloc,
I run ewcc sjis_to_utf8.c.

image

It seems that the compiler is complaining about
int main(int argc, char **argv).

char **argv cannot be used in the watcom?

@ghaerr
Copy link
Owner Author

ghaerr commented Jun 7, 2024

It seems that the compiler is complaining about
int main(int argc, char **argv).

Long story short, declare main as follows:

int main(int argc, char __wcnear * __wcfar *argv);

@tyama501
Copy link
Contributor

tyama501 commented Jun 7, 2024

Thank you. Now I could get the result expected.
(fmemalloc is an udefined)
image

@ghaerr
Copy link
Owner Author

ghaerr commented Jun 7, 2024

Looking good. Now add the _fmalloc system call as previously discussed and... good luck! (Use wdis fmalloc.obj to check that system call code is what is wanted).

@tyama501
Copy link
Contributor

tyama501 commented Jun 8, 2024

Hello @ghaerr ,

fmemalloc looks good.
fmemalloc_

I could compile and link the sjis_to_utf8.
It got slightly bigger than gcc-ia16.
60KB --> 63.7KB (TEXT increased 3KB. Overhead for the large model?)
sjis_to_utf8_wcc

I executed on ELKS and it got file memory too large.
It seems that HEAP 0.
sjisutf8_memory

Previously HEAP was 65535 with gcc-ia16 and medium model.
How should I configure the HEAP?

Thank you.

@ghaerr
Copy link
Owner Author

ghaerr commented Jun 8, 2024

Hello @tyama501,

fmemalloc looks good.

Nice! System calls are pretty tightly coded with OWC and register calling sequences, aren't they :)

TEXT increased 3KB. Overhead for the large model?

Yes, probably, but hard to tell at the moment. You might try re-compiling everything in compact model to see whether that makes much difference. (If EMoria already requires medium model then the ELKS kernel loader probably won't yet work with OWC-compiled programs because I haven't written any special code to convert OS/2 binaries with multiple code segments to ELKS a.out headers which have the 2nd code segment in .ftext (far text). So far, only single-code and single-data segments will be loadable by ELKS). The other reason I haven't written the conversion for multiple code segment programs is I haven't actually seen a large model OS/2 binary that is > 64k code, so not sure yet what will have to be done.

I executed on ELKS and it got file memory too large.
It seems that HEAP 0.

Yes, heap (and stack) are currently always being set to 0 (meaning default 4K default heap, 4K default stack) by the OS/2 binary converter os2toelks. We need an option to set the heap or stack from the command line for it.

Previously HEAP was 65535 with gcc-ia16 and medium model.
How should I configure the HEAP?

For now, you'll have to use chmem -h 0xffff during ELKS runtime in order to change the heap or stack size. It seems that with heap=0 (meaning 4K) you're already out of heap space, so obviously things are quite tight. You can also try reducing the stack size a bit (using -s) also to see if you can get it to load, with the restrictions mentioned above.

You're welcome to submit a PR for fmemalloc.c :)

Thank you!

@tyama501
Copy link
Contributor

tyama501 commented Jun 9, 2024

Hello @ghaerr ,

Nice! System calls are pretty tightly coded with OWC and register calling sequences, aren't they :)

Yes, it seems that it uses ss and bp to get the pseg of fmemalloc.

Hmmm.
It seems that if I set HEAP 65535 it gets sys_brk fail and
if I reduce STACK to 2047 then it gets hang up.

image

I will try compact model.

Thank you.

@tyama501
Copy link
Contributor

tyama501 commented Jun 9, 2024

It seems that compact model does not change size.
image

Let me try small model. Previously it was small model, not medium model.

@tyama501
Copy link
Contributor

tyama501 commented Jun 9, 2024

Still bigger than gcc-ia16 but it reduced to 61998Bytes from 63530Bytes with the small model.

image

@tyama501
Copy link
Contributor

tyama501 commented Jun 9, 2024

The small model model also seems not working properly.
image

I also tried stack setting 0x1FFF and 0x7FF but it gets error or wrong exit.
(I assume OpenWatcom uses more stack)

Well, anyway I will make PR for the fmemalloc.
Let me know if something is wrong.

Thank you.

@ghaerr
Copy link
Owner Author

ghaerr commented Jun 9, 2024

Hello @tyama501,

I'm a bit confused about which versions of your program were originally small or medium model. But looking at the screenshots it seems that the program is only a small amount of code, but needs large data. Thus, the original ia16-elf-gcc program should probably be small model, and the OWC program should probably be compact model (near code, far data). This keeps our debugging to a minimum.

With regards to stack and heap usage, both should be set to the previous values used in the ia16 version. However, it seems that what you're saying is that the ia16 vs OWC versions have different data segment sizes (use ia16-elf-size to show, not ls). This should generally not be the case, as data sizes should not change between compilers and generally we are running the same C library code, compiled by either compiler.

That said, it is not a good idea to arbitrarily change the stack size of a program without manually/visually verifying the stack requirements by inspecting the auto (stack) variables declared within its various functions. Stack overwrite problems on an 8086 can produce hard-to-find bugs.

With regards to the heap: in small both small and compact/large models, as mentioned before, the malloc function allocates only from the near heap (DS-addressible data segment). Since your program is using fmemalloc to get far memory, we should consider not calling malloc at all. This then leaves malloc called only by the C library, which occurs usually using stdio functions like fopen (not shell-directed file I/O), which allocate a struct FILE and a 1K buffer.

Thus, the heap can be considerably lower than 65535 which just suggests max heap. Perhaps set the heap to 2-3k in your original ia16 program to ensure it works, then set the same in OWC.

The idea here is to get to a program that has a small enough static data segment so that it can load and operate in ia16 and OWC. If the program data has many pointers, then those will move from 2 to 4 bytes and could easily become too large. In those cases, the program data segment needs to be made smaller, even in large model.

Finally, I have identified a likely bug in _fmemalloc which I will address in #1917.

Thank you!

@tyama501
Copy link
Contributor

tyama501 commented Jun 9, 2024

Thank you @ghaerr ,

I will read your explanation carefully later,
but the original code does not use malloc,
it only uses fmemalloc and it has been compiled with the small model.

It uses a lot of static memory because it has all sjis utf8 table inside.

I guess it didn't work with the OWC because of of the fmemalloc bug.

@ghaerr
Copy link
Owner Author

ghaerr commented Jun 9, 2024

the original code does not use malloc,
it only uses fmemalloc and it has been compiled with the small model.

Ok, that's good to know. If it uses fopen and the like then we may still need heap, otherwise not. You might test this with ia16 to determine whether a small heap value (1-2K) works.

It uses a lot of static memory because it has all sjis utf8 table inside.

Yes, that's what I thought. If there are few pointers, then the data segment size shown by ia16-elf-size should not increase much at all with OWC.

I guess it didn't work with the OWC because of of the fmemalloc bug.

That could be, but also there are many other reasons, as there is very little testing a lots of code having been written for the OWC compilation. So we will have to be careful to understand exactly what might be going on.

Another thought would be to test fmemalloc by calling it several times in a sample program and just printf the result, then sleep(10) while you look on another console running meminfo to check that memory segment is actually being returned correctly; things like that. The less system and library calls the better for initial testing of new programs or system calls.

Now that I think more, perhaps use small model OWC also in the start, since this was also used by ia16 and worked. That will eliminate many other issues while we get things sorted out.

@tyama501
Copy link
Contributor

tyama501 commented Jun 9, 2024

〉You might test this with ia16 to determine whether a small heap value (1-2K) works.

One question.
Is this mean we don't use heap for the far memory?

It uses fopen.
I think it needed some heap but I will try later.

@ghaerr
Copy link
Owner Author

ghaerr commented Jun 9, 2024

Is this mean we don't use heap for the far memory?

No - the fmemalloc system call allocates memory from main memory, outside your process code and data segments. It returns the 16-bit segment portion of the memory allocated, which is then used by a __far pointer to directly access up to 64k of it at a time.

It uses fopen.
I think it needed some heap

You will see that libc/stdio/__fopen.ccalls malloc to allocate memory for the FILE structure and in turn for the buffered input/output using FILE->bufstart. All that uses near heap, even in large model, as malloc always allocates from near heap as previously discussed.

@ghaerr
Copy link
Owner Author

ghaerr commented Jun 17, 2024

Hello @tyama501,

I did more testing using the ELKS C stdio library compiled with Watcom for large model by compiling and linking elkscmd/misc_utils/hd.c and it seems to work well (large model). So I am not sure what the issue(s) might be with your SJIS to UTF-8 converter, other than perhaps we have a problem with large code segments, since hd.c is pretty small. But stdio seems to be working OK in large model from my testing. I am using the standard 4K stack and 4K heap.

Thank you.

@tyama501
Copy link
Contributor

Hello @ghaerr ,

Thank you.
I need a little more time to investigate
but it seems that
it needs more Heap than 1024 since it uses file for input and output
and more stack than 4096
that the TOTDATA exceed 65536 when the large model.
image

@tyama501
Copy link
Contributor

(I don't know why it needs so much STACK)

@ghaerr
Copy link
Owner Author

ghaerr commented Jun 18, 2024

I would think 2K heap will be needed at a minimum, certainly enough to remove the sys_brk failed messages. For stack, when you use 2K heap and, say 1-2K stack, will the program run? What errors are produced?

@ghaerr
Copy link
Owner Author

ghaerr commented Jun 18, 2024

BTW, #1920 will require that you rewrite main using the standard int main(int argc, char **argv) declaration. This was initially required as the argv array passed by the kernel used near pointers which won't work with large model's far pointer default size. The string array is now rewritten by the C startup code so a non-standard declaration is no longer necessary.

@tyama501
Copy link
Contributor

Well, it says sys_brk fail with heap 2k and stack 2k.

image

@tyama501
Copy link
Contributor

tyama501 commented Jun 19, 2024

It seems that the file is created correctly. It might be some problem near the end.
image

@ghaerr
Copy link
Owner Author

ghaerr commented Jun 19, 2024

It seems that the file is created correctly. It might be some problem near the end.

Interesting, although the sys_brk fail message could be coming at any time. Perhaps you should try running without output redirected so that we can more accurately determine when this is failing, e.g. ./sjisutfl sjis.txt.

Another idea is to insert write statements for debugging like write(2, "1", 1).

I would have to see the source code to determine how much heap or stack are likely required. Can you post it? This isn't a large program, only a large data segment, right? Can approx 4-8k of data be remove from the program, just for testing purposes to allow using a 4K heap/4K stack?

I will write a test OWC program using almost all of the data segment (by allocating static buffers) that also uses stdio and see if I can get a failure...

@ghaerr
Copy link
Owner Author

ghaerr commented Jun 19, 2024

Hello @tyama501,

I tested stdio using a large model program whose data segment was almost completely full (134 bytes short of 65K including 4K stack and 4K heap, as space for environment and argv array are required). I could not get it to fail, so perhaps there is some other problem in your SJIS conversion program, I am not sure yet what it might be. Perhaps you can post source for your program or simplify it such that the brk_fail messages stop?

Source for data.c:

#include <stdio.h>
#include <string.h>

char b[32000];          // < 32k won't add additional segment
char bb[21900];

int main(int argc, char **argv)
{
    FILE *fp;
    int c;

    memset(bb, 0, sizeof(bb));
    fp = fopen(argv[1], "r");
    if (!fp) perror(argv[1]);
    while ((c = getc(fp)) != EOF)
        putchar(c);
    fflush(stdout);
    fclose(fp);
    return 0;
}

compiled with:

ewcc data.c
ewlink data.obj

Screenshot:
watacom large

Thank you!

@tyama501
Copy link
Contributor

tyama501 commented Jun 20, 2024

Hello @ghaerr ,

Thank you for the testing.
I have been still using the code in the following repository with **argv modified to char __wcnear * __wcfar *argv.
I haven't pulled the latest updates of #1920 yet.
sjis.txt is also in the repository.

https://github.com/tyama501/sjis-to-utf8_elks

@toncho11
Copy link
Contributor

So after compilation usage is:

sjisutf8 sjis.txt

and it will output UTF8 by converting the SJIS format contained in sjis.txt. Output will be to stdout.

@tyama501
Copy link
Contributor

Right.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants