Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add file operation #1068

Merged
merged 1 commit into from
Mar 21, 2022
Merged

add file operation #1068

merged 1 commit into from
Mar 21, 2022

Conversation

anjiahao1
Copy link
Contributor

@anjiahao1 anjiahao1 commented Feb 17, 2022

operate lz4 compressed files as a general files

Signed-off-by: anjiahao anjiahao@xiaomi.com

@Cyan4973
Copy link
Member

Cyan4973 commented Mar 2, 2022

OK, this is pretty good.

I will leave a few comments in the code review, but there is a real chance this gets merged.

lib/lz4file.h Outdated
#include <stdio.h>
#include "lz4frame.h"

typedef void LZ4F_file;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer an incomplete type.
It remains opaque, but it can't be confused with any other pointer type, allowing the compiler to participate in code validation. This is more important than it sounds, as many typical programming errors can be detected this way, saving precious debug time.

In this case, you already have LZ4_readFile_t and LZ4_writeFile_t defined in lz4file.c.
Keep them there, just rename them struct LZ4_XxxFile_s {... };,
and here, in the header : typedef struct LZ4_readFile_s LZ4_readFile_t;.

The downside is that you have now 2 different types, for read and write operations.
I don't see that as a problem, I actually like this distinction.
But if your goal is to mimic <stdio.h>, where the same FILE* object is used for all I/O operations, you may want to refactor your type. A simpler adaptation could be to use a union, and there are many variations around this theme.
I would, personally, keep them separate, because it matches the API's capabilities, which are solely read and write. You even have separated LZ4F_readClose() and LZ4F_writeClose(), so not even this operation is shared (while <stdio.h> has a single fclose()), and there are no additional capability planned. So really no gain in increasing confusion with a common pointer object between read and write operations.


#ifndef LZ4FILE_H
#define LZ4FILE_H

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

API which are not yet considered "stable", as in "guaranteed to be present with exact same ABI in future versions of the library", are guarded by the macro LZ4F_STATIC_LINKING_ONLY.

I would recommend doing the same here, because it's a completely new API, and it still has to receive feedback from users. Besides, increasing the surface of the stable public API by anything is a major operation, requiring the update of version number (1.9 => 1.10).

In the future, when this new API feels "ready", the guard will be removed, the version increased, and it will become part of the public API.

@Cyan4973
Copy link
Member

Cyan4973 commented Mar 2, 2022

I would recommend adding a test, somewhere, that uses this new API.
Typically, a new examples/, with comments to help new users, which would be both compiled and run when invoking make test.

lib/lz4file.h Outdated
* `lz4f` will set a lz4file handle.
* `fp` must be the return value of the lz4 file opened by fopen.
*/
LZ4FLIB_API LZ4F_errorCode_t LZ4F_readOpen(LZ4F_file** lz4f, FILE* fp);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When guard LZ4_STATIC_LINKING_ONLY is added, the declaration prefix becomes LZ4FLIB_STATIC_API.

@t-mat
Copy link
Contributor

t-mat commented Mar 2, 2022

How can you invoke LZ4F_writeClose() to close LZ4F_file context safely?

Without error, LZ4F_writeClose() may work fine.
But if there're some errors, it won't free() any internal buffers. And I think this behavior may cause problem.

For example, fwrite() may fail. But API users are not able to distinct they must call LZ4F_writeClose() again or not.

lz4/lib/lz4file.c

Lines 288 to 310 in 68f9c6e

LZ4F_errorCode_t LZ4F_writeClose(LZ4F_file* lz4f)
{
LZ4_writeFile_t* lz4fWrite = (LZ4_writeFile_t*)lz4f;
size_t ret;
if (lz4fWrite == NULL)
return -LZ4F_ERROR_GENERIC;
ret = LZ4F_compressEnd(lz4fWrite->cctxPtr,
lz4fWrite->dstBuf, lz4fWrite->dstBufMaxSize,
NULL);
if (LZ4F_isError(ret))
return ret;
if (ret != fwrite(lz4fWrite->dstBuf, 1, ret, lz4fWrite->fp)) {
return -LZ4F_ERROR_GENERIC;
}
LZ4F_freeCompressionContext(lz4fWrite->cctxPtr);
free(lz4fWrite->dstBuf);
free(lz4fWrite);
return LZ4F_OK_NoError;
}

Since intention of new API set is simplification, perhaps LZ4F_writeClose() should always free all contexts and memories even it encounters any error.

I mean

  • It may try to call LZ4F_compressEnd().
  • It may try to call fwrite().
  • It always call LZ4F_freeCompressionContext().
  • It always call free().
  • It returns LZ4F_ERROR_* when something wrong.
  • It returns LZ4F_OK_NoError if everything is okay.

@anjiahao1
Copy link
Contributor Author

Thank you for your suggestions, I will modify the code later.

@anjiahao1
Copy link
Contributor Author

some ci hava problem

@t-mat
Copy link
Contributor

t-mat commented Mar 3, 2022

Since all CI errors are at setup or apt-get, it seems change set itself is fine.

@Cyan4973 could you restart GH-Actions again?
https://docs.github.com/en/actions/managing-workflow-runs/re-running-workflows-and-jobs

If Cyann is busy, @anjiahao1 may be able to try empty push to invoke CI.
But it'll add empty commit history to the repo.
https://stackoverflow.com/a/56684569

@Cyan4973
Copy link
Member

Cyan4973 commented Mar 3, 2022

Tests are restarted

@Cyan4973
Copy link
Member

Cyan4973 commented Mar 3, 2022

What about adding an example ? And use it as a form of small test ?

@anjiahao1
Copy link
Contributor Author

What about adding an example ? And use it as a form of small test ?

ok i'll add tests and examples this weekend 👌

@Cyan4973
Copy link
Member

Cyan4973 commented Mar 5, 2022

Note: please avoid adding the produced binary examples/fileCompress to the repository.

Otherwise, the rest of the PR looks in pretty good shape.

@Cyan4973
Copy link
Member

Cyan4973 commented Mar 5, 2022

OK, I believe the code is in good shape.

The last issue I can think of is a bit complex, and I probably will have to look into it myself.
Namely, adding lz4file to the scope of liblz4 adds a new dependency: <stdio.h>.
And yes, there are many situations where this is not a problem.
The question is more : are there situations where this could be a problem ?
And even if there are, it's not necessarily a mortal blow. There might be acceptable mitigations.

Anyway, I believe the code itself is ready to merge, the question is more :
should it be merged into dev directly, or via a feature branch, where the last details could be harnessed before joining dev.

@anjiahao1
Copy link
Contributor Author

anjiahao1 commented Mar 6, 2022

ok,merge into dev better,Thank you 👍

@t-mat
Copy link
Contributor

t-mat commented Mar 6, 2022

As for <stdio.h>, I think embedded platforms or boot loaders are tend to avoid it. Since there's no OS, file system, etc.
But I also believe they should use lz4.h and lz4.c as a source code form. Because they don't use dynamic library.

So, I think it's okay to state that liblz4 is relatively higher level library which requrires standatd C compatible library.

@Cyan4973
Copy link
Member

Cyan4973 commented Mar 6, 2022

As for <stdio.h>, I think embedded platforms or boot loaders are tend to avoid it. Since there's no OS, file system, etc. But I also believe they should use lz4.h and lz4.c as a source code form. Because they don't use dynamic library.

So, I think it's okay to state that liblz4 is relatively higher level library which requrires standatd C compatible library.

I fully agree.
Integration at source code level can easily avoid integrating lz4file.
And integration at dynamic library level presumes a platform with standard C runtime.

I'm slightly more concerned by integration at static library level.
Say, our default recipe bundles all of lib into liblz4.a.
Therefore, it includes dependency on <stdio.h>.
But if the user then never invokes any function from lz4file,
is he impacted by the <stdio.h> dependency ?
Or no, because none of the symbols he uses depends on <stdio.h> ?

Btw, I'm wondering how I could check that.
Pb is, all my test platforms have a C runtime, so none is missing <stdio.h>.

@t-mat
Copy link
Contributor

t-mat commented Mar 7, 2022

Introducing new compile-time feature macro LZ4_NO_LZ4FILE or LZ4_FREESTANDING would be possible solution. Obviously it's not perfect though.

But if the user then never invokes any function from lz4file,
is he impacted by the <stdio.h> dependency ?

It depends on capability of compiler/linker.

But they may be impacted if all of their code and dependent libraries don't rely on any standard C library.
And it also means they're targeting freestanding implementation or platform specific one (*nix system call, WIN32, etc).

For freestanding target, I think we should strongly recommend them to use LZ4 as a library in source code form.
Because they often specify special compile switches (alignment, packing, etc), and generic static library easily hit these incompatibility.

Pb is, all my test platforms have a C runtime, so none is missing <stdio.h>.

Maybe we can implement freestanding environment test with the following compile switches.

  • -nostdlib (for GCC, Clang, etc)
  • /Zl (MSVC)

operate lz4 compressed files as a general files

Signed-off-by: anjiahao <anjiahao@xiaomi.com>
@Cyan4973
Copy link
Member

For freestanding target, I think we should strongly recommend them to use LZ4 as a library in source code form.

I agree.

Maybe we can implement freestanding environment test with the following compile switches.

  • -nostdlib (for GCC, Clang, etc)

That's a good idea !

Introducing new compile-time feature macro LZ4_NO_LZ4FILE or LZ4_FREESTANDING would be possible solution

I like LZ4_FREESTANDING .
We currently have LZ4_USER_MEMORY_FUNCTIONS, but it only covers malloc, calloc and free, removing <stdlib.h>. It doesn't cover <string.h> which is still included for memset and memcpy.
It could be a good idea to add those too, though in this case, there are some rather big performance implications that the user should be aware of.

@Cyan4973
Copy link
Member

@anjiahao1 : I believe the remaining efforts are essentially documentation related, there might also be some follow up, with new tests and new build capabilities, but these are all actions we'll take care on our side.

I noticed you recently updated your PR. Is there anything worthwhile mentioning ? Do you still intend to work a bit on this PR ?

@anjiahao1
Copy link
Contributor Author

@anjiahao1 : I believe the remaining efforts are essentially documentation related, there might also be some follow up, with new tests and new build capabilities, but these are all actions we'll take care on our side.

I noticed you recently updated your PR. Is there anything worthwhile mentioning ? Do you still intend to work a bit on this PR ?

oh,this a error in LZ4F_write,it use dtsMaxSzie to contrl chunk,is error.need use maxWriteSize to contrl chunk

@Cyan4973 Cyan4973 merged commit f14be20 into lz4:dev Mar 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants