Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with certain filenames. #56

Closed
Morritz opened this issue Sep 19, 2021 · 12 comments
Closed

Problem with certain filenames. #56

Morritz opened this issue Sep 19, 2021 · 12 comments
Labels
Milestone

Comments

@Morritz
Copy link

Morritz commented Sep 19, 2021

2 example filenames:

  • index.1234.css
  • index.4321.css

For both of this files there will be produced conflicting function definition "getIndexFile()", so it will not be able to compile. I am afraid that it also applies to the extensions.

@HackerDaGreat57
Copy link

My suggestion is just to use underscores instead of dots in file names.

@Morritz
Copy link
Author

Morritz commented Sep 19, 2021

My suggestion is just to use underscores instead of dots in file names.

the files is generated by another piece of software, even if it could be possible there is still case of two files having same extension which is not good

@HackerDaGreat57
Copy link

HackerDaGreat57 commented Sep 19, 2021

Interesting. Is there a way to mass-rename the dots to something else? And then just use the extension-less files as if they were .css files (or whatever they were in the first place). This could result in index_1234_css being used as if the file was .css.

FYI I'm not a bin2cpp developer. I'm just trying to help.

@Morritz
Copy link
Author

Morritz commented Sep 19, 2021

Interesting. Is there a way to mass-rename the dots to something else? And then just use the extension-less files as if they were .css files (or whatever they were in the first place). This could result in index_1234_css being used as if the file was .css.

FYI I'm not a bin2cpp developer. I'm just trying to help.

https://github.com/MKlimenko/embed
For now I found 'embed' as better solution. It doesn't have this issue because it names file headers with unique ids.

@end2endzone
Copy link
Owner

end2endzone commented Sep 20, 2021

Hi thank you for showing interest in bin2cpp and providing feedback. Much appreciated.
I was already aware of this problem. It was already partially discussed partially in #51.
I am assuming that you are using --dir option. As a temporary workaround, you can switch to single bin2cpp calls for each of these file. For example:

bin2cpp --file=index.1234.css --output=.\outdir --headerfile=index1234.h --identifier=index_1234_css
bin2cpp --file=index.4321.css --output=.\outdir --headerfile=index4321.h --identifier=index_4321_css

which will result with unique identifiers for both files.

Let me clarify my point. What I suggest is to still use the --dir option and convert to c++ most of the input files in your scenario. For files that creates duplicate identifiers, run a single pass of bin2cpp and manually specify the identifiers so that no conflict is generated when compiling.
I understand this is only possible if you are dealing with a fixed set of file names (files with known file names).

I do plan on updating bin2cpp this week and implement multiple fixes, including this one.

Edit: removed unfinished sentence. Fixed duplicate headerfile option.

@end2endzone
Copy link
Owner

The getFunctionIdentifierFromPath() function which generates a valid c++ identifier based on the path of a given file should be updated to be more robust. This option is mainly used with the --dir command. It is not uncommon that two files generates the same identifier. Without a proper unique identifier generation, the --dir option have a low interest.

More specifically:

Fix existing bugs:

  1. There is a bug where a file named index.1234.css generates the identifier getIndexFile(). In other words, the .1234 part of the filename is lost. This is incorrect and the expected identifier should be getIndex_1234File() or getIndex1234File().
  2. If there is another file called index.4321.css, this bug results in the same identifier for both file and prevent compilation of the generated files. Upon fixing the first problem, this won't be an issue anymore. However, the problem will rise again with two files that have the same name but different extension. For example, index.4321.css and index.4321.js. Both files will generates the identifier getIndex4321File().

Increase robustness

A global dictionary should be created to remember previous identifiers. If two files generates the same identifier, a counter value (or something similar) should be automatically appended to new identifiers.
This is also required for implementing #51 since two files with the same name but stored in different folders will generates the same identifier.

@end2endzone
Copy link
Owner

I have fixed the first issue (identifier conflicts for index.1234.css and index.4321.css) in branch feature_issue56 with commit 0b09348. If 2 file path resolves to the same identifier, "_[counter]" is appended to the second identifier. Look at the unit tests for examples.

This fix now provides unique identifiers which is pretty convenient.

For example, you can now encode directories with the following structure:

───www
   │   index.html
   │
   ├───blog
   │   │   index.html
   │   │
   │   ├───how-to-create-a-web-site
   │   │       index.html
   │   │
   │   └───using-bin2cpp
   │           index.html
   │
   ├───contact
   │       index.html
   │
   └───home
           index.html

If 6 files are named index.html, it creates function identifier such as getIndexhtml_5File().

This is my first question. I think these are not the prettiest identifiers. Which of the following strategies should I choose ?

  1. Keep using getIndexhtml_5File() (appending an underline and then a counter).
  2. Do you think that removing the underline is better ? Do you think getIndexhtml5File() would be more appropriate ?
  3. Another option would be to create identifiers that includes the directory path. For examples, getWwwIndexhtmlFile(), getWwwBlogIndexhtmlFile(), getWwwBlogHowtocreateawebsiteIndexhtmlFile(), getWwwContactIndexhtmlFile(), getWwwHomeIndexhtmlFile(). But if I go this way, the identifier names might get too long or harder to understand when there are no conflict. I think an identifier can get up to 2048 characters in length.

Which strategy is more appropriate in your opinion ?

Another problem is the generated output file names. All 6 generated files are named index.h. They all override each other and only the last one remains. For robustness, I think a dictionary-like solution is required to prevent file name conflicts. Again some strategies could be implemented:

  1. Keep outputting all files in the same directory (specified with --output). Use a dictionary to prevent file name conflicts to get file names such as index_1.h, index_2.h, index_3.h... With this option, the function identifiers and the output file names would be matching. For example, file index_56.h would have the identifier getIndex_56File().
  2. Create otuput files that includes the directory path. For example WwwIndex.h, WwwBlogIndex.h, etc...
  3. Keep the input directory structure in the output directory. For example:
[--output]
  └───www
      │   index.h
      │
      ├───blog
      │   │   index.h
      │   │
      │   ├───how-to-create-a-web-site
      │   │       index.h
      │   │
      │   └───using-bin2cpp
      │           index.h
      │
      ├───contact
      │       index.h
      │
      └───home
              index.h

(note those are *.h files)
With this option, the project owner (the user which uses bin2cpp) must know the path of all generated index.h files. This might be an annoying limitation.

I would really like to get your option. I think it would help make sure I take the right decision. Changing the implementation of identifiers and the output file names is a breaking changes. I need to make sure that which ever strategy I choose, it will be the one that fits most use case.

end2endzone added a commit that referenced this issue Sep 29, 2021
…ure of the input directory in the output directory. This feature is required in preparation for fixing #51 and #56.
end2endzone added a commit that referenced this issue Oct 1, 2021
* feature_issue56:
  Added double quotes in generate_test_files.sh.in to prevent 'Permission Denied' error.
  Fixed invalid script commands in `generate_test_files.sh.in`. Modified cmakelists.txt to create target `build_test_files` which makes sure that all required unit tests files are created before building bin2cpp_unittest target.
  * Fixed issue #56: Problem with certain filenames
  Implemented getUniqueFilePath() and unit tests.
  Implemented `pathSplit()` and `pathJoin()` with unit tests.
  Implemented --keepdirs which allows to keep the same directory structure of the input directory in the output directory. This feature is required in preparation for fixing #51 and #56.
  Updated product to version 3.0.0 since the api is now broken. Moved `filter()` and `getFunctionIdentifierFromPath()` from `main.cpp` to `common.h` so that it can be reused and property tested. Updated `getFunctionIdentifierFromPath()` and `getUniqueFunctionIdentifierFromPath()` to properly generated identifiers from a file path. Created unit tests as well.
@end2endzone
Copy link
Owner

end2endzone commented Oct 1, 2021

Hi. I created a new build which should fix this issue. Would you like to try it out? I think this fix actually solves the problem you mentioned. I would prefer to get your feedback before closing this issue. You can find the latest binaries with this fix at the following: Windows, Linux and macOS.

EDIT: Also note the new command line argument --keepdirs which "enables the output files to have the same directory structure as the input files. Valid only when --dir is used."

@HackerDaGreat57
Copy link

I think it's time for a new minor release/patch. My large file issue (#55) and this one got fixed, plus the new --keepdirs arg.

(2.5.0 or 2.4.1)

@end2endzone end2endzone added this to the 0.7.0 milestone Oct 1, 2021
@end2endzone
Copy link
Owner

Respectfully, I disagree. Also note the software is at version 3.0.0 since I created a breaking change with the new way of creating identifiers. Since that would be a major update, I see no reason why to rush things out.

Please understand that I have very few hours of free time in a week and creating a release is hours I do not put on fixing issues. I would prefer not to go trough the release process multiple times in the next week or two.

I do plan to include other fixes in the next release. Before releasing a new version, I need to make sure the new features are stable (that I did not forget about something). That is the reason I asked for feedback. I also have to update documentation files and create new examples to show how to use these new features. To be clear, I created a milestone to identify which issues I do plan to fix before the next release.

@end2endzone
Copy link
Owner

end2endzone commented Oct 20, 2021

@HackerDaGreat57 @Morritz Please note that I released version 3.0.0. Thank you to both of you for your help and feedback.

@Morritz
Copy link
Author

Morritz commented Oct 27, 2021

@HackerDaGreat57 @Morritz Please note that I released version 3.0.0. Thank you to both of you for your help and feedback.

thanks, it will for sure be really convienent to use with such filenames i wrote, for embedding html/js stuff to binaries,
std::embed is still unavailable as i know

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants