Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ignoring PB project files is wrong #3

Closed
deseven opened this issue Jan 10, 2020 · 5 comments
Closed

Ignoring PB project files is wrong #3

deseven opened this issue Jan 10, 2020 · 5 comments

Comments

@deseven
Copy link

deseven commented Jan 10, 2020

PB projects files (*.pbp) contain some things crucial to many projects, such as

  • build targets;
  • compiler options (like a threadsafe flag - EXTREMELY important);
  • tools;
  • constants.

If you want to protect someone from leaking their personal info (I know that project files leave user, hostname and some local paths intact), it's better to do that with a pre-commit git hook, which will strip sensitive data from pbp files or at least warn user that they are going to commit that data. Pretty simple regexp such as <lastopen.*> can be used to find it.

Even if we'll manage to alter the IDE behavior so it won't leak anything, there are legacy projects designed to be compiled with older versions of PB (such as the latest available ANSI release).

@tajmone
Copy link
Collaborator

tajmone commented Jan 11, 2020

First of all, it's worth pointing out that these points have been covered in the discussion at fantaisie-software/purebasic#15, where they are being actively evaluated and discussed — so it might be worth following the developments therein.

Stating (as the title suggests) that "Ignoring PB project files is wrong" is a rather strong statement, especially in a boilerplate project. As the documentation of the project points out, this boilerplate provides optimal settings, none of which are mandatory, so the end user can easily delete that one-liner in the .gitignore file is he/she prefers too.

The important thing is to document why some settings were inserted in the boilerplate, and provide examples of the pros and cons. In this respect, there is still a lot of work to be done in the Wiki in terms of documenting the boilerplate and its possible uses.

Project Files vs Compiler Directives

I would argue that relying on PureBasic IDE project files for correct compilation is a bad choice because end users might work with other editors and IDEs.

The proper way to ensure correct compiler settings is via compiler directives — i.e. relying on the language instead of the IDE. Quoting from fantaisie-software/purebasic#15:

@tajmone: ... the best way to enforce compiler settings is via compiler directives in the main source file that check whether the compiler is set to threadsafe or not, console app or not, and so on with all required settings (XP, user mode, DPI awareness, etc.) and raise a CompilerError in case the settings are not as required.

This would be the strongest assurance that every project is compiled as it was intended, and the *.sample.pbp file would then be just a convenience to associate all source files to the projects

Since the PureBasic IDE conflicts by design with version control, it's a much safer assumption to presume that end users of a Git project will be using a modern IDE that supports PureBasic syntax as well as Git integration. Also, we should assume that good projects do make use of compiler directives to ensure that source files are compiled as expected.

Using Git Filters/Hooks

@deseven: it's better to do that with a pre-commit git hook, which will strip sensitive data from pbp files or at least warn user that they are going to commit that data.

The use of Git filters is part of the original proposals at fantaisie-software/purebasic#15:

@tajmone: Other possible solutions that came to my mind focused on creating a tool to strip away from *.pbp files all user-specific data. The problem is that for this to work users would have to close the PB IDE, run the tool and then commit, because the IDE updates the project files both at save and exit time.

Again, a Git filter could also be configured, so that user-specific entries are filtered at indexing time, but a full XML parser would definitely introduce overhead to the commit stage.

Using a Git filter/hook is a possibility but we'd have to consider the above mentioned problem regarding the IDE saving again projects data at exit-time — the risk here is that the end user might commit the properly-fixed project file and then when he/she closes the IDE it's re-saved again, and Git then sees the file as modified. The whole idea was to prevent this ping-pong effect where Git and the PB IDE start competing over tacked files modifications.

@deseven: Pretty simple regexp such as <lastopen.*> can be used to find it.

A viable cross-platform solution would have to rely on Bash tools like SED and GREP, and since these work mostly on a per-line basis I doubt that there's an easy solution to work on nested XML entries in a clean way. Unless you only want to wipe out the key contents and leave them empty — but this wouldn't prevent the project file from being seen as modified by Git, for different OSs might add or remove some entries. The main issue is not preventing personal data leakage (which of course, we'd like to also avoid) but to prevent Git from detecting changes in the project files, which would interfere with many Git operations (branch checkouts, etc.), and that these non-meaningful changes might get committed (transferring machine specific settings from one user to the others, back and forth).

Other aspects to consider are:

  • Some Git GUI front-ends might not support similar Git functionalities and might simply ignore them.
  • Any commit-related filters would add some overhead to many Git operations (including behind the scenes operations which involves manipulating the repository history).
  • If Git changes the .pbp project file and the PB IDE is still open, the IDE will detect the change and ask to reload the project — which then would bring the whole situation back to square one, with the IDE indicating that the project needs to be saved again, and doing so automatically when exiting the IDE.

The problem is that the PureBasic IDE didn't take version control into account in its design, and trying to work on Git repositories using the PB IDE is going to introduce lots of nasty headaches — unless you simply ignore the project files and other machine specific settings files.

Personally, I'd be surprised to discover that users are actually using the PB IDE on Git repositories, especially since there's a wide range of free and opens source editors which provide powerful modern features like multi-cursor selection, fuzzy matching functionality in all search controls, and many other cool features that a modern programmer shouldn't really be willing to give up, for the sake of productivity.

But, having said that, I agree with you that usage of Git filters/hook is a path worth exploring and I suggest that this Issue might focus on further considerations in that direction — evaluating the pros and cons, and doing some actual tests to see how it works out and if it's a viable solution for a general purpose boilerplate.

Ideally, GitHub templates should offer the possibility to choose from different branches of the boilerplate, so that end users could choose variations of the base template according to need. This would have allowed to offer a template with the Git hooks along with the default boilerplate, but right now this is not an available feature with repository templates (but has been proposed).

IDE Future Changes

@deseven: Even if we'll manage to alter the IDE behavior so it won't leak anything, there are legacy projects designed to be compiled with older versions of PB (such as the latest available ANSI release).

I'm not sure that I follow you here. This is a boilerplate intended for new repositories (regardless of whether their code is old or new), so it doesn't apply to already existing repositories (expect that its maintainer can manually copy over the settings and re-adapt the repository).

As for handling legacy code in new projects, I don't see how this could change the situation.

You can use the latest IDE version to compile any PureBasic 5.x project by simply adding other PBCompilers to the list of available compilers — just like when you compile from the same IDE both x64 and x32 binaries in one go. There is no need to use the IDE that ships with a specific PB version, because newer versions of the IDE are always backward compatible and still support obsolete options (Ascii binaries, etc.). PB Users should always use the latest IDE to handle all projects (old and new) so they can benefit from any IDE bug fixes and improvements.

Since this repository is a boilerplate targeting PB projects, we should assume that its end users are using correctly the language and its tools. But then, again, we shouldn't give for granted that all PB projects are build around the PB IDE either, and that good projects offer IDE-agnostic code that is self-contained and doesn't rely on any specific IDE.

In any case, altering the IDE to introduce an alternative way to save projects info is a separate issue from that of safeguarding a repository from the IDE introducing machine-specific data in commits — indeed, considerations on the latter lead to proposing the former, but right now we have to find a way to protect the average repository from these leaks.

Whether or not the PB IDE will offer alternative project files for version controlled project (as it has been proposed), this is something that lies in the future and is beyond the current needs of this boilerplate.

@deseven
Copy link
Author

deseven commented Jan 11, 2020

While I totally agree that relying on an IDE-specific stuff is wrong (and pbp files are indeed IDE-specific), this would just further alienate most PB users who don't know other options and who will be forced to rely on external scripts or build systems.

Another thing I should add is that you don't have to use any external tools if you have bash:

GLOBIGNORE="*.sample.pbp"
IFS=$'\n'

for project in *.pbp; do
  while read -r line; do
    [[ ! $line =~ \<lastopen.*\> ]] && echo "$line" >> "$project.tmp" || echo "removing $line from $project"
  done < "$project"
  mv -f "$project.tmp" "$project"
done

Still, I had no idea that this problem is being discussed elsewhere, so I'm closing this.

@deseven deseven closed this as completed Jan 11, 2020
@tajmone
Copy link
Collaborator

tajmone commented Jan 11, 2020

Another thing I should add is that you don't have to use any external tools if you have bash

Yes, this is what I meant when I said we'd have to rely on GREP or SED, for these are among the few shell tools that we're sure will be available across all OSs with Git (i.e., that Windows users will have them via Git's Bash).

I'm just not sure that these tools are enough to handle XML files as required.

Among the proposals in the other issue, I suggested providing a .sample.pbp template, which users should copy and rename without the .sample suffix — with all *.pb files being ignored, except for *.sample.pbp matches. This is a fairly good solution, provided that the sample project is stripped of personal info. The main problem is if when need to update the project file, how would users detect that?

@deseven
Copy link
Author

deseven commented Jan 11, 2020

The code above doesn't need grep, sed or any other external tool besides bash itself.

If you're going to use sed, grep or even awk, note that they are not a part of bash or git and their versions differ between various OSes and distros. While GNU version is the most common one, *BSD, macOS and some other unix derivatives use BSD versions and their syntax can be incompatible.

@tajmone
Copy link
Collaborator

tajmone commented Jan 11, 2020

I'm aware of the differences, but for cross platform projects (because of Windows) we need to assume all scripts run under Bash and relying on the tools that ship with Git for Windows.

As I said above, I think that it's worth exploring the usage of Git filters to clean up project files, it's just that I haven't had the time to look into this specific approach and it needs to evaluated and tested.

I had started looking into PB project files, and they can get quite entangled depending on the settings of the project. This is why I think that the optimal solution is to adopt compiler directives instead — but of course, it's up to the end users to choose how to do it.

When providing a boilerplate template, it's always difficult to decide where to draw the line regarding what features to include and which to leave open. Sometimes "less is more", but other times it's contrary (e.g. providing something that the end user can easily remove is better than leaving it out, because deletion if faster than having to type the missing feature).

Right now, I didn't see a huge PB presence on GitHub, GitLab, Bitbucket and other similar services, and in most cases I've noticed that projects contain machine-specific settings (which is an obstacle to collaborative editing). The idea of a template is to provide some ready-made good settings that allow to start a project quickly, but it's not possible to cover all use cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants