Skip to content
This repository has been archived by the owner on May 31, 2018. It is now read-only.

Improve source code readability #443

Open
ismaelgv opened this issue Mar 12, 2016 · 45 comments
Open

Improve source code readability #443

ismaelgv opened this issue Mar 12, 2016 · 45 comments

Comments

@ismaelgv
Copy link
Contributor

First of all, I really appretiate your work in the development of this great piece of software. I've moved from yaourt to pacaur recently.

I've been digging in the script source code moved by curiosity. I got some experience in bash scripting but I found difficult to follow the application workflow. I'd like to contribute improving readability of the code if you are interested, this could ease maintenance and future contributions to the script functionality.

Basically, I'd focus on these points:

  • Add more verbose comments to the code to clarify each step
  • Use some comment banners to divide each section clearly
  • Break bash lines to improve readability (limit to 80 char each line)
  • Unroll some conditionals to improve readability
  • Use some bashisms

I've found a short and interesting bash style guide if you want to take a look.

More references:

@rmarquis
Copy link
Owner

Hi, thanks for your interest!

You are right, there are several readability issues with the current code. It lacks comments, some parts need to be refactored or put in separate functions, and in the current state pacaur can only be maintained efficiently by myself. I would however not adhere to the 80 char per line rule, although I do agree that long lines are probably an indication that the code could be better decoupled in sub-functions.

There are also others issues I'd like to solve in a neat manner:

I've done some initial cleaning in the pacaur50 branch, and have some work in progress to get rid of the internal json parsing code and replace it by cower calls instead. Also, the idea to move the entire code to python came to my mind, but I'm not sure if I'd like to spend much time in a complete rewrite.

Feel free to fork and improve upon the actual code, or ask any questions about the code. An external pair of eyes is always welcome.

@rmarquis rmarquis added this to the 5.x - new features milestone Mar 12, 2016
@rmarquis rmarquis added the todo label Mar 12, 2016
@ismaelgv
Copy link
Contributor Author

Thanks for you fast answer!

First, I'm going to start reading your code to add some commentaries. I'd rather understand well the code before refactor any piece of it. I'll do small pull request if you don't mind in order to keep everything tidy and manageable.

I'm messing around with Python these days in my job (matplotlib, numpy, pyqt5, ...) but I am not sure if it will bring any advantage to this application. I think it is really fast and reliable right now.

Additionally, you could break script into some subscripts, then you could source them from the main script (pacaur executable).

Edit: About 80 char lines it is only a suggestion, you can always use 120 char or another random number of characters. I think it is a good practise to limit lines, it could help you to notice when to create a new function or think in another solution.

@rumpelsepp
Copy link
Contributor

I'm messing around with Python these days in my job (matplotlib, numpy, pyqt5, ...) but I am not sure if it will bring any advantage to this application. I think it is really fast and reliable right now.

I do very much Python at work and I am also not sure it is worth the effort. pacaur is "just" a nice wrapper for pacman which creates a unified and clean CLI for pacman, makepkg, cower and friends. I'm afraid rewriting pacaur in Python gains us nothing. :/

@rmarquis
Copy link
Owner

Thanks to both of you for your feedback. Yes, python might not substantially add gains, but there would be several issues it could tremendously help with:

  • easier tracking of dependency structure for error reporting, where I hit a wall with the current bash design (improve dependency error message output #173)
  • ability to use a better json parsing library and pyalpm directly, instead of C libraries (expac, cower) that break at each pacman major release.
  • testing suite, to avoid regressions,
  • beautiful code, because python.

For the record, I initially hesitated between bash and python when starting pacaur a few years ago, and went with bash because you couldn't be reliable without a bash parser at the time. Since .SRCINFO have now been implemented, the AUR RPC allows to directly query dependencies instead of parsing PKGBUILD so python wouldn't be an issue anymore.

But right, moving to python would probably not be worth the time. It would still be a preferred choice if someone would start a similar wrapper today.

Lastly, keeping an eye on aurutils by @AladW is worth it. It's basically a modular helper that use a different approach (it uses a local repository instead of doing the build as you would do it manually, like pacaur does) with a bunch of nice ideas. Still experimental but overall much cleaner.

@rmarquis
Copy link
Owner

rmarquis commented Mar 17, 2016

The more I look at the current code, the more I want to ditch it completely. Reading the bash style links above also give a few reasons why bash isn't the proper language for a robust and readable code. Data structure would be very welcome here, as pacaur currently relies on a different arrays to mimic it. Debugging would also be easier, and long term maintenance by someone else than myself would be less of an issue.

If the code should be reworked to make it easier to read and maintain, why not doing it in python?

Looking at the existing python code, I see that there are 3 different AUR rpc interface library available:

  • python3-aur, which is up-to-date but somewhat bloated (using a SQLite db, really?). It's also developed behind closed doors which I dislike.
  • pkgbuilder aur lib, which was developed specifically for the helper of the same name. It's very tied in to its code, and not usable as is.
  • python-aur, which is not up-to-date (still using RPCv1) but seems to give returned data a nice structure, easy to use.

However, before any possible rewrite the question of the real need of a new pacaur version arises. I might as well ditch it for one of the competitors on the longer term - in some way the cost of maintaining a popular helper is really not worth the hassle.

Among the existing helpers, only 3 or 4 are actually reliable:

  • pacaur (bash), which handles complex situations pretty well, and has all the downside already mentioned in this issue.
  • bauerbill (python), which basically outputs bash scripts that will execute makepkg commands. It does use a fork of makepkg, which is something I simply dislike - although I understand the reasons behind that decision. The development is again here done behind closed door which I dislike.
  • pkgbuilder (python), which has a "traditional" design somewhat similar to pacaur. It has a few downside regarding split packages but is otherwise quite good imho,
  • aurutils (bash), which is still experimental but tries to use a different design, with a local repository. It's also modular, but it also becomes complex quite fast as the development progress.

In a nutshell, I'm still on the fence here. At the moment I'll still maintain and fix bugs in pacaur, but I'm not sure I really want to continue to work on it in the current state - especially if a better, cleaner design emerges in the near future (aurutils? a new pkgbuilder version?).

@AladW
Copy link
Contributor

AladW commented Mar 18, 2016

Reading the bash style links above also give a few reasons why bash isn't the proper language for a robust and readable code. Data structure would be very welcome here, as pacaur currently relies on a different arrays to mimic it.

It really depends on how you use the language - bash isn't very good with arrays, so focusing on that isn't good for effectiveness. Bash is however good at the stdin/stdout idea: piping programs and functions together, and using the filesystem to process data.

pkgbuilder (python), which has a "traditional" design somewhat similar to pacaur. It has a few downside regarding split packages but is otherwise quite well imho,

I agree, though my gripe with it is that it has no interactive inspection of packages.

aurutils (bash), which is still experimental but tries to use a different design, with a local repository. It's also modular, but it also becomes complex quite fast as the development progress.

Sometimes I get a bit too excited, but I've cut out experimental code and split up/refactored things further. Thanks for the reminder on keeping things simple. ;)

@rmarquis
Copy link
Owner

Yes, you are right. Pacaur has never been designed with the pipe idea in mind. This is another reason why aurutils is interesting. Also, keeping things simple on the long term is always difficult... so keep up the good work here!

The lack of interactive inspection of pkgbuilder is a design decision from the maintainer. I don't understand the rational behind that missing feature, but pkgbuilder is still one of the most interesting design out there. I guess I prefer it to the more "brute force" design of bauerbill that writes bash scripts each and every time.

@rumpelsepp
Copy link
Contributor

Since a few days ago I am wondering if it would make sense to redesign pacaur as some wrapper to aur-utils? That would create the script more modular and readable and keep the well known CLI of pacman/pacaur.

@rmarquis
Copy link
Owner

rmarquis commented Apr 2, 2016

Yes, this might be a possibility. The drawbacks I see are that

  • aurutils is slightly more complex to use
  • it currently lacks maturity
  • I have no idea how easy or how difficult the "fast workflow" concept can be applied to it, and
  • I personally have no motivation to do that myself. But if someone wants to do it reusing some of pacaur code, feel free to be my guest here.

@AladW
Copy link
Contributor

AladW commented Apr 9, 2016

I hope the maturity is somewhat accelerated by the absurd test-cases people have since thrown at aurutils, including packages which violate PKGBUILD(5) (depends with spaces in them, breaking tsort) and the AUR guidelines (pkgname which equals an official pkgbase, breaking the split packages code). 😬

Regarding the fast workflow, pacaur precalculates provides and presents the various choices at the beginning of the build. I don't know how difficult this would be to implement with a modular helper. Or you could keep using pacaur in parallel by setting the PKGDEST and PACMAN variables.

On the flip-side, you get all build files in one window (using PAGER, or vifm when installed), rather than prompting for each PKGBUILD/.install file as with pacaur.

Either way, if someone wanted to make an aurutils wrapper, perhaps it would be best to create a separate project for it and discuss it there.

@ismaelgv
Copy link
Contributor Author

What about splitting the script into subscripts in order to improve readability? Nothing complicated, just divide the script into several files and source them from the main script. For example:

  • Variable configuration
  • Functions (can be divided into several files by its functionality)
  • Main script

@rmarquis
Copy link
Owner

Indeed, part of the file could be split in /usr/share/pacaur/ like makepkg does. But as I feel it, there is no reason to do that unless a serious overhaul of the code takes place too.

@AladW To be honest, I consider the precalculation of provides and conflicts the ugliest part of pacaur. It's slow, terribly inefficient, hard to maintain. It's the part that required the most work to mature too. I wish I could reuse some pacman code easily instead of duplicating my own soup here.

@ismaelgv
Copy link
Contributor Author

I think that dividing pacaur into several subscripts is the proper way to start improving source code readability. Later, a deep overhaul of the code would be easier by people who wants to contribute. I just see this as the obvious starting point to improve the script in the future.

@ismaelgv
Copy link
Contributor Author

I think makepkg is a nice example as you said. Code structure seems coherent and clean.

Here a nice code fragment of makepkg/util/pkgbuild.sh:

get_pkgbuild_attribute() {
    # $1: package name
    # $2: attribute name
    # $3: multivalued
    # $4: name of output var

    local pkgname=$1 attrname=$2 isarray=$3 outputvar=$4

    printf -v "$outputvar" %s ''

    if [[ $pkgname ]]; then
        extract_global_variable "$attrname" "$isarray" "$outputvar"
        extract_function_variable "package_$pkgname" "$attrname" "$isarray" "$outputvar"
    else
        extract_global_variable "$attrname" "$isarray" "$outputvar"
    fi
}

##
#  usage : get_full_version()
# return : full version spec, including epoch (if necessary), pkgver, pkgrel
##
get_full_version() {
    if (( epoch > 0 )); then
        printf "%s\n" "$epoch:$pkgver-$pkgrel"
    else
        printf "%s\n" "$pkgver-$pkgrel"
    fi
}

Check commenting here. I think that specifying input parameters and the output of each function is a really nice idea.

@rmarquis
Copy link
Owner

I think that specifying input parameters and the output of each function is a really nice idea.

Definitely. And you're right, better start to split it the code right away. Looking at makepkg code, I'd suggest to start the following way:

/usr/bin/pacaur:

  • gettext initialization
  • variables initialization
  • subroutines sourcing (pacaur specific functions + reusing makepkg functions when possible)
  • getopt parser
  • usage()
  • version()
  • signal trap

/usr/share/pacaur/*.sh

  • main functions
  • utils functions
  • dependency functions
  • VSC handling functions
  • cache handling functions
  • etc.

This would occur in a new branch with the sole purpose of cleaning/splitting the code, while the current master branch is used as usual for maintenance.

Adjusting code style to makepkg style could be a good idea too.

@rmarquis
Copy link
Owner

Note: I have no time for this. If someone thinks he can kickstart this rewrite/code cleaning in a branch or fork, please do.

@rmarquis rmarquis added community and removed todo labels May 19, 2016
@rmarquis rmarquis modified the milestones: 5.0.x - later, 4.6.x - maintenance May 19, 2016
@rmarquis rmarquis modified the milestones: 4.7.x - new features, 5.0.x - later Oct 1, 2016
@ismaelgv
Copy link
Contributor Author

ismaelgv commented Oct 3, 2016

Just to note it, I had some difficulties to properly rebase my branches when the master code changes. I think that the code clean up and division into different files would be beneficial in this aspect to people who wants to contribute.

I want to start working on this issue soon.

@AladW
Copy link
Contributor

AladW commented Jan 29, 2017

Those .in files are files preprocessed by make, which makes little sense here considering pacaur has no Makefile or generally macros to expand. Otherwise you should probably first ensure pacaur functions themselves are modular, i.e. work as filters, rather than simply split them to separate files to be all sourced by a main script which is a mostly superficial change (compare yaourt which has a similar layout, yet is the most unreadable thing since the invention of scripture). libmakepkg didn't always get this right either, e.g. lint_package/build_references which hardcodes variables only available in makepkg.

Another good way to increase readability is remove "home brew" code such as the color functions or more controversially the json parser; much of those are available in libmakepkg or other general-purpose tools. That might in fact be the best starting point for this whole operation.

@ismaelgv
Copy link
Contributor Author

@AladW I see this step as the most reasonable before really changing the actual code workflow. Making aggressive change will surely introduce bugs. This will allow us to work in the code with notably less conflicting commits since the files are separated. I agree with you that the next step is ensuring modularity of the functions.

@AladW
Copy link
Contributor

AladW commented Jan 29, 2017

Sure it will break things. Isn't that why it's put on a separate branch? ;)

@ismaelgv
Copy link
Contributor Author

ismaelgv commented Jan 29, 2017

I'm more comfortable working with small changes and testing for bugs than making very big changes try to figure out where the bugs are introduced. Anyways, you are right this is a testing branch, let's have some fun. 😄

@rmarquis rmarquis mentioned this issue Jan 29, 2017
@rmarquis
Copy link
Owner

rmarquis commented Jan 29, 2017

Another good way to increase readability is remove "home brew" code such as the color functions or more controversially the json parser; much of those are available in libmakepkg or other general-purpose tools.

There is a ticket for the color functions and such (#422), so that's definitely something to go for. If the json parser get replaced by jshon, we might as well use others external tools to replace internal working (this includes pacutils, aurutils and maybe others). I've been skeptic about adding new dependencies (jshon), but if we go all-in for it and leverage tools that have been created the past few months/few years... why not.

@ismaelgv
Copy link
Contributor Author

I'd like to start documenting the functions using makepkg style as mentioned before in this issue. This will help me to better understand the program workflow and maybe give us some insight of what can be replaced with external tools.

@rmarquis
Copy link
Owner

Sure. You'll probably need (lot of) my help here, so just ask when something is badly undocumented.

@ismaelgv
Copy link
Contributor Author

ismaelgv commented Feb 9, 2017

Should I wait for a code freeze before keep working on this? I'd have to manually merge every commit from master since files are different.

@rmarquis
Copy link
Owner

rmarquis commented Feb 9, 2017

It is unlikely code will be completely frozen. All I can say is that there will not be major changes in the 4.7.x branch, only bug fixes.

@rmarquis
Copy link
Owner

rmarquis commented Feb 9, 2017

@ChuckDaniels87 If this is too inconvenient, you might consider not backporting any changes and just focusing on getting the structure right and documenting/clarifying the current code. When the time is right, we could duplicate your work on master again.

Not ideal, but changing the whole structure of an on-going project isn't easy :/

@ismaelgv
Copy link
Contributor Author

ismaelgv commented Feb 9, 2017

@rmarquis If there will be only bug fixes I think I can handle backporting the changes. I'm busy these days but I'll try to keep new branch up to date.

@rmarquis
Copy link
Owner

rmarquis commented Feb 9, 2017

@ChuckDaniels87 If that is of any help, I hope the current regressions introduced with 4.7.0 will be sorted out quickly, so we'll go back to the usual bi-weekly/monthly minor bugfixes only.

@ismaelgv
Copy link
Contributor Author

I've taken a look to the last commits and I think half of them can be cherry-picked. I'd like to update the branch this week if I find some time.

@ismaelgv
Copy link
Contributor Author

ismaelgv commented Jul 6, 2017

Since you are planning a major overhaul of the code for pacaur 5 and later, I think I should focus on documenting new code instead of old code on the pacaur5 branch. What do you think about it?

I will port new commits to pacaur5 this weekend to keep branches on sync. I have been very busy lately.

@rmarquis
Copy link
Owner

rmarquis commented Jul 7, 2017

Yes, definitely. The unexpected arrival of auracle kinda make me change plans here, sorry about that.

To give you a rough idea of what's coming, here is how I think we'll handle it:

  1. master branch : 4.7.x bugfixes as usual
  2. 4.8.x branch: simple transition to auracle to replace cower (info and search, color code adjustment). The current architecture of cower produces an immense amount of log data in the AUR server, and while pacaur itself isn't responsible directly, its popularity contributes to it by having cower as dependency. Also need to prepare deprecation of the current AUR only commands set (see deprecate AUR specific short command options #712).
  3. 4.9.x branch: make full use of auracle features (solver replacement), and do all required adjusments.
  4. 5.x.x branch: as planned in complete overhaul and long term plan #708.

Auracle still needs to mature before step 1 is possible, but I don't expect it to reach stable release before 2 or 3 months as falconindy is quite busy - though it might arrives sooner. Step 2 and 3 might be merged in a single step, or done gradually through the 5.x. series.

I agree about documenting the revisited code, but if you'd like to have an early start, here are the functions that are the least susceptible to change (though I'd like to make them functional rather than linear):

IgnoreChecks()
IgnoreDepsChecks()
ProviderChecks()
ConflictChecks()
ReinstallChecks()
OutofdateChecks()
OrphanChecks()
Prompt()
EditPkgs()
MakePkgs()
CleanCache()
GetBuiltPkg()
Proceed()

MakePkgs() will be completely revisited if we go the local repo route, but it's not likely to change much on the foreseeable future. ProviderChecks() and ConflictChecks() are terribly ugly and hard to understand - I don't even want to look at them myself.

@ismaelgv
Copy link
Contributor Author

ismaelgv commented Jul 7, 2017

I fully agree with the schedule you proposed. If auracle is fully compatible with the cower functionality that pacaur uses, we should start working on the 4.8.x branch to test the changes on a controlled environment.

Auracle still needs to mature before step 1 is possible, but I don't expect it to reach stable release before 2 or 3 months as falconindy is quite busy - though it might arrives sooner. Step 2 and 3 might be merged in a single step, or done gradually through the 5.x. series.

I prefer the idea of merging these steps. Since step 2 will introduce many code changes, we could start working directly on the restructured code (pacaur5 branch) and at the same time clean up many parts of it. I don't know if you will like this idea: you could release a pacaur5-beta after these changes to test the new pacaur on the wild. We are introducing many changes and this probably will introduce some bugs that won't be easily discovered just by two of us testing the code. I am sure that there will be some people willing to test the beta if we advertise it on the forums and other places such as reddit.

I agree about documenting the revisited code, but if you'd like to have an early start, here are the functions that are the least susceptible to change (though I'd like to make them functional rather than linear):

So, do you want to split/extract some functions into small and more generic functions? If that is the idea, we can create a little toolbox in a separate file or in utils.sh.

Besides, I will take a look on these functions and try to comment what it is not clear. That could also help on the future code overhaul.

@rmarquis
Copy link
Owner

rmarquis commented Jul 7, 2017

we should start working on the 4.8.x branch to test the changes on a controlled environment.

Yes, but auracle isn't ready yet. It is currently missing custom output formatting that pacaur uses for its --info output. Sorting is also absent - that is used for both search and info.

I prefer the idea of merging these steps.

I see the above 4.9.x as very low risk too. Replacing the solver is a matter of removing the existing solver code and adjusting the old code pacaur used when it was relaying on cower internal solver. I'm more concerned about the deprecation of short commands transition, but that should be done before the 5.x release in any case.

Actually, it would even be possible to merge steps one and two in a single step. I've done the distinction because I'm concerned that I might not have the required time. I'm quite busy at the moment, and I might be offline soon (travelling abroad for some time), so step 1 is more an approach to get rid of cower quickly with the minimum amount of work.

I don't know if you will like this idea: you could release a pacaur5-beta after these changes to test the new pacaur on the wild.

In the past, I've used the pacaur-git package as test release, with new tags pushed only the the git branch (4.0.0, 4.0.1, 4.0.2, ..), the stable pacaur package being stuck to 3.x. In some way, it makes sense as quite a number of people are already using pacaur-git and are de-facto beta testers - if I were to trust GH traffic data, I'd say that branch has already more than 160 people using it. We could surely increase that number easily with a bit of advertising.

I'm also fine with fixing discovered bugs in the develoment code only. Backporting critical fixes in the stable version is also always possible.

So, do you want to split/extract some functions into small and more generic functions?

The idea would make them rely on a passed parameter and adding checks, instead of working with global variables that are assummed to be always available - that would help for easing debugging. I haven't really looked at it yet, but there are quite some duplicate code here and there that could be refactored in more generic subfunctions.

Besides, I will take a look on these functions and try to comment what it is not clear. That could also help on the future code overhaul.

Yes. I really need an external pair of eyeballs here, since what is obvious to me isn't necessary obvious to contributors.

@rmarquis
Copy link
Owner

Another aspect in favor of releasing 4.8/4.9 early: I'd like to reuse the parseopts.sh subroutine of libmakepkg, as using getops gives me headaches. This routine is much more flexible and will result in cleaner code than the current implementation, but it is only available starting from pacman 5.1.

This also means having a separate pacaur5 beta package is necessary, as the pacman-git dependency will be a requirement. No ETA for pacman 5.1 release yet, but since we had a minor pacman update recently it will likely not happen before a few months.

@ismaelgv
Copy link
Contributor Author

Porting changes to pacaur5 branch is sometimes tedious since you cannot cherry pick many commits after the code reorganization. However, you are right here if these changes are relatively easy to do.

I will try to keep that branch updated and port the changes as soon as you make them. It would be nice if you can take a look to these commits since the process could be a little error-prone.

@rmarquis
Copy link
Owner

Once 4.8/4.9 are released, I think it will be easier to split the code on master in a similar way you did on the pacaur50 branch and port the doc from there, than to cherry-pick the changes individually from master. With the solver replacement, part of main.sh, much of deps.sh and all of json.sh code will be gone. I'd consider focusing on doc exclusively, as well as the main code structure.

On a sidenote, I'd move CheckUpdates() from checks.sh to aur.sh, as this is a main operation in contrast with other check subrountines used for the dependency tree preparation.

@ismaelgv
Copy link
Contributor Author

Once 4.8/4.9 are released, I think it will be easier to split the code on master in a similar way you did on the pacaur50 branch and port the doc from there, than to cherry-pick the changes individually from master. With the solver replacement, part of main.sh, much of deps.sh and all of json.sh code will be gone. I'd consider focusing on doc exclusively, as well as the main code structure.

I will try to keep pacaur5 up to date if the changes are not so drastic. I think is better to port them to the current branch instead of splitting the code again. Cleaning up removed code is the easiest part.

On a sidenote, I'd move CheckUpdates() from checks.sh to aur.sh, as this is a main operation in contrast with other check subrountines used for the dependency tree preparation.

Done.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants