Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Good job on your bc implementation. You should add it to wikipedia #59

Open
GeneGeneGitHub opened this issue Dec 4, 2022 · 5 comments

Comments

@GeneGeneGitHub
Copy link

https://en.wikipedia.org/wiki/Bc_(programming_language)

Under history they state there are 3 implementations. Yours would be the 4th, correct?

@gavinhoward
Copy link
Owner

That is correct, and thank you for noticing!

Unfortunately, Wikipedia has a policy of not accepting changes from "primary sources," and by their definition, I'm the primary source for my bc since I wrote it. This creates a funny situation where I can't add my own bc to Wikipedia and have to wait for someone else to do it.

It is mentioned in the footnote though.

And my dc could be added to https://en.m.wikipedia.org/wiki/Dc_(computer_program) as well. But again, someone else has to do it.

@GeneGeneGitHub
Copy link
Author

I didn't see anything in their policies about a primary source being restricted. It looks to me like your programs definitely deserve equal mention alongside the others.

Tell me what to write and I'll add them to wiki if you want.

I happened across your bc by chance (was looking to calculate a large product - result is 57 digits). Saw your very detailed 'Hit by a bus' article. It's good stuff. How much of the code did you write? (it says 'Gavin Howard and contributors'). Did you start fresh from the POSIX specs?

@gavinhoward
Copy link
Owner

gavinhoward commented Dec 5, 2022

I didn't see anything in their policies about a primary source being restricted.

Whoops. I mixed up primary and non-independent sources.

You are correct; there are no real restrictions on primary sources. The real restrictions are on non-independent sources, not primary sources. It's hard to find the policies, but here is probably the rule that would be used to reject my edits. More information is here, but Wikipedia's policies here are convoluted and hard to follow.

However, the second link has a "this page in a nutshell" which says,

Do not edit Wikipedia in your own interests, nor in the interests of your external relationships.

I figure that editing Wikipedia to add my bc and dc would be editing Wikipedia in my own interests.

It looks to me like your programs definitely deserve equal mention alongside the others.

Thank you very much!

Tell me what to write and I'll add them to wiki if you want.

You should only add them if you would like to since you are the independent party. I also think that you should write what you want to write.

But some things that could be useful would be that my bc and dc are now the default in FreeBSD (since the Wikipedia page believes that the OpenBSD ones still are), as well as any differences between mine and others. This might be helpful there, and if you need a list of extensions that my bc and dc have that no other ones have, here it is:

  • An extended math library. (See here for more information.)
  • A command-line prompt.
  • Turning on and off digit clamping. (Digit clamping is about how to treat "invalid" digits for a particular base. GNU bc uses it, and the BSD bc does not. Mine does both.)
  • A pseudo-random number generator. This includes the ability to set the seed and get reproducible streams of random numbers.
  • The ability to use stacks for the globals scale, ibase, and obase instead of needing to restore them in every function.
  • The ability to not use non-standard keywords. For example, abs is a keyword (a built-in function), but if some script actually defines a function called that, it's possible to tell my bc to not treat it as a keyword, which will make the script parses correctly.
  • The ability to turn on and off printing leading zeroes on numbers greater than -1 and less than 1.
  • Outputting in scientific and engineering notation.
  • Accepting input in scientific and engineering notation.
  • Passing strings and arrays to the length() built-in function. (In dc, the Y command will do this for arrays, and the Z command will do this for both numbers and strings.)
  • The abs() built-in function. (This is the b command in dc.)
  • The is_number() and is_string() built-in functions. (These tell whether a variable is holding a string or a number, for runtime type checking. The commands are u and t in dc.)
  • For bc only, the divmod() built-in function for computing a quotient and remainder at the same time.
  • The $ truncation operator. (It's the same in bc and dc.)
  • The @ "set scale" operator. (It's the same in bc and dc.)
  • The decimal shift operators. (<< and >> in bc, H and h in dc.)
  • Built-in functions or commands to get the max of scale, ibase, and obase.
  • The ability to put strings into variables in bc. (This always existed in dc.)
  • The ' command in dc for the depth of the execution stack.
  • The y command in dc for the depth of register stacks.
  • Built-in functions or commands to get the value of certain environment variables that might affect execution.
  • The stream keyword to do the same thing as the P command in dc.
  • Defined order of evaluation.
  • Defined exit statuses.
  • All environment variables other than POSIXLY_CORRECT, BC_ENV_ARGS, and BC_LINE_LENGTH.
  • The ability for users to define their own defaults for various options during build. (See here for more information.)

This may not be a comprehensive list, but it should help. I've also added this list to the README if you need a better link. (All of the links I am giving you are permalinks, so they should be appropriate for Wikipedia.)

I happened across your bc by chance (was looking to calculate a large product - result is 57 digits).

Oh, really? That's HUGE! (The Voyager spacecraft only needed pi to 15 digits, and bc defaults to 20.) If I may ask, what were you doing? After doing bc, I'm exceedingly interested in math that needs big numbers.

Saw your very detailed 'Hit by a bus' article. It's good stuff.

Thank you. :) I'm proud of that article. I wrote it for the sake of the FreeBSD people, since they depend on it now, and I liked it so much that it's now a tradition for me to do on all of my software before they hit production use.

How much of the code did you write? (it says 'Gavin Howard and contributors').

I haven't crunched the numbers before, but here they are. (All of these were done on the current commit, 2893dd2.)

Using git shortlog -s -n, then combining some authors that are referred to in multiple ways for some reason, I get:

  5437  Gavin Howard
   185  Stefan Esser
     7  Zach van Rijn
     7  depler
     3  Brian Callahan
     3  Eugene Gladchenko
     3  Michael Forney
     2  Firas Khalil Khana
     2  Piotr P. Stefaniak
     2  pac
     2  rofl0r
     1  Brooks Davis
     1  Charlie Root
     1  Ethan Sommer
     1  John Regan
     1  Josuah Demangeon
     1  Laurent Bercot
     1  Warner Losh
     1  Xi Ruoyao
     1  bugcrazy

Adding all of that up, there are 5662 commits, and I have 5437 of them. So I made just over 96% of all commits.

In pure lines of code still existing, which I calculated by using the first comment at this link and combining authors again, I get:

 234180 Gavin Howard
   1002 depler
    395 Stefan Esser
    149 bugcrazy
     93 Josuah Demangeon
     92 Laurent Bercot
     24 Piotr P. Stefaniak
     15 rofl0r
     10 Zach van Rijn
     10 Firas Khalil Khana
      9 Brian Callahan
      6 Warner Losh
      5 Eugene Gladchenko
      2 Michael Forney
      2 John Regan
      1 Xi Ruoyao
      1 pac
      1 Brooks Davis

This gives a total of 235,997, of which I have 234,180, which means I have 99.2% of all existing lines of code in the repo.

However, this is not fair for two reasons:

  1. This includes generated lines in manpages.
  2. I usually rewrite contributions after accepting them.

First, let's take away the manpages. I used:

git ls-files  | rg -v "manuals/bc/*" | rg -v "manuals/dc/*" | rg -v "manuals/bcl.3" | \
    xargs -n1 git blame --line-porcelain | sed -n 's/^author //p' | sort -f | uniq -ic | sort -nr

And got (after combining):

 168114 Gavin Howard
   1002 depler
    395 Stefan Esser
    149 bugcrazy
     93 Josuah Demangeon
     92 Laurent Bercot
     24 Piotr P. Stefaniak
     15 rofl0r
     10 Zach van Rijn
     10 Firas Khalil Khana
      9 Brian Callahan
      6 Warner Losh
      5 Eugene Gladchenko
      2 Michael Forney
      2 John Regan
      1 Xi Ruoyao
      1 pac
      1 Brooks Davis

Which gives a total of 169,931, of which I have 168,114, or 98.9%.

Now let's include all lines, included deleted ones, since I rewrite almost everything.

I wrote the following using this link:

#! /bin/bash

authors=$(git log --format='%aN' | sort -u)

IFS=$'\n'

for a in $authors; do

	printf 'For author: %s\n' "$a"

	git log --shortstat --author="$a" | grep -E "fil(e|es) changed" | \
		awk '{files+=$1; inserted+=$4; deleted+=$6} END {print "lines inserted: ", inserted, "\nlines deleted: ", deleted }'

done

And got (after combining):

For author: Brian Callahan
lines inserted:  11
lines deleted:  12
For author: Brooks Davis
lines inserted:  2
lines deleted:  2
For author: bugcrazy
lines inserted:  218
lines deleted:  0
For author: Charlie Root
lines inserted:  2
lines deleted:  4
For author: depler
lines inserted:  1842
lines deleted:  27
For author: Ethan Sommer
lines inserted:  6
lines deleted:  6
For author: Eugene Gladchenko
lines inserted:  7
lines deleted:  2
For author: Firas Khalil Khana
lines inserted:  17
lines deleted:  8
For author: Gavin Howard
lines inserted:  590048
lines deleted:  403713
For author: John Regan
lines inserted:  3
lines deleted:  5
For author: Josuah Demangeon
lines inserted:  108
lines deleted:  0
For author: Laurent Bercot
lines inserted:  108
lines deleted:  0
For author: Michael Forney
lines inserted:  9
lines deleted:  7
For author: pac
lines inserted:  2
lines deleted:  2
For author: Piotr P. Stefaniak
lines inserted:  30
lines deleted:  2
For author: rofl0r
lines inserted:  99
lines deleted:  77
For author: Stefan Esser
lines inserted:  3373
lines deleted:  2070
For author: Warner Losh
lines inserted:  11
lines deleted:  4
For author: Xi Ruoyao
lines inserted:  3
lines deleted:  3
For author: Zach van Rijn
lines inserted:  94
lines deleted:  87

Adding up all of the additions and deletions, we get a total of 1,002,024, of which I have 993,761, which is 99.2%. But that still includes the generated manpages. If I change the git log lines in my script to:

	git log --shortstat --author="$a" -- . ':!manuals/bc' ':!manuals/dc' ':!manuals/bcl.3' | grep -E "fil(e|es) changed" | \
		awk '{files+=$1; inserted+=$4; deleted+=$6} END {print "lines inserted: ", inserted, "\nlines deleted: ", deleted }'

And rerun, I get the following (after combining):

For author: Brian Callahan
lines inserted:  11
lines deleted:  12
For author: Brooks Davis
lines inserted:  2
lines deleted:  2
For author: bugcrazy
lines inserted:  218
lines deleted:  0
For author: Charlie Root
lines inserted:  2
lines deleted:  4
For author: depler
lines inserted:  1842
lines deleted:  27
For author: Ethan Sommer
lines inserted:  6
lines deleted:  6
For author: Eugene Gladchenko
lines inserted:  7
lines deleted:  2
For author: Firas Khalil Khana
lines inserted:  17
lines deleted:  8
For author: Gavin Howard
lines inserted:  289628
lines deleted:  136409
For author: John Regan
lines inserted:  3
lines deleted:  5
For author: Josuah Demangeon
lines inserted:  108
lines deleted:  0
For author: Laurent Bercot
lines inserted:  108
lines deleted:  0
For author: Michael Forney
lines inserted:  9
lines deleted:  7
For author: pac
lines inserted:  2
lines deleted:  2
For author: Piotr P. Stefaniak
lines inserted:  30
lines deleted:  2
For author: rofl0r
lines inserted:  99
lines deleted:  77
For author: Stefan Esser
lines inserted:  3288
lines deleted:  2060
For author: Warner Losh
lines inserted:  11
lines deleted:  4
For author: Xi Ruoyao
lines inserted:  3
lines deleted:  3
For author: Zach van Rijn
lines inserted:  94
lines deleted:  87

The sum is 434,205, of which I have 426,037, which is 98.1%. This seems more realistic to me.

Since I have 96% of the commits and 98% of the changes, my contributions should count somewhere in that range.

Did you start fresh from the POSIX specs?

Yes, but not all at once.

I started fresh from the POSIX specs for the parsing of bc at first. Someone else was making a math library and since I had experience in parsers, they asked me to write the parser for bc for them. I did so, and then I got bored because they weren't making progress on their math library despite working on it six months longer than I had with my parsing.

So I started fresh again from the POSIX specs, for the math this time, and wrote my own math in about two weeks. Needless to say, I took off on my own.

By the way, my git repo says the first commit was January 3, 2018. So these programs are just short of six five years old. (Can't do math when I just wake up.)

PS: Sorry for the long comment. I hope it helps.

@GeneGeneGitHub
Copy link
Author

Give me some days and I'll look into putting your bc on that wiki page under 'Implementations', and remove the entry under 'External Links'

The 57-digit number came from a silly online challenge. Someone claimed the prime factors of that number couldn't be calculated on Windows in any reasonable time. I found a public domain libary for factoring large integers, and it found the 4 primes in 0.057s. Then I used GNU bc to confirm the product of those 4 primes was the 57-digit number. I came across your bc via a Google search for 'bc Linux calculator'. I also confirmed the product of the primes using your bc.

Then I saw how thorough your readme was, how serious you are about development, and that your code was well- commented. In particular I liked this from bc_parse.c:

// Before you embark on trying to understand this code, have you read the
// Development manual (manuals/development.md) and the comment in include/bc.h
// yet? No? Do that first. I'm serious.
//
// The reason is because this file holds the most sensitive and finicky code in
// the entire codebase. Even getting history to work on Windows was nothing
// compared to this. This is where dreams go to die, where dragons live, and
// from which Ken Thompson himself would flee.

ha!

I saw somewhere you said 'So and so insulted my programming skills'. Well, you've written an incredibly difficult program from scratch in C, so I'd say your skills are now top-notch.

@gavinhoward
Copy link
Owner

Take your time! I would simply be grateful to be in the article at all. :)

I love it when arrogance gets proven wrong, especially when it comes to factoring. I like that story, and I'm glad my bc could help.

Er, yeah, that comment is slightly embarrassing now. I was tired from trying to finish the bus factor document and commenting the actual code. But I'm glad you found it funny.

And thank you so much for your compliment. :) I'm glad you think so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants