Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit the growth of string length when calling Mudder repeatedly but one-at-a-time #7

Closed
fasiha opened this issue Aug 2, 2019 · 12 comments

Comments

@fasiha
Copy link
Owner

fasiha commented Aug 2, 2019

Via email:

Imagine you have an initial list with 100 items. You then want to add items 1 by 1 to the list. You add 10,000 items 1 by 1. If for each item, you use the method for finding the one after it. The string length grows extremely quickly and winds up in the form:

  'zzzzzU',
  'zzzzzj',
  'zzzzzr',
  'zzzzzv',
  'zzzzzx',
  'zzzzzy',
  'zzzzzyV',
  'zzzzzyk',
  'zzzzzys',
  'zzzzzyw',
  'zzzzzyy',
  'zzzzzyz',
  'zzzzzyzV',.....
@fasiha fasiha closed this as completed in f4be889 Aug 3, 2019
@fasiha
Copy link
Owner Author

fasiha commented Aug 3, 2019

Per my email describing the fix:

function demo(numDivisions) {
  var arr = ['yo'];
  for (var i = 0; i < 100; i++) {
    arr.push(mudder.base62.mudder(arr[arr.length - 1], '', 1, undefined, numDivisions)[0]);
  };
  return arr.join(' ');
}

console.log(demo(undefined));

yo z zU zj zr zv zx zy zyz zz zzU zzj zzr zzv zzx zzy zzyz zzz zzzU zzzj zzzr zzzv zzzx zzzy zzzyz zzzz zzzzU zzzzj zzzzr zzzzv zzzzx zzzzy zzzzyz zzzzz zzzzzU zzzzzj zzzzzr zzzzzv zzzzzx zzzzzy zzzzzyV zzzzzyk zzzzzys zzzzzyw zzzzzyy zzzzzyz zzzzzyzV zzzzzyzk zzzzzyzs zzzzzyzw zzzzzyzy zzzzzyzz zzzzzyzzV zzzzzyzzk zzzzzyzzs zzzzzyzzw zzzzzyzzy zzzzzyzzz zzzzzyzzzV zzzzzyzzzk zzzzzyzzzs zzzzzyzzzw zzzzzyzzzy zzzzzyzzzz zzzzzyzzzzV zzzzzyzzzzk zzzzzyzzzzs zzzzzyzzzzw zzzzzyzzzzy zzzzzyzzzzz zzzzzyzzzzzV zzzzzyzzzzzk zzzzzyzzzzzs zzzzzyzzzzzw zzzzzyzzzzzy zzzzzyzzzzzz zzzzzyzzzzzzV zzzzzyzzzzzzk zzzzzyzzzzzzs zzzzzyzzzzzzw zzzzzyzzzzzzy zzzzzyzzzzzzz zzzzzyzzzzzzzV zzzzzyzzzzzzzk zzzzzyzzzzzzzs zzzzzyzzzzzzzw zzzzzyzzzzzzzy zzzzzyzzzzzzzz zzzzzyzzzzzzzzV zzzzzyzzzzzzzzk zzzzzyzzzzzzzzs zzzzzyzzzzzzzzw zzzzzyzzzzzzzzy zzzzzyzzzzzzzzz zzzzzyzzzzzzzzzV zzzzzyzzzzzzzzzk zzzzzyzzzzzzzzzs zzzzzyzzzzzzzzzw zzzzzyzzzzzzzzzy zzzzzyzzzzzzzzzz zzzzzyzzzzzzzzzzV

with the following:

console.log(demo(10));

yo yv z z6 zB zG zK zO zR zU zX zZ zb zd zf zh zi zj zk zl zm zn zo zp zq zqz zr zrt zs zsn zt zth zu zub zv zvU zvv zw zwO zwk zx zxI zxY zxn zy zyC zyN zyX zyg zyo zyv zz zz6 zzB zzG zzK zzO zzR zzU zzX zzZ zzb zzd zzf zzh zzi zzj zzk zzl zzm zzn zzo zzp zzq zzqz zzr zzrt zzs zzsn zzt zzth zzu zzub zzv zzvU zzvv zzw zzwO zzwk zzx zzxI zzxY zzxn zzy zzyC zzyN zzyX zzyg zzyo zzyv zzz

In the first example, each time you ask mudder for a string, it halves the remaining space at the end of the lexical space, leading to the awful wasteful result.

In the second example, you still ask mudder for a single string, but now you ask it to subdivide the remaining space into ten pieces. So each successive string is, instead of being 50% to the end of the lexical space, only 10%.

In your situation, where you might want to insert a million entries:

console.log(demo(1e6));

// yo yo00H yo00Y yo00p yo01 yo01H yo01Y yo01p yo02 yo02H yo02Y yo02p yo03 yo03H yo03Y yo03p yo04 yo04H yo04Y yo04p yo05 yo05H yo05Y yo05p yo06 yo06H yo06Y yo06p yo07 yo07H yo07Y yo07p yo08 yo08H yo08Y yo08p yo09 yo09H yo09Y yo09p yo0A yo0AH yo0AY yo0Ap yo0B yo0BH yo0BY yo0Bp yo0C yo0CH yo0CY yo0Cp yo0D yo0DH yo0DY yo0Dp yo0E yo0EH yo0EY yo0Ep yo0F yo0FH yo0FY yo0Fp yo0G yo0GH yo0GY yo0Gp yo0H yo0HH yo0HY yo0Hp yo0I yo0IH yo0IY yo0Ip yo0J yo0JH yo0JY yo0Jp yo0K yo0KH yo0KY yo0Kp yo0L yo0LH yo0LY yo0Lp yo0M yo0MH yo0MY yo0Mp yo0N yo0NH yo0NY yo0Np yo0O yo0OH yo0OY yo0Op yo0P

These are the first hundred strings you'd get while subdividing a million-fold at each string.

@joaonice
Copy link

Hi @fasiha. Imagine the opposite situation.

Imagine you have an initial list with 100 items. You then want to add items 1 by 1 to the beginning of the list. You add 10,000 items 1 by 1. If for each item, you use the method for finding the first element. The string length grows extremely quickly and winds up in the form:

With the demo function like this:

function demo(numDivisions) {
  var arr = ['yo'];
  for (var i = 0; i < 100; i++) {
    arr.unshift(mudder.base62.mudder('', arr[0], 1, undefined, numDivisions)[0]);
  };
  return arr.join(' ');
}

Output for demo(undefined):

00000000000000000001 00000000000000000003 00000000000000000007 0000000000000000000F 0000000000000000000V 0000000000000000001 0000000000000000003 0000000000000000007 000000000000000000F 000000000000000000V 000000000000000001 000000000000000003 000000000000000007 00000000000000000F 00000000000000000V 00000000000000001 00000000000000003 00000000000000007 0000000000000000F 0000000000000000V 0000000000000001 0000000000000003 0000000000000007 000000000000000F 000000000000000V 000000000000001 000000000000003 000000000000007 00000000000000F 00000000000000V 00000000000001 00000000000003 00000000000007 0000000000000F 0000000000000V 0000000000001 0000000000003 0000000000007 000000000000F 000000000000V 000000000001 000000000003 000000000007 00000000000F 00000000000V 00000000001 00000000003 00000000007 0000000000F 0000000000V 0000000001 0000000003 0000000007 000000000F 000000000V 000000001 000000003 000000007 00000000F 00000000V 00000001 00000003 00000007 0000000F 0000000V 0000001 0000003 0000007 000000F 000000V 000001 000003 000007 00000F 00000V 00001 00003 00007 0000F 0000V 0001 0003 0007 000F 000V 001 003 007 00F 00V 01 03 07 0F 0V 1 3 7 F U yo

Output for demo(100):

000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000N 00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000c 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001 000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000002 00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000004 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000008 00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000E 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000N 000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000c 00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000002 000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000004 00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000008 000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000E 00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000N 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000c 000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001 00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000002 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000004 000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000008 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000E 000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000N 00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000c 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000001 000000000000000000000000000000000000000000000000000000000000000000000000000000000000002 00000000000000000000000000000000000000000000000000000000000000000000000000000000000004 0000000000000000000000000000000000000000000000000000000000000000000000000000000000008 00000000000000000000000000000000000000000000000000000000000000000000000000000000000E 0000000000000000000000000000000000000000000000000000000000000000000000000000000000N 000000000000000000000000000000000000000000000000000000000000000000000000000000000c 00000000000000000000000000000000000000000000000000000000000000000000000000000001 0000000000000000000000000000000000000000000000000000000000000000000000000000002 000000000000000000000000000000000000000000000000000000000000000000000000000004 00000000000000000000000000000000000000000000000000000000000000000000000000008 000000000000000000000000000000000000000000000000000000000000000000000000000E 00000000000000000000000000000000000000000000000000000000000000000000000000N 0000000000000000000000000000000000000000000000000000000000000000000000000c 000000000000000000000000000000000000000000000000000000000000000000000001 00000000000000000000000000000000000000000000000000000000000000000000002 0000000000000000000000000000000000000000000000000000000000000000000004 000000000000000000000000000000000000000000000000000000000000000000008 0000000000000000000000000000000000000000000000000000000000000000000E 000000000000000000000000000000000000000000000000000000000000000000N 00000000000000000000000000000000000000000000000000000000000000000c 0000000000000000000000000000000000000000000000000000000000000001 000000000000000000000000000000000000000000000000000000000000002 00000000000000000000000000000000000000000000000000000000000004 0000000000000000000000000000000000000000000000000000000000008 00000000000000000000000000000000000000000000000000000000000E 0000000000000000000000000000000000000000000000000000000000N 000000000000000000000000000000000000000000000000000000000c 00000000000000000000000000000000000000000000000000000001 0000000000000000000000000000000000000000000000000000002 000000000000000000000000000000000000000000000000000004 00000000000000000000000000000000000000000000000000008 000000000000000000000000000000000000000000000000000E 00000000000000000000000000000000000000000000000000N 0000000000000000000000000000000000000000000000000c 000000000000000000000000000000000000000000000001 00000000000000000000000000000000000000000000002 0000000000000000000000000000000000000000000004 000000000000000000000000000000000000000000008 0000000000000000000000000000000000000000000E 000000000000000000000000000000000000000000N 00000000000000000000000000000000000000000c 0000000000000000000000000000000000000001 000000000000000000000000000000000000002 00000000000000000000000000000000000004 0000000000000000000000000000000000008 00000000000000000000000000000000000E 0000000000000000000000000000000000N 000000000000000000000000000000000c 00000000000000000000000000000001 0000000000000000000000000000002 000000000000000000000000000004 00000000000000000000000000008 000000000000000000000000000E 00000000000000000000000000N 0000000000000000000000000c 000000000000000000000001 00000000000000000000002 0000000000000000000004 000000000000000000008 0000000000000000000E 000000000000000000N 00000000000000000c 0000000000000001 000000000000002 00000000000004 0000000000008 00000000000E 0000000000N 000000000c 00000001 0000002 000004 00008 000D 00M 0b yo

The problem is, I need to create new items and insert them at the top of the list. What's your suggestion? Thank you

@fasiha fasiha reopened this May 19, 2020
@fasiha
Copy link
Owner Author

fasiha commented May 19, 2020

@joaonice thanks for reaching out with this knotty problem. Consider this:

function demo(numDivisions) {
  var arr = ['yo'];
  for (var i = 0; i < 50; i++) {
    arr.unshift(mudder.base62.mudder(arr[0], '0', 1, undefined, numDivisions)[0]);
  };
  return arr.join(' ');
}

console.log(demo())
// 0000000001 0000000003 0000000007 000000000F 000000000V 000000001 000000003 000000007 00000000F 00000000V 00000001 00000003 00000007 0000000F 0000000V 0000001 0000003 0000007 000000F 000000V 000001 000003 000007 00000F 00000V 00001 00003 00007 0000F 0000V 0001 0003 0007 000F 000V 001 003 007 00F 00V 01 03 07 0F 0V 1 3 7 F U yo

console.log(demo(100))
// B C D E F G H I J K L M N O P Q R S T U V W X Y Z a b c d e f g h i j k l m n o p q r s t u v w x y yo

The difference between this demo and yours is

  • you have mudder('', arr[0], ...), whereas
  • I have mudder(arr[0], '0', ...).

With this change, you get the expected behavior. This change is important because Mudder treats the numDivisions in a really dumb way: it assumes you want to split the space between start and end into numDivisions divisions and then take a step from start towards end.

So you were asking Mudder: start='' (which for the base62 symbol table implies start='0'), end='yo', split the space between them into 100 divisions, and give me the first step from start to `end. So you take a tiny step from '0', and that eventually leads to the extremely long strings you see: 🙅‍♂️.

Meanwhile, I'm asking Mudder: start='yo', end='0', and the rest is the same: split the space between them into 100 divisions and give me the first step from start to end. This asks Mudder to take a tiny step from 'yo', which leads to what you expect to see.

Hopefully this solves your problem?


I do agree that this is 😑… I will update the documentation to stress this directionality in the behavior of numDivisions but it's frustrating that, to take advantage of numDivisions, you have to know the "leftmost" value in your symbol table ('0' in base62's case above), whereas otherwise you could just leave it empty and Mudder would figure out the default argument for you.

It would be nice if you could ask Mudder for start='', end='yo', numDivisions=100, and some other argument to tell it whether to go from startend or vice versa to get the behavior you want… I'm not super-excited to add yet another argument to mudder though.

How annoying is it to have to specifically ask to go from yo'0' and lose the inference of the default value? I'll leave this issue open for more comments.

fasiha added a commit that referenced this issue May 19, 2020
@fasiha
Copy link
Owner Author

fasiha commented May 19, 2020

@joaonice what do you think of the extra documentation per 8682db1 ?

As I said above, I'll leave this issue open for further comments!

@fasiha fasiha closed this as completed May 28, 2020
@pie6k
Copy link

pie6k commented May 20, 2022

Is it possible to compute only first element from muddler if I only need first one?

For example:

const numDivisions = 10000;
mudder.base62.mudder(arr[0], '0', 1, undefined, numDivisions)[0]

In above example, I only need 1st element, but muddler will still compute 10000 even tho almost all of them are never used.

Use case:

I have sorted list, where first possible index would be aaaaaaa and last zzzzzzz. Let's say I have already sorted items there with their sort index somewhere in the middle.

I now want to put new item on top of the list. So I'd do "take first index we know and 'reduce' it by say 10000 possible slots and use it.

@fasiha
Copy link
Owner Author

fasiha commented May 20, 2022

In above example, I only need 1st element, but muddler will still compute 10000 even tho almost all of them are never used.

@pie6k I may be misunderstanding but I don't think the statement above is correct? For mudder.base62.mudder(arr[0], '0', 1, undefined, numDivisions), Mudder will compute just one string (the third argument), not numDivisions=10000. In longLinspace you see the index of the for loop runs from 1 to N (please forgive the me from several years ago with my weakness for mathematics-inspired short variable names, and the lack of VS Code's "rename symbol" 🙄), and N is the third argument to mudder, i.e., 1.

What the numDivisions=10000 does is, it changes the lexicographic spacing between the start and the end. The default numDivisions=2 means the "distance" between start and end is divided by two to get your new string, so you get the mean a + (b - a) / 2 = (a + b) / 2. So numDivisions=10000 means you get a + (b - a) / 10000.

If I'm understanding the question, Mudder should already be doing what you want. Does that make sense or am I misunderstanding you?

@pie6k
Copy link

pie6k commented May 20, 2022

Thanks for response.

I did put huge numDivisions because I want to 'use' my free namespace space below current lowest index slowly.

Let's say aaaaaaa is the lowest possible index. Let's say currently first item is nnnnnnn and I want to insert some new item before it.

So I look for string between aaaaaaa and nnnnnnn. But I don't really want a string in the middle of them, I want string just a bit before nnnnnnn, say mmmnnnn, not eeeeeeee.

If I use string in the middle (eeeeeeeee) and then after a while I'll want to insert another item before it, I'll then need to find an item between eeeeeeee and aaaaaaaa > which would be like cccccccc. Means only after 2 iterations I moved very close to minimal index already

I also don't want 'new lowest' index to be directly next to previous one. Say previous lowest was nnnnnnnn and new one is nnnnnnnm > as now I dont have a lot of 'free spots' between them if I want to use them, thus I picked 10000 as division point as some arbitral number that will give allow me to lower the lowest index 10000 times before I'll reach lowest allowed index.

@fasiha
Copy link
Owner Author

fasiha commented May 21, 2022

@pie6k yeahhhh I think I understand! I think what you're pointing out is the following problem:

mudder.base62.mudder('n', 'a', 1, undefined, 100) // [ 'm' ]
mudder.base62.mudder('n', 'a', 1, undefined, 1000) // [ 'm' ]
mudder.base62.mudder('n', 'a', 1, undefined, 10_000) // [ 'm' ]

As you increase numDivisions from 100 to 10_000, the result stays the same—and that is very annoying because you were hoping to choose more or less "space" between n and the new value and Mudder isn't doing that.

Am I finally understanding you?

If so, I just added a feature that helps with that (I haven't released it yet, it's in PR #21):

mudder.base62.mudder('n', 'a', 1, undefined, 10, 4) // [ 'lh' ]
mudder.base62.mudder('n', 'a', 1, undefined, 100, 4) // [ 'mrw' ]
mudder.base62.mudder('n', 'a', 1, undefined, 1000, 4) // [ 'mzC' ]
mudder.base62.mudder('n', 'a', 1, undefined, 10000, 4) // [ 'mzv0' ]

Notice that there's another parameter at the end of these calls. In the code that's called placesToKeep, and this allows Mudder to skip truncating strings aggressively. With this placesToKeep=4, as you increase numDivisions, you can now tightly control how much space you want between n, the result, and a.

Let me know if that helps, or if I'm still misunderstanding you and this won't help you.

(I'll probably still release this as v2.1 anyway since it's a nice feature. It only happens when you're going "backwards" like from n to a. You don't need placesToKeep if you were going from n to z: in that case, as numDivisions increases, the return value gets longer because the first "digit" is the same as the start's first digit, and the second or third or fourth digits are "zero". That was a surprising difference between increasing and decreasing strings that I didn't recognize!)

@Inviz
Copy link

Inviz commented Jun 3, 2022

Love the library and this new feature. However it is a little hard to understand from the readme alone how things work exactly. I just decided to blindly use it and hope for the best. However I think the library itself has a lot of thought put to it, it would be great if it was possible to understand the API and implications of all the different parameters easier. Thanks!

@fasiha
Copy link
Owner Author

fasiha commented Jun 5, 2022

it would be great if it was possible to understand the API and implications of all the different parameters easier

@Inviz I definitely see what you mean, thanks so much for raising this as an issue!! The library started out with a small API and it has steadily grown as our understanding of the various ways it can be used has evolved and so the terse API docs are no longer sufficient to explain all the arguments and when you might use them! I'm planning on adding a big "Examples" section to the README to address this. I'll tag you in the PR when it lands just in case you're available for a review. (I'm not sure when that'll be, I feel so stupid about how busy I am lately.)

But I will say that, for the vast majority of use cases, the first two or three arguments are all you need (e.g., base62.mudder(old, new) and maybe base62.mudder(old, new, n)); the new args numDivisions and placesToKeep are useful when you're thinking about long-term behavior under edge conditions.

@Inviz
Copy link

Inviz commented Jun 6, 2022

Yes definitely, the base case seems very easy. Although your lib offers very distinctive good features, and I'd love to understand the correct way to use them. So if I may make a suggestion, what i'd love in readme is typical use cases and options for them. Something like

  1. Reordering items occasionally in a small dataset with priority on smaller string
    => mudder(...), because ...
  2. Reodering items often, small dataset, min-string size 3 char
  3. Reodering items occasionally, big dataset, dont care about min size.

Would be superhelpful for people to find their use case

@fasiha fasiha mentioned this issue Jul 19, 2022
@fasiha
Copy link
Owner Author

fasiha commented Jul 19, 2022

@Inviz sorry it took a while, can you maybe take a look at #22 and tell me what you think? It doesn't explicitly address your bullets (1) and (2) because I'm not sure what to say about the case where you have frequent inserts vs infrequent inserts—the behavior of the growth of strings depends on whether the inserts are always on the same side of an interval (for example,

var start = 'n';
var end='l';
for (let i=0; i < 10; i++) {end = mudder.base62.mudder(start, end)[0];}
console.log(end)

that is, constantly splitting the space between start and end towards end), or if they're random. If they're random, then string length will grow slowly and calmly. If they're not random but always on one side, then you can get sudden growth of string length and that's kind of complicated to think about—how frequently does this happen, how many subdivisions on the same side happens, etc.

But I hope #22 is useful by answering some more straightforward questions about how to use the library? Totally open to feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants