Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datamon bundle list should include labels #235

Open
galvare2 opened this issue Jul 18, 2019 · 5 comments
Open

Datamon bundle list should include labels #235

galvare2 opened this issue Jul 18, 2019 · 5 comments
Assignees
Labels
CLI issues around CLI for datamon feature-request A feature is a set of net new related use cases P1 This is next and will move to P0 once planned. usability Issues to improve use experience.

Comments

@galvare2
Copy link

galvare2 commented Jul 18, 2019

Right now datamon bundle list returns something like this

datamon bundle list --repo flood-postgres
Using config file: /home/developer/.datamon/datamon.yaml
1N9KQjinEjRGtKovxksVG6biS3c , 2019-06-25 18:37:37.934833707 +0000 UTC , First version of flood pg data
1N9jaJkPxbnZyPlz5HL16N3C5xg , 2019-06-25 22:04:28.307925983 +0000 UTC , Greatly prune flood PG data to reduce backup time
1NCMOUOqdxqkrXMCIkqmZXbjIIZ , 2019-06-26 20:23:12.978399405 +0000 UTC , Backup from deployed data
1OCQ2W2lcIoRWQnBvk8r5D20J6B , 2019-07-18 19:41:31.151859934 +0000 UTC , Update flood alert thresholds
1OCf7RUcyfgecxrMUcwuko8Loex , 2019-07-18 21:45:31.68074457 +0000 UTC , Update flood alert thresholds
1OCg1ZgwBgdx0Oz3K1gjdGAqRuq , 2019-07-18 21:52:57.59154874 +0000 UTC , Update flood alert thresholds

I would like to see it return something like this:

datamon bundle list --repo flood-postgres
Using config file: /home/developer/.datamon/datamon.yaml
1N9KQjinEjRGtKovxksVG6biS3c , v1.0.0, 2019-06-25 18:37:37.934833707 +0000 UTC , First version of flood pg data
1N9jaJkPxbnZyPlz5HL16N3C5xg , v1.0.1, 2019-06-25 22:04:28.307925983 +0000 UTC , Greatly prune flood PG data to reduce backup time
1NCMOUOqdxqkrXMCIkqmZXbjIIZ , <no label>, 2019-06-26 20:23:12.978399405 +0000 UTC , Backup from deployed data
1OCQ2W2lcIoRWQnBvk8r5D20J6B , <no label>, 2019-07-18 19:41:31.151859934 +0000 UTC , Update flood alert thresholds
1OCf7RUcyfgecxrMUcwuko8Loex , <no label>, 2019-07-18 21:45:31.68074457 +0000 UTC , Update flood alert thresholds
1OCg1ZgwBgdx0Oz3K1gjdGAqRuq , v1.0.2, 2019-07-18 21:52:57.59154874 +0000 UTC , Update flood alert thresholds
@galvare2 galvare2 added CLI issues around CLI for datamon usability Issues to improve use experience. labels Jul 18, 2019
@ransomw1c ransomw1c self-assigned this Jul 19, 2019
@ransomw1c ransomw1c added the feature-request A feature is a set of net new related use cases label Jul 19, 2019
@ransomw1c
Copy link
Contributor

ransomw1c commented Jul 19, 2019

skimmed over some of git to understand what needs to be done here, although i'm erring toward a semi-naive implementation:

list labels will get label data and bundle data, put the labels in a map of slices by bundle id, then the listing of bundles will get the label(s) out of the map.

sounds like a plan, @kerneltime ?

@ransomw1c
Copy link
Contributor

@galvare2 what to do in the case of multiple labels referring to the same bundle?

although we're not using the exact same format, git

commit 72c11afbbbb0ea7ec8edaf8601d203977e5ea7f6 (tag: 0.5, d20190625-storageputparamtype--wip)
Merge: b6ee99e 90cb6e6
Author: ransomw1c <47995478+ransomw1c@users.noreply.github.com>
Date:   Mon Jun 24 12:51:11 2019 -0700

suggests parentheses, so

1NCMOUOqdxqkrXMCIkqmZXbjIIZ , <no label>, 2019-06-26 20:23:12.978399405 +0000 UTC , Backup from deployed data
1N9KQjinEjRGtKovxksVG6biS3c , (v1.0.0), 2019-06-25 18:37:37.934833707 +0000 UTC , First version of flood pg data
1OCg1ZgwBgdx0Oz3K1gjdGAqRuq , (v1.0.2; latest), 2019-07-18 21:52:57.59154874 +0000 UTC , Update flood alert thresholds

is one sketch that accounts for zero, one, or two labels. note that the parenthesized list uses a different delimiter than the outer list...

... except we indeed don't want to roll our own serialization format. so perhaps something more like

1NCMOUOqdxqkrXMCIkqmZXbjIIZ , <no label> , 2019-06-26 20:23:12.978399405 +0000 UTC , Backup from deployed data
1N9KQjinEjRGtKovxksVG6biS3c , v1.0.0 , 2019-06-25 18:37:37.934833707 +0000 UTC , First version of flood pg data
1OCg1ZgwBgdx0Oz3K1gjdGAqRuq , v1.0.2;latest , 2019-07-18 21:52:57.59154874 +0000 UTC , Update flood alert thresholds

where we continue to use CSV and have a separate delimiter for labels?


note to self: the above will require validation (and coercion for backward-compatibility) of labels to ensure that the labels themselves don't contain the delim char.

@kerneltime
Copy link
Contributor

kerneltime commented Jul 20, 2019

The choice to include labels when listing bundles should be optional. There is a performance implication for it.
The current model in place allows us to have confidence that a bundle (json) once written is never updated except for the labels. Any performance improvements at scale will need that to change or a significant engineering spend.
@galvare2 is it fair to say that when listing bundles you only care about the ones that have a label? An alternative is to only list the bundles that have labels in the format laid out above by you and @ransomw1c.
I would like to better understand why you want it this way and if there is a way to meet that need without introducing code that will work slower than the time it takes to list bundles (no "join").
There are some other features I have been thinking of that will allow queries and richer enumerations to work at scale but I do not think that is an urgent need. Let's talk more next week.

@ransomw1c
Copy link
Contributor

The choice to include labels when listing bundles should be optional. There is a performance implication for it.

i agree

other features I have been thinking of that will allow queries

to describe, iirc: the plan here is to maintain local (and remote?) indices via a db like badger to support more performant join-like (and otherwise) queries on metadata.

i've perused some of git regarding tags, the analog of datamon labels, and it relies on a "reflog," where refs are the internal unifying abstraction for branch tips and tags.. haven't fully grokked all implementation details, yet the reflog is distinct from the full-on indexing as far as i can tell so far.

why you want it this way

i arrived at this suggestion as a way to get the functionality implemented immediately without making this iss dependent on the indexing decision-making. moreover, reading git suggests that, in case of not using the reflog, the less performant lookup solution is what git, probably the most significant prior art for datamon as it exists currently, does (mildly foggy on this latter claim).

i agree that there are workarounds. here's a stopgap Zsh script to do the non-performant lookup implementation

#! /bin/zsh

# cd $DATAMON_REPO && make build-datamon-mac

BIN=out/datamon.mac


repo_name=

while getopts r:l:b: opt; do
    case $opt in
        (r)
            repo_name="$OPTARG"
            ;;
        (\?)
            print Bad option, aborting.
            exit 1
            ;;
    esac
done
(( OPTIND > 1 )) && shift $(( OPTIND - 1 ))

if [ -z $repo_name ]; then
    repo_name='ransom-datamon-test-repo'
fi

typeset -a bundle_list_lines
typeset -a bundleIDs
typeset -A bundleIDsToListLines
$BIN bundle list --repo $repo_name 2>&1 | \
    grep -v '^Using config file' | \
    while read bundle_list_line; do
        bundle_list_lines=($bundle_list_line $bundle_list_lines)
        bundleID=$(print $bundle_list_line | cut -d',' -f 1 | tr -d ' ')
        bundleIDs=("$bundleID" $bundleIDs)
        bundleIDsToListLines[$bundleID]=$bundle_list_line
    done

typeset -A bundleIDsToLabels
$BIN label list --repo $repo_name 2>&1 | \
    grep -v '^Using config file' | \
    while read label_list_line; do
        label=$(print $label_list_line | cut -d',' -f 1 | tr -d ' ')
        bundleID=$(print $label_list_line | cut -d',' -f 2 | tr -d ' ')
        if [[ -z ${bundleIDsToLabels[$bundleID]} ]]; then
            bundleIDsToLabels[$bundleID]="$label"
        else
            existingLabelList=${bundleIDsToLabels[$bundleID]}
            bundleIDsToLabels[$bundleID]="$label;$existingLabelList"
        fi
    done

for bundleID in $(print "$bundleIDs"); do
    bundle_list_line=${bundleIDsToListLines[$bundleID]}
    labelList=${bundleIDsToLabels[$bundleID]}
    print "$bundle_list_line , $labelList"
done

@galvare2
Copy link
Author

@kerneltime this request is more of a "nice to have" than an actual need, as i can get the information from the other cli commands. I guess the use case was just to be able to see all information about bundles and labels for a repo in one place, including bundles with no labels, for example in order to verify that uploads I ran went through correctly. But this should be considered a low priority because label list does almost everything that is needed for this anyway.

@ransomw1c ransomw1c added the P1 This is next and will move to P0 once planned. label Jul 26, 2019
fredbi added a commit that referenced this issue Dec 20, 2019
* fixes #235

Signed-off-by: Frederic BIDON <frederic@oneconcern.com>
fredbi added a commit that referenced this issue Dec 31, 2019
* fixes #235

Signed-off-by: Frederic BIDON <frederic@oneconcern.com>
fredbi added a commit that referenced this issue Jan 3, 2020
* fixes #235

Signed-off-by: Frederic BIDON <frederic@oneconcern.com>
fredbi added a commit that referenced this issue Jan 10, 2020
* fixes #235

Signed-off-by: Frederic BIDON <frederic@oneconcern.com>
fredbi added a commit that referenced this issue Jan 24, 2020
* fixes #235

Signed-off-by: Frederic BIDON <frederic@oneconcern.com>
fredbi added a commit to fredbi/datamon that referenced this issue Dec 1, 2022
* fixes oneconcern#235

Signed-off-by: Frederic BIDON <frederic@oneconcern.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLI issues around CLI for datamon feature-request A feature is a set of net new related use cases P1 This is next and will move to P0 once planned. usability Issues to improve use experience.
Development

Successfully merging a pull request may close this issue.

3 participants