Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

logcli: Add parallel flags #8518

Merged
merged 15 commits into from
Feb 16, 2023
Merged

logcli: Add parallel flags #8518

merged 15 commits into from
Feb 16, 2023

Conversation

angaz
Copy link
Contributor

@angaz angaz commented Feb 13, 2023

What this PR does / why we need it:
Requesting a large range, a day, for example, takes a very long time to download. You can download in parallel by starting many processes, but this just made my job so much easier.

Continuation of #7543

Checklist

  • Reviewed the CONTRIBUTING.md guide (required)
  • Documentation added
  • Tests updated
  • CHANGELOG.md updated
  • Changes that require user attention or interaction to upgrade are documented in docs/sources/upgrading/_index.md

@github-actions github-actions bot added the type/docs Issues related to technical documentation; the Docs Squad uses this label across many repositories label Feb 13, 2023
@angaz angaz marked this pull request as ready for review February 13, 2023 15:11
@angaz angaz requested review from JStickler and a team as code owners February 13, 2023 15:11
@angaz
Copy link
Contributor Author

angaz commented Feb 13, 2023

@jeschkies

Hi, I have updated the code and made a new PR.

Basically what is new here is that there's two new flags:

  • merge-parts
  • keep-parts

What this does is it will read the parts in order and write the output to stdout. And it will do this in order. So it will act similar to the non-parallel usage, and then after each file has been output, it will delete the file.

keep-parts overrides this behaviour, keeping the part files instead of deleting them in case the user wants to keep the files.

cmd/logcli/main.go Outdated Show resolved Hide resolved
cmd/logcli/main.go Outdated Show resolved Hide resolved
cmd/logcli/main.go Show resolved Hide resolved
cmd/logcli/main.go Outdated Show resolved Hide resolved
pkg/logcli/query/part_file.go Outdated Show resolved Hide resolved
pkg/logcli/query/query.go Show resolved Hide resolved
pkg/logcli/query/query.go Show resolved Hide resolved
pkg/logcli/query/query.go Show resolved Hide resolved
pkg/logcli/query/query.go Outdated Show resolved Hide resolved
pkg/logcli/query/query.go Show resolved Hide resolved
Copy link
Contributor

@JStickler JStickler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Docs squad] Lots of missing punctuation and some rewording suggestions.

Notice that when using --from and --to then ensure to use RFC3339Nano time
format, but without timezone at the end. The local timezone will be added
automatically or if using --timezone flag.
Notice that when using --from and --to then ensure to use RFC3339Nano time format, but without timezone at the end. The local timezone will
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Notice that when using --from and --to then ensure to use RFC3339Nano time format, but without timezone at the end. The local timezone will
Note that when using --from and --to you must use the RFC3339Nano time format, but without timezone at the end. The local timezone will

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JStickler Hi, what you are commenting on here is generated, it's not changes that I made. It is output of the help text of the application. A few are from new stuff that I added and I will change that, do you want me to change all of these things? Maybe it would be better to make a PR yourself to change it? I feel it's a bit out of the scope of my contribution here. IDK what do you think?

format, but without timezone at the end. The local timezone will be added
automatically or if using --timezone flag.
Notice that when using --from and --to then ensure to use RFC3339Nano time format, but without timezone at the end. The local timezone will
be added automatically or if using --timezone flag.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
be added automatically or if using --timezone flag.
be added automatically or when using the --timezone flag.

--merge-parts
--keep-parts

Refer to the help of these specific flags to understand what each of them do.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Refer to the help of these specific flags to understand what each of them do.
Refer to the help for each flag for details about what each of them do.

--merge-parts
'my-query'

This will start 10 workers, and they will each start downloading 15 minute slices of the specified time range.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This will start 10 workers, and they will each start downloading 15 minute slices of the specified time range.
This example will start 10 workers, and they will each start downloading 15 minute slices of the specified time range.


If you do not specify the --merge-parts flag, the part files will be downloaded, and logcli will exit, and you can process the files as you
wish. With the flag specified, the part files will be read in order, and the output printed to the terminal. The lines will be printed as
soon as the next part is complete, you don't have to wait for all the parts to download before getting output. --merge-parts will remove
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
soon as the next part is complete, you don't have to wait for all the parts to download before getting output. --merge-parts will remove
soon as the next part is complete, you don't have to wait for all the parts to download before getting output. the --merge-parts flag will remove

Exclude labels given the provided key during output.
--include-label=INCLUDE-LABEL ...
Include labels given the provided key during output.
--labels-length=0 Set a fixed padding to labels
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
--labels-length=0 Set a fixed padding to labels
--labels-length=0 Set a fixed padding to labels.

--store-config="" Execute the current query using a configured storage from a given Loki configuration file.
--remote-schema Execute the current query using a remote schema retrieved using the configured storage in the given Loki
configuration file.
--colored-output Show output with colored labels
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
--colored-output Show output with colored labels
--colored-output Show output with colored labels.

--remote-schema Execute the current query using a remote schema retrieved using the configured storage in the given Loki
configuration file.
--colored-output Show output with colored labels
-t, --tail Tail the logs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
-t, --tail Tail the logs
-t, --tail Tail the logs.

configuration file.
--colored-output Show output with colored labels
-t, --tail Tail the logs
-f, --follow Alias for --tail
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
-f, --follow Alias for --tail
-f, --follow Alias for --tail.

--colored-output Show output with colored labels
-t, --tail Tail the logs
-f, --follow Alias for --tail
--delay-for=0 Delay in tailing by number of seconds to accumulate logs for re-ordering
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
--delay-for=0 Delay in tailing by number of seconds to accumulate logs for re-ordering
--delay-for=0 Delay in tailing by number of seconds to accumulate logs for re-ordering.

Copy link
Contributor Author

@angaz angaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jeschkies I have made a few comments on some of your comments and will hopefully work on some of your suggestions tomorrow.

pkg/logcli/query/part_file.go Outdated Show resolved Hide resolved
pkg/logcli/query/query.go Show resolved Hide resolved
pkg/logcli/query/query.go Show resolved Hide resolved
pkg/logcli/query/query.go Outdated Show resolved Hide resolved
Notice that when using --from and --to then ensure to use RFC3339Nano time
format, but without timezone at the end. The local timezone will be added
automatically or if using --timezone flag.
Notice that when using --from and --to then ensure to use RFC3339Nano time format, but without timezone at the end. The local timezone will
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JStickler Hi, what you are commenting on here is generated, it's not changes that I made. It is output of the help text of the application. A few are from new stuff that I added and I will change that, do you want me to change all of these things? Maybe it would be better to make a PR yourself to change it? I feel it's a bit out of the scope of my contribution here. IDK what do you think?

@JStickler
Copy link
Contributor

what you are commenting on here is generated, it's not changes that I made. ... I feel it's a bit out of the scope of my contribution here.
@SN9NV I'll leave that up to you. I can make a PR later if that's easier for you.

@angaz
Copy link
Contributor Author

angaz commented Feb 15, 2023

@JStickler I have added some more docs, can you please check the language again. I believe I have solved all your previous comments, but it's possible I missed one because there were so many. cmd/logcli/main.go would be the right file to check for the documentation comments. Thank you.

@jeschkies I have pushed a few commits, I believe everything should be good now.

I reworked the part file to use some better names, I didn't like Complete, it didn't really feel like a command to me, Finalize feels much better.

@jeschkies
Copy link
Contributor

@SN9NV thanks for your effort. We are close and just need to make a decision for #8518 (comment).

Copy link
Contributor

@jeschkies jeschkies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work. Please address my last comment about documenting the limit and we are good to go from my side.

I'll follow up with a "proper" unlimited option as this will require more discussion.

pkg/logcli/query/query.go Show resolved Hide resolved
cmd/logcli/main.go Outdated Show resolved Hide resolved
cmd/logcli/main.go Outdated Show resolved Hide resolved
cmd/logcli/main.go Outdated Show resolved Hide resolved
@angaz
Copy link
Contributor Author

angaz commented Feb 16, 2023

Cool, I think everything should be done.

Copy link
Contributor

@jeschkies jeschkies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

@jeschkies jeschkies merged commit 42522d7 into grafana:main Feb 16, 2023
@angaz angaz deleted the logcli_parallel branch February 16, 2023 14:15
@angaz
Copy link
Contributor Author

angaz commented Feb 16, 2023

Thanks for all your reviews and comments, I hope this will be useful to people.

@jeschkies
Copy link
Contributor

jeschkies commented Feb 16, 2023

@SN9NV I've just ran it on my machine and found the parallel execution to be slower than the normal execution. I'll take a closer look tomorrow.

EDIT: Ok, I think there might be some lock contention or deadlock as my CPU is pretty much idle.

@angaz
Copy link
Contributor Author

angaz commented Feb 16, 2023

That is strange. For me it was definitely faster, a download which took many hours, like over night, took just tens of minutes with the parallelism. And I could see the time it took to download a single part is the time it took to download the same period sequentially. I did have autoscaling in Kubernetes. Maybe it is a problem if you just have one instance. IDK. It would add one or two more pods, but for sure, it was not scaling up to the same degree as the speedup if that makes sense.

@jeschkies
Copy link
Contributor

I'm running it against Grafana Clou so I might be rate limited. However, even with just two workers there's an issue. I can see the parts being done. Hm.

@angaz
Copy link
Contributor Author

angaz commented Feb 16, 2023

Hmm, I checked the last few rebase/commits and it seemed like it was still working as expected. I was starting 4 workers with 5-minute jobs in my tests, but I also downloaded about 25GB of logs on Tuesday and it worked well with 24 workers, 30-minute jobs, so I think it was working for me at least with my setup. I can test master again tomorrow.


func (j *parallelJob) run(c client.Client, out output.LogOutput, statistics bool) {
j.q.DoQuery(c, out, statistics)
j.done <- struct{}{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SN9NV this is the deadlock. An unbuffered channel will write into the target when read. That means it blocks here until the channel is read. However, we don't want to wait for that. See my fix #8553.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just came here to report the same thing, but you beat me to it. 😂

I was testing with very large number of threads, so this was kind of less noticeable. Smaller numbers makes the problem much more visible. So yeah, good catch.

@dannykopping
Copy link
Contributor

@SN9NV @jeschkies this feature seems to only be implemented for range queries in logcli, and as a consequence has broken instant queries:

$ go run ./cmd/logcli/main.go --addr="http://localhost:3100" --org-id="docker" instant-query 'sum by (method) (count_over_time({job="generated-logs"} | json[1m]))'
main: error: parallel-max-workers must be greater than 0, try --help
exit status 1

Is there any reason why this should not apply to instant queries?

@angaz
Copy link
Contributor Author

angaz commented Feb 27, 2023

@dannykopping thanks for the report.

That indeed is not ideal that it's showing an error message for an unrelated command.

Why it was not created for instant queries, I only understand range query, but if it's useful, then we can have a look at adding the feature for this command as well.

I will have a fix soon. I think setting to 1 will be a better idea instead of an error stopping execution.

@dannykopping
Copy link
Contributor

@SN9NV thanks for the quick response.

As far as I understand, this feature mainly deals with downloading large volumes of logs returned by a query. Thinking about it more, if that is the case, we don't need to handle instant queries I guess because it's very rare that one might have a large volume of logs for the same instant.

Apart from the instant query question above:
The functionality as implemented seems to break down when range aggregations are performed - with the responses not being returned in a coherent manner. The results are ostensibly not recombined in a way that mimicks the existing behaviour.

I was also a little disappointed to see that this was merged without tests which validate that the responses returned are identical with and without parallelization, which would be a reasonable expectation for a user to have; parallelization should not change the output IMHO since it's just a performance optimization.

dannykopping pushed a commit that referenced this pull request Feb 27, 2023
…tion (#8641)

**What this PR does / why we need it**:
Fixes regression reported here:
#8518 (comment)

@dannykopping can you please verify that it fixes the issue for you?

**Checklist**
- [x] Reviewed the
[`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md)
guide (**required**)
- [ ] Documentation added
- [ ] Tests updated
- [ ] `CHANGELOG.md` updated
- [ ] Changes that require user attention or interaction to upgrade are
documented in `docs/sources/upgrading/_index.md`
@angaz
Copy link
Contributor Author

angaz commented Feb 27, 2023

@dannykopping can you please provide a query which is not returning correct results? I tested quite a few queries for logs which I have had to download in the past and the parallel and standard methods produced exactly the same files in the output.

I didn't test any aggregation queries because this is not something I've done before, just simple filtering/mapping functions. I guess I can see that if there's an aggregation which spans many of the parallel-duration's that it would not be able to return the correct result. I'm not sure such a case would be possible to compensate for.

@dannykopping
Copy link
Contributor

@dannykopping can you please provide a query which is not returning correct results? I tested quite a few queries for logs which I have had to download in the past and the parallel and standard methods produced exactly the same files in the output.

Just run any query that has an aggregation, like count_over_time. When a query produces metrics this feature should be disabled IMHO.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size/XL type/docs Issues related to technical documentation; the Docs Squad uses this label across many repositories
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants