New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collect information on Accept-Language header values #3687

Merged
merged 4 commits into from Oct 12, 2018

Conversation

5 participants
@daniel-beck
Member

daniel-beck commented Oct 8, 2018

RFC: Would this be useful?

I imagine it might help understand the need for certain translations. At the very least, it would show how many users use Jenkins with a non-English language.

Data submitted looks like this:

{"en-US,en;q=0.5":9,"en-US,en;q=0.7,de;q=0.3":89}

Proposed changelog entries

  • Add telemetry trial about user languages

Submitter checklist

  • [n/a] JIRA issue is well described
  • Changelog entry appropriate for the audience affected by the change (users or developer, depending on the change). Examples
    * Use the Internal: prefix if the change has no user-visible impact (API, test frameworks, etc.)
  • [n/a] Appropriate autotests or explanation to why this change has no tests
  • [n/a] For dependency updates: links to external changelogs and, if possible, full diffs
@rtyler

This comment has been minimized.

Member

rtyler commented Oct 8, 2018

I would be very interested to know if groups like the @jenkinsci/chinese-localization-sig would find this information useful.

@Restricted(NoExternalUse.class)
public class UserLanguages extends Telemetry {
private static Map<String, MutableLong> requestsByLanguage = new TreeMap<>();

This comment has been minimized.

@rtyler

rtyler Oct 8, 2018

Member

Am I correct in assuming that this will only keep track of what Accept-Language headers have been sent for the duration of the running JVM? In effect, data will be flushed when the instance is restarted, correct?

On what period is this information going to be flushed and sent to Uplink?

This comment has been minimized.

@daniel-beck

daniel-beck Oct 8, 2018

Member

Flushed on restart (in memory only), and cumulative content is submitted every 24 hours.

This means we cannot actually aggregate multiple instances' submissions in the current form. Need to flush after submission so we can just add up numbers.

This comment has been minimized.

@daniel-beck

daniel-beck Oct 9, 2018

Member

This has been addressed. On submission, previously recorded content is now reset.

}
@Override
public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain) throws IOException, ServletException {

This comment has been minimized.

@rtyler

rtyler Oct 8, 2018

Member

Would there be any way make this unique based on cookie or other browser/session-specific identifier? Otherwise, if I am understanding this code correctly, a single power-user would skew the statistics quite a bit.

This comment has been minimized.

@daniel-beck

daniel-beck Oct 8, 2018

Member

Multiple sessions over a period make this essentially impossible, and I'm not comfortable with further browser fingerprinting. Note that we know # of requests by instance ID, which might allow us to disregard spikes.

This comment has been minimized.

@rtyler

rtyler Oct 9, 2018

Member

Okie doke

@LinuxSuRen

This comment has been minimized.

Member

LinuxSuRen commented Oct 9, 2018

I wonder if the data is public for everyone? I'm interested to know how many users use Simplified Chinese.

@daniel-beck

This comment has been minimized.

Member

daniel-beck commented Oct 9, 2018

@LinuxSuRen Not public for everyone (by design), but we can grant selective access, and/or aggregate it during and after the trial.

The question here is whether this information is useful. We could get the number of instances with users having zh/zh-CN in their request headers at all, as the preferred (first) language, as well as relative numbers compared to all instances reporting data. Additionally, we can provide the % of requests for each language arriving at each of those instances to count exclusively/predominantly/partially (e.g. Chinese) Jenkins instances.

@LinuxSuRen

This comment has been minimized.

Member

LinuxSuRen commented Oct 9, 2018

@daniel-beck I think it's always important to know our user's information. We could improve user experience according to the data. Here is a discussion about Jenkins Chinese website. If there are large number of people's first language is Chinese. So we'll know it's really important to do localization work.

@daniel-beck

This comment has been minimized.

Member

daniel-beck commented Oct 9, 2018

@LinuxSuRen That motivation seems good enough for me. Note that this isn't a permanent collection, but scheduled to end on January 1. Should still give us a good idea of what the current use of Jenkins looks like.

@Wadeck

Wadeck approved these changes Oct 9, 2018

String language = httpServletRequest.getHeader("Accept-Language");
if (language != null) {
if (!requestsByLanguage.containsKey(language)) {
requestsByLanguage.put(language, new MutableLong(0));

This comment has been minimized.

@Wadeck

Wadeck Oct 9, 2018

Contributor

WDYT about using java.util.concurrent.atomic.AtomicLong instead to have for free the sync?

@daniel-beck daniel-beck removed the needs-fix label Oct 9, 2018

@rtyler

rtyler approved these changes Oct 9, 2018

The duration of the trial is going to be interesting, I imagine we're going to see a lot of data from this one 🙂

@daniel-beck daniel-beck merged commit 5e9e498 into jenkinsci:master Oct 12, 2018

1 check was pending

continuous-integration/jenkins/pr-merge This commit is being built
Details
@guykisel

This comment has been minimized.

guykisel commented Oct 17, 2018

Hey @daniel-beck, just a heads up, it looks like this PR is using TreeMap in a way that's not threadsafe, causing multiple threads to get stuck and eat up a lot of CPU time on a Jenkins deployment I work on.

First few lines of example thread stack trace consuming hours of CPU time:

Handling GET /job/********
java.util.TreeMap.getEntry(TreeMap.java:359)
java.util.TreeMap.containsKey(TreeMap.java:232)
jenkins.telemetry.impl.UserLanguages$AcceptLanguageFilter.doFilter(UserLanguages.java:118)

image

telemetry collection thread
net.sf.json.JSONObject.element(JSONObject.java:1716)
net.sf.json.JSONObject.put(JSONObject.java:2328)
jenkins.telemetry.impl.UserLanguages.createContent(UserLanguages.java:88)
jenkins.telemetry.Telemetry$TelemetryReporter.lambda$execute$0(Telemetry.java:167)
jenkins.telemetry.Telemetry$TelemetryReporter$$Lambda$282/1283834220.accept(Unknown Source)
java.lang.Iterable.forEach(Iterable.java:75)
jenkins.telemetry.Telemetry$TelemetryReporter.execute(Telemetry.java:155)
hudson.model.AsyncPeriodicWork$1.run(AsyncPeriodicWork.java:101)
java.lang.Thread.run(Thread.java:748)
@daniel-beck

This comment has been minimized.

Member

daniel-beck commented Oct 17, 2018

@guykisel Sorry about that, I'll look into that at the earliest opportunity. As a workaround, if you disable usage statistics either via UI, system property on invocation, or static field in UsageStatistics using Groovy, all of this will be close to a no-op.

@guykisel

This comment has been minimized.

guykisel commented Oct 17, 2018

Yup we already disabled them :) Thanks for the tip and quick response!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment