Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Promtail to remote Loki - context deadline exceeded #3867

Closed
Lithimlin opened this issue Jun 18, 2021 · 11 comments
Closed

Promtail to remote Loki - context deadline exceeded #3867

Lithimlin opened this issue Jun 18, 2021 · 11 comments
Labels
component/promtail needs triage Issue requires investigation

Comments

@Lithimlin
Copy link

Lithimlin commented Jun 18, 2021

Describe the bug
I'm trying to set up Promtail on mutliple machines which all send the respective logs to a single central Loki instance. Sending logs from the machine hosting Loki to that Loki instance works fine. However, when I try to do the same from another machine, Promtail is unable to send the logs to Loki. Instead, I get the following error in the logs:

promtail[78741]: level=warn ts=2021-08-16T09:13:48.848403815Z caller=client.go:323 component=client host=<loki-hostname>:9200 msg="error sending batch, will retry" status=-1 error="Post \"http://<loki-hostname>:9200/loki/api/v1/push\": context deadline exceeded"
promtail[78741]: level=warn ts=2021-08-16T09:14:22.337181145Z caller=client.go:323 component=client host=<loki-hostname>:9200 msg="error sending batch, will retry" status=-1 error="Post \"http://<loki-hostname>:9200/loki/api/v1/push\": dial tcp [<loki-host-ipv6>]:9200: i/o timeout"

Since most issues here concerning this issue are about docker and the DNS setting, I've tried inserting the Loki-host's IP directly instead though with no difference in behavior. I've also tried looking into the code at the line mentioned in the message but I can't really see anything there.
According to #2453, this error message should also be more verbose if I understood correctly.
Nevertheless, I'm not sure if the data processing time is the issue here. Maybe someone else can judge that better.

Environment:

  • Infrastructure: Two machines with NixOS in the same subnet; No docker or the like

To Reproduce
Steps to reproduce the behavior:

  1. Configured Loki and Promtail (both grafana-loki-2.2.1) with the config files found below on Loki-host
  2. Built Nix system (Loki-host) (everything working so far)
  3. Configured Promtail (also grafana-loki-2.2.1) with the config file found below on remote host
  4. Built Nix system (remote host)
  5. Look at logs on remote host (journalctl -efu promtail)

I'm not sure if there would be a big difference between doing this on a NixOS system vs any other Linux system though I don't expect it unless it's a different version of Loki.

Expected behavior
Promtail sends the logs to the Loki instance on the Loki-host without issues.

Screenshots, Promtail config, or terminal output
Loki-config on Loki-host:

auth_enabled: false                                                             
                                                                                   
server:                                                                            
  http_listen_port: 9200                                                        
                                                                                
ingester:                                                                       
  lifecycler:                                                                   
    address: 127.0.0.1                                                          
    ring:                                                                       
      kvstore:                                                                  
        store: inmemory                                                         
      replication_factor: 1                                                     
    final_sleep: 0s                                                             
  chunk_idle_period: 1h
  max_chunk_age: 1h
  chunk_target_size: 1048576
  chunk_retain_period: 30s
  max_transfer_retries: 0                
                                                                                
schema_config:                                                                  
  configs:
    - from: 2021-01-01
      store: boltdb-shipper                                                     
      object_store: filesystem                                                  
      schema: v11                                                               
      index:                                                                    
        prefix: index_                                                          
        period: 24h                                                             

compactor:                                                                      
  working_directory: /var/lib/loki/compactor                                    
  shared_store: filesystem
                                                                                
storage_config:                                                                 
  boltdb_shipper:                                                               
    active_index_directory: /var/lib/loki/boltdb-shipper-active                 
    cache_location: /var/lib/loki/boltdb-shipper-cache                          
    cache_ttl: 24h
    shared_store: filesystem                                                    
                                                                                
  filesystem:                                                                   
    directory: /var/lib/loki/chunks                                             
                                                                                
limits_config:                                                                  
  reject_old_samples: true                                                      
  reject_old_samples_max_age: 168h                                              
                                                                                
chunk_store_config:                                                             
  max_look_back_period: 0s                                                      
                                                                                
table_manager:                                                                  
  retention_deletes_enabled: false                                              
  retention_period: 0s

promtail.nix file on both Loki-host and remote host:

{pkgs, ...}:
{
  services.promtail = { 
    enable = true;
    configuration = { 
      server = { 
        http_listen_port = 9080;
        grpc_listen_port = 0;
      };  
      positions = { 
        filename = "/tmp/positions.yaml";
      };  
      clients = [ { url = "http://<loki-hostname>:9200/loki/api/v1/push"; } ];
      scrape_configs = [ { 
        job_name = "journal";
        journal = { 
          max_age = "12h";
          labels = { 
            job = "systemd-journal";
          };
        };
        relabel_configs = [ 
          {
            source_labels = ["__journal__systemd_unit"];
            target_label = "unit";
          }
          {
            source_labels = ["__journal__hostname"];
            target_label = "host";
          }
          {
            source_labels = ["__journal_syslog_identifier"];
            target_label = "syslog_identifier";
          }
        ];
      } ];
    };  
  };  
}

EDIT: Updated configs to reflect changes in NixOS 21.05
EDIT: Added i/o timeout error

@dannykopping dannykopping added component/promtail needs triage Issue requires investigation labels Jun 18, 2021
@dannykopping
Copy link
Contributor

Hey @Lithimlin

Just to confirm: are you able to communicate with that host/port combination manually from the same host that you're running Promtail from?

I want to rule out a networking issue before we dive deeper.

@Lithimlin
Copy link
Author

Yes, indeed. Starting netcat on the remote host and sending messages to port 9200 of the Loki-host causes them to show up in tcpdump. Additionally, I can see that messages from the promtail service on the remote host arrive at the Loki-host.

@Retrospector
Copy link

I am seeing the exact same error message (context deadline exceeded) while testing promtail on Ubuntu 18.04, currently manually running the binary, no Docker or Systemd involved.

@Lithimlin
Copy link
Author

I upgraded NixOS today and now am using grafana-loki-2.2.1 and the problem still occurs.

@Lithimlin
Copy link
Author

Could this have to do with the address of the lifecycler being set to localhost?

@Lithimlin
Copy link
Author

It doesn't look like that was the issue

@lmancilla
Copy link

lmancilla commented Jun 30, 2021

+1

Same issue here! Currently testing Loki on Amazon Linux 2 and sending logs from another node via Promtail using same @Lithimlin configuration. (Loki binary + Promtail daemonset)

@Lithimlin
Copy link
Author

Lithimlin commented Jul 6, 2021

While inspecting some logs on the Loki-host (ironically with Loki) I realized that that host refuses the incoming connection from the remote host:

refused connection: IN=eth0 OUT= MAC=XXX SRC=<remote host ip> DST=<loki host ip> LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=26089 DF PROTO=TCP SPT=51172 DPT=9200 WINDOW=64240 RES=0x00 SYN URGP=0

Maybe this helps?

PS: Pings do still go through without any issues so this is not a firewall problem

@Lithimlin
Copy link
Author

I updated the original post to reflect changes done in NixOS 21.05.
However, the original problem has not changed. I have, however, found an i/o timeout error as well.

@Lithimlin
Copy link
Author

Well, this is a bit embarrassing.
While the gateway firewall was configured correctly and forwarding the packets to the loki host, the local firewall blocked them from progressing further.
So it turns out, as @dannykopping was saying, this was indeed a networking issue

@dannykopping
Copy link
Contributor

No worries @Lithimlin 🙂 glad you found the source of the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/promtail needs triage Issue requires investigation
Projects
None yet
Development

No branches or pull requests

4 participants