Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[infra] Resolver falhas de conexão com k8s #114

Open
fernandascovino opened this issue Jul 20, 2022 · 2 comments
Open

[infra] Resolver falhas de conexão com k8s #114

fernandascovino opened this issue Jul 20, 2022 · 2 comments
Labels
bug Something isn't working rj-smtr

Comments

@fernandascovino
Copy link
Collaborator

fernandascovino commented Jul 20, 2022

Descreva o problema

Pipelines que não iniciam por não conseguir estabelecer conexão com o pod (erro abaixo)

HTTPSConnectionPool(host='10.188.0.1', port=443): Max retries exceeded with url: /apis/batch/v1/namespaces/prefect/jobs (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7f591fa53710>: Failed to establish a new connection: [Errno 111] Connection refused'))

Mesmo escalonando, ainda observamos falhas (nesses casos a run não é renomeada pois só alteramos uma vez iniciada a pipeline). Porém, não parece estar relacionado a nenhuma instabilidade do cluster (ver link de alertas no GCP)

image

Comportamento esperado/desejado

  • Entender melhor a instabilidade do cluster -> hipótese: necessário adicionar wait_for_flow_run
  • Adicionar minuto na tabela de logs tardiamente como um erro genérico (na recaptura)
  • Receber alertas dessas falhas (consolidado no Discord na hora da recaptura)
  • Recapturar dados na materialização
  • Adicionar na registros_logs a descrição do erro (API Prefect?)

Links uteis

@fernandascovino
Copy link
Collaborator Author

Outros erros relacionados ao pod:

prefect.exceptions.ClientError: [{'message': 'State update failed for task run ID 32f6ef35-850d-48c2-8758-390027bdc293: provided a running state but associated flow run 6dc58b5d-cd8c-44aa-9401-6cb6db9a6028 is not in a running state.', 'locations': [{'line': 2, 'column': 5}], 'path': ['set_task_run_states'], 'extensions': {'code': 'INTERNAL_SERVER_ERROR', 'exception': {'message': 'State update failed for task run ID 32f6ef35-850d-48c2-8758-390027bdc293: provided a running state but associated flow run 6dc58b5d-cd8c-44aa-9401-6cb6db9a6028 is not in a running state.'}}}]

urllib3.exceptions.ProtocolError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))

@fernandascovino
Copy link
Collaborator Author

fernandascovino commented Jul 26, 2022

Possível solução do @gmartinsoc : https://stackoverflow.com/questions/55742540/kubernetes-python-client-connection-issue

@gabriel-milan pode ser que isso ocorra para vocês também na conexão do Python com k8s

@fernandascovino fernandascovino changed the title [bug] Resolver falhas de conexão com k8s [fix] Resolver falhas de conexão com k8s Jul 29, 2022
@fernandascovino fernandascovino changed the title [fix] Resolver falhas de conexão com k8s [infra][fix] Resolver falhas de conexão com k8s Jul 29, 2022
@fernandascovino fernandascovino changed the title [infra][fix] Resolver falhas de conexão com k8s [infra] Resolver falhas de conexão com k8s Jul 29, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working rj-smtr
Projects
None yet
Development

No branches or pull requests

2 participants