Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

projectDir is ignored due to a race condition #345

Closed
AndrewDryga opened this issue Jun 20, 2023 · 12 comments
Closed

projectDir is ignored due to a race condition #345

AndrewDryga opened this issue Jun 20, 2023 · 12 comments

Comments

@AndrewDryga
Copy link

AndrewDryga commented Jun 20, 2023

Environment

  • Elixir & Erlang versions (elixir --version): Erlang/OTP 25 [erts-13.0] [source] [64-bit] [smp:10:10] [ds:10:10:10] [async-threads:1] [jit] Elixir 1.14.3 (compiled with Erlang/OTP 25)
  • VSCode ElixirLS version: 0.14.7
  • Operating System Version: darwin 21.6.0

Issue Description

VSCode plugin or ElixirLS itself is ignoring elixirLS.projectDir setting at random (probably due to some race condition). When this happens it ElixirLS starts to create deps and .elixir_ls folders for apps in the umbrella and log

[Warn  - 09:48:39] No mixfile found in project. To use a subdirectory, set `elixirLS.projectDir` in your settings. Looked for mixfile at "/Users/andrew/Projects/os/firezone/mix.exs"

while project dir is set to:

./elixir/

The same thing is logged when Elixir: Restart Language Server command is executed so the only way to solve it is a full restart of the editor, which then works as expected for some time again:

[Info  - 09:53:40] Started ElixirLS v0.14.5
[Info  - 09:53:40] Running in /Users/andrew/Projects/os/firezone
[Info  - 09:53:40] ElixirLS built with elixir "1.14.3" on OTP "25"
[Info  - 09:53:40] Running on elixir "1.14.4 (compiled with Erlang/OTP 25)" on OTP "25"
[Info  - 09:53:40] Elixir sources not found (checking in /home/build/elixir). Code navigation to Elixir modules disabled.
dets: file "/Users/andrew/Projects/os/firezone/elixir/.elixir_ls/calls.dets" not properly closed, repairing ...
[Info  - 09:53:41] Loaded DETS databases in 318ms
[Info  - 09:53:41] Starting build with MIX_ENV: test MIX_TARGET: host
All deps are up to date
.... compilation
[Info  - 09:53:42] Compile took 622 milliseconds
[Info  - 09:53:42] [ElixirLS Dialyzer] Checking for stale beam files
[Info  - 09:53:42] [ElixirLS WorkspaceSymbols] Indexing...
[Info  - 09:53:43] [ElixirLS Dialyzer] Found 1 changed files in 118 milliseconds
[Info  - 09:53:43] [ElixirLS WorkspaceSymbols] Module discovery complete
[Info  - 09:53:43] [ElixirLS Dialyzer] Analyzing 0 modules: []
[Info  - 09:53:43] [ElixirLS Dialyzer] Analysis finished in 210 milliseconds
[Info  - 09:53:43] [ElixirLS WorkspaceSymbols] 30 callbacks added to index
[Info  - 09:53:43] [ElixirLS WorkspaceSymbols] 252 modules added to index
[Info  - 09:53:44] Dialyzer analysis is up to date
[Info  - 09:53:44] [ElixirLS Dialyzer] Writing manifest...
[Info  - 09:53:45] Experimental server is disabled.
[Info  - 09:53:45] [ElixirLS Dialyzer] Done writing manifest in 1205 milliseconds.
[Info  - 09:53:45] [ElixirLS WorkspaceSymbols] 451 types added to index
[Info  - 09:53:47] [ElixirLS WorkspaceSymbols] 5080 functions added to index

All of this happens ~every hour for me.

@lukaszsamson
Copy link
Collaborator

lukaszsamson commented Jun 20, 2023

A couple questions.

  1. Do you have a repo that reproduces it?
  2. Do you use multi-root workspace? If so is elixirLS.projectDir correctly set for each workspace folder (this is a scoped setting)?
  3. Can you try on v0.15.0? Some race conditions have been fixed and elixirLS.projectDir is now handled in more places.
  4. Does setting elixirLS.projectDir to elixir make any difference? There were reports of problems when it was set to . and we explicitly disallow that.

@AndrewDryga
Copy link
Author

@lukaszsamson

  1. It reproduces for me on https://github.com/firezone/firezone/tree/cloud but the repo is pretty big and it happens when I actively change the codebase, so making a test repo would be really hard.
  2. It is a single root workspace.
  3. Sure, I'll upgrade Elixir/Erlang version to the latest non 26-based versions and give it a run.
  4. If upgrading to 0.15.0 will not fix the issue, I will change the path, to ensure we test one thing at once and can find the root cause.

@AndrewDryga
Copy link
Author

AndrewDryga commented Jun 20, 2023

Neither setting projectDir to elixir not upgrading to 0.15.0 did not help, unfortunately:

[Info  - 10:49:06] Started ElixirLS v0.15.0
[Info  - 10:49:06] Running in /Users/andrew/Projects/os/firezone
[Info  - 10:49:06] ElixirLS built with elixir "1.15.0" on OTP "25"
[Info  - 10:49:06] Running on elixir "1.15.0 (compiled with Erlang/OTP 25)" on OTP "25"
[Info  - 10:49:06] Protocols are not consolidated
[Info  - 10:49:06] Elixir sources not found (checking in /home/runner/work/elixir/elixir). Code navigation to Elixir modules disabled.
[Info  - 10:49:11] Experimental server is disabled.
[Warn  - 10:49:11] Did not receive workspace/didChangeConfiguration notification after 5 seconds. Using default settings.
[Info  - 10:49:11] Starting build with MIX_ENV: test MIX_TARGET: host
[Warn  - 10:49:11] No mixfile found in project. To use a subdirectory, set `elixirLS.projectDir` in your settings. Looked for mixfile at "/Users/andrew/Projects/os/firezone/mix.exs"

Full restart works as before:

Running /Users/andrew/.vscode/extensions/jakebecker.elixir-ls-0.15.0/elixir-ls-release/launch.sh
Looking for ASDF install in /opt/homebrew/opt/asdf/libexec/asdf.sh
ASDF found, relaunching in bash shell
Looking for ASDF install in /opt/homebrew/opt/asdf/libexec/asdf.sh
Sourcing ASDF
Installing ElixirLS release v0.15.0
Running in /Users/andrew/Projects/os/firezone
Install complete
[Info  - 10:50:39] Started ElixirLS v0.15.0
[Info  - 10:50:39] Running in /Users/andrew/Projects/os/firezone
[Info  - 10:50:39] ElixirLS built with elixir "1.15.0" on OTP "25"
[Info  - 10:50:39] Running on elixir "1.15.0 (compiled with Erlang/OTP 25)" on OTP "25"
[Info  - 10:50:39] Protocols are not consolidated
[Info  - 10:50:40] Elixir sources not found (checking in /home/runner/work/elixir/elixir). Code navigation to Elixir modules disabled.
[Info  - 10:50:40] Loaded DETS databases in 228ms
[Info  - 10:50:40] Starting build with MIX_ENV: test MIX_TARGET: host
All deps are up to date
==> domain

@lukaszsamson
Copy link
Collaborator

I’ve seen similar behavior when something was reading LSP message from stdin before the server could do it. Are there any clues in VSCode developer tools? We log workspace events and server start params

@AndrewDryga
Copy link
Author

Here are some things I've found but they are rather result of a bug than a root cause:

  ERR Error: Unable to fetch formatter options
    at handleResponse (/Users/andrew/.vscode/extensions/jakebecker.elixir-ls-0.15.0/node_modules/vscode-jsonrpc/lib/common/connection.js:565:48)
    at handleMessage (/Users/andrew/.vscode/extensions/jakebecker.elixir-ls-0.15.0/node_modules/vscode-jsonrpc/lib/common/connection.js:345:13)
    at processMessageQueue (/Users/andrew/.vscode/extensions/jakebecker.elixir-ls-0.15.0/node_modules/vscode-jsonrpc/lib/common/connection.js:362:17)
    at Immediate.<anonymous> (/Users/andrew/.vscode/extensions/jakebecker.elixir-ls-0.15.0/node_modules/vscode-jsonrpc/lib/common/connection.js:334:13)
    at processImmediate (node:internal/timers:466:21)
  ERR an exception was raised:
    ** (CaseClauseError) no case clause matching: nil
        (elixir_sense 2.0.0) lib/elixir_sense/providers/docs.ex:58: ElixirSense.Providers.Docs.mod_fun_docs/9
        (elixir_sense 2.0.0) lib/elixir_sense.ex:74: ElixirSense.docs/3
        (language_server 0.15.0) lib/language_server/providers/hover.ex:24: ElixirLS.LanguageServer.Providers.Hover.hover/4
        (language_server 0.15.0) lib/language_server/server.ex:828: anonymous fn/3 in ElixirLS.LanguageServer.Server.handle_request_async/2: Error: an exception was raised:
    ** (CaseClauseError) no case clause matching: nil
        (elixir_sense 2.0.0) lib/elixir_sense/providers/docs.ex:58: ElixirSense.Providers.Docs.mod_fun_docs/9
        (elixir_sense 2.0.0) lib/elixir_sense.ex:74: ElixirSense.docs/3
        (language_server 0.15.0) lib/language_server/providers/hover.ex:24: ElixirLS.LanguageServer.Providers.Hover.hover/4
        (language_server 0.15.0) lib/language_server/server.ex:828: anonymous fn/3 in ElixirLS.LanguageServer.Server.handle_request_async/2
    at handleResponse (/Users/andrew/.vscode/extensions/jakebecker.elixir-ls-0.15.0/node_modules/vscode-jsonrpc/lib/common/connection.js:565:48)
    at handleMessage (/Users/andrew/.vscode/extensions/jakebecker.elixir-ls-0.15.0/node_modules/vscode-jsonrpc/lib/common/connection.js:345:13)
    at processMessageQueue (/Users/andrew/.vscode/extensions/jakebecker.elixir-ls-0.15.0/node_modules/vscode-jsonrpc/lib/common/connection.js:362:17)
    at Immediate.<anonymous> (/Users/andrew/.vscode/extensions/jakebecker.elixir-ls-0.15.0/node_modules/vscode-jsonrpc/lib/common/connection.js:334:13)
    at processImmediate (node:internal/timers:466:21)

This one looks fine and precedes two logs above:

mainThreadExtensionService.ts:81 [JakeBecker.elixir-ls]%CompileError{file: "/Users/andrew/Projects/os/firezone/elixir/apps/web/test/web/controllers/auth_controller_test.exs", line: 0, description: "cannot compile module Web.AuthControllerTest (errors have been logged)"}
$onExtensionRuntimeError @ mainThreadExtensionService.ts:81
mainThreadExtensionService.ts:82 Error: %CompileError{file: "/Users/andrew/Projects/os/firezone/elixir/apps/web/test/web/controllers/auth_controller_test.exs", line: 0, description: "cannot compile module Web.AuthControllerTest (errors have been logged)"}
    at handleResponse (/Users/andrew/.vscode/extensions/jakebecker.elixir-ls-0.15.0/node_modules/vscode-jsonrpc/lib/common/connection.js:565:48)
    at handleMessage (/Users/andrew/.vscode/extensions/jakebecker.elixir-ls-0.15.0/node_modules/vscode-jsonrpc/lib/common/connection.js:345:13)
    at processMessageQueue (/Users/andrew/.vscode/extensions/jakebecker.elixir-ls-0.15.0/node_modules/vscode-jsonrpc/lib/common/connection.js:362:17)
    at Immediate.<anonymous> (/Users/andrew/.vscode/extensions/jakebecker.elixir-ls-0.15.0/node_modules/vscode-jsonrpc/lib/common/connection.js:334:13)
    at processImmediate (node:internal/timers:466:21)
$onExtensionRuntimeError @ mainThreadExtensionService.ts:82
2console.ts:137 [Extension Host] ElixirLS: Finding tests in  file:///Users/andrew/Projects/os/firezone/elixir/apps/web/test/web/controllers/auth_controller_test.exs
log.ts:441   ERR [DocumentSymbols] Compilation error while parsing source file: Error: [DocumentSymbols] Compilation error while parsing source file
    at handleResponse (/Users/andrew/.vscode/extensions/jakebecker.elixir-ls-0.15.0/node_modules/vscode-jsonrpc/lib/common/connection.js:565:48)
    at handleMessage (/Users/andrew/.vscode/extensions/jakebecker.elixir-ls-0.15.0/node_modules/vscode-jsonrpc/lib/common/connection.js:345:13)
    at processMessageQueue (/Users/andrew/.vscode/extensions/jakebecker.elixir-ls-0.15.0/node_modules/vscode-jsonrpc/lib/common/connection.js:362:17)
    at Immediate.<anonymous> (/Users/andrew/.vscode/extensions/jakebecker.elixir-ls-0.15.0/node_modules/vscode-jsonrpc/lib/common/connection.js:334:13)
    at processImmediate (node:internal/timers:466:21)

@AndrewDryga
Copy link
Author

More logs:

[Error - 13:35:53] GenServer ElixirLS.LanguageServer.Server terminating
** (FunctionClauseError) no function clause matching in IO.chardata_to_string/1
    (elixir 1.15.0) lib/io.ex:671: IO.chardata_to_string(nil)
    (elixir 1.15.0) lib/path.ex:595: Path.split/1
    (elixir 1.15.0) lib/path.ex:328: Path.relative_to/2
    (language_server 0.15.0) lib/language_server/server.ex:432: anonymous fn/3 in ElixirLS.LanguageServer.Server.handle_notification/2
    (elixir 1.15.0) lib/enum.ex:4190: Enum.predicate_list/3
    (language_server 0.15.0) lib/language_server/server.ex:429: ElixirLS.LanguageServer.Server.handle_notification/2
    (language_server 0.15.0) lib/language_server/server.ex:204: ElixirLS.LanguageServer.Server.handle_cast/2
    (stdlib 4.3.1.1) gen_server.erl:1123: :gen_server.try_dispatch/4
Last message: {:"$gen_cast", {:receive_packet, %{"jsonrpc" => "2.0", "method" => "workspace/didChangeWatchedFiles", "params" => %{"changes" => [%{"type" => 2, "uri" => "file:///Users/andrew/Projects/os/firezone/elixir/apps/web/test/web/controllers/auth_controller_test.exs"}]}}}}
State: %ElixirLS.LanguageServer.Server{server_instance_id: "L6IUvJsFo4JFBGhwj-OzM1oqApSTq3RI", build_ref: nil, dialyzer_sup: nil, client_capabilities: %{"general" => %{"markdown" => %{"parser" => "marked", "version" => "1.1.0"}, "positionEncodings" => ["utf-16"], "regularExpressions" => %{"engine" => "ECMAScript", "version" => "ES2020"}, "staleRequestSupport" => %{"cancel" => true, "retryOnContentModified" => ["textDocument/semanticTokens/full", "textDocument/semanticTokens/range", "textDocument/semanticTokens/full/delta"]}}, "notebookDocument" => %{"synchronization" => %{"dynamicRegistration" => true, "executionSummarySupport" => true}}, "textDocument" => %{"callHierarchy" => %{"dynamicRegistration" => true}, "codeAction" => %{"codeActionLiteralSupport" => %{"codeActionKind" => %{"valueSet" => ["", "quickfix", "refactor", "refactor.extract", "refactor.inline", "refactor.rewrite", "source", "source.organizeImports"]}}, "dataSupport" => true, "disabledSupport" => true, "dynamicRegistration" => true, "honorsChangeAnnotations" => false, "isPreferredSupport" => true, "resolveSupport" => %{"properties" => ["edit"]}}, "codeLens" => %{"dynamicRegistration" => true}, "colorProvider" => %{"dynamicRegistration" => true}, "completion" => %{"completionItem" => %{"commitCharactersSupport" => true, "deprecatedSupport" => true, "documentationFormat" => ["markdown", "plaintext"], "insertReplaceSupport" => true, "insertTextModeSupport" => %{"valueSet" => [1, 2]}, "labelDetailsSupport" => true, "preselectSupport" => true, "resolveSupport" => %{"properties" => ["documentation", "detail", "additionalTextEdits"]}, "snippetSupport" => true, "tagSupport" => %{"valueSet" => [1]}}, "completionItemKind" => %{"valueSet" => [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]}, "completionList" => %{"itemDefaults" => ["commitCharacters", "editRange", "insertTextFormat", "insertTextMode"]}, "contextSupport" => true, "dynamicRegistration" => true, "insertTextMode" => 2}, "declaration" => %{"dynamicRegistration" => true, "linkSupport" => true}, "definition" => %{"dynamicRegistration" => true, "linkSupport" => true}, "diagnostic" => %{"dynamicRegistration" => true, "relatedDocumentSupport" => false}, "documentHighlight" => %{"dynamicRegistration" => true}, "documentLink" => %{"dynamicRegistration" => true, "tooltipSupport" => true}, "documentSymbol" => %{"dynamicRegistration" => true, "hierarchicalDocumentSymbolSupport" => true, "labelSupport" => true, "symbolKind" => %{"valueSet" => [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26]}, "tagSupport" => %{"valueSet" => [1]}}, "foldingRange" => %{"dynamicRegistration" => true, "foldingRange" => %{"collapsedText" => false}, "foldingRangeKind" => %{"valueSet" => ["comment", "imports", "region"]}, "lineFoldingOnly" => true, "rangeLimit" => 5000}, "formatting" => %{"dynamicRegistration" => true}, "hover" => %{"contentFormat" => ["markdown", "plaintext"], "dynamicRegistration" => true}, "implementation" => %{"dynamicRegistration" => true, "linkSupport" => true}, "inlayHint" => %{"dynamicRegistration" => true, "resolveSupport" => %{"properties" => ["tooltip", "textEdits", "label.tooltip", "label.location", "label.command"]}}, "inlineValue" => %{"dynamicRegistration" => true}, "linkedEditingRange" => %{"dynamicRegistration" => true}, "onTypeFormatting" => %{"dynamicRegistration" => true}, "publishDiagnostics" => %{"codeDescriptionSupport" => true, "dataSupport" => true, "relatedInformation" => true, "tagSupport" => %{"valueSet" => [1, 2]}, "versionSupport" => false}, "rangeFormatting" => %{"dynamicRegistration" => true}, "references" => %{"dynamicRegistration" => true}, "rename" => %{"dynamicRegistration" => true, "honorsChangeAnnotations" => true, "prepareSupport" => true, "prepareSupportDefaultBehavior" => 1}, "selectionRange" => %{"dynamicRegistration" => true}, "semanticTokens" => %{"augmentsSyntaxTokens" => true, "dynamicRegistration" => true, "formats" => ["relative"], "multilineTokenSupport" => false, "overlappingTokenSupport" => false, "requests" => %{"full" => %{"delta" => true}, "range" => true}, "serverCancelSupport" => true, "tokenModifiers" => ["declaration", "definition", "readonly", "static", "deprecated", "abstract", "async", "modification", "documentation", "defaultLibrary"], "tokenTypes" => ["namespace", "type", "class", "enum", "interface", "struct", "typeParameter", "parameter", "variable", ...]}, "signatureHelp" => %{"contextSupport" => true, "dynamicRegistration" => true, "signatureInformation" => %{"activeParameterSupport" => true, "documentationFormat" => ["markdown", "plaintext"], "parameterInformation" => %{"labelOffsetSupport" => true}}}, "synchronization" => %{"didSave" => true, "dynamicRegistration" => true, "willSave" => true, "willSaveWaitUntil" => true}, "typeDefinition" => %{"dynamicRegistration" => true, "linkSupport" => true}, "typeHierarchy" => %{"dynamicRegistration" => true}}, "window" => %{"showDocument" => %{"support" => true}, "showMessage" => %{"messageActionItem" => %{"additionalPropertiesSupport" => true}}, "workDoneProgress" => true}, "workspace" => %{"applyEdit" => true, "codeLens" => %{"refreshSupport" => true}, "configuration" => true, "diagnostics" => %{"refreshSupport" => true}, "didChangeConfiguration" => %{"dynamicRegistration" => true}, "didChangeWatchedFiles" => %{"dynamicRegistration" => true, "relativePatternSupport" => true}, "executeCommand" => %{"dynamicRegistration" => true}, "fileOperations" => %{"didCreate" => true, "didDelete" => true, "didRename" => true, "dynamicRegistration" => true, "willCreate" => true, "willDelete" => true, "willRename" => true}, "inlayHint" => %{"refreshSupport" => true}, "inlineValue" => %{"refreshSupport" => true}, "semanticTokens" => %{"refreshSupport" => true}, "symbol" => %{"dynamicRegistration" => true, "resolveSupport" => %{"properties" => ["location.range"]}, "symbolKind" => %{"valueSet" => [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, ...]}, "tagSupport" => %{"valueSet" => [1]}}, "workspaceEdit" => %{"changeAnnotationSupport" => %{"groupsOnLabel" => true}, "documentChanges" => true, "failureHandling" => "textOnlyTransactional", "normalizesLineEndings" => true, "resourceOperations" => ["create", "rename", "delete"]}}}, root_uri: "file:///Users/andrew/Projects/os/firezone", project_dir: nil, settings: nil, build_diagnostics: [], dialyzer_diagnostics: [], needs_build?: false, build_running?: false, analysis_ready?: false, received_shutdown?: false, requests: %{}, source_files: %{"file:///Users/andrew/Projects/os/firezone/elixir/apps/domain/lib/domain/auth.ex" => %ElixirLS.LanguageServer.SourceFile{text: "defmodule Domain.Auth do\n  use Supervisor\n  alias Domain.{Repo, Config, Validator}\n  alias Domain.{Accounts, Actors}\n  alias Domain.Auth.{Authorizer, Subject, Context, Permission, Roles, Role, Identity}\n  alias Domain.Auth.{Adapters, Provider}\n\n  @default_se (truncated)
[Info  - 13:35:53] Connection to server got closed. Server will restart.
true
Running /Users/andrew/.vscode/extensions/jakebecker.elixir-ls-0.15.0/elixir-ls-release/launch.sh
Looking for ASDF install in /opt/homebrew/opt/asdf/libexec/asdf.sh
ASDF found, relaunching in bash shell
Looking for ASDF install in /opt/homebrew/opt/asdf/libexec/asdf.sh
Sourcing ASDF
Installing ElixirLS release v0.15.0
Running in /Users/andrew/Projects/os/firezone
Install complete
[Info  - 13:35:55] Started ElixirLS v0.15.0
[Info  - 13:35:55] Running in /Users/andrew/Projects/os/firezone
[Info  - 13:35:55] ElixirLS built with elixir "1.15.0" on OTP "25"
[Info  - 13:35:55] Running on elixir "1.15.0 (compiled with Erlang/OTP 25)" on OTP "25"
[Info  - 13:35:55] Protocols are not consolidated
[Info  - 13:35:55] Elixir sources not found (checking in /home/runner/work/elixir/elixir). Code navigation to Elixir modules disabled.
[Warn  - 13:35:59] error: module Web is not loaded and could not be found
  nofile: Web.Router (module)

[Warn  - 13:35:59] ** (ErlangError) Erlang error: "CompileError during metadata build pre:\nnofile: cannot compile module Web.Router (errors have been logged)\nast node: {:use, [line: 2, column: 3], [{:__aliases__, [line: 2, column: 7], [:Web]}, :router]}"
    (elixir 1.15.0) src/elixir_expand.erl:92: :elixir_expand.expand/3
    (elixir 1.15.0) src/elixir_expand.erl:538: :elixir_expand.expand_block/5
    (elixir 1.15.0) src/elixir_expand.erl:46: :elixir_expand.expand/3
    (elixir 1.15.0) src/elixir.erl:441: :elixir.quoted_to_erl/4
    (elixir 1.15.0) src/elixir.erl:342: :elixir.eval_forms/4
    (elixir 1.15.0) lib/module/parallel_checker.ex:112: Module.ParallelChecker.verify/1
    (elixir 1.15.0) lib/code.ex:543: Code.validated_eval_string/3

[Warn  - 13:35:59] error: module Web is not loaded and could not be found
  nofile: Web.Router (module)
 ERR an exception was raised:
    ** (CaseClauseError) no case clause matching: {:list, nil}
        (elixir_sense 2.0.0) lib/elixir_sense/providers/docs.ex:58: ElixirSense.Providers.Docs.mod_fun_docs/9
        (elixir_sense 2.0.0) lib/elixir_sense.ex:74: ElixirSense.docs/3
        (language_server 0.15.0) lib/language_server/providers/hover.ex:24: ElixirLS.LanguageServer.Providers.Hover.hover/4
        (language_server 0.15.0) lib/language_server/server.ex:828: anonymous fn/3 in ElixirLS.LanguageServer.Server.handle_request_async/2: Error: an exception was raised:
    ** (CaseClauseError) no case clause matching: {:list, nil}
        (elixir_sense 2.0.0) lib/elixir_sense/providers/docs.ex:58: ElixirSense.Providers.Docs.mod_fun_docs/9
        (elixir_sense 2.0.0) lib/elixir_sense.ex:74: ElixirSense.docs/3
        (language_server 0.15.0) lib/language_server/providers/hover.ex:24: ElixirLS.LanguageServer.Providers.Hover.hover/4
        (language_server 0.15.0) lib/language_server/server.ex:828: anonymous fn/3 in ElixirLS.LanguageServer.Server.handle_request_async/2
    at handleResponse (/Users/andrew/.vscode/extensions/jakebecker.elixir-ls-0.15.0/node_modules/vscode-jsonrpc/lib/common/connection.js:565:48)
    at handleMessage (/Users/andrew/.vscode/extensions/jakebecker.elixir-ls-0.15.0/node_modules/vscode-jsonrpc/lib/common/connection.js:345:13)
    at processMessageQueue (/Users/andrew/.vscode/extensions/jakebecker.elixir-ls-0.15.0/node_modules/vscode-jsonrpc/lib/common/connection.js:362:17)
    at Immediate.<anonymous> (/Users/andrew/.vscode/extensions/jakebecker.elixir-ls-0.15.0/node_modules/vscode-jsonrpc/lib/common/connection.js:334:13)
    at processImmediate (node:internal/timers:466:21)
ERR an exception was raised:
    ** (FunctionClauseError) no function clause matching in String.split/3
        (elixir 1.15.0) lib/string.ex:478: String.split(nil, "\n\n", [parts: 2])
        (language_server 0.15.0) lib/language_server/providers/hover.ex:59: ElixirLS.LanguageServer.Providers.Hover.add_hexdocs_link/3
        (language_server 0.15.0) lib/language_server/providers/hover.ex:54: ElixirLS.LanguageServer.Providers.Hover.contents/3
        (language_server 0.15.0) lib/language_server/providers/hover.ex:32: ElixirLS.LanguageServer.Providers.Hover.hover/4
        (language_server 0.15.0) lib/language_server/server.ex:828: anonymous fn/3 in ElixirLS.LanguageServer.Server.handle_request_async/2: Error: an exception was raised:
    ** (FunctionClauseError) no function clause matching in String.split/3
        (elixir 1.15.0) lib/string.ex:478: String.split(nil, "\n\n", [parts: 2])
        (language_server 0.15.0) lib/language_server/providers/hover.ex:59: ElixirLS.LanguageServer.Providers.Hover.add_hexdocs_link/3
        (language_server 0.15.0) lib/language_server/providers/hover.ex:54: ElixirLS.LanguageServer.Providers.Hover.contents/3
        (language_server 0.15.0) lib/language_server/providers/hover.ex:32: ElixirLS.LanguageServer.Providers.Hover.hover/4
        (language_server 0.15.0) lib/language_server/server.ex:828: anonymous fn/3 in ElixirLS.LanguageServer.Server.handle_request_async/2
    at handleResponse (/Users/andrew/.vscode/extensions/jakebecker.elixir-ls-0.15.0/node_modules/vscode-jsonrpc/lib/common/connection.js:565:48)
    at handleMessage (/Users/andrew/.vscode/extensions/jakebecker.elixir-ls-0.15.0/node_modules/vscode-jsonrpc/lib/common/connection.js:345:13)
    at processMessageQueue (/Users/andrew/.vscode/extensions/jakebecker.elixir-ls-0.15.0/node_modules/vscode-jsonrpc/lib/common/connection.js:362:17)
    at Immediate.<anonymous> (/Users/andrew/.vscode/extensions/jakebecker.elixir-ls-0.15.0/node_modules/vscode-jsonrpc/lib/common/connection.js:334:13)
    at processImmediate (node:internal/timers:466:21)
ElixirLS has crashed. See Output panel.
c @ notificationsAlerts.ts:42
console.ts:137 [Extension Host] rejected promise not handled within 1 second: Error: exited in: GenServer.call(ElixirLS.LanguageServer.ExUnitTestTracer, {:get_tests, "/Users/andrew/Projects/os/firezone/elixir/apps/web/test/web/controllers/auth_controller_test.exs"}, :infinity)
    ** (EXIT) an exception was raised:
        ** (UndefinedFunctionError) function Web.Router.__verify_route__/1 is undefined (module Web.Router is not available)
            (web 0.0.0+git.0.deadbeef) Web.Router.__verify_route__([])
            (phoenix 1.7.5) lib/phoenix/verified_routes.ex:686: Phoenix.VerifiedRoutes.match_route?/2
            (phoenix 1.7.5) lib/phoenix/verified_routes.ex:166: anonymous fn/1 in Phoenix.VerifiedRoutes.__verify__/1
            (elixir 1.15.0) lib/enum.ex:984: Enum."-each/2-lists^foreach/1-0-"/2
            (elixir 1.15.0) lib/module/parallel_checker.ex:271: Module.ParallelChecker.check_module/3
            (elixir 1.15.0) lib/module/parallel_checker.ex:82: anonymous fn/6 in Module.ParallelChecker.spawn/4
y @ console.ts:137
console.ts:137 [Extension Host] stack trace: Error: exited in: GenServer.call(ElixirLS.LanguageServer.ExUnitTestTracer, {:get_tests, "/Users/andrew/Projects/os/firezone/elixir/apps/web/test/web/controllers/auth_controller_test.exs"}, :infinity)
    ** (EXIT) an exception was raised:
        ** (UndefinedFunctionError) function Web.Router.__verify_route__/1 is undefined (module Web.Router is not available)
            (web 0.0.0+git.0.deadbeef) Web.Router.__verify_route__([])
            (phoenix 1.7.5) lib/phoenix/verified_routes.ex:686: Phoenix.VerifiedRoutes.match_route?/2
            (phoenix 1.7.5) lib/phoenix/verified_routes.ex:166: anonymous fn/1 in Phoenix.VerifiedRoutes.__verify__/1
            (elixir 1.15.0) lib/enum.ex:984: Enum."-each/2-lists^foreach/1-0-"/2
            (elixir 1.15.0) lib/module/parallel_checker.ex:271: Module.ParallelChecker.check_module/3
            (elixir 1.15.0) lib/module/parallel_checker.ex:82: anonymous fn/6 in Module.ParallelChecker.spawn/4
    at handleResponse (/Users/andrew/.vscode/extensions/jakebecker.elixir-ls-0.15.0/node_modules/vscode-jsonrpc/lib/common/connection.js:565:48)
    at handleMessage (/Users/andrew/.vscode/extensions/jakebecker.elixir-ls-0.15.0/node_modules/vscode-jsonrpc/lib/common/connection.js:345:13)
    at processMessageQueue (/Users/andrew/.vscode/extensions/jakebecker.elixir-ls-0.15.0/node_modules/vscode-jsonrpc/lib/common/connection.js:362:17)
    at Immediate.<anonymous> (/Users/andrew/.vscode/extensions/jakebecker.elixir-ls-0.15.0/node_modules/vscode-jsonrpc/lib/common/connection.js:334:13)
    at processImmediate (node:internal/timers:466:21)
y @ console.ts:137
mainThreadExtensionService.ts:81 [JakeBecker.elixir-ls]exited in: GenServer.call(ElixirLS.LanguageServer.ExUnitTestTracer, {:get_tests, "/Users/andrew/Projects/os/firezone/elixir/apps/web/test/web/controllers/auth_controller_test.exs"}, :infinity)
    ** (EXIT) an exception was raised:
        ** (UndefinedFunctionError) function Web.Router.__verify_route__/1 is undefined (module Web.Router is not available)
            (web 0.0.0+git.0.deadbeef) Web.Router.__verify_route__([])
            (phoenix 1.7.5) lib/phoenix/verified_routes.ex:686: Phoenix.VerifiedRoutes.match_route?/2
            (phoenix 1.7.5) lib/phoenix/verified_routes.ex:166: anonymous fn/1 in Phoenix.VerifiedRoutes.__verify__/1
            (elixir 1.15.0) lib/enum.ex:984: Enum."-each/2-lists^foreach/1-0-"/2
            (elixir 1.15.0) lib/module/parallel_checker.ex:271: Module.ParallelChecker.check_module/3
            (elixir 1.15.0) lib/module/parallel_checker.ex:82: anonymous fn/6 in Module.ParallelChecker.spawn/4
$onExtensionRuntimeError @ mainThreadExtensionService.ts:81
mainThreadExtensionService.ts:82 Error: exited in: GenServer.call(ElixirLS.LanguageServer.ExUnitTestTracer, {:get_tests, "/Users/andrew/Projects/os/firezone/elixir/apps/web/test/web/controllers/auth_controller_test.exs"}, :infinity)
    ** (EXIT) an exception was raised:
        ** (UndefinedFunctionError) function Web.Router.__verify_route__/1 is undefined (module Web.Router is not available)
            (web 0.0.0+git.0.deadbeef) Web.Router.__verify_route__([])
            (phoenix 1.7.5) lib/phoenix/verified_routes.ex:686: Phoenix.VerifiedRoutes.match_route?/2
            (phoenix 1.7.5) lib/phoenix/verified_routes.ex:166: anonymous fn/1 in Phoenix.VerifiedRoutes.__verify__/1
            (elixir 1.15.0) lib/enum.ex:984: Enum."-each/2-lists^foreach/1-0-"/2
            (elixir 1.15.0) lib/module/parallel_checker.ex:271: Module.ParallelChecker.check_module/3
            (elixir 1.15.0) lib/module/parallel_checker.ex:82: anonymous fn/6 in Module.ParallelChecker.spawn/4
    at handleResponse (/Users/andrew/.vscode/extensions/jakebecker.elixir-ls-0.15.0/node_modules/vscode-jsonrpc/lib/common/connection.js:565:48)
    at handleMessage (/Users/andrew/.vscode/extensions/jakebecker.elixir-ls-0.15.0/node_modules/vscode-jsonrpc/lib/common/connection.js:345:13)
    at processMessageQueue (/Users/andrew/.vscode/extensions/jakebecker.elixir-ls-0.15.0/node_modules/vscode-jsonrpc/lib/common/connection.js:362:17)
    at Immediate.<anonymous> (/Users/andrew/.vscode/extensions/jakebecker.elixir-ls-0.15.0/node_modules/vscode-jsonrpc/lib/common/connection.js:334:13)
    at processImmediate (node:internal/timers:466:21)

@AndrewDryga
Copy link
Author

If I restart VSCode and have elixir-ls running and then just do "Restart language server" command here is what I get:

Outputs:

Info  - 13:43:59] ElixirLS will restart
[Info  - 13:44:00] Connection to server got closed. Server will restart.
true
Running /Users/andrew/.vscode/extensions/jakebecker.elixir-ls-0.15.0/elixir-ls-release/launch.sh
Looking for ASDF install in /opt/homebrew/opt/asdf/libexec/asdf.sh
ASDF found, relaunching in bash shell
Looking for ASDF install in /opt/homebrew/opt/asdf/libexec/asdf.sh
Sourcing ASDF
Installing ElixirLS release v0.15.0
Running in /Users/andrew/Projects/os/firezone
Install complete
[Info  - 13:44:02] Started ElixirLS v0.15.0
[Info  - 13:44:02] Running in /Users/andrew/Projects/os/firezone
[Info  - 13:44:02] ElixirLS built with elixir "1.15.0" on OTP "25"
[Info  - 13:44:02] Running on elixir "1.15.0 (compiled with Erlang/OTP 25)" on OTP "25"
[Info  - 13:44:02] Protocols are not consolidated
[Info  - 13:44:02] Elixir sources not found (checking in /home/runner/work/elixir/elixir). Code navigation to Elixir modules disabled.
[Info  - 13:44:07] Experimental server is disabled.
[Warn  - 13:44:07] Did not receive workspace/didChangeConfiguration notification after 5 seconds. Using default settings.
[Info  - 13:44:07] Starting build with MIX_ENV: test MIX_TARGET: host
[Warn  - 13:44:07] No mixfile found in project. To use a subdirectory, set `elixirLS.projectDir` in your settings. Looked for mixfile at "/Users/andrew/Projects/os/firezone/mix.exs"
[Info  - 13:44:07] Compile took 2 milliseconds
[Info  - 13:44:07] [ElixirLS WorkspaceSymbols] Indexing...
[Info  - 13:44:08] [ElixirLS WorkspaceSymbols] Module discovery complete
[Info  - 13:44:08] [ElixirLS WorkspaceSymbols] 24 callbacks added to index
[Info  - 13:44:08] [ElixirLS WorkspaceSymbols] 242 modules added to index
[Info  - 13:44:08] [ElixirLS WorkspaceSymbols] 417 types added to index
[Info  - 13:44:09] [ElixirLS WorkspaceSymbols] 4804 functions added to index

Developer Console:

notificationsAlerts.ts:42 ElixirLS has crashed. See Output panel.
c @ notificationsAlerts.ts:42
(anonymous) @ notificationsAlerts.ts:28
invoke @ event.ts:862
deliver @ event.ts:1075
fire @ event.ts:1031
addNotification @ notifications.ts:204
notify @ notificationService.ts:175
(anonymous) @ mainThreadMessageService.ts:77
d @ mainThreadMessageService.ts:42
$showMessage @ mainThreadMessageService.ts:36
N @ rpcProtocol.ts:455
M @ rpcProtocol.ts:440
H @ rpcProtocol.ts:370
G @ rpcProtocol.ts:296
(anonymous) @ rpcProtocol.ts:161
invoke @ event.ts:862
deliver @ event.ts:1075
fire @ event.ts:1031
fire @ ipc.net.ts:671
ee.onmessage @ localProcessExtensionHost.ts:583

@lukaszsamson
Copy link
Collaborator

Those two errors may be the reason

[Error - 13:35:53] GenServer ElixirLS.LanguageServer.Server terminating
** (FunctionClauseError) no function clause matching in IO.chardata_to_string/1
    (elixir 1.15.0) lib/io.ex:671: IO.chardata_to_string(nil)
    ** (EXIT) an exception was raised:
        ** (UndefinedFunctionError) function Web.Router.__verify_route__/1 is undefined (module Web.Router is not available)
            (web 0.0.0+git.0.deadbeef) Web.Router.__verify_route__([])

The rest ones are harmless - just an outcome of parsing fail during editing

@AndrewDryga
Copy link
Author

I think the second one is just a result of a typo in a router which prevented it from compiling. The first one was out of nowhere.

@lukaszsamson
Copy link
Collaborator

The first crash is due to handling workspace/didChangeWatchedFiles when project_dir not yet set. While that can be fixed with a simple nil check it exposed a bigger problem. After executing restart custom command VSCode is restarting the serve, sending initialize, initialized but then it is not sending workspace/didChangeConfiguration with elixirLS.projectDir. The server waits 5s and logs Did not receive workspace/didChangeConfiguration notification after 5 seconds. Using default settings.. The default settings assume that elixirLS.projectDir is "" which means that workspace root is used. Some architecture changes are needed to fix that, (using workspace/configuration reverse request or caching configuration on the server side).

In the meanwhile, instead of using projectDir I suggest multi-root workspace with workspace folders firezone and firezone/elixir.

The second one is a crash in elixir compiler likely caused by race condition between compiler and test tracer. It seems we need to put the test tracer under a build lock.

AndrewDryga added a commit to firezone/firezone that referenced this issue Jun 28, 2023
My editor failed here due to a bug: elixir-lsp/vscode-elixir-ls#345
jamilbk pushed a commit to firezone/firezone that referenced this issue Jul 12, 2023
Make all tests pass

I removed some of VPN/Wall settings (they are irrelevant once we move out gateway) along with port-based rules conditions (since we are moving to userspace wg).

Make sure that container can be built and run in PR CI step

Remove omnibus install scripts

Bring ecto.* helpers back to life

Fix priv/repo path

Add skeleton of API app

Add client, gateway, relay boilerplate code

Drop REST API boilerplate for now

Add primitive tests and more structure for API app

Control channels for Clients, Relays and Gateways (#1551)

Replace web app with a new one based on Tailwind and esbuild (#1568)

Re-enable SQL sandboxing for Phoenix apps

Bring back browser/config.xml

Remove unused import

Remove unused docker-compose file

Add minimal scaffholding for relay

Install necessary components for toolchain

Avoid concurrent jobs

Move everything to a workspace

Move gitignore and lockfile to workspace root

Move rust-toolchain to workspace root

Add caching to CI

Update .github/workflows/rust.yml

Signed-off-by: Thomas Eizinger <thomas@eizinger.io>

Implement basic STUN server (#1603)

This is an alternative to #1602
that implements the server using a library I've found called
`stun_codec`.

It already has support for parsing a variety of attributes.

The following is a nice website to test some of the functionality:
https://icetest.info/

The server is still listening on:
`ec2-3-89-112-240.compute-1.amazonaws.com:3478`.

Install Rust before computing cache keys (#1606)

Enforce no warnings in docs (#1605)

relay: Parse and respond to allocation requests (#1604)

With this patch, the relay can parse and respond to allocation requests. I
ran some basics tests against https://icetest.info/ and implemented a
regression test as a result of the logged data.

In writing this, I also had to slightly change the design of `Server`
(as expected). Event handlers for incoming data now do not return a
message directly. Instead, the caller is responsible to drain `Command`s
from it.

When creating an allocation, we need to start listening on a new port.
This needs to happen outside the `Server` as I am going for a sans-IO
style. We emit a `Command` that instructs the main event loop to listen
on a new port. Any incoming data on that port will be forwarded to the
`Server`.

At the moment, this incoming data is just dropped. This is actually
standards-compliant because we cannot handle binding requests yet which
would allow this data to be forwarded to the client.

In some areas, the code is still a bit rough but I expect to iron those
things out as we go along.

relay: add basic README (#1611)

relay: refresh allocations (#1610)

relay: don't repeat magic numbers througout the code (#1612)

A small refactoring to keep magic numbers only in one place.

relay: remember allocations by port (#1613)

Instead of remembering the used ports separately, we store a reference
to each allocation by port.

ci: remove broken workflows (#1614)

These workflows are all red which is expected as far as I understand.
I'd suggest we remove them to reduce the noise when reviewing PRs.

In case we ever wanted to bring parts of it back, Git is our best
friend.

Feel free to close if you think differently.

Update workflows for cloud chaos (#1615)

Updating workflows to skip on PR and run on merges to `cloud`.

IAM context (#1577)

Things I've left for later to IAM:
1. Subject session expiration (to prevent session extension attacks);
2. UserPass adapter;
3. Token adapter and removal of APITokens in favor of `api_client` actor
with a Token provider;
4. Cleanup of Configurations schema and table
5. SCIM
6. Groups and Actor Profile (name, email) Sync
7. Email delivery once Web app is done with the templates
8. We might also want to persist sessions to database, to then show list
of active sessions to the user and allow to terminate some of them from
UI
9. SAML?
10. Rename `unprivileged` role name to `end_user`
11. Add `first_` and `last_name`, and sync/edit blocking logic around
it.
12. Rename Clients to Devices?

Fix PR-labeler config (#1623)

Fix PR labeler config 🤞

fix(relay): use correct variable (#1617)

We had a semantic conflict here that resulted in a broken build. This PR
fixes that.

Co-authored-by: Jamil <jamilbk@users.noreply.github.com>

1.0 views (part 1) (#1599)

- [x] Users
- [x] Groups
- [x] Devices
- [x] Gateways

relay: create channel bindings and relay data (#1618)

Here is a short demo:

[Relay](https://github.com/firezone/firezone/assets/5486389/c0199294-70ca-47b4-90ae-2c96428bdb56)

You can run this locally using the `./run_smoke_test.sh` shell-script.
It is not reliable enough yet to be used in CI but I used one if its
outputs to make a regression test.

---------

Co-authored-by: Jamil <jamilbk@users.noreply.github.com>

Implementing channels logic (#1619)

Fix minor bugs and tidy up existing work on new views (#1628)

Just fixing some bugs and inconsistencies I found while going through
the new views.

Fix some of TODOs left from IAM PR (#1627)

Move elixir code to a subfolder (#1631)

refactor(relay): introduce type-safe `Server` APIs (#1630)

We introduce dedicated types for each message that the `Server` can
handle. This allows us to make the functions public because the
type-system now guarantees that those are either parsed from bytes or
constructed with the correct data.

The latter will be useful to write tests against a richer API.

Deployment for the cloud version (#1638)

TODO:
- [x] Cluster formation for all API and web nodes
- [x] Injest Docker logs to Stackdriver
- [x] Fix assets building for prod

To finish later:
- [ ] Structured logging:
https://issuetracker.google.com/issues/285950891
- [ ] Better networking policy (eg. use public postmark ranges and deny
all unwanted egress)
- [ ] OpenTelemetry collector for Google Stackdriver
- [ ] LoggerJSON.Plug integration

---------

Signed-off-by: Andrew Dryga <andrew@dryga.com>
Co-authored-by: Jamil <jamilbk@users.noreply.github.com>

Set correct outbound email in local env

Try to fix CI step

relay: implement authentication (#1641)

Remove Elixir checks from pre-commit hook and rename CI step that runs it

Always run Elixir CI checks when code in main branch changed

Fix typos

Run pre-commit CI step on all PRs

Add newlines in the end of files

Add resource type and expose it in WS API along with name (#1649)

Additionally:
1. Fixed ipv6 formatting for stun/turn addresses
2. Fixed a tests that check for race conditions concurrently

Normalize CIDR resource addresses

Remove outdated TODO

feat(rust): bump to new stable release 1.70.0 (#1648)

Continuous delivery to staging (#1655)

Add terraform code owners

Lave a note on workflow_run feature and fix checkout feature

Experiment with condition

Workflow is not picked up by GitHub for some reason

Try a different CI setup

Add missing on_workflow call

Remove copy-pasted required inputs

Fix races for concurrency control

Inherit secrets to child workflows

Fix path to versions file

Rename pre-commit step

Bump checkout action vsn in rust workflow

Try pushing update using GH API

Fix github branch name

Do not attempt to persist tag versions back to the repo

Add missing env for terraform workflow

Try to wrap tf vars in backticks

Add double quotes to the var itself

Fix assets pipeline, add Elixir deps audit, add Android applink manifest (#1659)

feat(relay): implement nonces for authentication (#1654)

To complete the authentication scheme for the relay, we need to prompt
the client with a nonce when they send an unauthenticated request. The
semantic meaning of a nonce is opaque to the client. As a starting
point, we implement a count-based scheme. Each nonce is valid for 10
requests. After that, a request will be rejected with a 401 and the
client has to authenticate with a new nonce.

This scheme provides a basic form of replay-protection.

feat(relay): provide a commandline interface using clap (#1658)

This saves us several lines of code and allows usage of the relay via
commandline arguments in addition to env variables. Note that because of
`#[arg(env)]`, all of these can still be configured via environment
variables too.

feat(relay): add Dockerfile (#1661)

This adds a basic Dockerfile for the relay so users and devs can easily
start it.

fix(relay): treat `stamp_secret` as string (#1660)

Previously, the relay would treat the `stamp_secret` internally as bytes and share it with the outside world as hex-string. The portal however treats it as an opaque string and uses the UTF-8 bytes to create username and password.

This patch aligns the relay's functionality with the portal and stores the `stamp_secret` internally as a string.

ci: specify workspace directory for cache action correctly (#1663)

ci: install musl target via `rust-toolchain.toml` file (#1664)

Targets specified in the `rust-toolchain.toml` file are automatically installed by `rustup`. This avoid setup steps for other devs and also simplifies the CI setup.

To be able to compile native code to musl, we do need `musl-gcc` which comes with the `musl-tools` package on ubuntu.

feat(relay): connect to portal on startup (#1643)

With this PR, the relay can be configured with a WebSocket URL on startup. If given, it will attempt to connect to it and join the `relay` room with its `stamp_secret`. Once the `init` message is received, regular relay operation will begin.

jamilbk%feat/stub website in cloud (#1675)

* Remove `www/`
* Stub empty `website/` to silence Vercel. This shouldn't cause
conflicts when we merge `cloud` to `master`. Perhaps we want to start
working off `master` soon, and move the current tip of master to
`legacy`?

Use pnpm over yarn (#1678)

Did some research when picking a package manager for the website and
settled on `pnpm` for the following reasons:

- CLI-compatible with `npm`
- Typically faster than even `yarn` especially on Apple silicon
- Security: Pnpm uses a different dependency resolution algorithm and
different folder structure of node_modules that prevents illegal access
to packages by other packages.

I think I caught all the places, but I may be missing something, so if
this isn't a good idea we can revert back.

This PR also cleans up the actions workflows to remove dead code.

Use pnpm for asset setup too (#1681)

Add pnpm to runners (#1683)

Found another place where pnpm needs to be added.

Hotifx seeds and references (#1689)

connlib: moves it to the main firezone library

 This brindgs connlib from its own separated repo to firezone's monorepo.

 On top of bringing connlib we also add and unify the Dockerfile for all
 rust binaries and add a docker-compose that can run a headless client, a
 relay and a gateway which eventually will test the whole flow between a
 client and a resource. For this to work we also incorporated some elixir
 scripts to generate portal tokens for those components.

Do not expire encoded Gateway/Relay tokens

Fix API error rendering

Render error when public key is reused

Fix stub module name

Remove outdated env files

rust: fix dockerfile for building multiple images in parallel (#1699)

When using `docker compose build` or any other way of building docker
images in parallel the way the cache was working with the rust's
Dockerfile made the caches between images overlap and corrupt each
other. We add a `locked` which prevents multiple writers to the same
cache to fix this behaviour.

Return changeset on name suffix constraint error

docker: fix building for macos (#1700)

There are problems building the docker images in macos using musl due to
ring's problems therefore we started using slim-debian with glibc for
development.

Authentication for the live app (#1674)

Co-authored-by: Jamil <jamilbk@users.noreply.github.com>

portal: Policies CRUD views (#1692)

@AndrewDryga ~~Was still hitting some redirect issues so I'll wait for
those to be resolved before continuing on building more views.~~ Edit:
After some sleep and coffee, I figured it out. Nice work on the sign in
form!

I went ahead and scoped existing dashboard links with `@account` and
fixed a dark mode issue -- you may want to cherry-pick those commits.
I'll add these to authenticated routes and integrate into what you have
so far.

As I was going through last night exploring your route approach I
thought of some edge cases; can discuss next week. I think the main one
that came to mind was that we probably want to differentiate between
login flows initiated directly in the browser (this is an admin logging
into the dashboard) vs login flows initiated from a client app (these
will terminate with a final redirect to respective `dest` whitelisted
URL). Maybe it makes sense to segregate these flows?

If a regular user tries login directly from the browser maybe we want to
show them something like "Please login from your Firezone application
instead" as they should only be able to initiate logins from a client
application. Or maybe there's simply no possibility to end up at the
final Android App Link or `firezone://` URI with a login initiated
directly from the browser?

portal: Status indicator badge (#1703)

Did some research on status page providers to manage incidents.
statuspage.io seems to be easy to use and cost-effective, fairly popular
and provides a good amount of flexibility to customize emails,
notifications, etc.

Super easy to set up and use but am not married to it if anyone feels
strongly about using another incident management service.

https://firezone.statuspage.io

<img width="235" alt="Screenshot 2023-06-27 at 8 07 29 AM"
src="https://github.com/firezone/firezone/assets/167144/8ad12b9b-7345-4a5d-bf43-c8af798d85f9">

Fix compilation warnings that are not fixed in merged PRs

Do not render ipv6 relay address if it's nil

CONTRIBUTING.md updates (#1704)

**Update CONTRIBUTING.md**

Why:

* The CONTRIBUTING.md doc seems to have fallen slightly out of date with
      how Firezone now works.  This commit updates the doc to provide a
quick start guide for getting all of the various Firezone components
up and running as quick as possible. The doc then links to the more
      specific `Elixir` and `Rust` README.md files in the respective
      directories to help developers who would like to contribute.

**Update docker-compose vault health check**

 Why:

* The current Vault health check listed in the docker-compose file does
not seem to be working when using `localhost` in the `wget` command.
      Updating the URL to use `127.0.0.1` seems to have fixed it.

---------

Signed-off-by: bmanifold <bmanifold@users.noreply.github.com>
Co-authored-by: Jamil <jamilbk@users.noreply.github.com>

Fix formatting issue

My editor failed here due to a bug: elixir-lsp/vscode-elixir-ls#345

connlib: Improve FFI bridges for Apple and Android (#1691)

This makes it possible to build the Apple/Android FFI bridges and
integrate them with their respective client apps.

---------

Signed-off-by: Francesca Lovebloom <franlovebloom@gmail.com>
Co-authored-by: Roopesh Chander <roop@roopc.net>

Fix/docker compose up (#1705)

This PR fixes `docker compose up` but it doesn't have the test client ->
resource flow working but it prevent anything from erroring at startup.

This fixes:
* tokens (use the correct token for the client user agent we are using)
* randomize `name_suffix` at start up for connlib (we will eventually
allow options to set it manually)
* remove port ranges for relay (see firezone/corp#613)

fix(relay): ensure smoke test script fails on error (#1711)

Due to a silly bash mistake (I hate bash), the error from the gateway
binary wasn't actually propagated to the script. Thus, we did not notice
that it was been broken for a while.

Attempting to fix it turned up that we were double-hexing the relay
secret and using invalid passwords for the clients.

fix(connlib): format with `cargo fmt` (#1709)

Runs `cargo fmt` on the entire `rust/` directory. This somehow doesn't
seem to be enforced, I think that is because we changed the previous CI
to now only run for the `relay` crate.

I'd like to merge this first to avoid the diff and in a 2nd PR, we can
work on unifying CI again.

fix(relay): remove smoke test CI script (#1717)

Unfortunately, this doesn't seem to be stable. I don't really understand
why. Judging from the logs, the problem is not in the relay but somehow
the final UDP packet doesn't arrive at the `gateway` binary.

To not unnecessarily block other PRs, I am removing the check for now.

Add more websocat examples for connecting to a resource

Wait for client and gateway containers for api to become ready

Add docs section to see if everything is connected to the panel

Explicitly subscribe to id channels

Looks like for some reason the id/1 callback doesn't subscribe the channel process any more (only the socket itself), so we are doing that explicitly now.

Stub out client app directories in monorepo structure (#1716)

Stubs out the client app dirs and basic CI workflow for the client apps
in preparation to move them into this repository.

After this is merged @roop @pratikvelani you should be able to add the
client repos here.

chore: unify and optimize Rust CI (#1710)

- Instead of having two, very similar jobs, we run our fmt, clippy and
tests steps across all crates and operating systems.
- We remove the dependency of the android and apple builds on the tests
and thus get faster feedback.
- We force clippy to fail on any warning. This one is super important
IMO. Warnings in Rust are very useful and ignoring them can lead to bugs
(think "unused Result" etc).

Resolves #1714.

---------

Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Francesca Lovebloom <franlovebloom@gmail.com>

connlib: Connection mock (#1721)

Resolves firezone/corp#607

Setting the env var `CONNLIB_MOCK` when building through either
`build-rust.sh` or `gradle` will activate the `mock` feature.

Attempt to enable merge queue (#1713)

https://docs.github.com/en/actions/using-workflows/events-that-trigger-workflows#merge_group

Feat/connlib full flow (#1722)

With this PR the full control-plane message flow is working.

Meaning that if you do:

```
docker compose up -d
docker compose exec -it client "ping 172.20.0.2" # will fix this IP later
```

Messages start flowing to gateway. The gateway still not correctly
forwards the messages to the resource since masquerading is still not
working, although I suspect there might be an additional problem. Will
fix this in my next PR along with a README on how to test this whole
flow.

This PR also fixes how we sent the stamp secret to the gateway from the
relay, but I still see some warnings in the webrtc that I'm sure that
are due to a mismatch between how webrtc-rs and the relay handle
messages (The most important being `bind() failed: unexpected response
type`), I will take a look at that and a way to test that the flow works
when:
1. hole-punching is available
2. through relay when it's not
Since the flow right now works without hole-punching or relay since the
gateway is in the same network in the docker compose.

Bump Elixir/OTP versions (#1730)

Bump versions in Dockerfile

Fix flaky tests

docs(relay): bring README.md up to date (#1718)

Drop invalid cache restore keys

Fix ubuntu 20.04 CI (#1734)

add a prefix key with host os to rust test job to prevent caching issues

CI: add a flow that test client to resource ping (#1729)

This PR fixes a bunch of small things to allow a new flow to test
clients pinging a resource within docker compose.

Masquerade/Forwarding is enabled directly in the container for now, this
might change in the future.

Also added a README to be able to run this locally.

---------

Signed-off-by: Gabi <gabrielalejandro7@gmail.com>
Co-authored-by: Jamil <jamilbk@users.noreply.github.com>

feat(relay): default portal URL (#1719)

Instead of having portal URL and token optional, we default the portal
URL and decide based on the presence of the token, whether we should
connect to the portal on startup. This allows the relay to be
used/tested standalone and keeps the number of config options and error
cases small.

We require the user to config the full path of the websocket and thus
avoid the need for duplicating the connlib function. Given that most
users will never need to override this option, this seems like a good
trade-off.

Resolves firezone/corp#614.

Feat/connlib handle error messages (#1735)

With this PR we handle in the client an error message due to
gateway/relay although rate limiting is needed.

Waiting for #1729 to be merged.

portal: Stub out Settings views (#1702)

Adds Setting UI views based on the Balsamiq Wireframes. This should be
merged **after** #1679
<img width="1469" alt="Screenshot 2023-06-26 at 4 48 55 PM"
src="https://github.com/firezone/firezone/assets/167144/0994b12b-5d8d-48a6-bc8d-c9ba07d2403c">

<img width="1469" alt="Screenshot 2023-06-26 at 4 49 01 PM"
src="https://github.com/firezone/firezone/assets/167144/1d69a54d-2740-4ab0-819b-75a50a976285">
<img width="1616" alt="Screenshot 2023-06-29 at 12 29 26 AM"
src="https://github.com/firezone/firezone/assets/167144/94a8913f-93be-4502-b30e-c70f147dbe62">

<img width="1616" alt="Screenshot 2023-06-29 at 12 29 14 AM"
src="https://github.com/firezone/firezone/assets/167144/16dfc709-65b9-44fd-adad-c412dc1d44e6">

<img width="1616" alt="Screenshot 2023-06-29 at 2 36 43 PM"
src="https://github.com/firezone/firezone/assets/167144/3cddc4b3-7494-4710-953e-4d60108b9aa8">
<img width="1616" alt="Screenshot 2023-06-29 at 2 36 56 PM"
src="https://github.com/firezone/firezone/assets/167144/1f433239-1023-471d-916c-76c43f47835e">
<img width="1616" alt="Screenshot 2023-06-29 at 2 37 05 PM"
src="https://github.com/firezone/firezone/assets/167144/9cd4be23-02eb-4adf-902b-00c02cecd744">

Add android client to the repo (#1738)

- Add android client to the repo

---------

Signed-off-by: Pratik Velani <pratikvelani@gmail.com>
Co-authored-by: Jamil <jamilbk@users.noreply.github.com>

Bring in apple client into monorepo (#1737)

This PR brings in the apple client into the monorepo.

---------

Co-authored-by: Jamil <jamilbk@users.noreply.github.com>

feat(relay): use structured logging (#1741)

With this patch, the relay exposes a `--json` and `JSON_LOG` env
variable that will activate logs in JSON format the way it is expected
by google cloud:
https://cloud.google.com/logging/docs/structured-logging

In addition, we make use of spans to record contextual information as
first-class variables that are available in the context of every
message. An example output here is:

```
{"time":"2023-07-06T19:54:42.643694430Z","target":"relay","logging.googleapis.com/sourceLocation":{"file":"relay/src/main.rs","line":"156"},"severity":"INFO","message":"Seeding RNG from '0'"}
{"time":"2023-07-06T19:54:42.644408014Z","target":"relay","logging.googleapis.com/sourceLocation":{"file":"relay/src/main.rs","line":"130"},"severity":"INFO","message":"Listening for incoming traffic on UDP port 3478"}
{"time":"2023-07-06T19:54:42.843247996Z","target":"relay","logging.googleapis.com/sourceLocation":{"file":"relay/src/server.rs","line":"417"},"span":{"lifetime":"600","name":"allocate"},"spans":[{"sender":"127.0.0.1:46406","transaction_id":"0531a911a24d1e5297b94cb2","name":"client"},{"lifetime":"600","name":"allocate"}],"severity":"INFO","ip4RelayAddress":"127.0.0.1:65460","message":"Created new allocation"}
{"time":"2023-07-06T19:54:42.851623041Z","target":"relay","logging.googleapis.com/sourceLocation":{"file":"relay/src/server.rs","line":"569"},"span":{"allocation":"AID-1","peer_address":"127.0.0.1:42314","requested_channel":"16384","name":"channel_bind"},"spans":[{"sender":"127.0.0.1:46406","transaction_id":"e99e07e482789cdc30bd2b50","name":"client"},{"allocation":"AID-1","peer_address":"127.0.0.1:42314","requested_channel":"16384","name":"channel_bind"}],"severity":"INFO","message":"Successfully bound channel"}
{"time":"2023-07-06T19:54:42.852889208Z","target":"relay","logging.googleapis.com/sourceLocation":{"file":"relay/src/server.rs","line":"288"},"span":{"allocation_id":"AID-1","channel":16384,"recipient":"127.0.0.1:46406","sender":"127.0.0.1:42314","name":"peer"},"spans":[{"allocation_id":"AID-1","channel":16384,"recipient":"127.0.0.1:46406","sender":"127.0.0.1:42314","name":"peer"}],"severity":"DEBUG","message":"Relaying 32 bytes"}
{"time":"2023-07-06T19:54:42.854625857Z","target":"relay","logging.googleapis.com/sourceLocation":{"file":"relay/src/server.rs","line":"619"},"span":{"channel":"16384","recipient":"127.0.0.1:42314","name":"channel_data"},"spans":[{"sender":"127.0.0.1:46406","name":"client"},{"channel":"16384","recipient":"127.0.0.1:42314","name":"channel_data"}],"severity":"DEBUG","message":"Relaying 32 bytes"}
```

For some reason, the current `span` is always duplicated but I don't
think that is a big issue. When run using the regular log formatter, it
looks like this:

```
2023-07-06T20:02:33.939273Z  INFO relay: Seeding RNG from '0'
2023-07-06T20:02:33.940153Z  INFO relay: Listening for incoming traffic on UDP port 3478
2023-07-06T20:02:34.135801Z  INFO client{sender=127.0.0.1:33919 transaction_id="7092a2363377709cd18b9d98"}:allocate{lifetime=600}: relay: Created new allocation ip4_relay_address=127.0.0.1:65460
2023-07-06T20:02:34.144833Z  INFO client{sender=127.0.0.1:33919 transaction_id="4e1a18e58953242c92a075a3"}:channel_bind{requested_channel=16384 peer_address=127.0.0.1:47859 allocation="AID-1"}: relay: Successfully bound channel
2023-07-06T20:02:34.145501Z DEBUG peer{sender=127.0.0.1:47859 allocation_id=AID-1 recipient=127.0.0.1:33919 channel=16384}: relay: Relaying 32 bytes
2023-07-06T20:02:34.146863Z DEBUG client{sender=127.0.0.1:33919}:channel_data{channel=16384 recipient=127.0.0.1:47859}: relay: Relaying 32 bytes
```

This provides lots of contextual information in a DRY and easily
parse-able way.

---------

Co-authored-by: Jamil <jamilbk@users.noreply.github.com>

Pass all required checks that weren't triggered in the PR (#1748)

Fixes #1747
Fixes #1746

Pass-checks workflow per subdir (#1749)

Fix cache for Docker buildx (#1750)

~~This is an attempt to fix the CI bug
[here](https://github.com/firezone/firezone/actions/runs/5491388141/jobs/10007864417#step:4:1638)
possibly introduced in
[d9eb2d1](d9eb2d18#diff-88bd94db0d5cfd5f0617b7c4ed48c0212597378ed7e28714c5d86c95999b4c7dR29)
and uncovered / exacerbated in Elixir 1.15~~

Edit: looks like this ended up being a couple cache issues with GitHub
actions:
1. The `elixir_api-container-build` cache would always overwrite the
`elixir_web-container-build` on subsequent builds of the same
`github.ref_name` (cache is scoped to branch name by default), leading
to the consistent error `Elixir.Web.Mailer.NoopAdapter does not exist`
whenever a branch was pushed to more than once.
2. The same thing happens with the `integration_test-basic-flow` job
because the `api` service gets built after the `web` service in
docker-compose.yml, overwriting its cache

For some reason it seems the `APPLICATION_NAME` ARG is not busting the
Docker cache properly on GitHub actions for elixir container builds, so
the fix here was to [use
`scope=`](https://docs.docker.com/build/cache/backends/gha/#scope) to
segregate the cache layers between builds of the same branch.

Move NoopAdapter to Domain app (#1756)

Workaround for this:

elixir-lang/elixir#12777

Feat/expire peers (#1739)

This PR takes care of expiring connections with peer from the gateway
side.

---------

Co-authored-by: Jamil <jamilbk@users.noreply.github.com>

fix(relay): reuse `delete_allocation` function (#1743)

Previously, we would access the state around allocations from different
places. This actually led to a minor memory leak where we wouldn't clean
up the `allocations_by_port` table. We refactor the code slightly to
avoid this.

---------

Co-authored-by: Jamil <jamilbk@users.noreply.github.com>

connlib: Use latest `swift-bridge` release (#1753)

A new version of `swift-bridge` released today, so we don't need it to
be a git dependency anymore.

headless & gateway: impl callbacks (#1757)

After rebasing over this #1744 CI should pass

connlib: Hook up callbacks (#1744)

Co-authored-by: Jamil <jamilbk@users.noreply.github.com>

Add slack notification for failed deployments

Fix flaky test

Fix health checks path
jamilbk pushed a commit to firezone/firezone that referenced this issue Jul 12, 2023
Make all tests pass

I removed some of VPN/Wall settings (they are irrelevant once we move out gateway) along with port-based rules conditions (since we are moving to userspace wg).

Make sure that container can be built and run in PR CI step

Remove omnibus install scripts

Bring ecto.* helpers back to life

Fix priv/repo path

Add skeleton of API app

Add client, gateway, relay boilerplate code

Drop REST API boilerplate for now

Add primitive tests and more structure for API app

Control channels for Clients, Relays and Gateways (#1551)

Replace web app with a new one based on Tailwind and esbuild (#1568)

Re-enable SQL sandboxing for Phoenix apps

Bring back browser/config.xml

Remove unused import

Remove unused docker-compose file

Add minimal scaffholding for relay

Install necessary components for toolchain

Avoid concurrent jobs

Move everything to a workspace

Move gitignore and lockfile to workspace root

Move rust-toolchain to workspace root

Add caching to CI

Update .github/workflows/rust.yml

Signed-off-by: Thomas Eizinger <thomas@eizinger.io>

Implement basic STUN server (#1603)

This is an alternative to #1602
that implements the server using a library I've found called
`stun_codec`.

It already has support for parsing a variety of attributes.

The following is a nice website to test some of the functionality:
https://icetest.info/

The server is still listening on:
`ec2-3-89-112-240.compute-1.amazonaws.com:3478`.

Install Rust before computing cache keys (#1606)

Enforce no warnings in docs (#1605)

relay: Parse and respond to allocation requests (#1604)

With this patch, the relay can parse and respond to allocation requests. I
ran some basics tests against https://icetest.info/ and implemented a
regression test as a result of the logged data.

In writing this, I also had to slightly change the design of `Server`
(as expected). Event handlers for incoming data now do not return a
message directly. Instead, the caller is responsible to drain `Command`s
from it.

When creating an allocation, we need to start listening on a new port.
This needs to happen outside the `Server` as I am going for a sans-IO
style. We emit a `Command` that instructs the main event loop to listen
on a new port. Any incoming data on that port will be forwarded to the
`Server`.

At the moment, this incoming data is just dropped. This is actually
standards-compliant because we cannot handle binding requests yet which
would allow this data to be forwarded to the client.

In some areas, the code is still a bit rough but I expect to iron those
things out as we go along.

relay: add basic README (#1611)

relay: refresh allocations (#1610)

relay: don't repeat magic numbers througout the code (#1612)

A small refactoring to keep magic numbers only in one place.

relay: remember allocations by port (#1613)

Instead of remembering the used ports separately, we store a reference
to each allocation by port.

ci: remove broken workflows (#1614)

These workflows are all red which is expected as far as I understand.
I'd suggest we remove them to reduce the noise when reviewing PRs.

In case we ever wanted to bring parts of it back, Git is our best
friend.

Feel free to close if you think differently.

Update workflows for cloud chaos (#1615)

Updating workflows to skip on PR and run on merges to `cloud`.

IAM context (#1577)

Things I've left for later to IAM:
1. Subject session expiration (to prevent session extension attacks);
2. UserPass adapter;
3. Token adapter and removal of APITokens in favor of `api_client` actor
with a Token provider;
4. Cleanup of Configurations schema and table
5. SCIM
6. Groups and Actor Profile (name, email) Sync
7. Email delivery once Web app is done with the templates
8. We might also want to persist sessions to database, to then show list
of active sessions to the user and allow to terminate some of them from
UI
9. SAML?
10. Rename `unprivileged` role name to `end_user`
11. Add `first_` and `last_name`, and sync/edit blocking logic around
it.
12. Rename Clients to Devices?

Fix PR-labeler config (#1623)

Fix PR labeler config 🤞

fix(relay): use correct variable (#1617)

We had a semantic conflict here that resulted in a broken build. This PR
fixes that.

Co-authored-by: Jamil <jamilbk@users.noreply.github.com>

1.0 views (part 1) (#1599)

- [x] Users
- [x] Groups
- [x] Devices
- [x] Gateways

relay: create channel bindings and relay data (#1618)

Here is a short demo:

[Relay](https://github.com/firezone/firezone/assets/5486389/c0199294-70ca-47b4-90ae-2c96428bdb56)

You can run this locally using the `./run_smoke_test.sh` shell-script.
It is not reliable enough yet to be used in CI but I used one if its
outputs to make a regression test.

---------

Co-authored-by: Jamil <jamilbk@users.noreply.github.com>

Implementing channels logic (#1619)

Fix minor bugs and tidy up existing work on new views (#1628)

Just fixing some bugs and inconsistencies I found while going through
the new views.

Fix some of TODOs left from IAM PR (#1627)

Move elixir code to a subfolder (#1631)

refactor(relay): introduce type-safe `Server` APIs (#1630)

We introduce dedicated types for each message that the `Server` can
handle. This allows us to make the functions public because the
type-system now guarantees that those are either parsed from bytes or
constructed with the correct data.

The latter will be useful to write tests against a richer API.

Deployment for the cloud version (#1638)

TODO:
- [x] Cluster formation for all API and web nodes
- [x] Injest Docker logs to Stackdriver
- [x] Fix assets building for prod

To finish later:
- [ ] Structured logging:
https://issuetracker.google.com/issues/285950891
- [ ] Better networking policy (eg. use public postmark ranges and deny
all unwanted egress)
- [ ] OpenTelemetry collector for Google Stackdriver
- [ ] LoggerJSON.Plug integration

---------

Signed-off-by: Andrew Dryga <andrew@dryga.com>
Co-authored-by: Jamil <jamilbk@users.noreply.github.com>

Set correct outbound email in local env

Try to fix CI step

relay: implement authentication (#1641)

Remove Elixir checks from pre-commit hook and rename CI step that runs it

Always run Elixir CI checks when code in main branch changed

Fix typos

Run pre-commit CI step on all PRs

Add newlines in the end of files

Add resource type and expose it in WS API along with name (#1649)

Additionally:
1. Fixed ipv6 formatting for stun/turn addresses
2. Fixed a tests that check for race conditions concurrently

Normalize CIDR resource addresses

Remove outdated TODO

feat(rust): bump to new stable release 1.70.0 (#1648)

Continuous delivery to staging (#1655)

Add terraform code owners

Lave a note on workflow_run feature and fix checkout feature

Experiment with condition

Workflow is not picked up by GitHub for some reason

Try a different CI setup

Add missing on_workflow call

Remove copy-pasted required inputs

Fix races for concurrency control

Inherit secrets to child workflows

Fix path to versions file

Rename pre-commit step

Bump checkout action vsn in rust workflow

Try pushing update using GH API

Fix github branch name

Do not attempt to persist tag versions back to the repo

Add missing env for terraform workflow

Try to wrap tf vars in backticks

Add double quotes to the var itself

Fix assets pipeline, add Elixir deps audit, add Android applink manifest (#1659)

feat(relay): implement nonces for authentication (#1654)

To complete the authentication scheme for the relay, we need to prompt
the client with a nonce when they send an unauthenticated request. The
semantic meaning of a nonce is opaque to the client. As a starting
point, we implement a count-based scheme. Each nonce is valid for 10
requests. After that, a request will be rejected with a 401 and the
client has to authenticate with a new nonce.

This scheme provides a basic form of replay-protection.

feat(relay): provide a commandline interface using clap (#1658)

This saves us several lines of code and allows usage of the relay via
commandline arguments in addition to env variables. Note that because of
`#[arg(env)]`, all of these can still be configured via environment
variables too.

feat(relay): add Dockerfile (#1661)

This adds a basic Dockerfile for the relay so users and devs can easily
start it.

fix(relay): treat `stamp_secret` as string (#1660)

Previously, the relay would treat the `stamp_secret` internally as bytes and share it with the outside world as hex-string. The portal however treats it as an opaque string and uses the UTF-8 bytes to create username and password.

This patch aligns the relay's functionality with the portal and stores the `stamp_secret` internally as a string.

ci: specify workspace directory for cache action correctly (#1663)

ci: install musl target via `rust-toolchain.toml` file (#1664)

Targets specified in the `rust-toolchain.toml` file are automatically installed by `rustup`. This avoid setup steps for other devs and also simplifies the CI setup.

To be able to compile native code to musl, we do need `musl-gcc` which comes with the `musl-tools` package on ubuntu.

feat(relay): connect to portal on startup (#1643)

With this PR, the relay can be configured with a WebSocket URL on startup. If given, it will attempt to connect to it and join the `relay` room with its `stamp_secret`. Once the `init` message is received, regular relay operation will begin.

jamilbk%feat/stub website in cloud (#1675)

* Remove `www/`
* Stub empty `website/` to silence Vercel. This shouldn't cause
conflicts when we merge `cloud` to `master`. Perhaps we want to start
working off `master` soon, and move the current tip of master to
`legacy`?

Use pnpm over yarn (#1678)

Did some research when picking a package manager for the website and
settled on `pnpm` for the following reasons:

- CLI-compatible with `npm`
- Typically faster than even `yarn` especially on Apple silicon
- Security: Pnpm uses a different dependency resolution algorithm and
different folder structure of node_modules that prevents illegal access
to packages by other packages.

I think I caught all the places, but I may be missing something, so if
this isn't a good idea we can revert back.

This PR also cleans up the actions workflows to remove dead code.

Use pnpm for asset setup too (#1681)

Add pnpm to runners (#1683)

Found another place where pnpm needs to be added.

Hotifx seeds and references (#1689)

connlib: moves it to the main firezone library

 This brindgs connlib from its own separated repo to firezone's monorepo.

 On top of bringing connlib we also add and unify the Dockerfile for all
 rust binaries and add a docker-compose that can run a headless client, a
 relay and a gateway which eventually will test the whole flow between a
 client and a resource. For this to work we also incorporated some elixir
 scripts to generate portal tokens for those components.

Do not expire encoded Gateway/Relay tokens

Fix API error rendering

Render error when public key is reused

Fix stub module name

Remove outdated env files

rust: fix dockerfile for building multiple images in parallel (#1699)

When using `docker compose build` or any other way of building docker
images in parallel the way the cache was working with the rust's
Dockerfile made the caches between images overlap and corrupt each
other. We add a `locked` which prevents multiple writers to the same
cache to fix this behaviour.

Return changeset on name suffix constraint error

docker: fix building for macos (#1700)

There are problems building the docker images in macos using musl due to
ring's problems therefore we started using slim-debian with glibc for
development.

Authentication for the live app (#1674)

Co-authored-by: Jamil <jamilbk@users.noreply.github.com>

portal: Policies CRUD views (#1692)

@AndrewDryga ~~Was still hitting some redirect issues so I'll wait for
those to be resolved before continuing on building more views.~~ Edit:
After some sleep and coffee, I figured it out. Nice work on the sign in
form!

I went ahead and scoped existing dashboard links with `@account` and
fixed a dark mode issue -- you may want to cherry-pick those commits.
I'll add these to authenticated routes and integrate into what you have
so far.

As I was going through last night exploring your route approach I
thought of some edge cases; can discuss next week. I think the main one
that came to mind was that we probably want to differentiate between
login flows initiated directly in the browser (this is an admin logging
into the dashboard) vs login flows initiated from a client app (these
will terminate with a final redirect to respective `dest` whitelisted
URL). Maybe it makes sense to segregate these flows?

If a regular user tries login directly from the browser maybe we want to
show them something like "Please login from your Firezone application
instead" as they should only be able to initiate logins from a client
application. Or maybe there's simply no possibility to end up at the
final Android App Link or `firezone://` URI with a login initiated
directly from the browser?

portal: Status indicator badge (#1703)

Did some research on status page providers to manage incidents.
statuspage.io seems to be easy to use and cost-effective, fairly popular
and provides a good amount of flexibility to customize emails,
notifications, etc.

Super easy to set up and use but am not married to it if anyone feels
strongly about using another incident management service.

https://firezone.statuspage.io

<img width="235" alt="Screenshot 2023-06-27 at 8 07 29 AM"
src="https://github.com/firezone/firezone/assets/167144/8ad12b9b-7345-4a5d-bf43-c8af798d85f9">

Fix compilation warnings that are not fixed in merged PRs

Do not render ipv6 relay address if it's nil

CONTRIBUTING.md updates (#1704)

**Update CONTRIBUTING.md**

Why:

* The CONTRIBUTING.md doc seems to have fallen slightly out of date with
      how Firezone now works.  This commit updates the doc to provide a
quick start guide for getting all of the various Firezone components
up and running as quick as possible. The doc then links to the more
      specific `Elixir` and `Rust` README.md files in the respective
      directories to help developers who would like to contribute.

**Update docker-compose vault health check**

 Why:

* The current Vault health check listed in the docker-compose file does
not seem to be working when using `localhost` in the `wget` command.
      Updating the URL to use `127.0.0.1` seems to have fixed it.

---------

Signed-off-by: bmanifold <bmanifold@users.noreply.github.com>
Co-authored-by: Jamil <jamilbk@users.noreply.github.com>

Fix formatting issue

My editor failed here due to a bug: elixir-lsp/vscode-elixir-ls#345

connlib: Improve FFI bridges for Apple and Android (#1691)

This makes it possible to build the Apple/Android FFI bridges and
integrate them with their respective client apps.

---------

Signed-off-by: Francesca Lovebloom <franlovebloom@gmail.com>
Co-authored-by: Roopesh Chander <roop@roopc.net>

Fix/docker compose up (#1705)

This PR fixes `docker compose up` but it doesn't have the test client ->
resource flow working but it prevent anything from erroring at startup.

This fixes:
* tokens (use the correct token for the client user agent we are using)
* randomize `name_suffix` at start up for connlib (we will eventually
allow options to set it manually)
* remove port ranges for relay (see firezone/corp#613)

fix(relay): ensure smoke test script fails on error (#1711)

Due to a silly bash mistake (I hate bash), the error from the gateway
binary wasn't actually propagated to the script. Thus, we did not notice
that it was been broken for a while.

Attempting to fix it turned up that we were double-hexing the relay
secret and using invalid passwords for the clients.

fix(connlib): format with `cargo fmt` (#1709)

Runs `cargo fmt` on the entire `rust/` directory. This somehow doesn't
seem to be enforced, I think that is because we changed the previous CI
to now only run for the `relay` crate.

I'd like to merge this first to avoid the diff and in a 2nd PR, we can
work on unifying CI again.

fix(relay): remove smoke test CI script (#1717)

Unfortunately, this doesn't seem to be stable. I don't really understand
why. Judging from the logs, the problem is not in the relay but somehow
the final UDP packet doesn't arrive at the `gateway` binary.

To not unnecessarily block other PRs, I am removing the check for now.

Add more websocat examples for connecting to a resource

Wait for client and gateway containers for api to become ready

Add docs section to see if everything is connected to the panel

Explicitly subscribe to id channels

Looks like for some reason the id/1 callback doesn't subscribe the channel process any more (only the socket itself), so we are doing that explicitly now.

Stub out client app directories in monorepo structure (#1716)

Stubs out the client app dirs and basic CI workflow for the client apps
in preparation to move them into this repository.

After this is merged @roop @pratikvelani you should be able to add the
client repos here.

chore: unify and optimize Rust CI (#1710)

- Instead of having two, very similar jobs, we run our fmt, clippy and
tests steps across all crates and operating systems.
- We remove the dependency of the android and apple builds on the tests
and thus get faster feedback.
- We force clippy to fail on any warning. This one is super important
IMO. Warnings in Rust are very useful and ignoring them can lead to bugs
(think "unused Result" etc).

Resolves #1714.

---------

Signed-off-by: Thomas Eizinger <thomas@eizinger.io>
Co-authored-by: Francesca Lovebloom <franlovebloom@gmail.com>

connlib: Connection mock (#1721)

Resolves firezone/corp#607

Setting the env var `CONNLIB_MOCK` when building through either
`build-rust.sh` or `gradle` will activate the `mock` feature.

Attempt to enable merge queue (#1713)

https://docs.github.com/en/actions/using-workflows/events-that-trigger-workflows#merge_group

Feat/connlib full flow (#1722)

With this PR the full control-plane message flow is working.

Meaning that if you do:

```
docker compose up -d
docker compose exec -it client "ping 172.20.0.2" # will fix this IP later
```

Messages start flowing to gateway. The gateway still not correctly
forwards the messages to the resource since masquerading is still not
working, although I suspect there might be an additional problem. Will
fix this in my next PR along with a README on how to test this whole
flow.

This PR also fixes how we sent the stamp secret to the gateway from the
relay, but I still see some warnings in the webrtc that I'm sure that
are due to a mismatch between how webrtc-rs and the relay handle
messages (The most important being `bind() failed: unexpected response
type`), I will take a look at that and a way to test that the flow works
when:
1. hole-punching is available
2. through relay when it's not
Since the flow right now works without hole-punching or relay since the
gateway is in the same network in the docker compose.

Bump Elixir/OTP versions (#1730)

Bump versions in Dockerfile

Fix flaky tests

docs(relay): bring README.md up to date (#1718)

Drop invalid cache restore keys

Fix ubuntu 20.04 CI (#1734)

add a prefix key with host os to rust test job to prevent caching issues

CI: add a flow that test client to resource ping (#1729)

This PR fixes a bunch of small things to allow a new flow to test
clients pinging a resource within docker compose.

Masquerade/Forwarding is enabled directly in the container for now, this
might change in the future.

Also added a README to be able to run this locally.

---------

Signed-off-by: Gabi <gabrielalejandro7@gmail.com>
Co-authored-by: Jamil <jamilbk@users.noreply.github.com>

feat(relay): default portal URL (#1719)

Instead of having portal URL and token optional, we default the portal
URL and decide based on the presence of the token, whether we should
connect to the portal on startup. This allows the relay to be
used/tested standalone and keeps the number of config options and error
cases small.

We require the user to config the full path of the websocket and thus
avoid the need for duplicating the connlib function. Given that most
users will never need to override this option, this seems like a good
trade-off.

Resolves firezone/corp#614.

Feat/connlib handle error messages (#1735)

With this PR we handle in the client an error message due to
gateway/relay although rate limiting is needed.

Waiting for #1729 to be merged.

portal: Stub out Settings views (#1702)

Adds Setting UI views based on the Balsamiq Wireframes. This should be
merged **after** #1679
<img width="1469" alt="Screenshot 2023-06-26 at 4 48 55 PM"
src="https://github.com/firezone/firezone/assets/167144/0994b12b-5d8d-48a6-bc8d-c9ba07d2403c">

<img width="1469" alt="Screenshot 2023-06-26 at 4 49 01 PM"
src="https://github.com/firezone/firezone/assets/167144/1d69a54d-2740-4ab0-819b-75a50a976285">
<img width="1616" alt="Screenshot 2023-06-29 at 12 29 26 AM"
src="https://github.com/firezone/firezone/assets/167144/94a8913f-93be-4502-b30e-c70f147dbe62">

<img width="1616" alt="Screenshot 2023-06-29 at 12 29 14 AM"
src="https://github.com/firezone/firezone/assets/167144/16dfc709-65b9-44fd-adad-c412dc1d44e6">

<img width="1616" alt="Screenshot 2023-06-29 at 2 36 43 PM"
src="https://github.com/firezone/firezone/assets/167144/3cddc4b3-7494-4710-953e-4d60108b9aa8">
<img width="1616" alt="Screenshot 2023-06-29 at 2 36 56 PM"
src="https://github.com/firezone/firezone/assets/167144/1f433239-1023-471d-916c-76c43f47835e">
<img width="1616" alt="Screenshot 2023-06-29 at 2 37 05 PM"
src="https://github.com/firezone/firezone/assets/167144/9cd4be23-02eb-4adf-902b-00c02cecd744">

Add android client to the repo (#1738)

- Add android client to the repo

---------

Signed-off-by: Pratik Velani <pratikvelani@gmail.com>
Co-authored-by: Jamil <jamilbk@users.noreply.github.com>

Bring in apple client into monorepo (#1737)

This PR brings in the apple client into the monorepo.

---------

Co-authored-by: Jamil <jamilbk@users.noreply.github.com>

feat(relay): use structured logging (#1741)

With this patch, the relay exposes a `--json` and `JSON_LOG` env
variable that will activate logs in JSON format the way it is expected
by google cloud:
https://cloud.google.com/logging/docs/structured-logging

In addition, we make use of spans to record contextual information as
first-class variables that are available in the context of every
message. An example output here is:

```
{"time":"2023-07-06T19:54:42.643694430Z","target":"relay","logging.googleapis.com/sourceLocation":{"file":"relay/src/main.rs","line":"156"},"severity":"INFO","message":"Seeding RNG from '0'"}
{"time":"2023-07-06T19:54:42.644408014Z","target":"relay","logging.googleapis.com/sourceLocation":{"file":"relay/src/main.rs","line":"130"},"severity":"INFO","message":"Listening for incoming traffic on UDP port 3478"}
{"time":"2023-07-06T19:54:42.843247996Z","target":"relay","logging.googleapis.com/sourceLocation":{"file":"relay/src/server.rs","line":"417"},"span":{"lifetime":"600","name":"allocate"},"spans":[{"sender":"127.0.0.1:46406","transaction_id":"0531a911a24d1e5297b94cb2","name":"client"},{"lifetime":"600","name":"allocate"}],"severity":"INFO","ip4RelayAddress":"127.0.0.1:65460","message":"Created new allocation"}
{"time":"2023-07-06T19:54:42.851623041Z","target":"relay","logging.googleapis.com/sourceLocation":{"file":"relay/src/server.rs","line":"569"},"span":{"allocation":"AID-1","peer_address":"127.0.0.1:42314","requested_channel":"16384","name":"channel_bind"},"spans":[{"sender":"127.0.0.1:46406","transaction_id":"e99e07e482789cdc30bd2b50","name":"client"},{"allocation":"AID-1","peer_address":"127.0.0.1:42314","requested_channel":"16384","name":"channel_bind"}],"severity":"INFO","message":"Successfully bound channel"}
{"time":"2023-07-06T19:54:42.852889208Z","target":"relay","logging.googleapis.com/sourceLocation":{"file":"relay/src/server.rs","line":"288"},"span":{"allocation_id":"AID-1","channel":16384,"recipient":"127.0.0.1:46406","sender":"127.0.0.1:42314","name":"peer"},"spans":[{"allocation_id":"AID-1","channel":16384,"recipient":"127.0.0.1:46406","sender":"127.0.0.1:42314","name":"peer"}],"severity":"DEBUG","message":"Relaying 32 bytes"}
{"time":"2023-07-06T19:54:42.854625857Z","target":"relay","logging.googleapis.com/sourceLocation":{"file":"relay/src/server.rs","line":"619"},"span":{"channel":"16384","recipient":"127.0.0.1:42314","name":"channel_data"},"spans":[{"sender":"127.0.0.1:46406","name":"client"},{"channel":"16384","recipient":"127.0.0.1:42314","name":"channel_data"}],"severity":"DEBUG","message":"Relaying 32 bytes"}
```

For some reason, the current `span` is always duplicated but I don't
think that is a big issue. When run using the regular log formatter, it
looks like this:

```
2023-07-06T20:02:33.939273Z  INFO relay: Seeding RNG from '0'
2023-07-06T20:02:33.940153Z  INFO relay: Listening for incoming traffic on UDP port 3478
2023-07-06T20:02:34.135801Z  INFO client{sender=127.0.0.1:33919 transaction_id="7092a2363377709cd18b9d98"}:allocate{lifetime=600}: relay: Created new allocation ip4_relay_address=127.0.0.1:65460
2023-07-06T20:02:34.144833Z  INFO client{sender=127.0.0.1:33919 transaction_id="4e1a18e58953242c92a075a3"}:channel_bind{requested_channel=16384 peer_address=127.0.0.1:47859 allocation="AID-1"}: relay: Successfully bound channel
2023-07-06T20:02:34.145501Z DEBUG peer{sender=127.0.0.1:47859 allocation_id=AID-1 recipient=127.0.0.1:33919 channel=16384}: relay: Relaying 32 bytes
2023-07-06T20:02:34.146863Z DEBUG client{sender=127.0.0.1:33919}:channel_data{channel=16384 recipient=127.0.0.1:47859}: relay: Relaying 32 bytes
```

This provides lots of contextual information in a DRY and easily
parse-able way.

---------

Co-authored-by: Jamil <jamilbk@users.noreply.github.com>

Pass all required checks that weren't triggered in the PR (#1748)

Fixes #1747
Fixes #1746

Pass-checks workflow per subdir (#1749)

Fix cache for Docker buildx (#1750)

~~This is an attempt to fix the CI bug
[here](https://github.com/firezone/firezone/actions/runs/5491388141/jobs/10007864417#step:4:1638)
possibly introduced in
[d9eb2d1](d9eb2d18#diff-88bd94db0d5cfd5f0617b7c4ed48c0212597378ed7e28714c5d86c95999b4c7dR29)
and uncovered / exacerbated in Elixir 1.15~~

Edit: looks like this ended up being a couple cache issues with GitHub
actions:
1. The `elixir_api-container-build` cache would always overwrite the
`elixir_web-container-build` on subsequent builds of the same
`github.ref_name` (cache is scoped to branch name by default), leading
to the consistent error `Elixir.Web.Mailer.NoopAdapter does not exist`
whenever a branch was pushed to more than once.
2. The same thing happens with the `integration_test-basic-flow` job
because the `api` service gets built after the `web` service in
docker-compose.yml, overwriting its cache

For some reason it seems the `APPLICATION_NAME` ARG is not busting the
Docker cache properly on GitHub actions for elixir container builds, so
the fix here was to [use
`scope=`](https://docs.docker.com/build/cache/backends/gha/#scope) to
segregate the cache layers between builds of the same branch.

Move NoopAdapter to Domain app (#1756)

Workaround for this:

elixir-lang/elixir#12777

Feat/expire peers (#1739)

This PR takes care of expiring connections with peer from the gateway
side.

---------

Co-authored-by: Jamil <jamilbk@users.noreply.github.com>

fix(relay): reuse `delete_allocation` function (#1743)

Previously, we would access the state around allocations from different
places. This actually led to a minor memory leak where we wouldn't clean
up the `allocations_by_port` table. We refactor the code slightly to
avoid this.

---------

Co-authored-by: Jamil <jamilbk@users.noreply.github.com>

connlib: Use latest `swift-bridge` release (#1753)

A new version of `swift-bridge` released today, so we don't need it to
be a git dependency anymore.

headless & gateway: impl callbacks (#1757)

After rebasing over this #1744 CI should pass

connlib: Hook up callbacks (#1744)

Co-authored-by: Jamil <jamilbk@users.noreply.github.com>

Add slack notification for failed deployments

Fix flaky test

Fix health checks path
@lukaszsamson
Copy link
Collaborator

The original issue is fixed an now on restart the server will call workspace/configuration to get configuration. There is still a bug in VSCode - after restart the server no longer receives notifications about configuration changes #368

@lukaszsamson
Copy link
Collaborator

BTW this is a recent breaking change in VSCode and it affects other language servers (e.g. haskell/vscode-haskell#920)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants