Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable server use as a library #11

Merged
merged 1 commit into from
Mar 27, 2022

Conversation

joonas-fi
Copy link
Contributor

@joonas-fi joonas-fi commented Mar 1, 2022

Meta

Issue: #12

Rebase of older PR: #3 (opened new PR because I wanted to use a branch in my fork instead of main, and I don't think you can change branch in an existing PR)

Example user application

This works:

package main

import (
	"context"
	"log"
	"net"
	"os/signal"
	"syscall"

	"github.com/gokrazy/rsync/rsyncd"
)

func main() {
	ctx, cancel := signal.NotifyContext(context.Background(), syscall.SIGINT, syscall.SIGTERM)
	defer cancel()

	if err := logic(ctx); err != nil {
		log.Fatal(err)
	}
}

func logic(ctx context.Context) error {
	listener, err := net.Listen("tcp", "0.0.0.0:666")
	if err != nil {
		return err
	}

	rsyncServer, err := rsyncd.NewServer(rsyncd.Module{
		Name: "default",
		Path: "/tmp/rsyncd",
	})
	if err != nil {
		return err
	}

	if err := rsyncServer.Serve(ctx, listener); err != nil {
		return err
	}

	log.Println("gracefully exiting")

	return nil
}

Notes on this PR

  • The PR might be easiest to be reviewed by just reading one commit at a time
  • I changed a few log messages to be more user friendly, but those commits can be easily removed if you don't agree with them
  • Tests pass, at least on my machine 😆
  • I did a blanket internal -> pkg, which might not be a good idea (I wanted an easy change to get myself started). But the "user code" needs at least config + rsync packages. I can add a commit that makes not-necessary-to-be-public packages back to internal.
  • The most questionable change I did was change the module map concept to a list. It brought quite a few changes. But I think if the rsyncd internally wants a map, it could take a simple list of modules and internally transform it into a map.
    • Despite this bringing many changes, I think end result is simpler
    • Configuration doesn't have to compute the map from list of modules
  • rsyncd package forcefully logs to stderr. I would like to have Logger *log.Logger field in rsyncd.Server which NewServer() maybe sets to log.Default() by default (preserving the current behaviour) or has to be given explicitly (have not thought of this yet), so user can give discarding logger to make the library quiet. But that could be a new PR, there is quite a bunch of stuff here already.

General notes

With my fresh eyes on this codebase, a few notes:

  • "maincmd" as a concept was confusing to me. I knew there is "client" and "server". I was thinking "which one main refers to?".
  • There is name "receiver" which currently means the "client" portion. But that name might not be long-lived if both the client and the server both gain send/receive capabilities. Would name change to client/server remove these ambiquities? Something like:
    • maincmd = servermaincmd
    • receivermaincmd = clientmaincmd
    • rsyncd could also be renamed to rsyncserver, but that's probably not important, the "d" already communicates the server portion..

@@ -20,21 +20,24 @@ type Module struct {
ACL []string `toml:"acl"`
}

func NewModule(name string, path string, acl []string) Module {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is needed for the client code that uses this as a library.

Since this is somewhat core concept of the serving part, I was left wondering if the module type should belong to root level package in this repo, or actually in rsyncd package, but I don't know. I wrote my question here mainly to flag it for you if you have any thoughts. Here is probably fine too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The generic name of this package brought a problem the very first time I imported into my concrete project. I had to rename the import:

rsyncdconfig "github.com/gokrazy/rsync/pkg/config"

because my package already had declared config identifier. Should the package be renamed to rsyncdconfig, or do you think having the user rename the import is the better way?

cfg.ModuleMap = make(map[string]Module)
for _, m := range cfg.Modules {
cfg.ModuleMap[m.Name] = m
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One nice place of improvement is here: no need to compute the module map. One less place where data can be out of sync (think somebody later changes the Modules fields but does not realize ModuleMap is actually the authoritative one.

Same for ModuleMap key (being the name). If someone changes the Name in the Module structure, they've to remember to change the map key too.

pkg/maincmd/maincmd.go Outdated Show resolved Hide resolved
pkg/maincmd/maincmd.go Outdated Show resolved Hide resolved
@@ -50,7 +50,7 @@ jobs:

- name: run tests (macos)
if: matrix.os == 'macos-latest'
run: go test -v ./internal/... && go test -c && sudo ./rsync.test -test.v
run: go test -v ./pkg/... && go test -c && sudo ./rsync.test -test.v
Copy link
Contributor Author

@joonas-fi joonas-fi Mar 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GitHub actions workflow now pass in my fork

Whether this is an appropriate fix depends on whether the blanket internal -> pkg rename is ok for now

@stapelberg
Copy link
Contributor

Thanks for sending this!

The PR might be easiest to be reviewed by just reading one commit at a time

Actually, it might be easiest to file a new issue and then send small individual PRs for each logical change, with each PR referencing the issue so that we have a good paper trail :)

That way, we can merge things piece by piece instead of having to get everything ready at once.

I changed a few log messages to be more user friendly, but those commits can be easily removed if you don't agree with them

Thanks, the log message changes are fine. These would be good to send as a separate PR, for example.

I did a blanket internal -> pkg, which might not be a good idea (I wanted an easy change to get myself started). But the "user code" needs at least config + rsync packages. I can add a commit that makes not-necessary-to-be-public packages back to internal.

I don’t like having pkg as part of the import path at all, so we’d instead need to move packages from github.com/gokrazy/rsync/internal to github.com/gokrazy/rsync (no pkg subdirectory).

But, as you already wrote, this blanket change is probably going too far. I think we should carefully consider each type we export, even if it’s more work.

The most questionable change I did was change the module map concept to a list. It brought quite a few changes. But I think if the rsyncd internally wants a map, it could take a simple list of modules and internally transform it into a map.
Despite this bringing many changes, I think end result is simpler

Sounds good! Can you send this as a separate PR, too, please?

rsyncd package forcefully logs to stderr. I would like to have Logger *log.Logger field in rsyncd.Server which NewServer() maybe sets to log.Default() by default (preserving the current behaviour) or has to be given explicitly (have not thought of this yet), so user can give discarding logger to make the library quiet. But that could be a new PR, there is quite a bunch of stuff here already.

Sounds good as well!

"maincmd" as a concept was confusing to me. I knew there is "client" and "server". I was thinking "which one main refers to?".
There is name "receiver" which currently means the "client" portion. But that name might not be long-lived if both the client and the server both gain send/receive capabilities. Would name change to client/server remove these ambiquities?

Ah, this is quite a confusing situation when talking about rsync.

The rsync protocol can run either in “sender” or in “receiver” mode.

However, both client and server can fulfill either role! It depends on the flags with which they were started.

For example, you could have a client connect to a server and then act as a sender to upload a backup of your PC (client) to your network storage (server). Or, you could have a client connect to a server and then act as a receiver to download a Linux distribution’s package repository.

I agree that currently, the terminology used in the repository is not entirely clear. But, the desired state is that the terms sender, receiver, client and server are all used, and refer to different, orthogonal concepts (rsync protocol role vs. network connection role).

The maincmd package fulfills the same role as the upstream rsync implementation: one binary which can run in all the different modes.

@joonas-fi
Copy link
Contributor Author

Thanks for the great feedback!

Thanks for explaining the rsync sender/receiver/client/server difficulty. Yeah that was new info to me, I stll need to digest it a bit lol 😄
But yeah maincmd might not be misleading as a name then, just a natural rsync all-in-one concept then, if I understood properly. 🙂

I've extracted different logical changes to own PRs and created an issue to serve as the paper trail for the high-level work.

Still TODO here: internal -> pkg rename.

In my project's I've been following the pkg/ approach. I however appreciate the fact that this is not my project. :)

How should we continue with the packages that need to be public? rsyncd & config have been identified. Would this work:

  • move <root>/internal/rsyncd -> <root>/rsyncd
  • move <root>/internal/config -> <root>/config

?

One thing I'm hesitant about the naming, since config currently seems to mean "rsyncd config", will there ever be "rsync client config"? Or should they all live in the same package, so "config" suffices for them both?

@stapelberg
Copy link
Contributor

Thanks for sending the PRs!

Thanks for explaining the rsync sender/receiver/client/server difficulty. Yeah that was new info to me, I stll need to digest it a bit lol. But yeah maincmd might not be misleading as a name then, just a natural rsync all-in-one concept then, if I understood properly.

You’re welcome! Maybe you could add a package-level comment that explains this while you’re at it? Otherwise, I can add one later.

Still TODO here: internal -> pkg rename.

In my project's I've been following the pkg/ approach. I however appreciate the fact that this is not my project. :)

I don’t have strong opinions on this, but found the goal of keeping your domain types in the top-level attractive. E.g. rsync.SumHead reads very nicely, and the import path github.com/gokrazy/rsync is short and descriptive. When using this approach, it seems natural to me to use cmd for binaries and internal for internal code. Everything else is public.

I found golang-standards/project-layout#10 to be a good read in case you haven’t seen that yet.

How should we continue with the packages that need to be public? rsyncd & config have been identified. Would this work:

  • move <root>/internal/rsyncd -> <root>/rsyncd

Yes

  • move <root>/internal/config -> <root>/config
    ?

One thing I'm hesitant about the naming, since config currently seems to mean "rsyncd config", will there ever be "rsync client config"? Or should they all live in the same package, so "config" suffices for them both?

Yeah, let’s rename it to rsyncdconfig, as that’s what it is. There might well be a different config at a later point, which should probably be similarly named (not config, which is too generic).

@joonas-fi
Copy link
Contributor Author

On pkg/

I found golang-standards/project-layout#10 to be a good read in case you haven’t seen that yet.

I haven't read that yet, thank you! 👍 Many good points for and against. My biggest "cmd/* internal/* pkg/* are good" observation is that it very clearly separates Go code from "other" files, such as documentation, "misc files" (be it project logo etc.) but most importantly, separate subsystems written in other languages. I have repos where I have TypeScript-based frontend for a Go backend, so I very much like keeping the things in separate hierarchies. :) I appreciate that Go-only projects don't need this, but I prefer consistency (Go-only projects don't look different to Go + frontend language projects).

Of course one could have different repositories for these subsystems, but monorepo vs. multi-repo is a different topic and not without obvious caveats: added complexity, atomic commits not possible (or as easy) as in a monorepo etc.. But I'm going off-topic now..

It's good to get different perspective! I need to continue evaluating my stance on pkg/ :)

Open questions

I now moved internal/rsyncd -> rsyncd and internal/config -> rsyncdconfig.

Now looking at it, I'm not happy with it. Thoughts:

  • Should rsyncdconfig actually be under rsyncd/rsyncdconfig, as it's conceptually purely a concept under rsyncd?
  • Also related to that question, another question is if it actually needs to be a separate package. I.e. what if config.go was just part of the rsyncd package itself? If you look at the use example config.NewModule(), the rsyncd.Server is not ever usable without first importing the config package.
  • Also there's now the problem of rsyncdconfig config file having listeners declared, but the listener concepts HTTPMonitoring & AnonSSH are not handled by the package rsyncd, even though the new package name rsyncdconfig implies so.
    • Should we rename rsyncdconfig -> maincmdconfig to signify that that is what it configures?
    • The above idea has a caveat as well: in library use it looks weird to use rsyncd.NewServer() and give it modules from maincmdconfig package
    • Is it just best to keep calling it rsyncdconfig and accept the fact there are concepts the rsyncd package doesn't actually use

internal/maincmd/maincmd.go Outdated Show resolved Hide resolved
@joonas-fi
Copy link
Contributor Author

I clearly have no idea what I'm doing with Git. I got merge conflicts where I didn't think there should be, and had to do an additional merge. I can clean up the history later when this looks ready for merging. Perhaps best to just look at overall changes, not the noisy commits for now..

rsyncdconfig/config.go Outdated Show resolved Hide resolved
internal/maincmd/maincmd.go Outdated Show resolved Hide resolved
internal/maincmd/maincmd.go Outdated Show resolved Hide resolved
@stapelberg
Copy link
Contributor

I haven't read that yet, thank you! +1 Many good points for and against. My biggest "cmd/* internal/* pkg/* are good" observation is that it very clearly separates Go code from "other" files, such as documentation, "misc files" (be it project logo etc.) but most importantly, separate subsystems written in other languages.

For this, I have come to prefer directories whose name starts with an _, e.g. in distri I’m using a _build subdirectory. The Go toolchain entirely ignores these :)

Also related to that question, another question is if it actually needs to be a separate package. I.e. what if config.go was just part of the rsyncd package itself? If you look at the use example config.NewModule(), the rsyncd.Server is not ever usable without first importing the config package.

Yeah, I think making it part of the rsyncd package is a good idea indeed.

I clearly have no idea what I'm doing with Git. I got merge conflicts where I didn't think there should be, and had to do an additional merge. I can clean up the history later when this looks ready for merging. Perhaps best to just look at overall changes, not the noisy commits for now..

Have you tried rebasing your changes? Alternatively just copy the files, throw away history and commit fresh :)

Listeners []Listener `toml:"listener"`
Modules []Module `toml:"module"`
Listeners []Listener `toml:"listener"`
Modules []rsyncd.Module `toml:"module"`
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per our discussion, no need to expose this config package if Module is defined by rsyncd package itself.

Also related: we discussed earlier renaming config -> rsyncdconfig. I reverted the rename and left it out from this PR, since the package is not needed as public.

If you want, I can send another PR renaming internal/config -> internal/rsyncdconfig. But it wasn't necessary here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you have time for the internal/configinternal/rsyncdconfig change as well, I’ll gladly take that as a separate pull request later :)

rsyncd/rsyncd.go Outdated
ACL []string `toml:"acl"`
}

func NewServer(modules ...Module) (*Server, error) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can now fail due to added validation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I commented about this in earlier comment (you wanted to remove NewModule() for extensibility, I think that makes validation more important)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW I'm not super sold on ...Module instead of []Module, as that rules out future option-arguments.

The original rationale was that some rsyncd library callers may have hardcoded modules so it just looks nicer when you don't need a slice.

(Maybe this is non-issue if you don't have issues breaking backwards compat before v1 release.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point. Let’s use []Module here, also because a zero-module server is not generally useful, so having rsyncd.NewServer() fail at compile-time is actually desired.

@joonas-fi
Copy link
Contributor Author

Sorry it took me so long to get back to this. This has taken more time than I had allocated (not your fault!), so I had to find some more free time :)

@joonas-fi
Copy link
Contributor Author

(The PR message's example user code is up-to-date now)

Copy link
Contributor

@stapelberg stapelberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good! Added a few more comments but I think we can merge this soon :)

rsyncd/rsyncd.go Show resolved Hide resolved
rsyncd/rsyncd.go Outdated
return errors.New("module has no name")
}
if mod.Path == "" {
return fmt.Errorf("module '%s' has empty path", mod.Name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use %q instead of '%s' please

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol you just taught me what %q means, never used or knew of it before 😆

rsyncd/rsyncd.go Outdated
ACL []string `toml:"acl"`
}

func NewServer(modules ...Module) (*Server, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point. Let’s use []Module here, also because a zero-module server is not generally useful, so having rsyncd.NewServer() fail at compile-time is actually desired.

Listeners []Listener `toml:"listener"`
Modules []Module `toml:"module"`
Listeners []Listener `toml:"listener"`
Modules []rsyncd.Module `toml:"module"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you have time for the internal/configinternal/rsyncdconfig change as well, I’ll gladly take that as a separate pull request later :)

@joonas-fi
Copy link
Contributor Author

I submitted the fixes!

For some reason I can't respond to all of your comments, so here goes:

If you have time for the internal/config → internal/rsyncdconfig change as well, I’ll gladly take that as a separate pull request later :)

Yeah I will send it after this has been merged!

Fair point. Let’s use []Module here, also because a zero-module server is not generally useful, so having rsyncd.NewServer() fail at compile-time is actually desired.

Sidenote: you can also use func NewServer(module0 Module, modules ...Modules) to enforce "at least 1" at compile time, but that's pretty ugly 😆 And that complicates the case when you simply have a slice, you've to give it as NewServer(modules[0], modules[1:]...)

rsyncd/rsyncd.go Show resolved Hide resolved
@stapelberg stapelberg merged commit c271006 into gokrazy:main Mar 27, 2022
@stapelberg
Copy link
Contributor

Thanks a lot for contributing this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants