Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cli2: add sketch of plugin loading #1810

Merged
merged 5 commits into from May 29, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
12 changes: 12 additions & 0 deletions src/cli2/bundledPlugins.js
@@ -0,0 +1,12 @@
// @flow

import type {CliPlugin} from "./cliPlugin";
import {GithubCliPlugin} from "../plugins/github/cliPlugin";

/**
* Returns an object mapping owner-name pairs to CLI plugin
* declarations; keys are like `sourcecred/github`.
*/
export function bundledPlugins(): {[pluginId: string]: CliPlugin} {
return {"sourcecred/github": new GithubCliPlugin()};
}
17 changes: 17 additions & 0 deletions src/cli2/cliPlugin.js
@@ -0,0 +1,17 @@
// @flow

import type {PluginDeclaration} from "../analysis/pluginDeclaration";
import type {WeightedGraph} from "../core/weightedGraph";
import type {ReferenceDetector} from "../core/references/referenceDetector";

export interface CliPlugin {
declaration(): PluginDeclaration;
load(PluginDirectoryContext): Promise<void>;
graph(PluginDirectoryContext, ReferenceDetector): Promise<WeightedGraph>;
referenceDetector(PluginDirectoryContext): Promise<ReferenceDetector>;
Comment on lines +9 to +11
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's provide a TaskReporter to each of these methods.

}

export interface PluginDirectoryContext {
configDirectory(): string;
cacheDirectory(): string;
}
61 changes: 61 additions & 0 deletions src/cli2/common.js
@@ -0,0 +1,61 @@
// @flow

import {join as pathJoin} from "path";
import fs from "fs-extra";

import type {PluginDirectoryContext} from "./cliPlugin";
import {parse as parseConfig, type InstanceConfig} from "./instanceConfig";

export async function loadInstanceConfig(
baseDir: string
): Promise<InstanceConfig> {
const projectFilePath = pathJoin(baseDir, "sourcecred.json");
const contents = await fs.readFile(projectFilePath);
return Promise.resolve(parseConfig(JSON.parse(contents)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: in an async function Promise.resolve is redundant.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but only because of a JavaScript design flaw that has really
aged like milk with the prevalence of TypeScript and Flow. I prefer to
write the code in the sensible, principled style, even though it’s not
strictly required by the language.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Certainly won't argue with you that Promises have a bunch of design flaws. I've built several client projects with https://github.com/fluture-js/Fluture instead of Promises and there's a lot of great properties there that I'm sorely missing having gone back to Promises.

On the other hand, the language integration definitely yields productivity benefits over going against the unfortunate standard, and reduces learning curve.

In this case return Promise.resolve() is a pattern that makes use of one of those Promise design choices, to silently unpack Promises at every opportunity.

In other words:

Promise.resolve(Promise.resolve(123))
> Promise { <state>: "fulfilled", <value>: 123 }

We get a Promise<number> not a Promise<Promise<number>>.
A surprising and undesirable property if you ask me.

My taste preference here is to avoid that surprise, so I generally encourage others not to return Promise.resolve().

Anyhow, if you feel like there's a benefit to going the other way, curious to hear about it. But for this PR, I'm fine with either :]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer the non-redundant style, although haven't invested the effort to really understand why William avoids it. Since we don't have a lint rule to automatically enforce it, I say it's authors pick on which to use.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That resolve(resolve(123)) is equivalent to resolve(123) is
precisely the design flaw that I’m trying to avoid. If the standard
didn’t have that flaw, then you would have to return resolve(123)
here, so returning 123 is surfacing the flaw, not hiding it. When
I read async function foo() { return 123; }, which shouldn’t compile,
I have to remind myself that it works because of this flaw.

I’m happy to discuss in more detail, and I can provide a specific
example of code that this design flaw makes it impossible to write
cleanly (forcing ugly contortions in the form of boxing) as well as
point to some of the complexity that the flaw forces in real code. I’ll
keep it as is for now, and am fine with it being author’s pick, just as
so many other things are.

}

// Make a directory, if it doesn't exist.
function mkdirx(path: string) {
try {
fs.mkdirSync(path);
} catch (e) {
if (e.code !== "EEXIST") {
throw e;
}
}
}
Comment on lines +17 to +26
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like fs-extra's mkdirp() would be easier to use here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That does something else. We don’t want to make all the transitive
parent directories.


export function makePluginDir(
baseDir: string,
prefix: $ReadOnlyArray<string>,
pluginId: string
): string {
const idParts = pluginId.split("/");
if (idParts.length !== 2) {
throw new Error(`Bad plugin name: ${pluginId}`);
}
const [pluginOwner, pluginName] = idParts;
const pathComponents = [...prefix, pluginOwner, pluginName];
let path = baseDir;
for (const pc of pathComponents) {
path = pathJoin(path, pc);
mkdirx(path);
}
Comment on lines +38 to +43
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like fs-extra's mkdirp() would be easier to use here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above.

return path;
}

export function pluginDirectoryContext(
baseDir: string,
pluginName: string
): PluginDirectoryContext {
const cacheDir = makePluginDir(baseDir, ["cache"], pluginName);
const configDir = makePluginDir(baseDir, ["config"], pluginName);
return {
configDirectory() {
return configDir;
},
cacheDirectory() {
return cacheDir;
},
};
}
55 changes: 55 additions & 0 deletions src/cli2/instanceConfig.js
@@ -0,0 +1,55 @@
// @flow

import {CliPlugin} from "./cliPlugin";
import {bundledPlugins as getAllBundledPlugins} from "./bundledPlugins";

type PluginName = string;

export type InstanceConfig = {|
+bundledPlugins: Map<PluginName, CliPlugin>,
|};

export type RawInstanceConfig = {|
+bundledPlugins: $ReadOnlyArray<BundledPluginSpec>,
|};

// Plugin identifier, like `sourcecred/identity`. Version number is
// implicit from the SourceCred version. This is a stopgap until we have
// a plugin system that admits external, dynamically loaded
// dependencies.
export type BundledPluginSpec = string;

type JsonObject =
| string
| number
| boolean
| null
| JsonObject[]
| {[string]: JsonObject};
Comment on lines +22 to +28
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having read your linked: https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-validate/

This is pretty nice. Previously been irked by the any JSON parse returns. Have tested https://gajus.github.io/flow-runtime for expected server response models and found some issues doing that.

Not sure if I like this library too much, due to fairly opaque babel magic. On the other hand, not having to declare the same type twice (once as a static Flow type, again as a parser) is really handy.

Have you tried any generators? Thoughts on them?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We (@decentralion and I) chatted about this a bit. We chose to write the
correct-style parser here (perhaps because I was driving ;-) ) because
it really does give us much stronger guarantees than the any-downcast,
though we definitely aren’t big fans of the verbosity. I’m not super
inclined to reflect Flow types into parsers, but I’m up for abstracting
this with an attoparsec-like or optics-based approach if we think that
that would be helpful. But later, not now; let’s find out how we
actually use these first.


export function parse(raw: JsonObject): InstanceConfig {
if (raw == null || typeof raw !== "object" || Array.isArray(raw)) {
throw new Error("bad config: " + JSON.stringify(raw));
}
const {bundledPlugins: rawBundledPlugins} = raw;
if (!Array.isArray(rawBundledPlugins)) {
console.warn(JSON.stringify(raw));
throw new Error(
"bad bundled plugins: " + JSON.stringify(rawBundledPlugins)
);
}
const allBundledPlugins = getAllBundledPlugins();
const bundledPlugins = new Map();
for (const name of rawBundledPlugins) {
if (typeof name !== "string") {
throw new Error("bad bundled plugin: " + JSON.stringify(name));
}
const plugin = allBundledPlugins[name];
if (plugin == null) {
throw new Error("bad bundled plugin: " + JSON.stringify(name));
}
bundledPlugins.set(name, plugin);
}
const result = {bundledPlugins};
return result;
}
24 changes: 24 additions & 0 deletions src/cli2/load.js
@@ -0,0 +1,24 @@
// @flow

import type {Command} from "./command";
import {loadInstanceConfig, pluginDirectoryContext} from "./common";

function die(std, message) {
std.err("fatal: " + message);
return 1;
}

const loadCommand: Command = async (args, std) => {
if (args.length !== 0) {
die(std, "usage: sourcecred load");
}
const baseDir = process.cwd();
const config = await loadInstanceConfig(baseDir);
for (const [name, plugin] of config.bundledPlugins) {
const dirContext = pluginDirectoryContext(baseDir, name);
plugin.load(dirContext);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's have the task reporter report that load started for the plugin--that way we have consistent measurements for all the plugins, and the plugin can use the provided task reporter to give more detailed information (e.g. timing on sub-portions)

}
return 0;
};
Comment on lines +11 to +22
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Want to suggest that there's some complecting going on here.

One part is the transport implementation we just-so-happen to be using (CLI) and the other is a use-case.

Since we're taking a greenfield approach at the CLI now, it's a distinction I'd love to see at some point. Doing so would allow exposing a first-class javascript API, similar to the likes of babel and webpack. While making the CLI transport implementation simple.

In this case the distinction would look like:

const loadCommand: Command = async (args, std) => {
  // Positional arguments, transport detail.
  if (args.length !== 0) {
    die(std, "usage: sourcecred load");
  }

  // Environment data extracting, transport detail.
  const baseDir = process.cwd();

  // Call the use-case.
  loadUsecase(baseDir);

  // Exit codes, transport detail.
  return 0;
};

// Suitable to expose as Javascript API.
function loadUseCase(baseDir: string): void {
  const config = await loadInstanceConfig(baseDir);
  for (const [name, plugin] of config.bundledPlugins) {
    const dirContext = pluginDirectoryContext(baseDir, name);
    plugin.load(dirContext);
  }
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the dev call today. If we do want to run a local daemon we can send commands over HTTP, I think this will help out.

IPFS and Docker are examples that invested in this structure, as they expose their commands through multiple transports (library, unix sockets and HTTP)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Beanow that we'll want to expose a functional style for implementing this logic that isn't hard-wired into the CLI. I don't need that to happen in this commit--I think it will be easy to factor out in the future when we need it. Though it could also be nice to have. I approve either way.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I of course also agree that it’d be great to have a JS API that
backs the CLI and is also exposed. This is basically what I describe at
the end of #945. But I don’t think that defining such an API should
block developing this CLI.


export default loadCommand;
15 changes: 13 additions & 2 deletions src/cli2/sourcecred.js
Expand Up @@ -2,9 +2,20 @@

import type {Command} from "./command";

import load from "./load";

const sourcecred: Command = async (args, std) => {
std.err("SourceCred CLI v2 not yet implemented");
return 1;
if (args.length === 0) {
std.err("fatal: specify a command");
return 1;
}
switch (args[0]) {
case "load":
return load(args.slice(1), std);
default:
std.err("fatal: unknown command: " + JSON.stringify(args[0]));
return 1;
}
};

export default sourcecred;
108 changes: 108 additions & 0 deletions src/plugins/github/cliPlugin.js
@@ -0,0 +1,108 @@
// @flow

import Database from "better-sqlite3";
import fs from "fs-extra";
import {join as pathJoin} from "path";

import fetchGithubRepo, {fetchGithubRepoFromCache} from "./fetchGithubRepo";
import type {CacheProvider} from "../../backend/cache";
import type {CliPlugin, PluginDirectoryContext} from "../../cli2/cliPlugin";
import type {PluginDeclaration} from "../../analysis/pluginDeclaration";
import type {ReferenceDetector} from "../../core/references/referenceDetector";
import type {WeightedGraph} from "../../core/weightedGraph";
import {Graph} from "../../core/graph";
import {RelationalView} from "./relationalView";
import {createGraph} from "./createGraph";
import {declaration} from "./declaration";
import {fromRelationalViews as referenceDetectorFromRelationalViews} from "./referenceDetector";
import {parse as parseConfig, type GithubConfig} from "./config";
import {validateToken, type GithubToken} from "./token";
import {weightsForDeclaration} from "../../analysis/pluginDeclaration";

const TOKEN_ENV_VAR_NAME = "SOURCECRED_GITHUB_TOKEN";

async function loadConfig(
dirContext: PluginDirectoryContext
): Promise<GithubConfig> {
const dirname = dirContext.configDirectory();
const path = pathJoin(dirname, "config.json");
const contents = await fs.readFile(path);
return Promise.resolve(parseConfig(JSON.parse(contents)));
}

// Shim to interface with `fetchGithubRepo`; TODO: refactor that to just
// take a directory.
class CacheProviderImpl implements CacheProvider {
Comment on lines +33 to +35
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I'm a huge fan of using these interfaces. It decouples the rest of the code from the implementation details of using filesystems. Would recommend applying the same idea to the configuration and tokens too like ConfigProvider and SecretsProvider.

On the flipside, I would rather not see fs.readFile or process.env in a use-case.

To give a common example where this practice pays off. Many applications use ENV for secrets. However interesting when using Docker Swarm, ENV is not suitable as it will be passed around in plaintext. Instead it should use "Docker Secrets", which most application implement by accepting a *_FILE suffix to secrets and reading it from that file path.

Having a central SecretsProvider implementation, adding this in would be easy and something the core can implement once. Having process.env scattered across the plugins, this would mean every plugin maintainer has to figure this out.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the flipside, I would rather not see fs.readFile or process.env in a use-case.

I think we're in agreement that stuff like fs.readFile and process.env should be kept at the outermost layer possible, and decoupled from implementing the logic.

IMO, losing the CacheProvider is not a big deal because we aren't actually supporting other filesystems, so as developers we aren't getting feedback on whether we're (for example) using CacheProvider consistently across the codebase. Without that feedback, there's no particular reason to believe that we will actually be using it consistently when we need that flexibility, or even that this is giving us the right flexibility compared to what we'll actually need when we need it. So we don't benefit much from it, and it does add extra indirection and cognitive overhead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m not really familiar with the way in which you’re using the term
use-case. Perhaps you could clarify?

Currently, fetchGithubRepo takes a simple token: GithubToken. It’s
up to the caller to fetch that, directly from an environment variable or
from a file or from a cast5-decryption or whatever. It’s beyond the
scope of fetchGithubRepo to care where that token came from. I don’t
see how introducing a secretProvider: () => GithubToken (presumably?)
makes that any simpler.

_dirContext: PluginDirectoryContext;
constructor(dirContext: PluginDirectoryContext) {
this._dirContext = dirContext;
}
database(id: string): Promise<Database> {
const path = pathJoin(this._dirContext.cacheDirectory(), `${id}.db`);
return Promise.resolve(new Database(path));
}
}

function getTokenFromEnv(): GithubToken {
const rawToken = process.env[TOKEN_ENV_VAR_NAME];
if (rawToken == null) {
throw new Error(`No GitHub token provided: set ${TOKEN_ENV_VAR_NAME}`);
}
return validateToken(rawToken);
}

export class GithubCliPlugin implements CliPlugin {
declaration(): PluginDeclaration {
return declaration;
}

async load(ctx: PluginDirectoryContext): Promise<void> {
const cache = new CacheProviderImpl(ctx);
const token = getTokenFromEnv();
const config = await loadConfig(ctx);
for (const repoId of config.repoIds) {
await fetchGithubRepo(repoId, {token, cache});
}
}

async graph(
ctx: PluginDirectoryContext,
rd: ReferenceDetector
): Promise<WeightedGraph> {
const _ = rd; // TODO(#1808): not yet used
const cache = new CacheProviderImpl(ctx);
const token = getTokenFromEnv();
const config = await loadConfig(ctx);

const repositories = [];
for (const repoId of config.repoIds) {
repositories.push(await fetchGithubRepoFromCache(repoId, {token, cache}));
}
const graph = Graph.merge(
repositories.map((r) => {
const rv = new RelationalView();
rv.addRepository(r);
return createGraph(rv);
})
);
const weights = weightsForDeclaration(declaration);
return {graph, weights};
}

async referenceDetector(
ctx: PluginDirectoryContext
): Promise<ReferenceDetector> {
const cache = new CacheProviderImpl(ctx);
const token = getTokenFromEnv();
const config = await loadConfig(ctx);

const rvs = [];
for (const repoId of config.repoIds) {
const repo = await fetchGithubRepoFromCache(repoId, {token, cache});
const rv = new RelationalView();
rv.addRepository(repo);
rvs.push(rv);
}
return referenceDetectorFromRelationalViews(rvs);
}
}
38 changes: 38 additions & 0 deletions src/plugins/github/config.js
@@ -0,0 +1,38 @@
// @flow

import {type RepoId, stringToRepoId} from "./repoId";

export type GithubConfig = {|
+repoIds: $ReadOnlyArray<RepoId>,
|};

// eslint-disable-next-line no-unused-vars
type SerializedGithubConfig = {|
+repositories: $ReadOnlyArray<string>,
|};
// (^ for documentation purposes)

type JsonObject =
| string
| number
| boolean
| null
| JsonObject[]
| {[string]: JsonObject};

export function parse(raw: JsonObject): GithubConfig {
if (raw == null || typeof raw !== "object" || Array.isArray(raw)) {
throw new Error("bad config: " + JSON.stringify(raw));
}
const {repositories} = raw;
if (!Array.isArray(repositories)) {
throw new Error("bad repositories: " + JSON.stringify(repositories));
}
const repoIds = repositories.map((x) => {
if (typeof x !== "string") {
throw new Error("bad repository: " + JSON.stringify(x));
}
return stringToRepoId(x);
});
return {repoIds};
}