Skip to content

Commit

Permalink
Merge branch 'elastic'
Browse files Browse the repository at this point in the history
  • Loading branch information
korfuri committed Jul 10, 2017
2 parents 15abcb9 + c28c859 commit 7a98a91
Show file tree
Hide file tree
Showing 17 changed files with 436 additions and 43 deletions.
4 changes: 1 addition & 3 deletions .travis.yml
Original file line number Diff line number Diff line change
@@ -1,8 +1,6 @@
language: go
go:
- 1.x
- 1.6
- 1.7.x
- 1.8.x
- tip
before_install:
- go get golang.org/x/tools/cmd/cover
Expand Down
75 changes: 75 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,3 +15,78 @@ answer questions such as:
* Where are all the references to this identifier?
* What types implement this interface?
* What interfaces are implemented by this type?

Goref can be used as a library, see
its [godoc](http://godoc.org/github.com/korfuri/goref) for usage
information.

Goref can also be used to index code into ElasticSearch. This is
currently a Work-In-Progress. The binary for this is at
elasticsearch/main/main for lack of a good name until now. Usage is:

./main
--version 42 # version of the code being indexed
--include_tests # or --noinclude_tests to avoid indexing [XTests](https://godoc.org/golang.org/x/tools/go/loader#hdr-CONCEPTS_AND_TERMINOLOGY)
--elasticsearch_url http://localhost:9200
--elasticsearch_user user
--elasticsearch_password hunter2
github.com/korfuri/goref
your/awesome/pacakge

This always imports dependencies recursively.

## Code versioning

When code is indexed, the concept of "version" is critical. Since code
is a living thing, indexes of code must be versioned. Since the code
we index lives in many repositories, we can't use the repositories'
history as a versioning tool.

So versions have to be provided to the index externally. It's
recommended to keep a global, monotonically increasing counter for
this. Every time you `go get` one or more packages, that counter
should be incremented. The counter is an int64, so an elegant way to
do this is to use the `time.Time()` at which you last sync'd your entire
Go tree.

TODO(korfuri): versions should be per-package, not per-graph. There
should be a way to avoid duplicating all packages if only one package
was updated. Probably a callback passed by the user code that returns
the version for a given package, so that callback could look up the
latest mtime of all files in that package. Need to find a convenient
API.

If you'll be doing operations to a completely immutable tree of
packages (typically, your PackageGraph remains in memory and is never
serialized to disk, and you don't `go get` or `git pull` while loading
packages), you can just set the version to a fixed number and ignore
that.

Goref is (obviously) not safe to use if you concurrently update the
code while it's analyzing it.

## Vendoring and goref

Vendored packages are treated as separate packages in goref. There is
no support to deduplicate `github.com/foo/bar` and
`github.com/baz/qux/vendor/foo/bar`. This follows the `go`
tool's
[philosophy on that question](https://docs.google.com/document/d/1Bz5-UB7g2uPBdOx-rw5t9MxJwkfpx90cqG9AFL0JAYo/edit).

## Types of references

Currently, goref provides the following kinds of references, defined
in `reftype.go`:

* `Import` represents an import of a package by a file. Since a
package doesn't exist in a single position, there will be a Ref
from the importing file to each file in the imported package.
* `Call` represents a call of a function by another function.
* `Instantiation` are generated for composite literals.
* `Implementation` represent a reference from a type implementing an
interface to that interface.
* `Extension` represent a reference from interface A to interface B if
interface A is a superset of interface B.
* `Reference` is the default enum value, used if goref can't figure
out what kind of reference is used but detects that a package
depends on an identifier in another package.
7 changes: 3 additions & 4 deletions dotimports_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -10,12 +10,11 @@ import (

func TestDotImports(t *testing.T) {
const (
pkgpath = "github.com/korfuri/goref/testprograms/dotimports"
filepath = "testprograms/dotimports/main.go"
pkgpath = "github.com/korfuri/goref/testprograms/dotimports"
)

pg := goref.NewPackageGraph()
pg.LoadProgram(pkgpath, []string{filepath})
pg := goref.NewPackageGraph(0)
pg.LoadPrograms([]string{pkgpath}, true)
assert.Contains(t, pg.Packages, pkgpath)
assert.Contains(t, pg.Packages, pkgpath+"/lib")
pkg := pg.Packages[pkgpath]
Expand Down
89 changes: 89 additions & 0 deletions elasticsearch/elasticsearch.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
package elasticsearch

import (
"context"
"errors"
"fmt"

"github.com/korfuri/goref"
log "github.com/sirupsen/logrus"
elastic "gopkg.in/olivere/elastic.v5"
)

const (
// Max number of errors reported in one call to
// LoadGraphToElastic
maxErrorsReported = 20
)

// PackageExists returns whether the provided loadpath + version tuple
// exists in this index.
func PackageExists(loadpath string, version int64, client *elastic.Client) bool {
ctx := context.Background()
docID := fmt.Sprintf("v1@%d@%s", version, loadpath)
pkgDoc, _ := client.Get().
Index("goref").
Type("package").
Id(docID).
Do(ctx)
// TODO: handle errors better. Right now we assume that any
// error is a 404 and can be ignored safely.
return pkgDoc != nil
}

// LoadGraphToElastic loads all Packages and Refs from a PackageGraph
// to the provided ES index.
func LoadGraphToElastic(pg goref.PackageGraph, client *elastic.Client) ([]*goref.Ref, error) {
ctx := context.Background()
missedRefs := make([]*goref.Ref, 0)
errs := make([]error, 0)

for _, p := range pg.Packages {
log.Infof("Processing package %s", p.Path)

if PackageExists(p.Path, p.Version, client) {
log.Infof("Package %s already exists in this index.", p)
continue
}

log.Infof("Creating Package %s in the index", p)
if _, err := client.Index().
Index("goref").
Type("package").
Id(p.DocumentID()).
BodyJson(p).
Do(ctx); err != nil {
log.Infof("2 %s", err)
return nil, err
}

for _, r := range p.OutRefs {
log.Infof("Creating Ref document [%s] in the index", r)
refDoc, err := client.Index().
Index("goref").
Type("ref").
BodyJson(r).
Do(ctx)
if err != nil {
missedRefs = append(missedRefs, r)
errs = append(errs, err)
log.Infof("Create Ref document failed with err:[%s] for Ref:[%s]", err, r)
} else {
log.Infof("Created Ref document with docID:[%s] for Ref:[%s]", refDoc.Id, r)
}
}
}
if len(missedRefs) > 0 {
errStr := fmt.Sprintf("%d refs couldn't be imported. Errors were:\n", len(missedRefs))
c := 0
for _, e := range errs {
errStr = errStr + e.Error() + "\n"
c = c + 1
if c >= maxErrorsReported {
break
}
}
return missedRefs, errors.New(errStr)
}
return nil, nil
}
79 changes: 79 additions & 0 deletions elasticsearch/main/main.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
package main

import (
"flag"

"github.com/korfuri/goref"
"github.com/korfuri/goref/elasticsearch"
log "github.com/sirupsen/logrus"
elastic "gopkg.in/olivere/elastic.v5"
)

const (
Usage = `elastic_goref -version 42 -include_tests <true|false> \\
-elastic_url http://localhost:9200/ -elastic_user elastic -elastic_password changeme \\
github.com/korfuri/goref github.com/korfuri/goref/elastic/main`
)

var (
version = flag.Int64("version", -1,
"Version of the code being examined. Should increase monotonically when the code is updated.")
includeTests = flag.Bool("include_tests", true,
"Whether XTest packages should be included in the index.")
elasticUrl = flag.String("elastic_url", "http://localhost:9200",
"URL of the ElasticSearch cluster.")
elasticUsername = flag.String("elastic_user", "elastic",
"Username to authenticate with ElasticSearch.")
elasticPassword = flag.String("elastic_password", "changeme",
"Password to authenticate with ElasticSearch.")
)

func usage() {
log.Fatal(Usage)
}

func main() {
flag.Parse()
args := flag.Args()

if *version == -1 || len(args) == 0 {
usage()
}

// Create a client
client, err := elastic.NewClient(
elastic.SetURL(*elasticUrl),
elastic.SetBasicAuth(*elasticUsername, *elasticPassword))
if err != nil {
log.Fatal(err)
}

// Filter out packages that already exist at this version in
// the index.
packages := make([]string, 0)
for _, a := range args {
if !elasticsearch.PackageExists(a, *version, client) {
packages = append(packages, a)
}
}

// Index the requested packages
log.Infof("Indexing packages: %v", packages)
if *includeTests {
log.Info("This index will include XTests.")
}
pg := goref.NewPackageGraph(0)
pg.LoadPrograms(packages, *includeTests)
log.Info("Computing the interface-implementation matrix.")
pg.ComputeInterfaceImplementationMatrix()

log.Infof("%d packages in the graph.", len(pg.Packages))
log.Infof("%d files in the graph.", len(pg.Files))

// Load the indexed references into ElasticSearch
log.Info("Inserting references into ElasticSearch.")
if missed, err := elasticsearch.LoadGraphToElastic(*pg, client); err != nil {
log.Fatalf("Couldn't load %d references. Error: %s", len(missed), err)
}
log.Info("Done, bye.")
}
6 changes: 3 additions & 3 deletions empty_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -9,11 +9,11 @@ import (

func TestCanImportEmptyPackage(t *testing.T) {
const (
emptypkgpath = "github.com/korfuri/goref/testprograms/empty/main"
emptypkgpath = "github.com/korfuri/goref/testprograms/empty"
)

pg := goref.NewPackageGraph()
pg.LoadProgram(emptypkgpath, []string{"testprograms/empty/main.go"})
pg := goref.NewPackageGraph(0)
pg.LoadPrograms([]string{emptypkgpath}, false)
assert.Len(t, pg.Packages, 1)
assert.Len(t, pg.Files, 1)
assert.Empty(t, pg.Packages[emptypkgpath].InRefs)
Expand Down
7 changes: 3 additions & 4 deletions interfaces_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -10,12 +10,11 @@ import (

func TestInterfaceImplMatrix(t *testing.T) {
const (
pkgpath = "github.com/korfuri/goref/testprograms/interfaces/main"
filepath = "testprograms/interfaces/main.go"
pkgpath = "github.com/korfuri/goref/testprograms/interfaces"
)

pg := goref.NewPackageGraph()
pg.LoadProgram(pkgpath, []string{filepath})
pg := goref.NewPackageGraph(0)
pg.LoadPrograms([]string{pkgpath}, false)
assert.Contains(t, pg.Packages, pkgpath)
pg.ComputeInterfaceImplementationMatrix()

Expand Down
20 changes: 20 additions & 0 deletions json/json.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
package json

import (
"encoding/json"

"github.com/korfuri/goref"
)

func GraphAsJSON(pg goref.PackageGraph, outch chan<- []byte, errch chan<- error, done chan<- struct{}) {
for _, p := range pg.Packages {
for _, r := range p.InRefs {
if j, err := json.Marshal(r); err == nil {
outch <- j
} else {
errch <- err
}
}
}
done <- struct{}{}
}
24 changes: 20 additions & 4 deletions main/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ package main
import (
"github.com/dustin/go-humanize"
"github.com/korfuri/goref"
"github.com/korfuri/goref/json"

"log"
"os"
Expand All @@ -26,8 +27,8 @@ func main() {

start := time.Now()

m := goref.NewPackageGraph()
m.LoadProgram("github.com/korfuri/goref/main", []string{"main.go"})
m := goref.NewPackageGraph(0)
m.LoadPrograms([]string{"github.com/korfuri/goref/main/main"}, true)

log.Printf("Loading took %s\n", time.Since(start))
reportMemory()
Expand All @@ -43,12 +44,12 @@ func main() {
log.Printf("%d files in the graph\n", len(m.Files))

log.Printf("Package `goref` has these files:\n")
for d, _ := range m.Packages["github.com/korfuri/goref"].Files {
for d := range m.Packages["github.com/korfuri/goref"].Files {
log.Printf(" - %s\n", d)
}

log.Printf("Package `fmt` has these files:\n")
for d, _ := range m.Packages["fmt"].Files {
for d := range m.Packages["fmt"].Files {
log.Printf(" - %s\n", d)
}

Expand Down Expand Up @@ -79,6 +80,21 @@ func main() {
}

log.Printf("Displaying took %s (total runtime: %s)\n", time.Since(computeMatrixDone), time.Since(start))

jsonch := make(chan []byte)
errch := make(chan error)
done := make(chan struct{})
go json.GraphAsJSON(*m, jsonch, errch, done)
for {
select {
case j := <-jsonch:
log.Printf("%s\n", string(j))
case err := <-errch:
log.Fatal(err)
case <-done:
return
}
}
}

func unused() interface{} {
Expand Down
Loading

0 comments on commit 7a98a91

Please sign in to comment.