Skip to content

Commit

Permalink
chore: v14
Browse files Browse the repository at this point in the history
  • Loading branch information
ndabAP committed May 26, 2023
1 parent 9d81cd1 commit 4e3ef3e
Show file tree
Hide file tree
Showing 16 changed files with 169 additions and 112 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ jobs:
- name: Set up Go
uses: actions/setup-go@v3
with:
go-version: 1.18
go-version: 1.19

- name: Build
run: go build -v ./...
Expand Down
3 changes: 3 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,9 @@ all: test clean build
test:
go test ./... -short

test_full:
go test ./...

build: windows linux darwin
@echo version: $(VERSION)

Expand Down
22 changes: 14 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,12 @@
# assocentity

[![Go Report Card](https://goreportcard.com/badge/github.com/ndabAP/assocentity/v13)](https://goreportcard.com/report/github.com/ndabAP/assocentity/v13)
[![Go Report Card](https://goreportcard.com/badge/github.com/ndabAP/assocentity/v14)](https://goreportcard.com/report/github.com/ndabAP/assocentity/v14)

Package assocentity is a social science tool to analyze the relative distance
from tokens to entities. The motiviation is to make conclusions based on the
distance from interesting tokens to a certain entity and its synonyms.
distance from interesting tokens to a certain entity and its synonyms. Visit
[this](https://ndabap.github.io/entityscrape/index.html) website to see an
usage example.

## Features

Expand All @@ -16,7 +18,7 @@ distance from interesting tokens to a certain entity and its synonyms.
## Installation

```bash
$ go get github.com/ndabAP/assocentity/v13
$ go get github.com/ndabAP/assocentity/v14
```

## Prerequisites
Expand Down Expand Up @@ -71,18 +73,22 @@ if err != nil {
mean := assocentity.Mean(dists)
```

The `NLPTokenizer` has a built-in retryer with a strategy that went well with
the Google Language API limitations. It can't be disabled or configured.

### Tokenization

If you provide your own tokenizer you must implement the interface with the
method `Tokenize` and the following signature:
A `Tokenizer` is something that produces tokens with a given text. While a
`Token` is the smallest possible unit of a text. The interface with the
method `Tokenize` has the following signature:

```go
type Tokenizer interface {
Tokenize(ctx context.Context, text string) ([]Token, error)
}
```

`Token` is of type:
A `Token` has the following properties:

```go
type Token struct {
Expand All @@ -100,7 +106,7 @@ For example, given the text:
text := "Punchinello was burning to get me"
```

The result from `Tokenize` would be:
The result from `Tokenize` would be a slice of tokens:

```go
[]Token{
Expand Down Expand Up @@ -147,7 +153,7 @@ The application expects the text from "stdin" and accepts the following flags:
Example:

```bash
echo "Relax, Max. You're a nice guy." | ./bin/assocentity_linux_amd64_v13.0.0-0-g948274a-dirty -gog-svc-loc=/home/max/.config/assocentity/google-service.json -entities="Max Payne,Payne,Max"
echo "Relax, Max. You're a nice guy." | ./bin/assocentity_linux_amd64_v14.0.0-0-g948274a-dirty -gog-svc-loc=/home/max/.config/assocentity/google-service.json -entities="Max Payne,Payne,Max"
```

The output is written to "stdout" in appropoiate formats.
Expand Down
12 changes: 4 additions & 8 deletions assocentity.go
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,10 @@ import (
"context"
"math"

"github.com/ndabAP/assocentity/v13/internal/comp"
"github.com/ndabAP/assocentity/v13/internal/iterator"
"github.com/ndabAP/assocentity/v13/internal/pos"
"github.com/ndabAP/assocentity/v13/tokenize"
"github.com/ndabAP/assocentity/v14/internal/comp"
"github.com/ndabAP/assocentity/v14/internal/iterator"
"github.com/ndabAP/assocentity/v14/internal/pos"
"github.com/ndabAP/assocentity/v14/tokenize"
)

// source wraps entities and texts
Expand Down Expand Up @@ -116,8 +116,6 @@ func distances(
// Finds/counts entities in positive direction
posDirIter.SetPos(currDetermTokensPos)
for posDirIter.Next() {
// [I, was, (with), Max, Payne, here] -> true, Max Payne
// [I, was, with, Max, Payne, (here)] -> false, ""
isEntity, entity := comp.TextWithEntities(
posDirIter,
entityTokensIter,
Expand All @@ -133,8 +131,6 @@ func distances(
// Finds/counts entities in negative direction
negDirIter.SetPos(currDetermTokensPos)
for negDirIter.Prev() {
// [I, was, (with), Max, Payne, here] -> false, ""
// [I, was, with, Max, Payne, (here)] -> true, Max Payne
isEntity, entity := comp.TextWithEntities(
negDirIter,
entityTokensIter,
Expand Down
6 changes: 3 additions & 3 deletions assocentity_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ import (
"strings"
"testing"

"github.com/ndabAP/assocentity/v13/tokenize"
"github.com/ndabAP/assocentity/v14/tokenize"
)

// whiteSpaceTokenizer tokenizes a text by empty space and assigns unknown
Expand Down Expand Up @@ -175,7 +175,7 @@ func Test_distances(t *testing.T) {
}

func TestNormalize(t *testing.T) {
t.Run("HumandReadableNormalizer", func(t *testing.T) {
t.Run("HumanReadableNormalizer", func(t *testing.T) {
got := map[tokenize.Token][]float64{
{
PoS: tokenize.UNKN,
Expand Down Expand Up @@ -208,7 +208,7 @@ func TestNormalize(t *testing.T) {
Text: "and",
}: {},
}
Normalize(got, HumandReadableNormalizer)
Normalize(got, HumanReadableNormalizer)

if !reflect.DeepEqual(got, want) {
t.Errorf("Normalize() = %v, want %v", got, want)
Expand Down
16 changes: 8 additions & 8 deletions cli/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,9 @@ import (
"os"
"strings"

"github.com/ndabAP/assocentity/v13"
"github.com/ndabAP/assocentity/v13/nlp"
"github.com/ndabAP/assocentity/v13/tokenize"
"github.com/ndabAP/assocentity/v14"
"github.com/ndabAP/assocentity/v14/nlp"
"github.com/ndabAP/assocentity/v14/tokenize"
)

var logger = log.Default()
Expand All @@ -28,22 +28,22 @@ var (
entitiesF = flag.String(
"entities",
"",
"Define entities to be searched within input, example: -entities=\"Max Payne,Payne\"",
"List of comma separated entities, example: -entities=\"Max Payne,Payne\"",
)
gogSvcLocF = flag.String(
"gog-svc-loc",
"google-svc-acc-key",
"",
"Google Clouds NLP JSON service account file, example: -gog-svc-loc=\"~/gog-svc-loc.json\"",
"Google Clouds NLP JSON service account file, example: -google-svc-acc-key=\"~/google-svc-acc-key.json\"",
)
opF = flag.String(
"op",
"mean",
"Operation to execute",
"Operation to execute, default is \"mean\"",
)
posF = flag.String(
"pos",
"any",
"Defines part of speeches to be included, example: -pos=noun,verb,pron",
"List of comma separated part of speeches, example: -pos=noun,verb,pron",
)
)

Expand Down
44 changes: 22 additions & 22 deletions go.mod
Original file line number Diff line number Diff line change
@@ -1,33 +1,33 @@
module github.com/ndabAP/assocentity/v13
module github.com/ndabAP/assocentity/v14

go 1.18
go 1.19

require (
cloud.google.com/go v0.34.0
cloud.google.com/go/language v1.9.0
github.com/joho/godotenv v1.3.0
google.golang.org/api v0.102.0
google.golang.org/genproto v0.0.0-20221024183307-1bc688fe9f3e
google.golang.org/api v0.123.0
google.golang.org/genproto v0.0.0-20230410155749-daa745c078e1
)

require (
cloud.google.com/go/compute v1.19.0 // indirect
cloud.google.com/go/compute/metadata v0.2.3 // indirect
github.com/google/s2a-go v0.1.3 // indirect
golang.org/x/crypto v0.1.0 // indirect
)

require (
github.com/BurntSushi/toml v0.3.1 // indirect
github.com/golang/groupcache v0.0.0-20200121045136-8c9f03a8e57e // indirect
github.com/golang/protobuf v1.5.2 // indirect
github.com/golang/protobuf v1.5.3 // indirect
github.com/google/go-cmp v0.5.9 // indirect
github.com/googleapis/enterprise-certificate-proxy v0.2.0 // indirect
github.com/googleapis/gax-go v1.0.3 // indirect
github.com/googleapis/gax-go/v2 v2.7.0
go.opencensus.io v0.23.0 // indirect
golang.org/x/exp v0.0.0-20221026153819-32f3d567a233 // indirect
golang.org/x/lint v0.0.0-20190313153728-d0100b6bd8b3 // indirect
golang.org/x/mod v0.6.0 // indirect
golang.org/x/net v0.7.0 // indirect
golang.org/x/oauth2 v0.0.0-20221014153046-6fdb5e3db783 // indirect
golang.org/x/sys v0.5.0 // indirect
golang.org/x/text v0.7.0 // indirect
golang.org/x/tools v0.2.0 // indirect
github.com/googleapis/enterprise-certificate-proxy v0.2.3 // indirect
github.com/googleapis/gax-go/v2 v2.9.1
go.opencensus.io v0.24.0 // indirect
golang.org/x/net v0.9.0 // indirect
golang.org/x/oauth2 v0.7.0 // indirect
golang.org/x/sys v0.7.0 // indirect
golang.org/x/text v0.9.0 // indirect
google.golang.org/appengine v1.6.7 // indirect
google.golang.org/grpc v1.50.1 // indirect
google.golang.org/protobuf v1.28.1 // indirect
honnef.co/go/tools v0.0.0-20190523083050-ea95bdfd59fc // indirect
google.golang.org/grpc v1.55.0 // indirect
google.golang.org/protobuf v1.30.0 // indirect
)

0 comments on commit 4e3ef3e

Please sign in to comment.