Serverless Stanford Named Entity Recognizer

This project enables you to deploy the Stanford Named Entity Recognizer (NER) to a "serverless" environment based on AWS Lambda and API Gateway.

Why?

The general advantages of serverless computing include cost, scalability and productivity. Specifically, these translate to:

The ability to analyse text in virtually any environment - most notably from the browser
Processing a large number of texts concurrently - potentially thousands
Ease and speed of iteration - just deploy with one command after making changes to your models or label interpretation logic

How?

Getting started

Make sure you have the following installed on your machine:
- Docker
Or
- Node >= 8
- JDK >= 8
- Maven
Sign up for an AWS account
Configure your AWS credentials for deployment with the Serverless framework. Make sure these are set up as the environment variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY if working with docker.
Install dependencies:
- With docker:
```
docker build -t sner .
```
  Or
- With Node/JDK/Maven: Install the Serverless dependencies using the command in the project root directory:
```
 npm install
```

Deploying to AWS

With docker:

docker run --rm -it  -e AWS_ACCESS_KEY_ID=$AWS_ACCESS_KEY_ID -e AWS_SECRET_ACCESS_KEY=$AWS_SECRET_ACCESS_KEY sner npm run deploy -- --stage=dev

Or

With Node/JDK/Maven:

npm run deploy -- --stage=dev

You should see your POST and GET endpoints displayed after a successful deployment e.g.

...
endpoints:
  POST - https://xxxxxx.execute-api.xx-xxxx-x.amazonaws.com/dev/entities
  GET - https://xxxxxx.execute-api.xx-xxxx-x.amazonaws.com/dev/entities
...

Trying it out

You can try using the GET endpoint by simply appending the query parameter "text" to it along with the text you wish to analyse e.g.

https://xxxxxx.execute-api.xx-xxxx-x.amazonaws.com/dev/entities?text=Stanford University is located in Silicon Valley and was founded in November 1885

Response:

{
  "ORGANIZATION": [
    {
      "name": "Stanford University",
      "count": 1
    }
  ],
  "LOCATION": [
    {
      "name": "Silicon Valley",
      "count": 1
    }
  ],
  "DATE": [
    {
      "name": "November 1885",
      "count": 1
    }
  ]
}

Example payload for the POST endpoint:

{
  "text": "Stanford University is located in Silicon Valley and was founded in November 1885"
}

What?

Label interpretation logic

The "business logic" lives in the EntityExtractor class and processes text in the following way:

Finds labels associated with each word in a string using the CoreNLP library
Filters the labels to leave only those corresponding to named entities
Extracts the names, types and number of times each entity occurs in the text from the remaining labels
Groups the entity names and counts by their types

Configuration

The pom.xml and serverless.yml files contain most of the important settings in this project.

Select the models you wish to use in the pom.xml <properties> and <build> sections:

<project>
<!--...-->
<properties>
    <!--...-->
    <ner.model1>english.all.3class.distsim</ner.model1>
    <ner.model2>english.conll.4class.distsim</ner.model2>
    <ner.model3>english.muc.7class.distsim</ner.model3>
    <!--...-->
</properties>
<!--...-->
  <build>
    <plugins>
      <!--...-->
      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-shade-plugin</artifactId>
        <!--...-->
        <configuration>
          <!--...-->
          <filters>
            <filter>
              <!-- This minimises the output jar file size to remain within the [Lambda limits](https://docs.aws.amazon.com/lambda/latest/dg/limits.html) by only including your selected models -->
              <includes>
                <include>${ner.prefix}${ner.model1}.*</include>
                <include>${ner.prefix}${ner.model2}.*</include>
                <include>${ner.prefix}${ner.model3}.*</include>
              </includes>
            </filter>
          </filters>
        </configuration>
        <!--...-->
      </plugin>
    <!--...-->
    </plugins>
  </build>
  <!--...-->
</project>

Update the CoreNLP library version in the pom.xml <properties> section:

<properties>
    <nlp.version>3.9.1</nlp.version>
    <!--...-->
</properties>

Change the AWS Lambda name, memory, region in the serverless.yml file
Configure your endpoints in the serverless.yml file

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.circleci		.circleci
src		src
.dockerignore		.dockerignore
.gitignore		.gitignore
.npmrc		.npmrc
Dockerfile		Dockerfile
README.md		README.md
package.json		package.json
pom.xml		pom.xml
serverless.yml		serverless.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Serverless Stanford Named Entity Recognizer

Why?

How?

Getting started

Deploying to AWS

Trying it out

What?

Label interpretation logic

Configuration

About

Releases

Packages

Languages

jabrythehutt/sner

Folders and files

Latest commit

History

Repository files navigation

Serverless Stanford Named Entity Recognizer

Why?

How?

Getting started

Deploying to AWS

Trying it out

What?

Label interpretation logic

Configuration

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages