Skip to content

jabrythehutt/sner

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CircleCI

Serverless Stanford Named Entity Recognizer

This project enables you to deploy the Stanford Named Entity Recognizer (NER) to a "serverless" environment based on AWS Lambda and API Gateway.

Why?

The general advantages of serverless computing include cost, scalability and productivity. Specifically, these translate to:

  • The ability to analyse text in virtually any environment - most notably from the browser
  • Processing a large number of texts concurrently - potentially thousands
  • Ease and speed of iteration - just deploy with one command after making changes to your models or label interpretation logic

How?

Getting started

  1. Make sure you have the following installed on your machine:

    Or

  2. Sign up for an AWS account

  3. Configure your AWS credentials for deployment with the Serverless framework. Make sure these are set up as the environment variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY if working with docker.

  4. Install dependencies:

    • With docker:

      docker build -t sner .
      

      Or

    • With Node/JDK/Maven: Install the Serverless dependencies using the command in the project root directory:

       npm install
      

Deploying to AWS

With docker:

docker run --rm -it  -e AWS_ACCESS_KEY_ID=$AWS_ACCESS_KEY_ID -e AWS_SECRET_ACCESS_KEY=$AWS_SECRET_ACCESS_KEY sner npm run deploy -- --stage=dev

Or

With Node/JDK/Maven:

npm run deploy -- --stage=dev

You should see your POST and GET endpoints displayed after a successful deployment e.g.

...
endpoints:
  POST - https://xxxxxx.execute-api.xx-xxxx-x.amazonaws.com/dev/entities
  GET - https://xxxxxx.execute-api.xx-xxxx-x.amazonaws.com/dev/entities
...

Trying it out

You can try using the GET endpoint by simply appending the query parameter "text" to it along with the text you wish to analyse e.g.

https://xxxxxx.execute-api.xx-xxxx-x.amazonaws.com/dev/entities?text=Stanford University is located in Silicon Valley and was founded in November 1885

Response:

{
  "ORGANIZATION": [
    {
      "name": "Stanford University",
      "count": 1
    }
  ],
  "LOCATION": [
    {
      "name": "Silicon Valley",
      "count": 1
    }
  ],
  "DATE": [
    {
      "name": "November 1885",
      "count": 1
    }
  ]
}

Example payload for the POST endpoint:

{
  "text": "Stanford University is located in Silicon Valley and was founded in November 1885"
}

What?

Label interpretation logic

The "business logic" lives in the EntityExtractor class and processes text in the following way:

  1. Finds labels associated with each word in a string using the CoreNLP library
  2. Filters the labels to leave only those corresponding to named entities
  3. Extracts the names, types and number of times each entity occurs in the text from the remaining labels
  4. Groups the entity names and counts by their types

Configuration

The pom.xml and serverless.yml files contain most of the important settings in this project.

  • Select the models you wish to use in the pom.xml <properties> and <build> sections:
<project>
<!--...-->
<properties>
    <!--...-->
    <ner.model1>english.all.3class.distsim</ner.model1>
    <ner.model2>english.conll.4class.distsim</ner.model2>
    <ner.model3>english.muc.7class.distsim</ner.model3>
    <!--...-->
</properties>
<!--...-->
  <build>
    <plugins>
      <!--...-->
      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-shade-plugin</artifactId>
        <!--...-->
        <configuration>
          <!--...-->
          <filters>
            <filter>
              <!-- This minimises the output jar file size to remain within the [Lambda limits](https://docs.aws.amazon.com/lambda/latest/dg/limits.html) by only including your selected models -->
              <includes>
                <include>${ner.prefix}${ner.model1}.*</include>
                <include>${ner.prefix}${ner.model2}.*</include>
                <include>${ner.prefix}${ner.model3}.*</include>
              </includes>
            </filter>
          </filters>
        </configuration>
        <!--...-->
      </plugin>
    <!--...-->
    </plugins>
  </build>
  <!--...-->
</project>

  • Update the CoreNLP library version in the pom.xml <properties> section:
<properties>
    <nlp.version>3.9.1</nlp.version>
    <!--...-->
</properties>

Releases

No releases published

Packages

No packages published

Languages