This project enables you to deploy the Stanford Named Entity Recognizer (NER) to a "serverless" environment based on AWS Lambda and API Gateway.
The general advantages of serverless computing include cost, scalability and productivity. Specifically, these translate to:
- The ability to analyse text in virtually any environment - most notably from the browser
- Processing a large number of texts concurrently - potentially thousands
- Ease and speed of iteration - just deploy with one command after making changes to your models or label interpretation logic
-
Make sure you have the following installed on your machine:
Or
-
Sign up for an AWS account
-
Configure your AWS credentials for deployment with the Serverless framework. Make sure these are set up as the environment variables
AWS_ACCESS_KEY_ID
andAWS_SECRET_ACCESS_KEY
if working with docker. -
Install dependencies:
-
With docker:
docker build -t sner .
Or
-
With Node/JDK/Maven: Install the Serverless dependencies using the command in the project root directory:
npm install
-
With docker:
docker run --rm -it -e AWS_ACCESS_KEY_ID=$AWS_ACCESS_KEY_ID -e AWS_SECRET_ACCESS_KEY=$AWS_SECRET_ACCESS_KEY sner npm run deploy -- --stage=dev
Or
With Node/JDK/Maven:
npm run deploy -- --stage=dev
You should see your POST and GET endpoints displayed after a successful deployment e.g.
...
endpoints:
POST - https://xxxxxx.execute-api.xx-xxxx-x.amazonaws.com/dev/entities
GET - https://xxxxxx.execute-api.xx-xxxx-x.amazonaws.com/dev/entities
...
You can try using the GET endpoint by simply appending the query parameter "text" to it along with the text you wish to analyse e.g.
https://xxxxxx.execute-api.xx-xxxx-x.amazonaws.com/dev/entities?text=Stanford University is located in Silicon Valley and was founded in November 1885
Response:
{
"ORGANIZATION": [
{
"name": "Stanford University",
"count": 1
}
],
"LOCATION": [
{
"name": "Silicon Valley",
"count": 1
}
],
"DATE": [
{
"name": "November 1885",
"count": 1
}
]
}
Example payload for the POST endpoint:
{
"text": "Stanford University is located in Silicon Valley and was founded in November 1885"
}
The "business logic" lives in the EntityExtractor class and processes text in the following way:
- Finds labels associated with each word in a string using the CoreNLP library
- Filters the labels to leave only those corresponding to named entities
- Extracts the names, types and number of times each entity occurs in the text from the remaining labels
- Groups the entity names and counts by their types
The pom.xml and serverless.yml files contain most of the important settings in this project.
<project>
<!--...-->
<properties>
<!--...-->
<ner.model1>english.all.3class.distsim</ner.model1>
<ner.model2>english.conll.4class.distsim</ner.model2>
<ner.model3>english.muc.7class.distsim</ner.model3>
<!--...-->
</properties>
<!--...-->
<build>
<plugins>
<!--...-->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<!--...-->
<configuration>
<!--...-->
<filters>
<filter>
<!-- This minimises the output jar file size to remain within the [Lambda limits](https://docs.aws.amazon.com/lambda/latest/dg/limits.html) by only including your selected models -->
<includes>
<include>${ner.prefix}${ner.model1}.*</include>
<include>${ner.prefix}${ner.model2}.*</include>
<include>${ner.prefix}${ner.model3}.*</include>
</includes>
</filter>
</filters>
</configuration>
<!--...-->
</plugin>
<!--...-->
</plugins>
</build>
<!--...-->
</project>
<properties>
<nlp.version>3.9.1</nlp.version>
<!--...-->
</properties>
-
Change the AWS Lambda name, memory, region in the serverless.yml file
-
Configure your endpoints in the serverless.yml file