Moira Huang, Sarah Lincoln, Thomas Tay, Alice Wu
Technology connects people from across the globe. As the world becomes increasingly globalized, building intuitive, accessible, and useful software is becoming more and more important.
Each member of our team has different skillsets and experience levels. In the beginning, we brainstormed different projects that we could work on. We considered web and mobile apps before deciding to collaborate on a web app. Because HTML and Python are more approachable for beginners, we thought that it would be a good project for everyone to work on.
As we were brainstorming, we received mentoring from an IBM representative. She suggested that we look at IBM Code Patterns, which are interactive project templates that users can customize. After considering a few, we decided to work on a web app that generates captions for user-uploaded photos.
We started by working on the app interface, coding in HTML and JavaScript. After modifying the UI, we needed to find a source for the photos. To do this, we searched for APIs provided by Google, Microsoft, Facebook, and IBM. We first tried integrating Facebook login, but had to abandon that idea because of access permission problems. We then decided to use the Google Photos API to access pictures in Google Drive.
The server of this project is written in Python, and we added features to the server to process data retrieved by the client.
We decided on an interactive web app that allows users to select photos from their Google Drive that they want to caption.
This program utilizes machine learning to analyze photos and create relevant captions that describe the contents of the photos.
Programs such as ours work toward promoting inclusive online spaces by encouraging people to share their lives, regardless of their English language ability.
In this Code Pattern we will use one of the models from the Model Asset Exchange (MAX), an exchange where developers can find and experiment with open source deep learning models. Specifically we will be using the Image Caption Generator to create a web application that will caption images and allow the user to filter through images based image content. The web application provides an interactive user interface backed by a lightweight python server using Tornado. The server takes in images via the UI and sends them to a REST end point for the model and displays the generated captions on the UI. The model's REST endpoint is set up using the docker image provided on MAX. The Web UI displays the generated captions for each image as well as an interactive word cloud to filter images based on their caption.
When the reader has completed this Code Pattern, they will understand how to:
- Build a Docker image of the Image Caption Generator MAX Model
- Deploy a deep learning model with a REST endpoint
- Generate captions for an image using the MAX Model's REST API
- Run a web application that using the model's REST API
- Server sends default images to Model API and receives caption data.
- User interacts with Web UI containing default content and uploads image(s).
- Web UI requests caption data for image(s) from Server and updates content when data is returned.
- Server sends image(s) to Model API and receives caption data to return to Web UI.
- IBM Model Asset Exchange: A place for developers to find and use free and open source deep learning models.
- Docker: Docker is a tool designed to make it easier to create, deploy, and run applications by using containers.
- Python: Python is a programming language that lets you work more quickly and integrate your systems more effectively.
- JQuery: jQuery is a cross-platform JavaScript library designed to simplify the client-side scripting of HTML.
- Bootstrap 3: Bootstrap is a free and open-source front-end library for designing websites and web applications.
- Pexels: Pexels provides high quality and completely free stock photos licensed under the Creative Commons Zero (CC0) license.
The following is a talk at Spark+AI Summit 2018 about MAX that includes a short demo of the web app.
Ways to run the code pattern:
Follow the Deploy the Model Doc to deploy the Image Caption Generator model to IBM Cloud. If you already have a model API endpoint available you can skip this process.
Note: Deploying the model can take time, to get going faster you can try running locally.
-
Press the
Deploy to IBM Cloudbutton. If you do not have an IBM Cloud account yet, you will need to create one. -
Click
Delivery Pipelineand click theCreate +button in the form to generate aIBM Cloud API Keyfor the web app. -
Once the API key is generated, the
Region,Organization, andSpaceform sections will populate. Fill in theImage Caption Generator Model API Endpointsection with the endpoint deployed above, then click onCreate.The format for this entry should be
http://170.0.0.1:5000 -
In Toolchains, click on
Delivery Pipelineto watch while the app is deployed. Once deployed, the app can be viewed by clickingView app.
You can also deploy the model and web app on Kubernetes using the latest docker images on Docker Hub.
On your Kubernetes cluster, run the following commands:
kubectl apply -f https://raw.githubusercontent.com/IBM/MAX-Image-Caption-Generator/master/max-image-caption-generator.yaml
kubectl apply -f https://raw.githubusercontent.com/IBM/MAX-Image-Caption-Generator-Web-App/master/max-image-caption-generator-web-app.yaml
The web app will be available at port 8088 of your cluster.
The model will only be available internally, but can be accessed externally through the NodePort.
Note: For deploying the web app on IBM Cloud it is recommended to follow the Deploy to IBM Cloud instructions above rather than deploying with IBM Cloud Kubernetes Service.
NOTE: These steps are only needed when running locally instead of using the
Deploy to IBM Cloudbutton.
- Check out the code
- Installing dependencies
- Running the server
- Configuring ports (Optional)
- Instructions for Docker (Optional)
NOTE: The set of instructions in this section are a modified version of the one found on the Image Caption Generator Project Page
To run the docker image, which automatically starts the model serving API, run:
docker run -it -p 5000:5000 codait/max-image-caption-generator
This will pull a pre-built image from Docker Hub (or use an existing image if already cached locally) and run it. If you'd rather build the model locally you can follow the steps in the model README.
Note that currently this docker image is CPU only (we will add support for GPU images later).
The API server automatically generates an interactive Swagger documentation page.
Go to http://localhost:5000 to load it. From there you can explore the API and also create test requests.
Use the model/predict endpoint to load a test file and get captions for the image from the API.
The model assets folder contains a few images you can use to test out the API, or you can use your own.
You can also test it on the command line, for example:
curl -F "image=@path/to/image.jpg" -X POST http://localhost:5000/model/predict
{
"status": "ok",
"predictions": [
{
"index": "0",
"caption": "a man riding a wave on top of a surfboard .",
"probability": 0.038827644239537
},
{
"index": "1",
"caption": "a person riding a surf board on a wave",
"probability": 0.017933410519265
},
{
"index": "2",
"caption": "a man riding a wave on a surfboard in the ocean .",
"probability": 0.0056628732021868
}
]
}Clone the Image Caption Generator Web App repository locally by running the following command:
git clone https://github.com/IBM/MAX-Image-Caption-Generator-Web-App.git
Note: You may need to
cd ..out of the MAX-Image-Caption-Generator directory first
Then change directory into the local repository
cd MAX-Image-Caption-Generator-Web-App
Before running this web app you must install its dependencies:
pip install -r requirements.txt
You then start the web app by running:
python app.py
Once it's finished processing the default images (< 1 minute) you can then access the web app at:
http://localhost:8088
The Image Caption Generator endpoint must be available at http://localhost:5000 for the web app to successfully start.
If you want to use a different port or are running the ML endpoint at a different location you can change them with command-line options:
python app.py --port=[new port] --ml-endpoint=[endpoint url including protocol and port]
To run the web app with Docker the containers running the web server and the REST endpoint need to share the same network stack. This is done in the following steps:
Modify the command that runs the Image Caption Generator REST endpoint to map an additional port in the container to a
port on the host machine. In the example below it is mapped to port 8088 on the host but other ports can also be used.
docker run -it -p 5000:5000 -p 8088:8088 --name max-image-caption-generator codait/max-image-caption-generator
Build the web app image by running:
docker build -t max-image-caption-generator-web-app .
Run the web app container using:
docker run --net='container:max-image-caption-generator' -it max-image-caption-generator-web-app
You can also deploy the web app with the latest docker image available on DockerHub by running:
docker run --net='container:max-image-caption-generator' -it codait/max-image-caption-generator-web-app
This will use the model docker container run above and can be run without cloning the web app repo locally.
There is a large amount of user uploaded images in a long running web app
When running the web app at
http://localhost:8088an admin page is available athttp://localhost:8088/cleanupthat allows the user to delete all user uploaded files from the server.[Note: This deletes all user uploaded images]
- Model Asset eXchange (MAX)
- Center for Open-Source Data & AI Technologies (CODAIT)
- MAX Announcement Blog
- D3.js: D3.js is a JavaScript library for manipulating documents based on data.
- d3-cloud: A Wordle-inspired word cloud layout written in JavaScript.
- Featherlight: Featherlight is a very lightweight jQuery lightbox plugin.
- Glyphicons: GLYPHICONS is a library of precisely prepared monochromatic icons and symbols, created with an emphasis to simplicity and easy orientation.
- Image Picker: Image Picker is a simple jQuery plugin that transforms a select element into a more user friendly graphical interface.
- Cookie Consent: Cookie Consent is a JavaScript plugin for alerting users about the use of cookies on a website.
- Artificial Intelligence Code Patterns: Enjoyed this Code Pattern? Check out our other Artificial Intelligence Code Patterns
- AI and Data Code Pattern Playlist: Bookmark our playlist with all of our Code Pattern videos
- Watson Studio: Master the art of data science with IBM's Watson Studio
- Deep Learning with Watson Studio: Design and deploy deep learning models using neural networks, easily scale to hundreds of training runs. Learn more at Deep Learning with Watson Studio.







