Skip to content
A demo flask app where you can take a picture in a mobile browser and send the pic to Google's ML Vision API for label and text dectection.
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
static
templates
LICENSE
Procfile
README.md
app.py
vision.py

README.md

ml-camera

Contents

What is ml-camera?
How does this app work?
Requirements

What is ml-camera?

This is a demo flask app where you can take a picture in a mobile browser and send the pic to Google's ML Vision API for label and text extraction.


How does this app work?

At a high level, this app does the following:

  1. Asks the user for access to their phone's camera (Front or Back)
  2. Once the user grants access, video starts streaming.
  3. The user can choose whether they want to detect what an image is or extract text from an image by clicking on the appropriate link.
  4. The image is then sent server side along with the image service to be used (text extraction or label).
  5. After sending the image to the Vision API, it returns either a label or text that was extracted from the image and then sends this information back to the client.

What does this app look like?

Example of the Google ML Vision API successfully labeling a laptop. In addition to the label, a confidence score is also returned. In the case below, Google is 95% confident that the image is a laptop.



Example of the Google ML Vision API extracting text off of a keyboard.

Requirements

This app uses the following python libraries, which you will need to install:

  • numpy
  • google.cloud.vision
  • io
  • datetime
  • flask
  • flask_socketio
  • requests

On the client side:

  • jquery

Main files to review how it works

Essentially, main functionality of the app is contained within the following files:

  1. app.py - this is the main script
  2. vision.py - this is a helper script which sends the image to either the label detection or text extraction service.
  3. /static/js/camera.js - this is the main javascript file which sends the data to python and renders results from the Vision API.

Useful Reference Links

Here are some links that I found very useful with code examples.

You can’t perform that action at this time.