# Project Design Writeup and Approval Template

Follow this as a guide to completing the project design writeup. The questions for each section are merely there to
suggest what the baseline should cover; be sure to use detail as it will make the project much easier to approach as
the class moves on.

## Image/Photography Classification Model

### Problem:
Take an assorted, unlabelled, disorganized collection of digital photographs and programmatically sort them into the following categories based on subject matter:

    • Humans
    • Animals
    • Cityscapes
    • Nature scenes
    
Based on the nature of the problem, I feel like a logistical regression / classification model would be the best method to tackle the issue.

Im choosing a logistical regression (or mutually exclusive) model specifically in this case, despite being aware that the proposed categories can easily overlap, because I intend on simply taking the category with the largest predicted percentage and categorizing it as that--both for the sake of simplicity and my own current programming levels.

### Datasets:
The initial training and test set will come from http://image-net.org/, selectively searching and scraping across various synonyms for the categories:
    
    • humans, people, person, man, woman, child, crowd, etc
    • animals, dogs, cats, birds, etc
    • cityscapes, cities, buildings, construction, bridges, streets, etc
    • nature, trees, forest, park, grass, autumn, etc

The collection of images to run the completed model against will consist of an assortment of personal photographs from myself taken over the years.

### Domain knowledge:
Having worked as an image curator, specifically finding/locating/scraping images and videos for the sake of training machine learning models, I've built up a (very general) intuition on how robusts a dataset needs to be for these sorts of things. More importantly, I know how to poke around and find gaps in coverage plus where to go to find more data to fill those gaps.

Ideally, this knowledge would help be quickly source appropriate data, letting me focus on the actual model building process.

Other research efforts:
    • Step-by-step guide for a model https://elitedatascience.com/keras-tutorial-deep-learning-in-python
    • Coworkers brains

### Project Concerns:
Steps/questions:

    • Need to setup Keras
    • Need to setup PIL

Assumptions:

    • Pictures not in the four categories have low predictions and can be dealt with using thresholding.
    • Personal image collection mostly consists of the four categories and are from real life images (not graphics).
    
Risks/Costs:

    • Costs: Photos would either not be categorized or be categorized incorrectly. Small loss.
    • Benefit: Quickly sort photos into smaller, slightly more useful categories.
    • Training/testing data will most likely have incorrect labels or consist of overlapping categories.

### Outcomes:
The output workflow would ideally look like this:

    1. Feed images through trained model.
    2. Get predictions and filter out the largest one, using a threshold of n%.
    3a. If highest prediction doesn't reach n% threshold:
        • Move image into an 'Uncategorized' category. (mkdir if needed)
    3b. If highest prediction on highest (x) category is above n% threshold:
        • Move image into x category. (mkdir if needed)
        
As the script is running, the images in the target folder should slowly move into one of five new folders:

    • Humans
    • Animals
    • Cityscapes
    • Nature scenes
    • Uncategorized
    
### Success metric:
If the project correctly categorizes a majority of the images, I will consider this a success. If only some of the images are correctly categorized, but most of the are marked as 'Uncategorized', then I would consider it a minor success that could be further iterated on.

If the project fails, then hopefully it's merely a matter of needing more training data.