# Inspect Photos

This notebook allows you to get a quick overview of the key attributes associated with a photo in the Unsplash dataset. It aggregates some key features from all 5 tables in the dataset (`photos`, `keywords`, `collections`, `conversions` and `colors`).

## Load all data

In [1]:
import numpy as np
import pandas as pd
import glob

# This code is adapted from the examples provided with the datase
path = './unsplash-dataset/lite/'
documents = ['photos', 'keywords', 'collections', 'conversions', 'colors']
datasets = {}

for doc in documents:
    files = glob.glob(path + doc + ".tsv*")

    subsets = []
    for filename in files:
        df = pd.read_csv(filename, sep='\t', header=0)
        subsets.append(df)

    datasets[doc] = pd.concat(subsets, axis=0, ignore_index=True)

photos = datasets['photos']
keywords = datasets['keywords']
collections = datasets['collections']
conversions = datasets['conversions']
colors = datasets['colors']

## Define which attributes to show

The `show_photo_data` function defines which attributes should be shown and in what format.

In [185]:
from IPython.display import Image
from IPython.core.display import display, HTML
from collections import Counter

# Displays the most important information for a given photo ID
def show_photo_data(photo_id, keyword_confidence=70):
    # Find the relevant photo in the photos table
    photo = photos[photos['photo_id'] == photo_id].iloc[0]

    # Show the photo and its link
    display(Image(url=photo["photo_image_url"], width=300, retina=True, embed=False))
    display(HTML(f'<a href="{photo["photo_url"]}">{photo["photo_url"]}</a>'))
    print()

    # Show the descriptions from the user and the AI
    print(f'User: {photo["photo_description"]}')
    print(f'AI: {photo["ai_description"]}')
    print()

    # Show the downloads and views stats and their ratio
    print(f'Downloads: {photo["stats_downloads"]}')
    print(f'Views: {photo["stats_views"]}')
    print(f'Ratio: {(100 * photo["stats_downloads"] / photo["stats_views"]):.2f}%')
    print()

    # Display the keywords associates with that photo (both from the user and the AI). Only keywords having some minimal confidence will be shown.
    photo_keywords = keywords[keywords['photo_id'] == photo_id]
    keywords_user = photo_keywords[photo_keywords["suggested_by_user"] == "t"]
    keywords_ai = photo_keywords[(photo_keywords["suggested_by_user"] == "f") & 
                                 ((photo_keywords["ai_service_1_confidence"] > keyword_confidence) | 
                                  (photo_keywords["ai_service_2_confidence"] > keyword_confidence))]
    print(f'Keywords User: {", ".join(list(keywords_user["keyword"]))}')
    print(f'Keywords AI: {", ".join(list(keywords_ai["keyword"]))}')
    print()

    # Display the search terms associated with each photo (conversions) and their count
    photo_conversions = Counter(list(conversions[conversions['photo_id'] == photo_id]["keyword"]))
    print(f'Conversions:')
    for item in sorted(photo_conversions, key=lambda key: -photo_conversions[key]):
        print(f'{photo_conversions[item]:4d}: {item}')
    print()

    # Display the colleactions the photo is added to
    photo_collections = collections[collections['photo_id'] == photo_id]
    print(f'Collections: {", ".join(list(photo_collections["collection_title"]))}')
    print()
    
    # Display the colors associated with the photo
    photo_colors = colors[colors['photo_id'] == photo_id]
    print(f'Colors: {", ".join(list(photo_colors["keyword"]))}')

# Inspect a Photo

In [190]:
show_photo_data('JOFKIzygu70')


User: nan
AI: tipi tent on snowfield near trees during night

Downloads: 10122
Views: 111369
Ratio: 9.09%

Keywords User: 
Keywords AI: night, outdoors, cloud, nature, tent, aurora, atmosphere, sky, tree, natural landscape

Conversions:
  52: aurora
   8: night tent
   5: aurora boreal
   4: tromsø
   3: aurora borealis
   3: tent at night
   3: tent night
   2: norway night
   2: norway camping
   2: norway aurora
   2: norway
   1: aurora burealis
   1: camping night
   1: putting up a tent
   1: camping
   1: camping tent night
   1: aurora camping
   1: the aurora
   1: aurora borealis bright
   1: aurora blue
   1: monschau
   1: aurora nordic
   1: norway tent
   1: norway mountant
   1: auroras boreales
   1: blue aurora
   1: glamping
   1: aurora norway
   1: camping aurora
   1: boreal aurora

Collections: Night Sky, AWESOME PLACES, Snow, aurorean sky, 클로버게임 < 010-6847-8990 > 카톡 : 2400hun / 배터리게임, My collection, Northern Lights, The Night Sky, Cultural, Spacey Wacey, whereve