# Creat instagram post list

This notebook creates a csv with information on X amount of Flickr images per places in our database. It does so in 3 steps:

1. For each destination, query the top X most interesting images from Flickr. 
2. For each author found, query people info to know more about the author.
3. For each place, query the wikivoyage place url for a quick link to more info.

All (intermediate) query results are saved so that we don't need to query again.

In this first run, we query the top 20 images per place. We better have the data, and then we can always do the image generation only for the top 5.

In [None]:
TOP_X_IMAGES = 20

output_dir = '../../data/flickr/'

### Init

In [None]:
%load_ext autoreload
%autoreload 2

In [None]:
from dotenv import load_dotenv
load_dotenv()

import os
import json
import requests
import pandas as pd

from stairway.apis.flickr.photos import get_flickr_images, create_image_url, create_attribution_url
from stairway.apis.flickr.people import get_flickr_people_info, parse_flickr_people_info
from stairway.apis.wikivoyage.page import get_wikivoyage_page_info

In [None]:
FLICKR_KEY = os.getenv('FLICKR_KEY')

### Read data

In [None]:
data_dir = '../../data/wikivoyage/enriched/'

file_name = 'wikivoyage_destinations.csv'

In [None]:
df = (
    pd.read_csv(data_dir + file_name)
    .rename(columns={'id': 'stairway_id'})
    .set_index("stairway_id", drop=False)
    [['stairway_id', 'name', 'country', 'nr_tokens', 'wiki_id']]
)
df.shape

## 1. Query the api and explode the DF

In [None]:
def find_flickr_images(df, nr_images=TOP_X_IMAGES):
    "Takes a df of a single row and explodes it into the nr of entries found"
    flickr_json = get_flickr_images(df["search_string"].iloc[0], 
                                    api_key=FLICKR_KEY, 
                                    images_per_page=nr_images)['photo']

    if len(flickr_json) > 0: 
        flickr_df = (
            pd.DataFrame(flickr_json)
            .assign(url_b = lambda df: create_image_url(df))
            .assign(image_url = lambda df: create_attribution_url(df))
    #         [['id', 'owner', 'title', 'image_url', 'ownername', 'url_b', 'url_o', 'height_o', 'width_o']]
        )

        repeated_df = (
            pd.concat([df]*len(flickr_json), ignore_index=True)
            .reset_index()
            [['stairway_id', 'index', 'name', 'country', 'nr_tokens', 'wiki_id']]
        )
        
        df_out = pd.concat([repeated_df, flickr_df], axis=1)
    else: 
        df_out = None
    return df_out


In [None]:
df_images = (
    df
    .reset_index(drop=True)
    .assign(search_string = lambda df: df['name'] + ' ' + df['country'])
    .groupby('stairway_id')
    .apply(find_flickr_images)
    .reset_index(drop=True)
)
df_images.shape

In [None]:
df_images.to_csv(output_dir + 'flickr_top5_images_per_place.csv', index=False)

## 2. Get Flickr people info

Use https://www.flickr.com/services/api/flickr.people.getInfo.html

First deduplicate the authors from the image list, then retrieve info and join back to avoid querying a single author multiple times. 

In [None]:
# user = '12962905@N05'  #kevinpoh
# user = '61713368@N07'  #tiket2

# output = get_flickr_people_info(user, api_key=FLICKR_KEY)
# output = parse_flickr_people_info(output)

# output

In [None]:
df_people = pd.DataFrame(
    [parse_flickr_people_info(get_flickr_people_info(author, api_key=FLICKR_KEY))
     for author in 
#      df_images['owner'].unique()
     df_images['owner'].drop_duplicates()
    ])


df_people = df_images[['owner']].drop_duplicates(ignore_index=True).join(df_people) 

df_people.shape

In [None]:
df_people.to_csv(output_dir + 'flickr_top5_images_per_place_owners.csv', index=False)

Now join the people table with the image table

In [None]:
df_all = (
    df_images
    .merge(df_people, on='owner')
)
df_all.shape

## 3. Add link to wiki travel for ease of use

Use `wiki_id` of course

In [None]:
# data = get_wikivoyage_page_info(33)

# data['fullurl']

In [None]:
def get_wikivoyage_fullurl(wiki_id):
    data = get_wikivoyage_page_info(wiki_id)
    return data['fullurl']

In [None]:
df_wikiurls = df['wiki_id'].apply(get_wikivoyage_fullurl).to_frame(name='wiki_url').reset_index()

df_wikiurls.shape

In [None]:
df_wikiurls

In [None]:
df_wikiurls.to_csv(output_dir + 'wikivoyage_place_urls.csv', index=False)

Now join the wiki links with the image table

In [None]:
df_all = (
    df_all
    .merge(df_wikiurls, on='stairway_id')
)
df_all.shape

## Dump the final list to file

In [None]:
df_all.to_csv(output_dir + 'instagram_post_list_full.csv', index=False)

Last step, is making a nice subselection of the variables and putting them in de right order for the overview.

In [None]:
column_order = ['stairway_id', 'index', 'name', 'country', 'nr_tokens', 'title', 'ownername', 
                'realname', 'path_alias', 'location', 'profileurl', 'image_url', 'wiki_url']
column_rename = {'title': 'image_title', 'path_alias': 'owner_tag', 'location': 'owner_location'}

(
    df_all
    .loc[lambda df: df['index'] < 5]
    [column_order]
    .rename(columns=column_rename)
    .to_csv(output_dir + 'instagram_post_list.csv', index=False)
)

Done.