# Create instagram post list

This notebook creates a csv with information on X amount of Flickr images per places in our database. It does so using scripts for the following 3 steps:

1. For each destination, query the top X most interesting images from Flickr. 
2. For each author found, query people info to know more about the author.
3. For each place, query the wikivoyage place url for a quick link to more info.

All (intermediate) query results are saved so that we don't need to query again.

In this first run, we query the top 20 images per place. We better have the data, and then we can always do the image generation only for the top 5.

In [None]:
%load_ext autoreload
%autoreload 2

In [None]:
data_dir = '../../data/'

TOP_X_IMAGES = 5

### Init

In [None]:
import pandas as pd

from stairway.apis.flickr.people import get_flickr_people_info, parse_flickr_people_info
from stairway.apis.wikivoyage.page_info import get_wikivoyage_page_info
from stairway.instagram.description import pick_generic_hastags, clean_text, create_draft_description

### Read place data

In [None]:
# places_file = data_dir + 'wikivoyage/enriched/wikivoyage_destinations.csv'

# df = (
#     pd.read_csv(places_file)
#     .rename(columns={'id': 'stairway_id'})
#     .set_index("stairway_id", drop=False)
#     [['stairway_id', 'name', 'country', 'nr_tokens', 'wiki_id']]
# )
# df.shape

## 1. Query the api and explode the DF

Implementation moved to `scripts/flickr_image_list.py`.

In [None]:
df_images = pd.read_csv(data_dir + 'flickr/flickr_image_list.csv')
df_images.shape

## 2. Get Flickr people info

Use https://www.flickr.com/services/api/flickr.people.getInfo.html

First deduplicate the authors from the image list, then retrieve info and join back to avoid querying a single author multiple times. 

In [None]:
# user = '12962905@N05'  #kevinpoh
# user = '61713368@N07'  #tiket2
# user = '143466180@N07'  # removed user

# output = get_flickr_people_info(user, api_key=FLICKR_KEY)
# output = parse_flickr_people_info(output)

# output

Implementation moved to `scripts/flickr_people_list.py`.

In [None]:
df_people = pd.read_csv(data_dir + "flickr/flickr_people_list.csv").drop_duplicates()
df_people.shape

## 3. Add link to wiki travel for ease of use

Use `wiki_id` and query up to 50 wiki_ids at the same time.

Implementation moved to `scripts/wikivoyage_page_info.py`.

In [None]:
# data = get_wikivoyage_page_info([10, 33, 36])

# [v['fullurl'] for k, v in data.items()]

In [None]:
df_wiki_info = pd.read_csv(data_dir + "wikivoyage/clean/wikivoyage_page_info.csv").drop_duplicates()
df_wiki_info.shape

## Join all tables together

Now join the people table with the image table

In [None]:
df_all = (
    df_images
    .merge(df_people, left_on='owner', right_on="nsid")
)
df_all.shape

Now join the wiki links with the image table

In [None]:
df_all = (
    df_all
    # there is a conflicting 'title' in the image dataset
    .merge(df_wiki_info.drop(columns=['title']), left_on='wiki_id', right_on='pageid')
)
df_all.shape

## 4. Add a draft description to get started

In [None]:
df_all['draft_text'] = df_all.apply(lambda df: create_draft_description(df['name'], df['country'], df['path_alias']), axis=1)
df_all.shape

## Dump the final list to file

Last step, is making a nice subselection of the variables and putting them in de right order for the overview.

In [None]:
column_rename = {'index': 'image_nr',
                 'title': 'image_title', 
                 'path_alias': 'owner_tag', 
                 'location': 'owner_location', 
                 'fullurl': 'wiki_url'}
column_order = ['stairway_id', 'index', 'name', 'country', 'article_length', 'title', 'ownername', 
                'realname', 'path_alias', 'location', 'profileurl', 'image_url', 'fullurl', 'draft_text']

df_insta = (
    df_all        
    .assign(article_length = lambda df: df['length'].astype(int))
    .loc[lambda df: df['index'] < TOP_X_IMAGES]
    [column_order]
    .rename(columns=column_rename)
)
df_insta.shape

In [None]:
df_insta.to_csv(data_dir + 'instagram/instagram_post_list.csv', index=False)

Next steps: 
1. Import the CSV into Google Spreadsheets
2. Process the images in this list and upload them in Google Drive.

Done.