This notebook is dedicated to scrapping Instagram data on the basis of a hashtag. It is freely based upon a tutorial [here](https://medium.com/@kseniatikhomirova/scrap-instagram-locations-with-python-d48ba6e56ebc). 

I have developed it with the hashtag "vanlife", since me and my wife have been interested in geography of this phenomenon.

In [2]:
# import prerequiesities

import pandas as pd
import sys

### we do a lot of requests during the scrapping. Some of them with requests package, some of them with urllib
import requests
from urllib.request import urlopen 
from urllib.parse import quote  
from bs4 import BeautifulSoup

# to avoid errors, we sometime use time.sleep(N) before retrying a request
import time
# the input data have typically a json structure
import json
import ast

import datetime as dt
import googlemaps


from concurrent.futures import ThreadPoolExecutor

import sddk
import gspread
from gspread_dataframe import get_as_dataframe, set_with_dataframe

# Googlemaps API & ScienceData

Feel free to skip this section for now and return to it later, once you have your own GoogleServiceAccount and know how to load it into your python environment.

If you want to use googlemaps API to retrieve geolocation coordinates for individual post, you need to feed your code with your Google Service Account key. There is a lot of tutorials how to get this key and how to get it to your python environment. The code below reads my Google Service Accounts Key from my personal folder on sciencedata.dk. Therefore, I have to login there first.

In [3]:
# feel free to skip if you dont have sciencedata account
conf = sddk.configure()

sciencedata.dk username (format '123456@au.dk'): 648597@au.dk
sciencedata.dk password: ········
endpoint variable has been configured to: https://sciencedata.dk/files/


In [53]:
# if you want to use google maps api to retrieve geolocation coordinates for individual post
# you need to feed

google maps - read the key from sciencedata.
key = sddk.read_file("Google_API_key.txt", "str", conf)
gmaps = googlemaps.Client(key=key)

In [None]:
# further, to access gsheet, you need Google Service Account key json file
# I have mine located in my personal space on sciencedata.dk, so I read it from there:

# (1) read the file and parse its content
file_data = conf[0].get(conf[1] + "ServiceAccountsKey.json").json()
# (2) transform the content into crendentials object
credentials = service_account.Credentials.from_service_account_info(file_data)
# (3) specify your usage of the credentials
scoped_credentials = credentials.with_scopes(['https://spreadsheets.google.com/feeds', 'https://www.googleapis.com/auth/drive'])
# (4) use the constrained credentials for authentication of gspread package
gc = gspread.Client(auth=scoped_credentials)
# (5) establish connection with spreadsheets specified by their url
#PIA_data = gc.open_by_url()

# simple test

In [15]:
# count hashtag instances
hashtag = "datascience"
url = "https://www.instagram.com/explore/tags/" + hashtag + "/?__a=1"
resp = requests.get(url)

In [21]:
# look at response headers
resp.headers

{'Content-Type': 'application/json; charset=utf-8', 'x-robots-tag': 'noindex', 'Vary': 'Accept-Language, Cookie, Accept-Encoding', 'Content-Language': 'en', 'Date': 'Mon, 26 Oct 2020 09:40:38 GMT', 'Content-Encoding': 'gzip', 'Strict-Transport-Security': 'max-age=31536000', 'Cache-Control': 'private, no-cache, no-store, must-revalidate', 'Pragma': 'no-cache', 'Expires': 'Sat, 01 Jan 2000 00:00:00 GMT', 'X-Frame-Options': 'SAMEORIGIN', 'content-security-policy': "report-uri https://www.instagram.com/security/csp_report/; default-src 'self' https://www.instagram.com; img-src data: blob: https://*.fbcdn.net https://*.instagram.com https://*.cdninstagram.com https://*.facebook.com https://*.giphy.com; font-src data: https://*.fbcdn.net https://*.instagram.com https://*.cdninstagram.com; media-src 'self' blob: https://www.instagram.com https://*.cdninstagram.com https://*.fbcdn.net; manifest-src 'self' https://www.instagram.com; script-src 'self' https://instagram.com https://www.instagram.

In [20]:
# look at textual content of the response
resp.text



In [22]:
# parse the json structured textual content into python dictionary object
data = json.loads(resp.text)
data

{'graphql': {'hashtag': {'id': '17843356603020274',
   'name': 'datascience',
   'allow_following': False,
   'is_following': False,
   'is_top_media_only': False,
   'profile_pic_url': 'https://scontent-arn2-1.cdninstagram.com/v/t51.2885-15/e35/c0.0.787.787a/s150x150/122105529_205920140919222_1829786485327439492_n.jpg?_nc_ht=scontent-arn2-1.cdninstagram.com&_nc_cat=106&_nc_ohc=mzcgn7BSuHAAX-onlyM&tp=16&oh=7cb84e3543f62cccb3bb002141b07df9&oe=5FC2152F',
   'edge_hashtag_to_media': {'count': 652007,
    'page_info': {'has_next_page': True,
     'end_cursor': 'QVFBUTJxNlBsbUUyWERDMVRfZnprcHBXZlNwUmJEMTFOczlYUi1RamU5b3ptMjNnU3RGZF9Sd3QwN2hDTDJjbkJneU1Zb3VxNURPUzFCWFlVWmtRMEFUWg=='},
    'edges': [{'node': {'comments_disabled': False,
       '__typename': 'GraphImage',
       'id': '2428377382718918212',
       'edge_media_to_caption': {'edges': [{'node': {'text': 'ETL vs ELT ¿Cuáles son sus diferencias?\n\nExploramos las diferencias y similitudes entre ETL y ELT como métodos para integrar 

In [24]:
data["graphql"]["hashtag"].keys()

dict_keys(['id', 'name', 'allow_following', 'is_following', 'is_top_media_only', 'profile_pic_url', 'edge_hashtag_to_media', 'edge_hashtag_to_top_posts', 'edge_hashtag_to_content_advisory', 'edge_hashtag_to_related_tags', 'edge_hashtag_to_null_state'])

In [30]:
# actual posts
data['graphql']['hashtag']['edge_hashtag_to_media']["edges"]

[{'node': {'comments_disabled': False,
   '__typename': 'GraphImage',
   'id': '2428377382718918212',
   'edge_media_to_caption': {'edges': [{'node': {'text': 'ETL vs ELT ¿Cuáles son sus diferencias?\n\nExploramos las diferencias y similitudes entre ETL y ELT como métodos para integrar y mover grandes cantidades de datos para BIG DATA.\n\naprenderbigdata.com'}}]},
   'shortcode': 'CGzVF0Njc5E',
   'edge_media_to_comment': {'count': 1},
   'taken_at_timestamp': 1603705164,
   'dimensions': {'height': 608, 'width': 1080},
   'display_url': 'https://scontent-arn2-1.cdninstagram.com/v/t51.2885-15/e35/122940099_170492501375211_6528399052552387040_n.jpg?_nc_ht=scontent-arn2-1.cdninstagram.com&_nc_cat=103&_nc_ohc=FriaH3Aa5IEAX-0m-fy&_nc_tp=18&oh=46a2eac4e0aa786a62a0ad69ae51ced7&oe=5FBF1EA6',
   'edge_liked_by': {'count': 2},
   'edge_media_preview_like': {'count': 2},
   'owner': {'id': '7636157711'},
   'thumbnail_src': 'https://scontent-arn2-1.cdninstagram.com/v/t51.2885-15/e35/c236.0.608.6

In [9]:
len(data['graphql']['hashtag']['edge_hashtag_to_media']["edges"])

62

# Collecting end_cursors

In [28]:
def request_for_next_page(url): 
    r = requests.get(url)
    try:
        data = json.loads(r.text)
        end_cursor = data['graphql']['hashtag']['edge_hashtag_to_media']['page_info']['end_cursor']
    except:
        problem = "problem"
        print(problem)
        n = 0
        while (n <= 3 and problem == "problem"):  
            time.sleep(1)
            try: 
                r = requests.get(url)
                data = json.loads(r.text)
                end_cursor = data['graphql']['hashtag']['edge_hashtag_to_media']['page_info']['end_cursor']
                problem = "problem solved"
            except:
                n = n+1
    try:
        return end_cursor
    except:
        return "not-found"

In [29]:
%%time
end_cursors = []

n_of_pages = 100
hashtag = "datascience"
raw_url =  "https://www.instagram.com/explore/tags/" + hashtag + "/?__a=1"

end_cursor = ""

for n_page in range(n_of_pages):
    if len(end_cursors) > 0:
        url = raw_url + "&max_id=" + end_cursors[-1] # use the last end cursor
    else:
        url = raw_url
    actual_end_cursor = request_for_next_page(url)
    if actual_end_cursor !="not-found":
        end_cursors.append(actual_end_cursor)# value for the next page
    else:
        break

CPU times: user 2.49 s, sys: 91.5 ms, total: 2.58 s
Wall time: 2min 54s


In [None]:
# always change name to by increasing the number
#sddk.write_file("instagram_webscraping/end_cursors_3.json", end_cursors, conf)

# Define crucial functions

In [13]:
# simple test of gmaps api
# you need a billing setup to use it
# DANGEROUS - might become quit costly!!!

gplace = gmaps.geocode("chotikov")[0]
coordinates = gplace["geometry"]["location"]
g_loc_type = gplace["types"]
coordinates

NameError: name 'gmaps' is not defined

In [62]:
def mine_the_post(actual_url):
    post_data = json.loads(requests.get(actual_url).content)
    post_mined = {}
    try:
        post_mined["location_slug"] = post_data['graphql']['shortcode_media']['location']['slug']
        address_json = post_data['graphql']['shortcode_media']['location']['address_json'].replace("false", "False").replace("true", "True") 
        address_json = ast.literal_eval(address_json.replace("false", "False").replace("true", "True"))
        post_mined["country_code"] = address_json["country_code"]
        for loc_type in ["city", "region", "country"]:
            if address_json['exact_' + loc_type + "_match"] == True:
                post_mined["i_loc_type"] = loc_type
                break
        try: 
            gplace = gmaps.geocode(post_mined["location_slug"])[0]
            post_mined["coordinates"] = gplace["geometry"]["location"]
            post_mined["g_loc_type"] = gplace["types"]
        except: 
            pass
    except:
        pass
        #post["location_slug" = ""
        #coordinates = None
        #i_loc_type = None
        #g_loc_type = None
    try:
        timestamp = post_data['graphql']['shortcode_media']['taken_at_timestamp']
        post_mined["timestamp"] = dt.datetime.fromtimestamp(int(timestamp)).strftime('%Y-%m-%d %H:%M:%S')
    except: pass 
    return post_mined
    #return [timestamp, location_slug, address_json, i_loc_type, , coordinates, g_loc_type] #, coordinates] #, lat, lon]

def deEmojify(inputString): # from here: https://stackoverflow.com/questions/33404752/removing-emojis-from-a-string-in-python
  return inputString.encode('ascii', 'ignore').decode('ascii')

def get_post_info(item):
    post = {}
    item_node_shortcode = item["node"]['shortcode']
    post_url = "https://www.instagram.com/p/" + item_node_shortcode + "/?__a=1"
    post["end_cursor"] = end_cursor
    post["url"] = post_url.partition("?__a=1")[0]
    try: 
        text = item['node']['edge_media_to_caption']['edges'][0]['node']['text'].replace("\n", " ")
        post["text"] = deEmojify(text)
    except: text = ""
    hashtags = []
    for word in text.split():
        if word.startswith("#"):
            hashtags.append(word.partition("#")[2])
    post["hashtags"] = hashtags
    try: # produce a list of potential object on the picture
        caption = item['node']['accessibility_caption'].partition("contain: ")[2].split(", ")
        post["caption"] = caption[:-1] + caption[-1].split(" and ")
    except: pass
    post["likes"] = item['node']['edge_liked_by']["count"]
    #basic_data = [end_cursor, post_url.partition("?__a=1")[0], text, hashtags, caption, likes]
    try:
        post.update(mine_the_post(post_url))
    except:
        time.sleep(2)
        try: 
            post.update(mine_the_post(post_url))
        except:
            pass
    return post

def get_edges(url_address):
  try: 
    r = requests.get(url_address)
    data = json.loads(r.text)
    edges = data['graphql']['hashtag']['edge_hashtag_to_media']['edges']
    return edges
  except: 
    try:
      time.sleep(3)
      r = requests.get(url_address)
      data = json.loads(r.text)
      edges = data['graphql']['hashtag']['edge_hashtag_to_media']['edges'] # list with posts
      return edges
    except:
      return "no edges"

In [63]:
actual_url = "https://www.instagram.com/p/CETwJWGpuv9/?__a=1"
post_data = json.loads(requests.get(actual_url).content)
post_data['graphql']['shortcode_media']

{'__typename': 'GraphImage',
 'id': '2383460376250870781',
 'shortcode': 'CETwJWGpuv9',
 'dimensions': {'height': 937, 'width': 750},
 'gating_info': None,
 'fact_check_overall_rating': None,
 'fact_check_information': None,
 'sensitivity_friction_info': None,
 'media_overlay_info': None,
 'media_preview': 'ACEq0yuRxUMbloy/dev1FVo2kLj5uO45zx74xV8L8jLxyPw5/KsrmlrCCRD13D8v8aVQMZ9Saoz3WVKbkyeBgHsR/OiNpNvJPI49PwobsFrl6iqG9vU0UuYfKE+4N8mSc/w+nsOegNWlICsjnPGDng/5xUGSCAvUck/0/wAaZncxzye9N6ai3JZI7eP+HcPdjx/Ooo5DJGP7ozj6HtS+agAzyrHHbmovLcSY/hOcY4+mfQCnuuwbeYvlP6j86Kt/Zn/vL+Z/woo17C07jxGucjjNSbUxioh3px6/lVkibVBwKaVXvn0601uv4UN0qRhhff8AOikooA//2Q==',
 'display_url': 'https://scontent-arn2-1.cdninstagram.com/v/t51.2885-15/e35/118166396_518776635599619_7755558974607652392_n.jpg?_nc_ht=scontent-arn2-1.cdninstagram.com&_nc_cat=103&_nc_ohc=CrAmHJyYrxoAX_AMbwk&oh=1b02502b105bdcf60e190318d986843c&oe=5F70E2AB',
 'display_resources': [{'src': 'https://scontent-arn2-1.cdninstagram.com/v/t51.2885-15/sh0.08/

# Scrape end_cursors and post data at the same time

In [111]:
sciencedata_folder = "instagram_webscraping" # location on sciencedata where we already have some or plan to have our scraped data

In [112]:
# try to load the latest data to begin with the latest end_cursor we already scraped

try:
    # if we already have some endcursors data, we will start with them
    highest_file_number = max([int(fn.rpartition("_")[2].partition(".")[0]) for fn in sddk.list_filenames("instagram_webscraping", "", conf) if "posts_raw" in fn])
    latest_filename = "posts_raw_" + str(highest_file_number) + ".json"
    print("latest file: " + latest_filename)
    last_posts_raw = sddk.read_file(sciencedata_folder + "/" + latest_filename, "df", conf)
    last_end_cursor = last_posts_raw.tail(1)["end_cursor"].values[0]
    last_url = raw_url + "&max_id=" +  last_end_cursor
    # get the new one to start with
    end_cursor = request_for_next_page(last_url)
    n = highest_file_number
except: 
    end_cursor = ""
    n = 0

latest file: posts_raw_2750.json


In [115]:
last_posts_raw.tail(3)

Unnamed: 0,end_cursor,url,text,hashtags,caption,likes,location_slug,country_code,i_loc_type,coordinates,g_loc_type,timestamp
3495,QVFDRFJZa0x4Z0ttaDc1MThJRXFIMzZHRHROWWJiYVVOME...,https://www.instagram.com/p/CDOdUbAhNVu/,"Slowly the EU started to re-open, and we decid...","[vanlife, vwt4hightop, vwt4camper, husbil, van...","[1 person, tree, car, house, outdoor.]",74,stockholm-sweden,SE,city,"{'lat': 59.3293235, 'lng': 18.0685808}","[locality, political]",2020-07-29 12:25:13
3496,QVFDRFJZa0x4Z0ttaDc1MThJRXFIMzZHRHROWWJiYVVOME...,https://www.instagram.com/p/CC_cSGohiXb/,After buying her in the afternoon we packed so...,"[vanlife, vwt4hightop, vwt4camper, husbil, van...","[sky, twilight, ocean, outdoor, nature, water.]",33,stockholm-sweden,SE,city,"{'lat': 59.3293235, 'lng': 18.0685808}","[locality, political]",2020-07-23 16:27:33
3497,QVFDRFJZa0x4Z0ttaDc1MThJRXFIMzZHRHROWWJiYVVOME...,https://www.instagram.com/p/CC_K3VGB42X/,This is when we first bought Sally the Swaussi...,"[vanlife, vwt4hightop, vwt4camper, husbil, van...","[one or more people, outdoor.]",35,,,,,,2020-07-23 13:55:21


In [113]:
end_cursor

'QVFDQnhfZFh1T2Z3STl4Qm9hajhGRVF3bVNsWXZMUnFUaTB6dy1mcWZud0gxbVVWMk1PRnhJM0R0bV9zVnVUY21TZkl6WEVlNmV5aFlmTHpSQ2hCVEJtTA=='

In [None]:
%%time

n_of_pages = 50000
num_checks = [num for num in range(0, n_of_pages, 50)]

hashtag = "vanlife"
raw_url =  "https://www.instagram.com/explore/tags/" + hashtag + "/?__a=1"

actual_data = []
end_cursors = []

for n_page in range(n + 1, n_of_pages):
    if len(end_cursor) > 0: # if we already have an end_cursor to start with (see the previous cell)
        url = raw_url + "&max_id=" +  end_cursor # use the last end cursor
    else: # if we start from scratch
        url = raw_url
    #url = "https://www.instagram.com/explore/tags/{0}/?__a=1&max_id={1}".format(hashtag, actual_end_cursor)
    edges = get_edges(url)
    if edges != "no edges":
        with ThreadPoolExecutor(max_workers=75) as pool:
            current_parsed_edges = list(pool.map(get_post_info,edges))
        actual_data.extend(current_parsed_edges)
    n = n+1
    if n in num_checks:
        print(n)
        actual_data_df = pd.DataFrame(actual_data)
        sddk.write_file("instagram_webscraping/posts_raw_" + str(n) + ".json", actual_data_df, conf)
        actual_data = [] # clear actual data
    
    end_cursor = request_for_next_page(url)
    if end_cursor !="not-found":
        end_cursors.append(end_cursor)# value for the next page
    else:
        break

# Start with preprocessed end_cursor data

In [1]:
%%time
num_checks = [num for num in range(0,len(end_cursors), 50)]
actual_data = []
n = 1
tag = 'vanlife' # your tag
for end_cursor in end_cursors:
    url = "https://www.instagram.com/explore/tags/{0}/?__a=1&max_id={1}".format(tag, end_cursor)
    edges = get_edges(url)
    if edges != "no edges":
      with ThreadPoolExecutor(max_workers=75) as pool:
        current_parsed_edges = list(pool.map(get_post_info,edges))
      actual_data.extend(current_parsed_edges)
    n = n+1
    if n in num_checks:
        print(n)
        print("latest end_cursor: " + end_cursor)
        actual_data_df = pd.DataFrame(actual_data)
        sddk.write_file("instagram_webscraping/posts_raw_" + str(n) + ".json", actual_data_df, conf)
        actual_data = [] # clear actual data
      #if len(actual_data) >= 5000:
      #  actual_data_df = pd.DataFrame(actual_data)
      #  sddk.write_file("instagram_webscraping/posts_raw_" + str(n) + ".json", actual_data_df, conf)
      #  n = n+1
      #  actual_data = [] # clear actual data
        
### export our last data as well 
actual_data_df = pd.DataFrame(actual_data)
sddk.write_file("instagram_webscraping/posts_raw_" + str(n) + ".json", actual_data_df, conf)#
#sddk.write_file("instagram_webscraping/posts_raw_20200826.json", actual_data_df, conf)

NameError: name 'end_cursors' is not defined

In [144]:
actual_data_df

Unnamed: 0,end_cursor,url,text,hashtags,caption,likes,timestamp,location_slug,country_code,coordinates,g_loc_type,i_loc_type
0,QVFCT1dLZGtyQksyekRPUlJ3LVlaTFA0WFdUQ010OE9yNl...,https://www.instagram.com/p/CETwOAgILJK/,Lugares de descanso y pernocta que marcan la d...,"[diasentremontañas, anayruben, subetealpaisaje...","[mountain, sky, outdoor, nature.]",39,2020-08-25 10:17:58,,,,,
1,QVFCT1dLZGtyQksyekRPUlJ3LVlaTFA0WFdUQ010OE9yNl...,https://www.instagram.com/p/CETwMNxJWG6/,Pippa's first holibob...and our first camping ...,"[lakedistrict, t5, camping, campervan, hoilday...","[mountain, sky, outdoor, nature.]",31,2020-08-25 10:17:43,lake-district,GB,"{'lat': 35.225172, 'lng': -89.7312158}","[establishment, point_of_interest, shopping_mall]",
2,QVFCT1dLZGtyQksyekRPUlJ3LVlaTFA0WFdUQ010OE9yNl...,https://www.instagram.com/p/CETwJWGpuv9/,#vanlife #camper #vwt #camping #homeiswhereyou...,"[vanlife, camper, vwt, camping, homeiswhereyou...",[indoor.],65,2020-08-25 10:17:19,los-angeles-california,US,"{'lat': 34.0522342, 'lng': -118.2436849}","[locality, political]",city
3,QVFCT1dLZGtyQksyekRPUlJ3LVlaTFA0WFdUQ010OE9yNl...,https://www.instagram.com/p/CETwJONA_9T/,Back to work today and missing the sea views b...,"[vanlife, vanlifediaries, vanlifeuk, t4, vwt4,...","[one or more people, ocean, sky, cloud, outdoo...",56,2020-08-25 10:17:18,wales,GB,"{'lat': 52.1306607, 'lng': -3.7837117}","[administrative_area_level_1, political]",region
4,QVFCT1dLZGtyQksyekRPUlJ3LVlaTFA0WFdUQ010OE9yNl...,https://www.instagram.com/p/CETwIInpCsg/,Being back in Norway meant shitty weather and ...,"[norway, hike, waterfall, lake, travel, view, ...","[mountain, sky, outdoor, nature.]",19,2020-08-25 10:17:09,rago-national-park,,"{'lat': 67.4385104, 'lng': 16.0056756}","[establishment, park, point_of_interest, touri...",
...,...,...,...,...,...,...,...,...,...,...,...,...
350,QVFCVEFIVU4tWDluVjZuVXN4RHhjd1ZpREh1VWpJQ0ZhZ0...,https://www.instagram.com/p/CETqvCjJyDh/,First #roadtrip with @2beealives sister. The r...,"[roadtrip, vanlife, vanlifegermany, vanlifedia...","[sky, tree, plant, grass, outdoor, nature.]",35,2020-08-25 09:30:02,pirna-germany,DE,"{'lat': 50.9625175, 'lng': 13.9419168}","[locality, political]",city
351,QVFCVEFIVU4tWDluVjZuVXN4RHhjd1ZpREh1VWpJQ0ZhZ0...,https://www.instagram.com/p/CETqpYygRM2/,Kings Park Festival Australian Events www.roa...,"[roadtrip, australia, roadiii, vanlife, touris...","[flower, nature.]",13,2020-08-25 09:29:16,,,,,
352,QVFCVEFIVU4tWDluVjZuVXN4RHhjd1ZpREh1VWpJQ0ZhZ0...,https://www.instagram.com/p/CETqVgbAPJY/,Kings Park Festival Australian Events www.ro...,"[roadtrip, australia, roadiii, vanlife, touris...","[plant, flower, nature, outdoor.]",11,2020-08-25 09:26:33,,,,,
353,QVFCVEFIVU4tWDluVjZuVXN4RHhjd1ZpREh1VWpJQ0ZhZ0...,https://www.instagram.com/p/CERNpqgipCD/,Parthenay,[],"[outdoor, nature.]",39,2020-08-24 10:37:25,parthenay,FR,"{'lat': 46.648825, 'lng': -0.251441}","[locality, political]",city


# To begin with parsed data

In [83]:
data_parsed_df = sddk.read_file("instagram_webscraping/posts_raw_1.json", "df", conf)
data_parsed_df.head(30)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10
0,QVFCT1dLZGtyQksyekRPUlJ3LVlaTFA0WFdUQ010OE9yNl...,https://www.instagram.com/p/CETwOAgILJK/,Lugares de descanso y pernocta que marcan la d...,"[diasentremontaas, anayruben, subetealpaisaje,...","[mountain, sky, outdoor, nature.]",30,2020-08-25 10:17:58,,,,"[, ]"
1,QVFCT1dLZGtyQksyekRPUlJ3LVlaTFA0WFdUQ010OE9yNl...,https://www.instagram.com/p/CETwMNxJWG6/,Pippa's first holibob...and our first camping ...,"[lakedistrict, t5, camping, campervan, hoilday...","[mountain, sky, outdoor, nature.]",23,2020-08-25 10:17:43,GB,"{'id': '113392233373458', 'has_public_page': T...",,"[, ]"
2,QVFCT1dLZGtyQksyekRPUlJ3LVlaTFA0WFdUQ010OE9yNl...,https://www.instagram.com/p/CETwJWGpuv9/,#vanlife #camper #vwt #camping #homeiswhereyou...,"[vanlife, camper, vwt, camping, homeiswhereyou...",[indoor.],55,2020-08-25 10:17:19,US,"{'id': '212999109', 'has_public_page': True, '...",,"[, ]"
3,QVFCT1dLZGtyQksyekRPUlJ3LVlaTFA0WFdUQ010OE9yNl...,https://www.instagram.com/p/CETwJONA_9T/,Back to work today and missing the sea views b...,"[vanlife, vanlifediaries, vanlifeuk, t4, vwt4,...","[one or more people, ocean, sky, cloud, outdoo...",36,2020-08-25 10:17:18,GB,"{'id': '258199373', 'has_public_page': True, '...",,"[, ]"
4,QVFCT1dLZGtyQksyekRPUlJ3LVlaTFA0WFdUQ010OE9yNl...,https://www.instagram.com/p/CETwIInpCsg/,Being back in Norway meant shitty weather and ...,"[norway, hike, waterfall, lake, travel, view, ...","[mountain, sky, outdoor, nature.]",14,2020-08-25 10:17:09,,"{'id': '362528469', 'has_public_page': True, '...",,"[, ]"
5,QVFCT1dLZGtyQksyekRPUlJ3LVlaTFA0WFdUQ010OE9yNl...,https://www.instagram.com/p/CETwHu_jlIo/,Visite de Rocamadour au top !! #rocamadour #d...,"[rocamadour, decouverte, campingcar, campingca...","[cloud, sky, outdoor.]",13,2020-08-25 10:17:06,FR,"{'id': '250378211', 'has_public_page': True, '...",,"[, ]"
6,QVFCT1dLZGtyQksyekRPUlJ3LVlaTFA0WFdUQ010OE9yNl...,https://www.instagram.com/p/CETwGeMjO7O/,Golden hour is our happy hour This great cam...,"[InsiderTip, jaycoaustralia, camping, roadtrip...","[sky, outdoor, nature.]",18,2020-08-25 10:16:56,AU,"{'id': '236913728', 'has_public_page': True, '...",,"[, ]"
7,QVFCT1dLZGtyQksyekRPUlJ3LVlaTFA0WFdUQ010OE9yNl...,https://www.instagram.com/p/CETwGUxJKif/,Deutschland-Tour Tag 5-7: Am Abend des fnften...,"[mecklenburgvorpommern, mecklenburgischeseenpl...","[food, indoor.]",60,2020-08-25 10:16:55,,,,"[, ]"
8,QVFCT1dLZGtyQksyekRPUlJ3LVlaTFA0WFdUQ010OE9yNl...,https://www.instagram.com/p/CETwEV4pa-a/,A small sign of life from us - were fine . #c...,"[campershower, outdoorshower, campervan, campe...","[tree, outdoor, nature.]",111,2020-08-25 10:16:38,,,,"[, ]"
9,QVFCT1dLZGtyQksyekRPUlJ3LVlaTFA0WFdUQ010OE9yNl...,https://www.instagram.com/p/CETwCCdoOLr/,"()Wir sind nicht die Norm, wir sind die die an...","[frmehrrealittaufinstagram, formorerealityonin...","[sky, cloud, ocean, outdoor, water, nature.]",49,2020-08-25 10:16:19,NL,"{'id': '248392736', 'has_public_page': True, '...",,"[, ]"
