# Extracting and transforming data

## Goal

> Check which videos from Vimeo's staff picks are featured at the blog Motionographer.com.


## Libs

Notes
* Generate an APP
* Vimeo's API: https://developer.vimeo.com/api/guides/start
* Token: https://developer.vimeo.com/apps/168643#personal_access_tokens
* API wrapper pip install PyVimeo

In [36]:
import numpy as np
import pandas as pd

from bs4 import BeautifulSoup
from sqlalchemy import create_engine
from pandas.io.json import json_normalize
from tqdm.notebook import tqdm

import requests, json, multiprocessing, glob, datetime, re, math

pd.options.display.max_rows = 500
pd.options.display.max_columns = 500

In [37]:
def get_page(current_page):    

    headers = {"Authorization": "Bearer 6d797fb7512534142b202cc24aaab742"}
    endpoint = f'https://api.vimeo.com/channels/staffpicks/videos?page={current_page}&per_page=100'
    vimeo_page = requests.get(endpoint, headers=headers)
    page_content = vimeo_page.json()
    return page_content

def return_data(num_page):
    response = get_page(num_page)
    page_data = pd.json_normalize(response['data'])
    page_data.to_csv(f'./downloaded_pages/v_page_{num_page:0>4}.csv')
    print(f'Page {num_page:0>4} saved.')

In [38]:
first_page_response = get_page(1)
total_pages = math.ceil(first_page_response['total'] / first_page_response['per_page'])

## First function written (before using map for multiprocessing) 

### Deprecated

In [39]:
def all_pages():
    '''
    Takes current_page number. Returns dict.
    Return = {
        'paging': {
            'next': next page's uri or none
        }, 
        'data': [
            {},
            ...
        ]
    }
    '''

    response = get_page(1)
    response_df = pd.json_normalize(response['data'])
    pages_info = response['paging']
    total_pages = math.ceil(response['total'] / response['per_page'])

    for i in tqdm( range ( 1, ( total_pages + 1 ) ) ) :
        response = get_page(i)
        page_data = pd.json_normalize(response['data'])
        page_data.to_csv(f'./downloaded_pages/v_page_{i:0>4}.csv')
        print(f'Page {i:0>4} saved in disk.')


## Reading first page

In [40]:
# columns to filter
cols = ['name', 'link', 'duration', 'release_time', 'content_rating', 'tags', 
'categories', 'stats.plays', 'user.name', 'user.link', 'user.gender', 'user.websites', 
'user.account', 'user.websites', 'user.location_details.formatted_address','user.short_bio','user.skills', 'user.available_for_hire', 'user.location_details.latitude',
'user.location_details.longitude', 'user.location_details.city',
'user.location_details.state', 'user.location_details.neighborhood', 'user.location_details.sub_locality',
'user.location_details.state_iso_code', 'user.location_details.country',
'user.location_details.country_iso_code', 'width', 'height']


In [None]:
first_page = pd.read_csv('./downloaded_pages/v_page_0001.csv', usecols=cols)
# l = pd.json_normalize(first_page['tags'][1])
# type(first_page['tags'][1])
# first_page['tags'] = pd.to_numeric(first_page['tags'], errors='ignore')
# first_page['tags'][1]
# json.load(first_page['tags'])
json.loads(element for element in first_page['tags'])
# json.loads(var)
# first_page.dtypes
# usecols=cols
# 'tags', 'categories', 'user.websites', 'user.skills'

# release_time
# first_page.sample(15)

> Each .csv has 100 rows, corresponding to 100 videos, and 175 columns

In [17]:
list(first_page[cols])

['name',
 'link',
 'duration',
 'release_time',
 'content_rating',
 'tags',
 'categories',
 'stats.plays',
 'user.name',
 'user.link',
 'user.gender',
 'user.websites',
 'user.account',
 'user.websites',
 'user.location_details.formatted_address',
 'user.short_bio',
 'user.skills',
 'user.available_for_hire',
 'user.location_details.latitude',
 'user.location_details.longitude',
 'upload.size',
 'user.location_details.city',
 'user.location_details.state',
 'user.location_details.neighborhood',
 'user.location_details.sub_locality',
 'user.location_details.state_iso_code',
 'user.location_details.country',
 'user.location_details.country_iso_code',
 'width',
 'height']

In [34]:
first_page.sample(10)

Unnamed: 0,name,link,duration,release_time,content_rating,tags,categories,stats.plays,user.name,user.link,user.gender,user.short_bio,user.websites,user.location_details.formatted_address,user.location_details.latitude,user.location_details.longitude,user.skills,user.available_for_hire,user.account
30,VERT,https://vimeo.com/398274283,733,2020-03-17T16:20:07+00:00,['safe'],[],"[{'uri': '/categories/narrative', 'name': 'Nar...",158701.0,Kate Cox,https://vimeo.com/katecox,n,Kate&#039;s aim as a director is to unearth fe...,[],"London, UK",51.507351,-0.127758,[],False,pro
20,FLUT by Malte Stein,https://vimeo.com/399313424,595,2020-03-20T22:21:38+00:00,['language'],[],"[{'uri': '/categories/animation', 'name': 'Ani...",70025.0,maltestein,https://vimeo.com/user17714648,,,[],,,,[],False,basic
13,Magnetic Fields,https://vimeo.com/400100317,106,2020-03-24T02:05:38+00:00,['safe'],[],"[{'uri': '/categories/experimental', 'name': '...",6794.0,Benjamin Bardou,https://vimeo.com/benjaminbardou,n,► benjaminbardou.com ► benjaminbardou@gmail.co...,"[{'name': 'website', 'link': 'http://benjaminb...","Paris, France",48.856613,2.352222,"[{'uri': '/marketplace/skills/59', 'name': 'Fi...",True,plus
74,The Last Video Store,https://vimeo.com/arthurcauty/thelastvideostore,478,2020-02-22T17:04:04+00:00,['language'],"[{'uri': '/tags/documentary', 'name': 'documen...",[],34635.0,Arthur Cauty | Filmmaker,https://vimeo.com/arthurcauty,m,Multi award-winning filmmaker | inquiries: ac@...,"[{'name': ""Arthur's Official Website"", 'link':...","Bristol, UK",51.454514,-2.58791,"[{'uri': '/marketplace/skills/17', 'name': 'Di...",True,plus
96,S+C+A+R+R - The Rest Of My Days,https://vimeo.com/391501121,234,2020-02-14T14:36:00+00:00,['safe'],[],"[{'uri': '/categories/music', 'name': 'Music',...",18918.0,Passion Paris,https://vimeo.com/passionparis,n,Soci&eacute;t&eacute; ind&eacute;pendante de p...,"[{'name': 'Site Passion Paris', 'link': 'http:...",Paris,,,[],False,pro
51,GIRLFRIENDS,https://vimeo.com/395282487,1168,2020-03-03T20:25:31+00:00,['safe'],[],[],126542.0,Travelling distribution,https://vimeo.com/travellingdistribution,n,"For more than 10 years, Travelling has been re...","[{'name': None, 'link': 'www.travellingdistrib...","Trois-Rivières, Québec, Canada",,,[],False,pro
68,Zoe and Hanh,https://vimeo.com/393553415,536,2020-02-24T22:56:00+00:00,['safe'],"[{'uri': '/tags/comedy', 'name': 'Comedy', 'ta...","[{'uri': '/categories/narrative', 'name': 'Nar...",11582.0,Kim Tran,https://vimeo.com/kimtrantexas,,"Kim Tran is a writer, filmmaker and middle chi...","[{'name': 'Instagram', 'link': 'https://www.in...","Austin, TX, USA",30.267153,-97.743057,[],False,basic
33,JUTLAND II | Breath of the Seasons,https://vimeo.com/397912933,212,2020-03-16T07:59:44+00:00,['safe'],"[{'uri': '/tags/timelapse', 'name': 'timelapse...","[{'uri': '/categories/travel', 'name': 'Travel...",20523.0,Jonas Høholt,https://vimeo.com/jonashoholt,m,I bend and warp time and motion,"[{'name': 'Instagram', 'link': 'http://www.ins...","Aarhus, Danmark",56.162937,10.203921,"[{'uri': '/marketplace/skills/93', 'name': 'Ti...",True,plus
7,Jesse Jams,https://vimeo.com/400592143,951,2020-03-25T13:45:36+00:00,['safe'],[],"[{'uri': '/categories/documentary', 'name': 'D...",10483.0,Trevor Anderson,https://vimeo.com/trevoranderson,m,"Sundance Film Festival, Drumheller Prison, pla...","[{'name': 'Trevor Anderson Films', 'link': 'ht...",,,,[],False,plus
19,Quilt Fever,https://vimeo.com/399322718,945,2020-03-20T23:03:54+00:00,['safe'],"[{'uri': '/tags/quilt', 'name': 'quilt', 'tag'...","[{'uri': '/categories/documentary', 'name': 'D...",67259.0,Olivia Loomis Merrion,https://vimeo.com/oliviamerrion,f,"Filmmaker based in Oakland, CA","[{'name': 'oliviamerrion.com', 'link': 'http:/...","Oakland, CA, USA",,,"[{'uri': '/marketplace/skills/17', 'name': 'Di...",True,plus


# Sending Parallel Requests to Vimeo

> Saves pages in a csv

In [None]:
%%time
pool = multiprocessing.Pool()
result = pool.map(return_data, range(1, total_pages + 1))
pool.terminate()
pool.join()

> The Challenge: I begun the process by getting 25 videos per page, without multiprocessing and it was taking a whole night to download the pages, and either the kernel broke or I got some error in the middle of the process. Waiting for data was the most time consuming task in the project.

## Filtering columns and merging all pages

In [None]:
path = './downloaded_pages/'
all_files = glob.glob(path + "*.csv")
each_csv = (pd.read_csv(f)[cols] for f in all_files)
sp_df = pd.concat(each_csv, ignore_index=True)

In [340]:
# Exporting filtered .csv
date_time = datetime.datetime.now().strftime("%d%b%Y").replace('/', '').lower() 
sp_df.to_csv(f'./downloaded_pages/staffpicks_{date_time}.csv')


## Vimeo's Dataset

In [66]:
to_export = sp_df.sort_values(by='release_time', ascending=False)
to_export.to_csv(f'./downloaded_pages/sp_{date_time}_tableau.csv')

> When I wrote the function to get all pages I forgot to add '.csv' when I named. I tried all possible methods to concat the files and got several errors. To sum it up, I was trying to concat .txt files, so when I imported the merged file, I was getting a very strange renderization (the file wasn't separated by comma, it was plai text!). 

# Web Scraping Motionographer

> Motionographer - curated motion design content: http://motionographer.com/ | https://motionographer.com/wp-json/wp/v2/posts

In [42]:
# first page
first_page = 1
first_page_link = f'http://motionographer.com/articles/page/{first_page}'
m_soup = BeautifulSoup(requests.get('http://motionographer.com/articles/page/1').content)

last_page_num = int(m_soup.select('body div nav li a')[-2].text)
last_page_link = m_soup.select('body div nav li a')[-2]['href']

In [43]:
all_pages_nums = [*range(first_page, last_page_num + 1)]

In [44]:
all_pages_url = [f"http://motionographer.com/articles/page/{item}" for item in range(first_page, last_page_num + 1)]
all_pages_url

['http://motionographer.com/articles/page/1',
 'http://motionographer.com/articles/page/2',
 'http://motionographer.com/articles/page/3',
 'http://motionographer.com/articles/page/4',
 'http://motionographer.com/articles/page/5',
 'http://motionographer.com/articles/page/6',
 'http://motionographer.com/articles/page/7',
 'http://motionographer.com/articles/page/8',
 'http://motionographer.com/articles/page/9',
 'http://motionographer.com/articles/page/10',
 'http://motionographer.com/articles/page/11',
 'http://motionographer.com/articles/page/12',
 'http://motionographer.com/articles/page/13',
 'http://motionographer.com/articles/page/14',
 'http://motionographer.com/articles/page/15',
 'http://motionographer.com/articles/page/16',
 'http://motionographer.com/articles/page/17',
 'http://motionographer.com/articles/page/18',
 'http://motionographer.com/articles/page/19',
 'http://motionographer.com/articles/page/20',
 'http://motionographer.com/articles/page/21',
 'http://motionographe

In [45]:
def download_page(page_url):
    '''Page url --> html. From page url, downloads html page.'''
    
    downloaded_page = BeautifulSoup(requests.get(page_url).content)
    naming = re.findall('[0-9]+', str(page_url))
    
    with open(f"./motionographer/m_{naming[0]}.html", "w") as file:
        file.write(str(downloaded_page))
        
#     print('page downloaded!')

In [78]:
def download_post(post_url):
    '''Post url --> html. From page url, downloads html page.'''
    try:
        downloaded_post = BeautifulSoup(requests.get(post_url).content)
        naming = (re.findall('/([^/]+)/$', str(post_url)))[0]
        post_df = crawl_posts(downloaded_post)
        post_df.to_csv(f'./posts/{naming}.csv')
    except IndexError:
        print(f"Couldn't download {post_url}")
        
#     print('post downloaded!')

In [47]:
def get_post_url(page_content):
    '''Page content --> posts url. From soup, gets posts url.'''
    
    page_posts = BeautifulSoup(page_content).select('article.post > div.article-header > a')
    url_list = [link['href'] for link in page_posts]
#     print('page downloaded.')
    return url_list

In [48]:
def makes_soup(url):
    '''Url --> soup. Makes soup from url.'''
    
    request = requests.get(url).content
    soup = BeautifulSoup(request)
    return soup

In [49]:
def crawl_posts(post_soup):
    '''Post soup --> Pandas DataFrame. Returns a pandas dataframe from soup.'''
    
    title = post_soup.select('body article h1')[0].text # post title
    iframe = post_soup.select('div.video > iframe')[0]['src']
    video_url = 'https://vimeo.com/' + ''.join(re.findall('/(\d+)', iframe)) # vimeo links
    date = post_soup.select('body article p time')[0]['datetime'] # date / time
    author = post_soup.select('body article p a')[0].text # author
    author_url = post_soup.select('body article p a')[0]['href'] # author link
    content = post_soup.select('body div article')[0] # article content
    
    post_page = {'Title': [title],'URL': [video_url], 'Date': [date], 'Author': [author], 
                 'Author_URL': [author_url], 'Content': [content]}
    post_df = pd.DataFrame(post_page, index=[0])

    return post_df

In [51]:
def read_html(page_num):
    '''Page num --> String. Given a page number, opens and returns the page as a string.'''
    
    with open(f'./motionographer/m_{page_num}.html', 'r') as f:
        html_string = f.read()
    return html_string

In [52]:
def get_posts_url(page_list):
    '''Page --> posts in page. Gets list of pages and returns, for each page, a list with all posts.''' 
    
    all_posts_url = []
    
    for page in page_list:
        string = read_html(page)
        posts_from_page = get_post_url(string)
        all_posts_url.append(posts_from_page)
        
    return all_posts_url

In [53]:
list_of_posts_list = get_posts_url(all_pages_nums)
flat_posts_list = [i for item in list_of_posts_list for i in item]

In [None]:
downloaded_posts = [download_post(x) for x in tqdm(flat_posts_list)]

HBox(children=(FloatProgress(value=0.0, max=5847.0), HTML(value='')))

Couldn't download https://motionographer.com/2020/03/12/launch-party-postponed-motionographer-continues-on/
Couldn't download https://motionographer.com/2020/03/09/rare-volume-beyond-the-airdate/
Couldn't download https://motionographer.com/2020/01/24/greetings/
Couldn't download https://motionographer.com/2019/12/09/%e2%9c%8c%ef%b8%8ffarewell-motionographer%e2%9c%8c%ef%b8%8f/
Couldn't download https://motionographer.com/2019/12/02/doug-alberts-and-the-next-generation/
Couldn't download https://motionographer.com/2019/11/22/sharon-harris-and-what-its-like-working-in-tech/
Couldn't download https://motionographer.com/2019/11/12/two-needles-in-the-haystack/
Couldn't download https://motionographer.com/2019/08/21/london-sao-paulo-and-cookie-studio/
Couldn't download https://motionographer.com/2019/07/17/legwork-is-dead/
Couldn't download https://motionographer.com/2019/05/31/motionographer-x-promax-meetup/
Couldn't download https://motionographer.com/2019/05/22/give-everything-you-know/
C

Couldn't download https://motionographer.com/2016/08/08/beyond-title-design-filipe-carvalho-on-moonlighting-and-collaboration/
Couldn't download https://motionographer.com/2016/08/03/catching-up-with-vincent-tsui/
Couldn't download https://motionographer.com/2016/07/29/40-more-instagram-accounts-you-should-be-following/
Couldn't download https://motionographer.com/2016/07/12/vr-without-goggles-field-trip-to-mars-sends-a-souped-up-school-bus-to-space/
Couldn't download https://motionographer.com/2016/07/07/a-love-story-for-chipotle/
Couldn't download https://motionographer.com/2016/07/01/about-the-motionographer-facelift/
Couldn't download https://motionographer.com/2016/06/28/motionographer-scholarship-school-of-motion-design-bootcamp/
Couldn't download https://motionographer.com/2016/06/24/an-insulting-pitch-email-from-mk12-co-founder-ben-radatz/
Couldn't download https://motionographer.com/2016/06/21/review-division05s-snapdragon/
Couldn't download https://motionographer.com/2016/06/

Couldn't download https://motionographer.com/2015/05/29/the-mill-updates/
Couldn't download https://motionographer.com/2015/05/29/logan-ny-golden-touch-music-video/
Couldn't download https://motionographer.com/2015/05/28/see-no-evil-june-4/
Couldn't download https://motionographer.com/2015/05/28/how-firstborn-became-an-agency-without-losing-its-soul/
Couldn't download https://motionographer.com/2015/05/21/nyc-may-mograph-meetup/
Couldn't download https://motionographer.com/2015/05/14/not-to-scale-relaunch/
Couldn't download https://motionographer.com/2015/05/13/survey-creative-parenting/
Couldn't download https://motionographer.com/2015/05/12/superestudio-relaunches/
Couldn't download https://motionographer.com/2015/05/12/entertainment-lawyer-jeffrey-rose-on-the-collective/
Couldn't download https://motionographer.com/2015/05/06/nyc-an-evening-with-david-oreilly-at-the-moma/
Couldn't download https://motionographer.com/2015/05/04/primetime-emmys-adds-motion-design-category/
Couldn't do

Couldn't download https://motionographer.com/2014/06/25/golden-wolf-for-wawa/
Couldn't download https://motionographer.com/2014/06/23/daniel-savage-new-site-new-work/
Couldn't download https://motionographer.com/2014/06/16/psyop-la-tackles-a-series-of-cg-shorts-for-samsung-galaxy/
Couldn't download https://motionographer.com/2014/06/13/gobelins-annecy-2014/
Couldn't download https://motionographer.com/2014/06/12/1stavemachine-relaunches/
Couldn't download https://motionographer.com/2014/06/09/nike-football-the-last-game/
Couldn't download https://motionographer.com/2014/06/06/brikk-livet-i-bokstavslandet-radiotjanst/
Couldn't download https://motionographer.com/2014/06/05/gergely-penny-dreadful-portraits-dracula/
Couldn't download https://motionographer.com/2014/06/04/griff-stephen-king-mr-mercedes/
Couldn't download https://motionographer.com/2014/06/03/edmond-was-a-donkey/
Couldn't download https://motionographer.com/2014/06/03/see-no-evil-june-3-2/
Couldn't download https://motionog

Couldn't download https://motionographer.com/2014/03/06/the-griswolds-red-tuxedo-directed-by-kris-mercado/
Couldn't download https://motionographer.com/2014/03/06/golden-wolf-primary-dog-blood-chella-ride/
Couldn't download https://motionographer.com/2014/03/06/clever-cute-iphone-5c-animations/
Couldn't download https://motionographer.com/2014/03/03/the-86th-academy-awards-nominee-posters/
Couldn't download https://motionographer.com/2014/03/03/rip-alain-resnais/
Couldn't download https://motionographer.com/2014/02/26/black-gold-by-pes/
Couldn't download https://motionographer.com/2014/02/26/smith-foulkes-inner-beauty-for-honda/
Couldn't download https://motionographer.com/2014/02/24/pocull-loves-balls/
Couldn't download https://motionographer.com/2014/02/24/nyc-mograph-february-meetup/
Couldn't download https://motionographer.com/2014/02/19/enders-game-motion-graphics-reel/
Couldn't download https://motionographer.com/2014/02/19/2veintes-latest-reel-brims-with-bold-typography/
Couldn'

Couldn't download https://motionographer.com/2013/10/10/london-londons-hero/
Couldn't download https://motionographer.com/2013/10/09/jonathan-kim-art-director-designer-portfolio-2013/
Couldn't download https://motionographer.com/2013/10/03/jeff-le-bars-carn/
Couldn't download https://motionographer.com/2013/10/02/leftchannel-motion-2013-opener/
Couldn't download https://motionographer.com/2013/10/01/bot-dolly-box-interview-and-behind-the-scenes/
Couldn't download https://motionographer.com/2013/09/28/lucas-zanotto-geile-weine/
Couldn't download https://motionographer.com/2013/09/27/crcr-c2c-delta/
Couldn't download https://motionographer.com/2013/09/25/mainframe-relaunches/
Couldn't download https://motionographer.com/2013/09/25/school-of-motion/
Couldn't download https://motionographer.com/2013/09/24/bot-dolly-box/
Couldn't download https://motionographer.com/2013/09/23/nyc-mograph-september-meetup/
Couldn't download https://motionographer.com/2013/09/21/exit-73-studios-coin/
Couldn't

Couldn't download https://motionographer.com/2013/05/23/podcast-the-collective/
Couldn't download https://motionographer.com/2013/05/21/buck-antfood-for-childline/
Couldn't download https://motionographer.com/2013/05/15/see-no-evil-june-2/
Couldn't download https://motionographer.com/2013/05/13/promotive-tv/
Couldn't download https://motionographer.com/2013/05/12/art-com-moving-in/
Couldn't download https://motionographer.com/2013/05/10/making-glass-cows/
Couldn't download https://motionographer.com/2013/05/10/renaud-hallee-the-clockmakers/
Couldn't download https://motionographer.com/2013/05/10/chris-randall-pilsner-urquell-book-of-legends/
Couldn't download https://motionographer.com/2013/05/09/jon-saunders-updates-3/
Couldn't download https://motionographer.com/2013/05/09/rest-in-peace-ray-harryhausen/
Couldn't download https://motionographer.com/2013/05/09/ronda-nick-ids/
Couldn't download https://motionographer.com/2013/05/08/holbrooks-red-cross-parcel/
Couldn't download https://m

Couldn't download https://motionographer.com/2013/01/25/adam-powell-sivu-better-man-than-he/
Couldn't download https://motionographer.com/2013/01/24/injaus-for-i-sat/
Couldn't download https://motionographer.com/2013/01/23/lnwc-ghost-stories-trailer/
Couldn't download https://motionographer.com/2013/01/22/see-no-evil-february-2/
Couldn't download https://motionographer.com/2013/01/22/starcraft-ii-heart-of-the-swarm-opening-cinematic/
Couldn't download https://motionographer.com/2013/01/21/best-wishes-2013/
Couldn't download https://motionographer.com/2013/01/19/sam-mason-and-cd-pete-candeland-mazda-incredible-world/
Couldn't download https://motionographer.com/2013/01/18/tobias-larson-guilt/
Couldn't download https://motionographer.com/2013/01/18/alexis-beaumont-remi-godin-stuck-in-the-sound-lets-go/
Couldn't download https://motionographer.com/2013/01/17/the-academy-and-colin-hesterly-hammer-hand/
Couldn't download https://motionographer.com/2013/01/15/chris-weller-updates-pac-man-in-

Couldn't download https://motionographer.com/2012/10/05/alma-mater/
Couldn't download https://motionographer.com/2012/10/04/call-for-entries-bassawards/
Couldn't download https://motionographer.com/2012/10/03/julia-pott-the-event/
Couldn't download https://motionographer.com/2012/10/03/killer-mike-reagan/
Couldn't download https://motionographer.com/2012/10/01/cartoon-network-20th-birthday-music-video-by-i-love-dust/
Couldn't download https://motionographer.com/2012/10/01/graphic-design-now-in-production/
Couldn't download https://motionographer.com/2012/10/01/blender-tears-of-steel/
Couldn't download https://motionographer.com/2012/09/28/la-trip-the-light-fantastic-a-tribute-to-robert-abel-associates/
Couldn't download https://motionographer.com/2012/09/28/wolf-crow-launches/
Couldn't download https://motionographer.com/2012/09/28/talktalk-model-britain/
Couldn't download https://motionographer.com/2012/09/26/sagrada-familia-comes-to-life-with-projection-mapped-imagery/
Couldn't downl

Couldn't download https://motionographer.com/2012/07/11/christopher-hewitt-for-tribord/
Couldn't download https://motionographer.com/2012/07/11/nike-game-on-world/
Couldn't download https://motionographer.com/2012/07/11/royale-for-nike/
Couldn't download https://motionographer.com/2012/07/11/digital-domaindavid-fincher-adidas-mechanical-legs/
Couldn't download https://motionographer.com/2012/07/11/robert-hodgin/
Couldn't download https://motionographer.com/2012/07/10/the-aikiu-pieces-of-gold/
Couldn't download https://motionographer.com/2012/07/10/seattle-aeseattle-meeting-with-maxon/
Couldn't download https://motionographer.com/2012/07/10/fresh-paint-brookfield/
Couldn't download https://motionographer.com/2012/07/10/mk12-relaunches/
Couldn't download https://motionographer.com/2012/07/10/glossyrey-be-a-vegetarian/
Couldn't download https://motionographer.com/2012/07/10/london-screen-social-music-vs-film-2/
Couldn't download https://motionographer.com/2012/07/09/labour-for-littlebits/

Couldn't download https://motionographer.com/2012/05/26/onesize-anno-2012/
Couldn't download https://motionographer.com/2012/05/24/alphabetical-order-france-5-rebrand/
Couldn't download https://motionographer.com/2012/05/24/a52-relaunched/
Couldn't download https://motionographer.com/2012/05/24/joey-123jaera-steadyo/
Couldn't download https://motionographer.com/2012/05/23/ostersoen-by-lorenzo-papace/
Couldn't download https://motionographer.com/2012/05/23/sebas-clim-sparkle/
Couldn't download https://motionographer.com/2012/05/22/mikey-please-making-the-eagleman-stag/
Couldn't download https://motionographer.com/2012/05/22/maximin-spotti-poutshi-creativite/
Couldn't download https://motionographer.com/2012/05/19/neil-gaimans-commencement-address-2012/
Couldn't download https://motionographer.com/2012/05/17/fresh-paint-studios-for-epix/
Couldn't download https://motionographer.com/2012/05/17/mara-smalley-updates-2/
Couldn't download https://motionographer.com/2012/05/17/studio-type-proj

In [None]:
def concat_dataframe():
    path = './downloaded_pages/'
    all_files = glob.glob(path + "*.csv")
    each_csv = (pd.read_csv(f)[cols] for f in all_files)
    sp_df = pd.concat(each_csv, ignore_index=True)
    
    # Exporting filtered .csv
    date_time = datetime.datetime.now().strftime("%d%b%Y").replace('/', '').lower() 
    sp_df.to_csv(f'./downloaded_pages/staffpicks_{date_time}.csv')

## Parallel download

In [382]:
%%time
pool = multiprocessing.Pool()
result = pool.map(download_page, all_pages)
pool.terminate()
pool.join()

page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloade

page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloaded!
page downloade

# Storing data in a database

## Imports final dataset

In [7]:
sp_df = pd.read_csv('./downloaded_pages/sp_02apr2020_tableau.csv')

## Creates engines

In [12]:
vimeo_engine = create_engine('postgresql+psycopg2://postgres:123@localhost')
# motionographer_engine = create_engine('postgresql+psycopg2://postgres:123@localhost/motionographer')
engines = vimeo_engine
conn = engines.connect()

## Runs engines

In [13]:
sp_df.to_sql('staff_picks', conn, index=False, if_exists='append')

# Next Steps

1. Crawl all Motionographer pages
2. Create dataset from it
3. Filter useful information from Motionographer's posts
4. Consolidate Pipeline
5. Save Vimeo and Motionographer's data in a SQL database
6. Update remote repo

Extra:
* Clean data
* Share in Kaggle
* *Write content from it, with data visualization*
* Share on LinkedIn with the community of designers/filmakers
* Have 100% of functions with proper docstring description
