## Cool Stuff

Below is a collection of random stuff that we didn't get to cover that you may find useful in the future. 


### Parsing Webpages
Sometimes you want to pull information from websites in some ordered way. This is possible because websites are written in HTML, which is a structured, hierarchically-organized format. The best package for doing this is called BeautifulSoup, but when we import it, we use the package `bs4`, which is just the newest version of it. We use the package `requests` to initially open up the webpage so we can load the data into BeautifulSoup. We can then search for particular elements of the webpage using the `find_all` method. This requires you to know a bit of HTML

Below is a function that pulls the tweets from a named Twitter user and prints them (just the first few). 


In [None]:
from bs4 import BeautifulSoup, NavigableString
import requests

def print_tweets(user):
    r = requests.get("https://twitter.com/%s" %user)
    html = r.text

    soup =  BeautifulSoup(html, 'html.parser') 

    tweets =  soup.find_all('strong', {'class': 'fullname js-action-profile-name show-popup-with-id'})

    for i in range(len(tweets)):
        user = tweets[i].contents[0]

        action_tag = soup('span', {'class': 'username js-action-profile-name'})
        show_name = action_tag[i].contents[1].contents[0]

        twit_text = soup('p', {'class': 'js-tweet-text'})

        message = ""

        for nib in twit_text[i]:
            if isinstance(nib, NavigableString):
                message += nib
            else:
                message += nib.text

        print "@"+show_name, message
        print " "
        

In [None]:
print_tweets('whitehouse')

In [None]:
print_tweets('justinbieber')

Here's a function that does the same thing, except it displays all the images from a twitter feed

In [None]:
from bs4 import BeautifulSoup
import requests
from IPython.display import Image, display


def show_tweet_images(user):
    r = requests.get("https://twitter.com/%s" %user)
    html = r.text

    soup =  BeautifulSoup(html, 'html.parser') 

    allimg = [img['src'] for img in soup.findAll('img',src=re.compile('http.*media.*.g$'))]

    for img in allimg:
        print img
        display(Image(url=img))


In [None]:
show_tweet_images('justinbieber')

Here's a function that just does a Google Image search, then grabs a random result and displays it. We can give an optional argument specifying the number of images we want. 

In [None]:
from bs4 import BeautifulSoup
import requests
import random
import re
from IPython.display import Image, display


def display_google_image(query,num_results=1):
    
    
    for i in range(num_results):
    
        #make the query in the appropriate format
        query= query.lower().split()
        query='+'.join(query)

        #it only returns 20 images at a time, this chooses a random "page" of 20 images (from 1-50)
        page = random.randint(1,50) 



        #pull the info from the webpage
        header = {'User-Agent': 'Mozilla/5.0'} 

        r = requests.get('https://www.google.com/search',
                         params={'q': query,'start':page,'source': 'lnms','tbm':'isch'},
                         headers=header)


        #parse it
        soup = BeautifulSoup(r.text,'html')
        #find all image urls
        images = [a['src'] for a in soup.findAll("img", {"src": re.compile("gstatic.com")})]


        imgnum = random.randint(0,len(images)-1)


        randimg = images[imgnum]

#         print "Search URL: " +r.url #print the url. 
#         print "Page: "+str(page)
#         print "Image #: "+str(imgnum)
        print "Image URL: "+randimg

        display(Image(url=randimg,width=250))
    
    


Now we search for our favorite query, "pug costume". The only downside is that it only loads the thumbnail image, and not the full-sized one

In [None]:
display_google_image("pug costume",5)

## Better Plotting

`matplotlib` is great, but clunky. The plots are also not very interactive. There are a couple of packages out there that make it easy to make nice, interactive plots that can be run in the web browser. 

### Bokeh
The older one is called `bokeh`. It can create interactive plots with some basic controls. One thing it can do is produce maps with coordinates overlayed on them. Let's make an interactive version of one of our runs. Watch how you can pan and zoom on the plot!

In [None]:
import pandas as pd
from pandas import DataFrame

df = DataFrame.from_csv('./datasets/20runs.csv',index_col=False)


subset = df[df.user=='gypsydude']
subset.head()

In [None]:
from bokeh.io import output_file, show
from bokeh.models import GMapPlot, GMapOptions, ColumnDataSource, Line, PanTool, WheelZoomTool,DataRange1d
import numpy as np


map_options = GMapOptions(lat=subset.latitude.median(), lng=subset.longitude.median(), map_type="satellite",zoom=14)

plot = GMapPlot(
    x_range=DataRange1d(), y_range=DataRange1d(), map_options=map_options, title="GypsyDude's Runs",plot_width=800,
    plot_height=800
)



source = ColumnDataSource(
        data=dict(
            lat=subset.latitude,
            lon=subset.longitude,
        )
    )

circle = Line(x="lon", y="lat", line_width=2.5, line_color="red", line_alpha=0.5)
plot.add_glyph(source, circle)

plot.add_tools(PanTool(), WheelZoomTool())
output_file("gmap_plot.html")
show(plot)

### Plotly

Plotly is newer and fancier. The trick is that it requires you to setup an account, because it's made for creating and sharing plots on the web. The account is free though. I have already created one that you can try from here. Plotly can create all kinds of plots, but here's an example of something you can't to with matplotlib well: 3D surface plots. Cool!


In [None]:
import plotly.plotly as py
import plotly.graph_objs as go

import pandas as pd
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
py.sign_in('psycomputing', 'qoka645e05')


init_notebook_mode() # run at the start of every ipython notebook to use plotly.offline
                     # this injects the plotly.js source files into the notebook


# Read data from a csv
z_data = pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/api_docs/mt_bruno_elevation.csv')

data = [
    go.Surface(
        z=z_data.as_matrix()
    )
]
layout = go.Layout(
    title='Mt Bruno Elevation',
    autosize=False,
    width=600,
    height=600,
    margin=dict(
        l=65,
        r=50,
        b=65,
        t=90
    )
)
fig = go.Figure(data=data, layout=layout)
py.iplot(fig, filename='elevations-3d-surface')

### Recursion

This is an advanced topic that will take a while to really understand. Recursive functions are ones that operate the same way at many different levels. Remember our image homework, where we created the mirror images of Ernie? Below I have a function that produces the same thing. But, it has a "level" argument. When level is 1, then it makes the 4-paneled image. When level is 2, it makes that 4-paneled image, then takes that 4-paneled image and creates mirror images out of *that*, so 16 total images. This can repeat for many levels. Notice that the function has no loops in it. It only has the code for doing level 1, and if level is higher, then *it calls itself*, with level minus 1. Recursion is the closest thing a programmer can produce that feelslike magic. Go ahead and open the "pattern.jpg" file from the images folder. That is what you get with level 9 (262,144 Ernies).


In [None]:
#load the image first: 

from PIL import Image

im = Image.open('./datasets/ernie.jpg')


box = (550,450,900,875)

ernie = im.crop(box)
ernie.thumbnail((100,100))
ernie

In [None]:
#here is the function
def mirror_images(img,level=1):
    
    if level==1:
        #blank image to hold 4 of the originals
        blank = Image.new('RGB',(img.width*2,img.height*2))
        #paste all 4, flipping them so they're symmetrical horizontally and vertically
        blank.paste(img,(0,0))
        blank.paste(img.transpose(Image.FLIP_LEFT_RIGHT),(img.width,0))
        blank.paste(img.transpose(Image.FLIP_TOP_BOTTOM),(0,img.height))
        blank.paste(img.transpose(Image.FLIP_LEFT_RIGHT).transpose(Image.FLIP_TOP_BOTTOM),(img.width,img.height))   
    else:
        img2 = mirror_images(img,1)
        blank = mirror_images(img2,level-1)
    return(blank)

In [None]:
#level 1
mirror_images(ernie,1)

In [None]:
#level 2

mirror_images(ernie,2)

In [None]:
#level 5
mirror_images(ernie,5)