#Mining the Social Web, 2nd Edition

##Chapter 5: . Mining Web Pages: Using Natural Language Processing to Understand Human Language, Summarize Blog Posts, and More

This IPython Notebook provides an interactive way to follow along with and explore the numbered examples from [_Mining the Social Web (2nd Edition)_](http://bit.ly/135dHfs). The intent behind this notebook is to reinforce the concepts from the sample code in a fun, convenient, and effective way. This notebook assumes that you are reading along with the book and have the context of the discussion as you work through these exercises.

In the somewhat unlikely event that you've somehow stumbled across this notebook outside of its context on GitHub, [you can find the full source code repository here](http://bit.ly/16kGNyb).

## Copyright and Licensing

You are free to use or adapt this notebook for any purpose you'd like. However, please respect the [Simplified BSD License](https://github.com/ptwobrussell/Mining-the-Social-Web-2nd-Edition/blob/master/LICENSE.txt) that governs its use.

Note: If you find yourself wanting to copy output files from this notebook back to your host environment, see the bottom of this notebook for one possible way to do it.

## Example 1. Using boilerpipe to extract the text from a web page

In [1]:
from boilerpipe.extract import Extractor

URL='http://radar.oreilly.com/2010/07/louvre-industrial-age-henry-ford.html'

extractor = Extractor(extractor='ArticleExtractor', url=URL)

print(extractor.getText())

Listen
The Louvre of the Industrial Age
The Henry Ford is one of the world's great museums, and the world it chronicles is our own.
by Tim O'Reilly | @timoreilly | +Tim O'Reilly | July 30, 2010
This morning I had the chance to get a tour of The Henry Ford Museum in Dearborn, MI, along with Dale Dougherty, creator of Make: and Makerfaire, and Marc Greuther, the chief curator of the museum.  I had expected a museum dedicated to the auto industry, but it’s so much more than that.  As I wrote in my first stunned tweet, “it’s the Louvre of the Industrial Age.”
When we first entered, Marc took us to what he said may be his favorite artifact in the museum, a block of concrete that contains Luther Burbank’s shovel, and Thomas Edison’s signature and footprints.  Luther Burbank was, of course, the great agricultural inventor who created such treasures as the nectarine and the Santa Rosa plum. Ford was a farm boy who became an industrialist; Thomas Edison was his friend and mentor. The museum, op

## Example 2. Using feedparser to extract the text (and other fields) from an RSS or Atom feed

In [3]:
import feedparser

FEED_URL='http://feeds.feedburner.com/oreilly/radar/atom'

fp = feedparser.parse(FEED_URL)

for e in fp.entries:
    print(e.title)
    print(e.links[0].href)
    print(e.content[0].value)

Returning to our senses
http://feedproxy.google.com/~r/oreilly/radar/atom/~3/YToqMoCVWTM/returning-to-our-senses
<p><img src='https://d3tdunqjn7n0wj.cloudfront.net/600x450/5704563713_dd5a8c0de1_o_crop-81a1c2986a030549ca065f0a8ff8a7b9.jpg'/></p><p><em>An introduction to how human senses can be incorporated into design principles.</em></p>


<h2>If a Tree Falls in the Forest…</h2>
<p><a data-type="indexterm" data-primary="fast thinking (System" id="id-nxC1SLFrFM"></a><a data-type="indexterm" data-primary="Google (company)" data-secondary="Google Earth Engine" id="id-NnC7FnF2FQ"></a><a data-type="indexterm" data-primary="rainforest conservation" id="id-8nCVInF8Fa"></a><a data-type="indexterm" data-primary="satellite imaging" id="id-mwC9tyFBFb"></a><a data-type="indexterm" data-primary="senses and sensing" data-secondary="rainforest conservation and" id="id-dkCDhyFqFD"></a><a data-type="indexterm" data-primary="System" id="id-BnCyTjFoFr"></a>BRAZIL BEGAN USING SATELLITE imaging to moni

## Example 3. Pseudocode for a breadth-first search

In [None]:
Create an empty graph
Create an empty queue to keep track of nodes that need to be processed

Add the starting point to the graph as the root node
Add the root node to a queue for processing

Repeat until some maximum depth is reached or the queue is empty:
  Remove a node from the queue 
  For each of the node's neighbors: 
    If the neighbor hasn't already been processed: 
      Add it to the queue 
      Add it to the graph 
      Create an edge in the graph that connects the node and its neighbor

**Naive sentence detection based on periods**

In [4]:
txt = "Mr. Green killed Colonel Mustard in the study with the candlestick. Mr. Green is not a very nice fellow."
print(txt.split("."))

['Mr', ' Green killed Colonel Mustard in the study with the candlestick', ' Mr', ' Green is not a very nice fellow', '']


**More sophisticated sentence detection**

In [5]:
import nltk

# Downloading nltk packages used in this example
nltk.download('punkt')

sentences = nltk.tokenize.sent_tokenize(txt)
print(sentences)

[nltk_data] Downloading package punkt to /Users/temp/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
['Mr. Green killed Colonel Mustard in the study with the candlestick.', 'Mr. Green is not a very nice fellow.']


**Tokenization of sentences**

In [6]:
tokens = [nltk.tokenize.word_tokenize(s) for s in sentences]
print(tokens)

[['Mr.', 'Green', 'killed', 'Colonel', 'Mustard', 'in', 'the', 'study', 'with', 'the', 'candlestick', '.'], ['Mr.', 'Green', 'is', 'not', 'a', 'very', 'nice', 'fellow', '.']]


**Part of speech tagging for tokens**

In [7]:
# Downloading nltk packages used in this example
nltk.download('maxent_treebank_pos_tagger')

pos_tagged_tokens = [nltk.pos_tag(t) for t in tokens]
print(pos_tagged_tokens)

[nltk_data] Downloading package maxent_treebank_pos_tagger to
[nltk_data]     /Users/temp/nltk_data...
[nltk_data]   Package maxent_treebank_pos_tagger is already up-to-
[nltk_data]       date!
[[('Mr.', 'NNP'), ('Green', 'NNP'), ('killed', 'VBD'), ('Colonel', 'NNP'), ('Mustard', 'NNP'), ('in', 'IN'), ('the', 'DT'), ('study', 'NN'), ('with', 'IN'), ('the', 'DT'), ('candlestick', 'NN'), ('.', '.')], [('Mr.', 'NNP'), ('Green', 'NNP'), ('is', 'VBZ'), ('not', 'RB'), ('a', 'DT'), ('very', 'RB'), ('nice', 'JJ'), ('fellow', 'NN'), ('.', '.')]]


**Named entity extraction/chunking for tokens**

In [11]:
# Downloading nltk packages used in this example
nltk.download('maxent_ne_chunker')
nltk.download('words')

ne_chunks = nltk.ne_chunk_sents(pos_tagged_tokens)
print(ne_chunks)
#print(ne_chunks[0].pprint()) # You can prettyprint each chunk in the tree

for chunk in ne_chunks:
    print(chunk)

[nltk_data] Downloading package maxent_ne_chunker to
[nltk_data]     /Users/temp/nltk_data...
[nltk_data]   Package maxent_ne_chunker is already up-to-date!
[nltk_data] Downloading package words to /Users/temp/nltk_data...
[nltk_data]   Package words is already up-to-date!
<generator object ParserI.parse_sents.<locals>.<genexpr> at 0x128d9e6d0>
(S
  (PERSON Mr./NNP)
  (PERSON Green/NNP)
  killed/VBD
  (ORGANIZATION Colonel/NNP Mustard/NNP)
  in/IN
  the/DT
  study/NN
  with/IN
  the/DT
  candlestick/NN
  ./.)
(S
  (PERSON Mr./NNP)
  (ORGANIZATION Green/NNP)
  is/VBZ
  not/RB
  a/DT
  very/RB
  nice/JJ
  fellow/NN
  ./.)


## Example 4. Harvesting blog data by parsing feeds

In [21]:
import os
import sys
import json
import feedparser
from bs4 import BeautifulSoup
#from BeautifulSoup import BeautifulStoneSoup
from nltk import clean_html

FEED_URL = 'http://feeds.feedburner.com/oreilly/radar/atom'

# clean_html() is dropped by NLTK, the exact same job can be 
# done better using Beautifulsoup - get_text(). So, following function is commented out

# def cleanHtml(html):
#     return BeautifulSoup(clean_html(html),
#                 convertEntities=BeautifulSoup.HTML_ENTITIES).contents[0]

fp = feedparser.parse(FEED_URL)

print("Fetched %s entries from '%s'" % (len(fp.entries[0].title), fp.feed.title))

blog_posts = []
for e in fp.entries:
    blog_posts.append({'title': e.title, 'content'
                      : BeautifulSoup(e.content[0].value).get_text(), 'link': e.links[0].href})

out_file = os.path.join('resources', 'ch05-webpages', 'feed.json')
f = open(out_file, 'w')
f.write(json.dumps(blog_posts, indent=1))
f.close()

print('Wrote output file to %s' % (f.name, ))

Fetched 23 entries from 'All - O'Reilly Media'




 BeautifulSoup(YOUR_MARKUP})

to this:

 BeautifulSoup(YOUR_MARKUP, "lxml")

  markup_type=markup_type))


Wrote output file to resources/ch05-webpages/feed.json


## Example 5. Using NLTK’s NLP tools to process human language in blog data

In [25]:
import json
import nltk

# Download nltk packages used in this example
nltk.download('stopwords')

BLOG_DATA = "resources/ch05-webpages/feed.json"

blog_data = json.loads(open(BLOG_DATA).read())

# Customize your list of stopwords as needed. Here, we add common
# punctuation and contraction artifacts.

stop_words = nltk.corpus.stopwords.words('english') + [
    '.',
    ',',
    '--',
    '\'s',
    '?',
    ')',
    '(',
    ':',
    '\'',
    '\'re',
    '"',
    '-',
    '}',
    '{',
    u'—',
    ]

for post in blog_data:
    sentences = nltk.tokenize.sent_tokenize(post['content'])

    words = [w.lower() for sentence in sentences for w in
             nltk.tokenize.word_tokenize(sentence)]

    fdist = nltk.FreqDist(words)

    # Basic stats

    num_words = sum([i[1] for i in fdist.items()])
    num_unique_words = len(fdist.keys())

    # Hapaxes are words that appear only once

    num_hapaxes = len(fdist.hapaxes())

    top_10_words_sans_stop_words = [w for w in fdist.items() if w[0]
                                    not in stop_words][:10]

    print(post['title'])
    print('\tNum Sentences:'.ljust(25), len(sentences))
    print('\tNum Words:'.ljust(25), num_words)
    print('\tNum Unique Words:'.ljust(25), num_unique_words)
    print('\tNum Hapaxes:'.ljust(25), num_hapaxes)
    print('\tTop 10 Most Frequent Words (sans stop words):\n\t\t', \
            '\n\t\t'.join(['%s (%s)'
            % (w[0], w[1]) for w in top_10_words_sans_stop_words]) +'\n')

[nltk_data] Downloading package stopwords to /Users/temp/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
Returning to our senses
	Num Sentences:           12
	Num Words:               250
	Num Unique Words:        149
	Num Hapaxes:             112
	Top 10 Most Frequent Words (sans stop words):
		 introduction (1)
		human (4)
		senses (2)
		incorporated (1)
		design (1)
		principles (1)
		tree (1)
		falls (1)
		forest… (1)
		brazil (1)

Techniques for designing to reduce risk
	Num Sentences:           1
	Num Words:               31
	Num Unique Words:        26
	Num Hapaxes:             23
	Top 10 Most Frequent Words (sans stop words):
		 need (1)
		plan (1)
		ca (1)
		n't (1)
		see (1)
		design (1)
		wide (1)
		range (1)
		user (1)
		scenarios (1)

Wrapping an RxJS observable stream into an Angular service
	Num Sentences:           29
	Num Words:               845
	Num Unique Words:        277
	Num Hapaxes:             154
	Top 10 Most Frequent Words (sans stop words

	Num Words:               605
	Num Unique Words:        279
	Num Hapaxes:             191
	Top 10 Most Frequent Words (sans stop words):
		 enhance (1)
		overall (1)
		code (7)
		quality (7)
		blend (1)
		interpersonal (2)
		communication (1)
		tool-based (5)
		analysis.software (1)
		takes (1)

Four short links: 10 April 2018
	Num Sentences:           11
	Num Words:               313
	Num Unique Words:        174
	Num Hapaxes:             121
	Top 10 Most Frequent Words (sans stop words):
		 deep (2)
		learning (4)
		learnings (1)
		reverse (2)
		engineering (2)
		whatsapp (4)
		database (3)
		client (3)
		social (3)
		science (2)

Four short links: 9 April 2018
	Num Sentences:           8
	Num Words:               194
	Num Unique Words:        116
	Num Hapaxes:             86
	Top 10 Most Frequent Words (sans stop words):
		 monads (3)
		gdpr (3)
		blockchain (4)
		search (2)
		talk (3)
		paper (1)
		monad (2)
		tutorial (1)
		tell (1)
		instead (1)

Four short links: 6 April 2018
	N

## Example 6. A document summarization algorithm based principally upon sentence detection and frequency analysis within sentences

In [28]:
import json
import nltk
import numpy

BLOG_DATA = "resources/ch05-webpages/feed.json"

N = 100  # Number of words to consider
CLUSTER_THRESHOLD = 5  # Distance between words to consider
TOP_SENTENCES = 5  # Number of sentences to return for a "top n" summary

# Approach taken from "The Automatic Creation of Literature Abstracts" by H.P. Luhn

def _score_sentences(sentences, important_words):
    scores = []
    sentence_idx = -1

    for s in [nltk.tokenize.word_tokenize(s) for s in sentences]:

        sentence_idx += 1
        word_idx = []

        # For each word in the word list...
        for w in important_words:
            try:
                # Compute an index for where any important words occur in the sentence.

                word_idx.append(s.index(w))
            except ValueError: # w not in this particular sentence
                pass

        word_idx.sort()

        # It is possible that some sentences may not contain any important words at all.
        if len(word_idx)== 0: continue

        # Using the word index, compute clusters by using a max distance threshold
        # for any two consecutive words.

        clusters = []
        cluster = [word_idx[0]]
        i = 1
        while i < len(word_idx):
            if word_idx[i] - word_idx[i - 1] < CLUSTER_THRESHOLD:
                cluster.append(word_idx[i])
            else:
                clusters.append(cluster[:])
                cluster = [word_idx[i]]
            i += 1
        clusters.append(cluster)

        # Score each cluster. The max score for any given cluster is the score 
        # for the sentence.

        max_cluster_score = 0
        for c in clusters:
            significant_words_in_cluster = len(c)
            total_words_in_cluster = c[-1] - c[0] + 1
            score = 1.0 * significant_words_in_cluster \
                * significant_words_in_cluster / total_words_in_cluster

            if score > max_cluster_score:
                max_cluster_score = score

        scores.append((sentence_idx, score))

    return scores

def summarize(txt):
    sentences = [s for s in nltk.tokenize.sent_tokenize(txt)]
    normalized_sentences = [s.lower() for s in sentences]

    words = [w.lower() for sentence in normalized_sentences for w in
             nltk.tokenize.word_tokenize(sentence)]

    fdist = nltk.FreqDist(words)

    top_n_words = [w[0] for w in fdist.items() 
            if w[0] not in nltk.corpus.stopwords.words('english')][:N]

    scored_sentences = _score_sentences(normalized_sentences, top_n_words)

    # Summarization Approach 1:
    # Filter out nonsignificant sentences by using the average score plus a
    # fraction of the std dev as a filter

    avg = numpy.mean([s[1] for s in scored_sentences])
    std = numpy.std([s[1] for s in scored_sentences])
    mean_scored = [(sent_idx, score) for (sent_idx, score) in scored_sentences
                   if score > avg + 0.5 * std]

    # Summarization Approach 2:
    # Another approach would be to return only the top N ranked sentences

    top_n_scored = sorted(scored_sentences, key=lambda s: s[1])[-TOP_SENTENCES:]
    top_n_scored = sorted(top_n_scored, key=lambda s: s[0])

    # Decorate the post object with summaries

    return dict(top_n_summary=[sentences[idx] for (idx, score) in top_n_scored],
                mean_scored_summary=[sentences[idx] for (idx, score) in mean_scored])

blog_data = json.loads(open(BLOG_DATA).read())

for post in blog_data:
       
    post.update(summarize(post['content']))

    print(post['title'])
    print('=' * len(post['title'])+'\n')
#     print
    print('Top N Summary'+ '\n')
    print('-------------')
    print(' '.join(post['top_n_summary'])+ '\n')
#     print
    print('Mean Scored Summary')
    print('-------------------')
    print(' '.join(post['mean_scored_summary'])+ '\n')
#     print

Returning to our senses

Top N Summary

-------------
If a Tree Falls in the Forest…
BRAZIL BEGAN USING SATELLITE imaging to monitor deforestation during the 1980s. This was the first large-scale, coordinated response to loggers and ranchers who had been illegally clearing the rainforests, and it worked, for a time. To avoid being spotted, loggers and ranchers began working more discreetly in smaller areas that were harder to detect (see Figure 1-1).1 This required a new approach to monitoring the forests. A view powered by Google Earth Engine showing global deforestation

Google Earth Engine and the Brazilian NGO, Imazon, worked together to create more powerful environmental monitoring capabilities. With new analysis techniques for satellite imagery, they were able to classify forest topologies within the rainforest.

Mean Scored Summary
-------------------
To avoid being spotted, loggers and ranchers began working more discreetly in smaller areas that were harder to detect (see Figur

Building tools for the AI applications of tomorrow

Top N Summary

-------------
We’re currently laying the foundation for future generations of AI applications, but we aren’t there yet.For the last few years, AI has been almost synonymous with deep learning (DL). We’ve seen AlphaGo touted as an example of deep learning. We’ve seen deep learning used for naming paint colors (not very successfully), imitating Rembrandt and other great painters, and many other applications. Deep learning’s apparent simplicity--the small number of basic techniques you need to know--makes it much easier to “democratize” AI, to build a core of AI developers that don’t have Ph.D.s in applied math or computer science. As Ali Rahimi has argued, we can often get deep learning to work, but we aren’t close to understanding how, when, or why it works: “we’re equipping [new AI developers] with little more than folklore and pre-trained deep nets, then asking them to innovate.

Mean Scored Summary
-------------------

-------------------
IoT, Migrations, Prisoner's Dilemma, and Security

IoT Inspector -- The Princeton University research team is digging into the traffic that IoT devices do, to identify malicious or otherwise dodgy behaviour. They'll release their packet capture and analysis tool as open source. (via BoingBoing)

Migrations (Will Larson) -- very good explanation of how to manage migrations which are usually the only available avenue to make meaningful progress on technical debt. (via Simon Willison)

Beating the Prisoner's Dilemma -- In 2013 as the semester ended in December, students in Fröhlich’s "Intermediate Programming," "Computer System Fundamentals," and "Introduction to Programming for Scientists and Engineers" classes decided to test the limits of the policy, and collectively planned to boycott the final. Because they all did, a zero was the highest score in each of the three classes, which, by the rules of Fröhlich’s curve, meant every student received an A.

The Intertwing

How to run a custom version of Spark on hosted Kubernetes

Top N Summary

-------------
Learn how Spark 2.3.0+ integrates with K8s clusters on Google Cloud and Azure.Do you want to try out a new version of Apache Spark without waiting around for the entire release process? Does running alpha-quality software sound like fun? Does setting up a test cluster sound like work? This post will help you try out new (2.3.0+) and custom versions of Spark on Google/Azure with Kubernetes. Just don't run this in production without a backup and a very fancy support contract for when things go sideways.

Mean Scored Summary
-------------------
Learn how Spark 2.3.0+ integrates with K8s clusters on Google Cloud and Azure.Do you want to try out a new version of Apache Spark without waiting around for the entire release process? Does running alpha-quality software sound like fun? Does setting up a test cluster sound like work? This post will help you try out new (2.3.0+) and custom versions of Spark on G

Stephen Gates on the growing risks posed by malicious bots

Top N Summary

-------------
Gates joined the Oracle Dyn Global Business Unit from Zenedge, the web application security company recently acquired by Oracle. It works by using a list of default usernames and passwords (from previous data breaches) to take control of IoT devices. One key differentiator with Mirai is that it’s self-propagating—each infected device has the ability to scan the internet to find similar devices and subsequently infect them. Another key factor driving malicious bot growth is the increase in malware that focuses on exploiting vulnerabilities (versus relying on usernames and passwords). The malware automates the process of scanning and infecting IoT devices for known vulnerabilities.

Mean Scored Summary
-------------------
Gates joined the Oracle Dyn Global Business Unit from Zenedge, the web application security company recently acquired by Oracle. Gates and I discussed how growing malicious bot acti

Data engineers vs. data scientists

Top N Summary

-------------
The two positions are not interchangeable—and misperceptions of their roles can hurt teams and compromise productivity.It’s important to understand the differences between a data engineer and a data scientist. Venn diagrams like Figure 1 oversimplify the complex positions and how they’re different. When I work with organizations on their team structures, I don’t use a Venn diagram to illustrate the relationship between a data engineer and a data scientist. Data scientists’ skills
At their core, data scientists have a math and statistics background (sometimes physics). On the extreme end of this applied math, they’re creating machine learning models and artificial intelligence.

Mean Scored Summary
-------------------
The two positions are not interchangeable—and misperceptions of their roles can hurt teams and compromise productivity.It’s important to understand the differences between a data engineer and a data scientist

Four short links: 5 April 2018

Top N Summary

-------------
Interactive Notebooks, Molecule-making AI, Interpersonal Dynamics, and Javascript Motion Library

MyBinder -- Turn a GitHub repo into a collection of interactive notebooks. (via Julia Evans)

Molecule-Making AI (Nature) -- The new AI tool, developed by Marwin Segler, an organic chemist and artificial intelligence researcher at the University of Münster in Germany, and his colleagues, uses deep learning neural networks to imbibe essentially all known single-step organic-chemistry reactions—about 12.4 million of them. The tool repeatedly applies these neural networks in planning a multi-step synthesis, deconstructing the desired molecule until it ends up with the available starting reagents. (via Slashdot)

Interpersonal Dynamics -- The list of common corrosive dynamics rang true: bone-deep competition; fear of being found out; my reality is not the reality; it's no fun being the squeaky wheel; feedback stays at the surface; de

It's time to rebuild the web

Top N Summary

-------------
And we have to ask ourselves what would happen if we brought back those technologies: would we have a web that's more humane and better suited to the future we want to build? I've written several times (and will no doubt write more) about rebuilding the internet, but I've generally assumed the rebuild will need peer-to-peer technologies. Those technologies are inherently much more complex than anything Dash proposes. In contrast, Dash's "missing building blocks" are fundamentally simple. They can easily be used by people who don't have a unicorn's worth of experience as web developers and security administrators.

Mean Scored Summary
-------------------
The web was never supposed to be a few walled gardens of concentrated content owned by a few major publishers; it was supposed to be a cacophony of different sites and voices.Anil Dash's "The Missing Building Blocks of the Web" is an excellent article about the web as it was sup

6 creative ways to solve problems with Linux containers and Docker

Top N Summary

-------------
An outside-the-box exploration of how containers can be used to provide novel solutions.Most people are introduced to Docker and Linux containers as a way to approach solving a very specific problem they are experiencing in their organization. The problem they want to solve often revolves around either making the dev/test cycle faster and more reliable while simultaneously shortening the related feedback loops, or improving the packaging and deploying of applications into production in a very similar fashion. Today, there are a lot of tools in the ecosystem that can significantly decrease the time it takes to accomplish these tasks while also vastly improving the ability of individuals, teams, and organizations to reliably perform repetitive tasks successfully. That being said, tools have become such a big focus in the ecosystem that there are many people who haven’t really spent much time 

## Example 7. Visualizing document summarization results with HTML output

In [32]:
import os
import json
import nltk
import numpy
from IPython.display import IFrame
from IPython.core.display import display

BLOG_DATA = "resources/ch05-webpages/feed.json"

HTML_TEMPLATE = """<html>
    <head>
        <title>%s</title>
        <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
    </head>
    <body>%s</body>
</html>"""

blog_data = json.loads(open(BLOG_DATA).read())

for post in blog_data:
   
    # Uses previously defined summarize function.
    post.update(summarize(post['content']))

    # You could also store a version of the full post with key sentences marked up
    # for analysis with simple string replacement...

    for summary_type in ['top_n_summary', 'mean_scored_summary']:
        post[summary_type + '_marked_up'] = '<p>%s</p>' % (post['content'], )
        for s in post[summary_type]:
            post[summary_type + '_marked_up'] = \
            post[summary_type + '_marked_up'].replace(s, '<strong>%s</strong>' % (s, ))

        filename = post['title'].replace("?", "") + '.summary.' + summary_type + '.html'
        f = open(os.path.join('resources', 'ch05-webpages', filename), 'w')
        html = HTML_TEMPLATE % (post['title'] + \
          ' Summary', post[summary_type + '_marked_up'],)
        
        f.write(html)
        f.close()

        print("Data written to", f.name)

# Display any of these files with an inline frame. This displays the
# last file processed by using the last value of f.name...

print("Displaying %s:" % f.name)
display(IFrame('files/%s' % f.name, '100%', '600px'))

Data written to resources/ch05-webpages/Returning to our senses.summary.top_n_summary.html
Data written to resources/ch05-webpages/Returning to our senses.summary.mean_scored_summary.html
Data written to resources/ch05-webpages/Techniques for designing to reduce risk.summary.top_n_summary.html
Data written to resources/ch05-webpages/Techniques for designing to reduce risk.summary.mean_scored_summary.html
Data written to resources/ch05-webpages/Wrapping an RxJS observable stream into an Angular service.summary.top_n_summary.html
Data written to resources/ch05-webpages/Wrapping an RxJS observable stream into an Angular service.summary.mean_scored_summary.html
Data written to resources/ch05-webpages/Four short links: 27 April 2018.summary.top_n_summary.html
Data written to resources/ch05-webpages/Four short links: 27 April 2018.summary.mean_scored_summary.html
Data written to resources/ch05-webpages/How to customize an Istio service mesh.summary.top_n_summary.html
Data written to resource

Data written to resources/ch05-webpages/Strong feedback loops make strong software teams.summary.top_n_summary.html
Data written to resources/ch05-webpages/Strong feedback loops make strong software teams.summary.mean_scored_summary.html
Data written to resources/ch05-webpages/Four short links: 10 April 2018.summary.top_n_summary.html
Data written to resources/ch05-webpages/Four short links: 10 April 2018.summary.mean_scored_summary.html
Data written to resources/ch05-webpages/Four short links: 9 April 2018.summary.top_n_summary.html
Data written to resources/ch05-webpages/Four short links: 9 April 2018.summary.mean_scored_summary.html
Data written to resources/ch05-webpages/Four short links: 6 April 2018.summary.top_n_summary.html
Data written to resources/ch05-webpages/Four short links: 6 April 2018.summary.mean_scored_summary.html
Data written to resources/ch05-webpages/It's time to usher in a new era of UX curation.summary.top_n_summary.html
Data written to resources/ch05-webpages/

## Example 8. Extracting entities from a text with NLTK

In [33]:
import nltk
import json

BLOG_DATA = "resources/ch05-webpages/feed.json"

blog_data = json.loads(open(BLOG_DATA).read())

for post in blog_data:

    sentences = nltk.tokenize.sent_tokenize(post['content'])
    tokens = [nltk.tokenize.word_tokenize(s) for s in sentences]
    pos_tagged_tokens = [nltk.pos_tag(t) for t in tokens]

    # Flatten the list since we're not using sentence structure
    # and sentences are guaranteed to be separated by a special
    # POS tuple such as ('.', '.')

    pos_tagged_tokens = [token for sent in pos_tagged_tokens for token in sent]

    all_entity_chunks = []
    previous_pos = None
    current_entity_chunk = []
    for (token, pos) in pos_tagged_tokens:

        if pos == previous_pos and pos.startswith('NN'):
            current_entity_chunk.append(token)
        elif pos.startswith('NN'):
            if current_entity_chunk != []:

                # Note that current_entity_chunk could be a duplicate when appended,
                # so frequency analysis again becomes a consideration

                all_entity_chunks.append((' '.join(current_entity_chunk), pos))
            current_entity_chunk = [token]

        previous_pos = pos

    # Store the chunks as an index for the document
    # and account for frequency while we're at it...

    post['entities'] = {}
    for c in all_entity_chunks:
        post['entities'][c] = post['entities'].get(c, 0) + 1

    # For example, we could display just the title-cased entities

    print(post['title'])
    print('-' * len(post['title']))
    proper_nouns = []
    for (entity, pos) in post['entities']:
        if entity.istitle():
            print('\t%s (%s)' % (entity, post['entities'][(entity, pos)]))
    print('\n')

Returning to our senses
-----------------------
	Tree Falls (1)
	Figure (2)
	Google Earth Engine (1)
	Google Earth Engine (1)
	Imazon (1)
	Amazon (1)


Techniques for designing to reduce risk
---------------------------------------
	Techniques (1)


Wrapping an RxJS observable stream into an Angular service
----------------------------------------------------------
	Date > (1)
	Observable (1)
	Date (1)
	Return (1)
	Create (1)
	Emit (1)
	Observer (1)
	Date (2)
	Component (1)
	> Custom (1)
	Inject (1)
	Subscribe (1)
	Stackblitz (1)
	Don ’ (1)
	Observable (1)
	Got (1)
	Node (1)
	Continue (1)


Four short links: 27 April 2018
-------------------------------
	Commerce (1)
	Faster Training (1)
	Formal Methods Death (1)
	A (1)
	World (1)
	Neuro-Evolution (1)
	Open Source Way (1)
	Dropbox (1)
	Santa (1)
	Great Theorem-Prover Showdown (1)
	Twitter (1)
	Leftpad (1)
	Note (1)
	Four Short Links (1)
	Monday (1)
	April (1)
	New (1)
	Four Short Links (2)
	Please (1)
	Continue (1)


How to customize a

Four short links: 19 April 2018
-------------------------------
	Multics (1)
	Community Relevance (1)
	Speech Synthesis (1)
	Multics (2)
	Unix (1)
	Art (1)
	Relevance (1)
	Text (1)
	Voice (1)
	Fitting (1)
	Synthesis (1)
	Phonological Loop (1)
	Code (1)
	Boundaries (1)
	Facebook Data (1)
	Today (1)
	Facebook ” (1)
	Facebook “ (1)
	Emerging (1)
	Strata Data Conference (1)
	London (1)
	May (1)
	Continue (1)


Four short links: 18 April 2018
-------------------------------
	Secure Devices Zulip (1)
	Mailtrain (1)
	Mailgun (1)
	Barack Obama (1)
	Obama (1)
	Buzzfeed (1)
	Seven Properties (1)
	Highly Secure (1)
	Devices (1)
	Microsoft (1)
	Impact (1)
	Business (1)
	Society (1)
	New York (1)
	April (1)
	Continue (1)


From USENET to Facebook: The second time as farce
-------------------------------------------------
	Hegel (1)
	Marx (2)
	Facebook ’ (1)
	Bad Month (1)
	Zuckerberg ’ (1)
	Apology Tour (1)
	Zeynep Tufecki (1)
	Facebook ’ (4)
	Tufekci (1)
	Zuckerberg (2)
	” Apology (1)
	Facebook ’ 

	Google Drive (1)
	Azure (1)
	Dropbox (1)
	Continue (1)


It's time to usher in a new era of UX curation
----------------------------------------------


Kyle Simpson and Tammy Everts on the challenges of the modern web
-----------------------------------------------------------------
	O ’ Reilly Programming Podcast (1)
	Fluent (1)
	O ’ Reilly Programming Podcast (1)
	O ’ Reilly Fluent Conference (1)
	July (1)
	San Jose (1)
	Kyle Simpson (2)
	Tammy Everts (1)
	Simpson (1)
	Cookbook (1)
	Don ’ (1)
	Everts (1)
	Time (1)
	Money (1)
	Business Value (1)
	Web Performance.Continue (1)
	Tammy Everts (1)


Four short links: 5 April 2018
------------------------------
	Notebooks (1)
	Interpersonal Dynamics (1)
	Julia Evans (1)
	Nature (1)
	Marwin Segler (1)
	University (1)
	Münster (1)
	Germany (1)
	Slashdot (1)
	Interpersonal Dynamics (1)
	Popmotion (1)
	Continue (1)


5 tips for architecting fast data applications
----------------------------------------------
	Considerations (1)
	Web (1)
	Goo

Four short links: 30 March 2018
-------------------------------
	Data Literacy (1)
	Data Science Readings (1)
	Bloated Data Architectures (1)
	Readings (1)
	Applied Data Science (1)
	Hadley Wickham (1)
	Stanford (1)
	Configuration (1)
	Single Thread (1)
	Alternative Musical Scales (1)
	Mark J. Nelson (1)
	Continue (1)


What machine learning engineers need to know
--------------------------------------------
	O ’ Reilly Data Show Podcast (1)
	Jesse Anderson (1)
	Paco Nathan (1)
	Apache Pulsar.In (1)
	Data Show (1)
	Jesse Anderson (1)
	Big Data Institute (1)
	Paco Nathan (1)
	Jupytercon (1)
	U.S (1)
	Eric Colson (1)
	Strata Data San Jose (1)


UX challenges in the Internet of Things
---------------------------------------
	Internet (1)


Four short links: 29 March 2018
-------------------------------
	Facebook Container (1)
	Publishing Future (1)
	Social Media Ethics (1)
	Online Virality Facebook Container (1)
	Firefox (1)
	Facebook (4)
	Facebook (2)
	Online Publishing (1)
	Doc Searls (

## Example 9. Discovering interactions between entities

In [34]:
import nltk
import json

BLOG_DATA = "resources/ch05-webpages/feed.json"

def extract_interactions(txt):
    sentences = nltk.tokenize.sent_tokenize(txt)
    tokens = [nltk.tokenize.word_tokenize(s) for s in sentences]
    pos_tagged_tokens = [nltk.pos_tag(t) for t in tokens]

    entity_interactions = []
    for sentence in pos_tagged_tokens:

        all_entity_chunks = []
        previous_pos = None
        current_entity_chunk = []

        for (token, pos) in sentence:

            if pos == previous_pos and pos.startswith('NN'):
                current_entity_chunk.append(token)
            elif pos.startswith('NN'):
                if current_entity_chunk != []:
                    all_entity_chunks.append((' '.join(current_entity_chunk),
                            pos))
                current_entity_chunk = [token]

            previous_pos = pos

        if len(all_entity_chunks) > 1:
            entity_interactions.append(all_entity_chunks)
        else:
            entity_interactions.append([])

    assert len(entity_interactions) == len(sentences)

    return dict(entity_interactions=entity_interactions,
                sentences=sentences)

blog_data = json.loads(open(BLOG_DATA).read())

# Display selected interactions on a per-sentence basis

for post in blog_data:

    post.update(extract_interactions(post['content']))

    print(post['title'])
    print('-' * len(post['title']))
    for interactions in post['entity_interactions']:
        print('; '.join([i[0] for i in interactions]))
    print('\n')

Returning to our senses
-----------------------
introduction; senses; design
Tree Falls; Forest… BRAZIL BEGAN USING SATELLITE
response; loggers; ranchers; rainforests
loggers; ranchers; areas; Figure; approach

view; Google Earth Engine; deforestation; Google Earth Engine; NGO; Imazon; monitoring
collaboration; wider range; deforestation
analysis; techniques; imagery; topologies
accuracy; assessments; contribution; ecosystem; vulnerability
emergence; roads
risks; agriculture; logging; areas
scenarios; management strategy; land use; preservation.Continue


Techniques for designing to reduce risk
---------------------------------------
range; scenarios; reading; Techniques


Wrapping an RxJS observable stream into an Angular service
----------------------------------------------------------
service; stream; values; UI; component; stream; values; ’ s dependency injection mechanism; business logic; services; UI
app; stream; values
show; service; stream; values; UI; component; stream; value

Toward the Jet Age of machine learning
--------------------------------------
challenges; efficiency; automation; safety; cooperation; researchers; engineers; academia; industry.Machine learning today; dawn
flights; Wright; brothers; Pioneer Age; aviation; decade; belief; flight; transportation
Machine learning; ML
breakthroughs; problems; image recognition; speech translation; language processing; technology; companies; billions; dollars
conviction; ML; key; society; ’

Wright; brothers; airplane; feet; flight; December


enthusiasm; Wright; brothers; century; aviation
Pioneer Age; aviation; sport
Jet Age; series; innovations; engineering—monoplane; wings; aluminum; designs; turbine; engines; stress testing

Decades; advances; engineering; Jet Age; behavior; challenges—e.g.

Simply; engineering
kind; engineering; ML
software development; software development; applications domains; e.g.; vision; speech; language; behaviors; operations; e.g.; networks; data sets
organizations; expertise

Meetings; unit

lot; ways
notes; founders
Xeno.graphics; weird
Strata Business Summit; sessions; Strata Data Conference; London
Continue; links


Traits you’ll find in good managers
-----------------------------------
Work; manager; Traits; find


Four short links: 20 April 2018
-------------------------------
Functional Programming; High-Dimensional Data; Games; Datavis; Container Management Interview; Simon Peyton-Jones; changes; type system; things; functions; algebraic; data types; GADTs; rank polymorphism; data types; Ph.D.; students; search; topic
fact; people; companies
people; companies; software; years
SPJ; creator; Haskell; thinkers
HyperTools; Python; toolbox

=; lot; columns
Data Visualization; exploration; space; structure
Titus; Netflix
companies; scale; problems; Amazon; Netflix; Google
Architecture; sessions; OSCON; July
Hurry—best; price; ends
Continue; links


Thinking beyond bots: How AI can drive social impact
----------------------------------------------------
way

t; apologies; Zuck
forerunners; USENET

USENET; system; Unix; users; posts; hundreds
tragedy; whimper
Wild West; sort; network; USENET; everything

groups; spam; problem
spam; pornography; software


answers; questions; users
bots; technology isn; ’; t
divide; USENET
Posts; newsgroups; moderator; rest
groups; prone
t immune; groups; places; discussion

*; newsgroups; anyone; reason; anything
thing; USENET; importance
moderation; Facebook
pieces; content; day; moderators; posts; seconds

USENET ’; decline; research; users; newbies; helpers; leaders; trolls; flamers; communications; help
basis; moderation; assistants; posts; moderators
Whether; moderators; posts
fun; troll

technology; Jeopardy; decade
logic; news ”; centers
systems; bots; Google ’; s; Gmail



Facebook; pages; Facebook; people
problem; Facebook ’; road
someone; hit

people; ’; s; Tufekci
USENET; metrics; means; users

network; ’; t optimize; engagement; hate; groups
Neo-Nazis; like
platform didn; ’; lead; ”; claim; sort

Simon Moss on using artificial intelligence to fight financial crimes
---------------------------------------------------------------------
Innovations; detection; response; attacks; systems.In; episode; O ’ Reilly Podcast; Simon Moss; vice president; industry consulting; solutions; Americas
machine learning; learning; techniques; crimes; credit card fraud; identity theft; health care fraud
Discussion; points; Moss; AI; techniques; set; weapons; ”; perpetrators
tide; crime. ”; AI; methods; identity fraud; health care fraud; money laundering; issues; needle
needle; stack
individuals; activity; normality; activity.; Moss; use; machine; techniques; rules; engine; rules; engine; behaviors; machine learning
scenarios; time; problem; angles.; ” Machine; learning; efficiency; detection process; “; data sources; data; case; seconds; decision; ’; ’; business activity; ’; something; investigation; criminality
links; Analytics; division; Teradata Moss ’; LinkedIn; article; fight; money laundering

Data engineers vs. data scientists
----------------------------------
positions; misperceptions; roles; teams; compromise productivity.It; ’; s; differences; data; engineer
differences; teams
misunderstanding; strengths; weaknesses
misconceptions; diagrams; data scientists

venn diagram; data scientists

Venn; diagrams; Figure


position; value; data pipelines
difference; base; skills

organizations; team; structures; t use; Venn; diagram; relationship; data; engineer


Diagram; core; competencies; data scientists; data engineers; overlapping
Illustration; Jesse Anderson
Data; scientists; skills; core; data; scientists; math; statistics; background

end; math; machine learning; models
software engineering; counterparts; data scientists

Data scientists; data; business; level; business
results; business
ability; results; observations; way
sentence definition; data; scientist; data; scientist; someone; math; statistics; data
data; scientist trait; ve; necessity

order; analysis; otherwis

Database Flow; use; server
Code; Data; Social; Sciences; handbook; insights; experts; code; data; terms
Data Science; Machine Learning; sessions; Strata Data Conference; London
Continue; links


Four short links: 9 April 2018
------------------------------
Monads; GDPR; Blockchain; Search; Monads; paper

computer; scientists; programmers
Publishers; GDPR; explanation; GDPR; companies; Facebook; Google; content
Blockchain; Technology; Bad Vision; Future; person; existence; problem; blockchain solution; way
Typesense; source typo; search engine; delivers; results
Law; ethics; governance; sessions; Strata Data Conference; London
Continue; links


Four short links: 6 April 2018
------------------------------
Management; Flame Graphs; Silent Speech Interface; Cloud Backup Thou Shalt; Depend; Me; ACM; %; websites; library; libraries; ways; room; improvement; handling
FlameScope; Netflix; source visualization tool; time
Netflix Tech Blog; AlterEgo; Personalized; Silent Speech Interface; resul

It's time to rebuild the web
----------------------------
web; gardens; content; publishers; cacophony; sites; Dash; Missing Building Blocks; Web; article; web
technologies; web; possibility
technologies; web; humane
times; internet; rebuild
technologies; anything
technologies; web; blockchains; onion routing; revolution; interface design; chance; playground
contrast; Dash; building
people; unicorn; worth; experience; web; developers; security
Dash; demise; View Source; browser feature; HTML

web; part; people; background; source; pages; code
Today; View Source; browsers; complexity; web
bits; megabytes; JavaScript


piece; draft; HTML

Dash; points; Netscape Gold; version; Netscape; day; editors
formatting; layout
Ask; designer; simplicity; wins
View Source; useless
simple; sites; sites; viewers; code
developer; Facebook; source; site; CSS
web; gardens; content; Facebook; YouTube; Twitter
cacophony; sites



problem; megasites
Facebook; content; ocean; random; sites
relatives; friends

------------------------------
Internet; Battle Things; Program Fuzzing; Data Sheets; Data Sets; Retro Port Challenges; Characteristics; Intelligent Autonomy; Internet; Battle Things; Highly Adversarial Environments; Numerous; things; battlefield; future; collaboration; warfighters; teams
paper; characteristics; capabilities; intelligence; network; things; humans—Internet; Battle Things
challenges; generation; AI
Slashdot; T-Fuzz; Fuzzing; Program Transformation
coverage; approaches; imprecise; heuristics; input mutation; techniques; e.g.; execution; taint analysis; sanity
method; tackles; coverage; angle; sanity; checks
T-Fuzz; fuzzer
fuzzer; code; paths; lightweight; technique; input; checks

program; code; checks
Data; Sheets; Data Sets; way; data set; characteristics; motivations
issue; concept; data sheet; data sets; document; data sets; APIs
Prince; Persia; BBC Master; author; game; Jordan Mechner; source code; Apple II

creativity; people
web stuff; days
Continue; links


Four s

## Example 10. Visualizing interactions between entities with HTML output

In [36]:
import os
import json
import nltk
from IPython.display import IFrame
from IPython.core.display import display

BLOG_DATA = "resources/ch05-webpages/feed.json"

HTML_TEMPLATE = """<html>
    <head>
        <title>%s</title>
        <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
    </head>
    <body>%s</body>
</html>"""

blog_data = json.loads(open(BLOG_DATA).read())

for post in blog_data:

    post.update(extract_interactions(post['content']))

    # Display output as markup with entities presented in bold text

    post['markup'] = []

    for sentence_idx in range(len(post['sentences'])):

        s = post['sentences'][sentence_idx]
        for (term, _) in post['entity_interactions'][sentence_idx]:
            s = s.replace(term, '<strong>%s</strong>' % (term, ))

        post['markup'] += [s] 
            
    filename = post['title'].replace("?", "") + '.entity_interactions.html'
    f = open(os.path.join('resources', 'ch05-webpages', filename), 'w')
    html = HTML_TEMPLATE % (post['title'] + ' Interactions', 
                            ' '.join(post['markup']),)
    f.write(html)
    f.close()

    print("Data written to", f.name)
    
    # Display any of these files with an inline frame. This displays the
    # last file processed by using the last value of f.name...
    
    print("Displaying %s:" % f.name)
    display(IFrame('files/%s' % f.name, '100%', '600px'))

Data written to resources/ch05-webpages/Returning to our senses.entity_interactions.html
Displaying resources/ch05-webpages/Returning to our senses.entity_interactions.html:


Data written to resources/ch05-webpages/Techniques for designing to reduce risk.entity_interactions.html
Displaying resources/ch05-webpages/Techniques for designing to reduce risk.entity_interactions.html:


Data written to resources/ch05-webpages/Wrapping an RxJS observable stream into an Angular service.entity_interactions.html
Displaying resources/ch05-webpages/Wrapping an RxJS observable stream into an Angular service.entity_interactions.html:


Data written to resources/ch05-webpages/Four short links: 27 April 2018.entity_interactions.html
Displaying resources/ch05-webpages/Four short links: 27 April 2018.entity_interactions.html:


Data written to resources/ch05-webpages/How to customize an Istio service mesh.entity_interactions.html
Displaying resources/ch05-webpages/How to customize an Istio service mesh.entity_interactions.html:


Data written to resources/ch05-webpages/Teaching and implementing data science and AI in the enterprise.entity_interactions.html
Displaying resources/ch05-webpages/Teaching and implementing data science and AI in the enterprise.entity_interactions.html:


Data written to resources/ch05-webpages/Building tools for the AI applications of tomorrow.entity_interactions.html
Displaying resources/ch05-webpages/Building tools for the AI applications of tomorrow.entity_interactions.html:


Data written to resources/ch05-webpages/Four short links: 26 April 2018.entity_interactions.html
Displaying resources/ch05-webpages/Four short links: 26 April 2018.entity_interactions.html:


Data written to resources/ch05-webpages/Four short links: 25 April 2018.entity_interactions.html
Displaying resources/ch05-webpages/Four short links: 25 April 2018.entity_interactions.html:


Data written to resources/ch05-webpages/Toward the Jet Age of machine learning.entity_interactions.html
Displaying resources/ch05-webpages/Toward the Jet Age of machine learning.entity_interactions.html:


Data written to resources/ch05-webpages/Four short links: 24 April 2018.entity_interactions.html
Displaying resources/ch05-webpages/Four short links: 24 April 2018.entity_interactions.html:


Data written to resources/ch05-webpages/The Intertwingularity is near: When humans transcend print media.entity_interactions.html
Displaying resources/ch05-webpages/The Intertwingularity is near: When humans transcend print media.entity_interactions.html:


Data written to resources/ch05-webpages/Four short links: 23 April 2018.entity_interactions.html
Displaying resources/ch05-webpages/Four short links: 23 April 2018.entity_interactions.html:


Data written to resources/ch05-webpages/Traits you’ll find in good managers.entity_interactions.html
Displaying resources/ch05-webpages/Traits you’ll find in good managers.entity_interactions.html:


Data written to resources/ch05-webpages/Four short links: 20 April 2018.entity_interactions.html
Displaying resources/ch05-webpages/Four short links: 20 April 2018.entity_interactions.html:


Data written to resources/ch05-webpages/Thinking beyond bots: How AI can drive social impact.entity_interactions.html
Displaying resources/ch05-webpages/Thinking beyond bots: How AI can drive social impact.entity_interactions.html:


Data written to resources/ch05-webpages/5 best practices for delivering design critiques.entity_interactions.html
Displaying resources/ch05-webpages/5 best practices for delivering design critiques.entity_interactions.html:


Data written to resources/ch05-webpages/How to run a custom version of Spark on hosted Kubernetes.entity_interactions.html
Displaying resources/ch05-webpages/How to run a custom version of Spark on hosted Kubernetes.entity_interactions.html:


Data written to resources/ch05-webpages/Four short links: 19 April 2018.entity_interactions.html
Displaying resources/ch05-webpages/Four short links: 19 April 2018.entity_interactions.html:


Data written to resources/ch05-webpages/Four short links: 18 April 2018.entity_interactions.html
Displaying resources/ch05-webpages/Four short links: 18 April 2018.entity_interactions.html:


Data written to resources/ch05-webpages/From USENET to Facebook: The second time as farce.entity_interactions.html
Displaying resources/ch05-webpages/From USENET to Facebook: The second time as farce.entity_interactions.html:


Data written to resources/ch05-webpages/Four short links: 17 April 2018.entity_interactions.html
Displaying resources/ch05-webpages/Four short links: 17 April 2018.entity_interactions.html:


Data written to resources/ch05-webpages/Relato: Turking the business graph.entity_interactions.html
Displaying resources/ch05-webpages/Relato: Turking the business graph.entity_interactions.html:


Data written to resources/ch05-webpages/The eight rules of good documentation.entity_interactions.html
Displaying resources/ch05-webpages/The eight rules of good documentation.entity_interactions.html:


Data written to resources/ch05-webpages/Stephen Gates on the growing risks posed by malicious bots.entity_interactions.html
Displaying resources/ch05-webpages/Stephen Gates on the growing risks posed by malicious bots.entity_interactions.html:


Data written to resources/ch05-webpages/Simon Moss on using artificial intelligence to fight financial crimes.entity_interactions.html
Displaying resources/ch05-webpages/Simon Moss on using artificial intelligence to fight financial crimes.entity_interactions.html:


Data written to resources/ch05-webpages/Four short links: 16 April 2018.entity_interactions.html
Displaying resources/ch05-webpages/Four short links: 16 April 2018.entity_interactions.html:


Data written to resources/ch05-webpages/Four short links: 13 April 2018.entity_interactions.html
Displaying resources/ch05-webpages/Four short links: 13 April 2018.entity_interactions.html:


Data written to resources/ch05-webpages/Jupyter is where humans and data science intersect.entity_interactions.html
Displaying resources/ch05-webpages/Jupyter is where humans and data science intersect.entity_interactions.html:


Data written to resources/ch05-webpages/The importance of transparency and user control in machine learning.entity_interactions.html
Displaying resources/ch05-webpages/The importance of transparency and user control in machine learning.entity_interactions.html:


Data written to resources/ch05-webpages/Four short links: 12 April 2018.entity_interactions.html
Displaying resources/ch05-webpages/Four short links: 12 April 2018.entity_interactions.html:


Data written to resources/ch05-webpages/Using qualitative and quantitative data to design better user experiences.entity_interactions.html
Displaying resources/ch05-webpages/Using qualitative and quantitative data to design better user experiences.entity_interactions.html:


Data written to resources/ch05-webpages/Probing the pill box: Repurposing drugs for new treatments.entity_interactions.html
Displaying resources/ch05-webpages/Probing the pill box: Repurposing drugs for new treatments.entity_interactions.html:


Data written to resources/ch05-webpages/Data engineers vs. data scientists.entity_interactions.html
Displaying resources/ch05-webpages/Data engineers vs. data scientists.entity_interactions.html:


Data written to resources/ch05-webpages/Four short links: 11 April 2018.entity_interactions.html
Displaying resources/ch05-webpages/Four short links: 11 April 2018.entity_interactions.html:


Data written to resources/ch05-webpages/4 things business leaders should know as they explore AI and deep learning.entity_interactions.html
Displaying resources/ch05-webpages/4 things business leaders should know as they explore AI and deep learning.entity_interactions.html:


Data written to resources/ch05-webpages/Strong feedback loops make strong software teams.entity_interactions.html
Displaying resources/ch05-webpages/Strong feedback loops make strong software teams.entity_interactions.html:


Data written to resources/ch05-webpages/Four short links: 10 April 2018.entity_interactions.html
Displaying resources/ch05-webpages/Four short links: 10 April 2018.entity_interactions.html:


Data written to resources/ch05-webpages/Four short links: 9 April 2018.entity_interactions.html
Displaying resources/ch05-webpages/Four short links: 9 April 2018.entity_interactions.html:


Data written to resources/ch05-webpages/Four short links: 6 April 2018.entity_interactions.html
Displaying resources/ch05-webpages/Four short links: 6 April 2018.entity_interactions.html:


Data written to resources/ch05-webpages/It's time to usher in a new era of UX curation.entity_interactions.html
Displaying resources/ch05-webpages/It's time to usher in a new era of UX curation.entity_interactions.html:


Data written to resources/ch05-webpages/Kyle Simpson and Tammy Everts on the challenges of the modern web.entity_interactions.html
Displaying resources/ch05-webpages/Kyle Simpson and Tammy Everts on the challenges of the modern web.entity_interactions.html:


Data written to resources/ch05-webpages/Four short links: 5 April 2018.entity_interactions.html
Displaying resources/ch05-webpages/Four short links: 5 April 2018.entity_interactions.html:


Data written to resources/ch05-webpages/5 tips for architecting fast data applications.entity_interactions.html
Displaying resources/ch05-webpages/5 tips for architecting fast data applications.entity_interactions.html:


Data written to resources/ch05-webpages/What becomes of the broken hearted Blueprint of a donor-free world using custom heart technologies.entity_interactions.html
Displaying resources/ch05-webpages/What becomes of the broken hearted Blueprint of a donor-free world using custom heart technologies.entity_interactions.html:


Data written to resources/ch05-webpages/Four short links: 4 April 2018.entity_interactions.html
Displaying resources/ch05-webpages/Four short links: 4 April 2018.entity_interactions.html:


Data written to resources/ch05-webpages/100+ new live online trainings just launched on O'Reilly's learning platform.entity_interactions.html
Displaying resources/ch05-webpages/100+ new live online trainings just launched on O'Reilly's learning platform.entity_interactions.html:


Data written to resources/ch05-webpages/It's time to rebuild the web.entity_interactions.html
Displaying resources/ch05-webpages/It's time to rebuild the web.entity_interactions.html:


Data written to resources/ch05-webpages/It’s time for data ethics conversations at your dinner table.entity_interactions.html
Displaying resources/ch05-webpages/It’s time for data ethics conversations at your dinner table.entity_interactions.html:


Data written to resources/ch05-webpages/How companies around the world apply machine learning.entity_interactions.html
Displaying resources/ch05-webpages/How companies around the world apply machine learning.entity_interactions.html:


Data written to resources/ch05-webpages/Four short links: 3 April 2018.entity_interactions.html
Displaying resources/ch05-webpages/Four short links: 3 April 2018.entity_interactions.html:


Data written to resources/ch05-webpages/Four short links: 2 April 2018.entity_interactions.html
Displaying resources/ch05-webpages/Four short links: 2 April 2018.entity_interactions.html:


Data written to resources/ch05-webpages/6 creative ways to solve problems with Linux containers and Docker.entity_interactions.html
Displaying resources/ch05-webpages/6 creative ways to solve problems with Linux containers and Docker.entity_interactions.html:


Data written to resources/ch05-webpages/Four short links: 30 March 2018.entity_interactions.html
Displaying resources/ch05-webpages/Four short links: 30 March 2018.entity_interactions.html:


Data written to resources/ch05-webpages/What machine learning engineers need to know.entity_interactions.html
Displaying resources/ch05-webpages/What machine learning engineers need to know.entity_interactions.html:


Data written to resources/ch05-webpages/UX challenges in the Internet of Things.entity_interactions.html
Displaying resources/ch05-webpages/UX challenges in the Internet of Things.entity_interactions.html:


Data written to resources/ch05-webpages/Four short links: 29 March 2018.entity_interactions.html
Displaying resources/ch05-webpages/Four short links: 29 March 2018.entity_interactions.html:


Data written to resources/ch05-webpages/A graphical user interface to build apps on top of microservices.entity_interactions.html
Displaying resources/ch05-webpages/A graphical user interface to build apps on top of microservices.entity_interactions.html:


Data written to resources/ch05-webpages/Four short links: 28 March 2018.entity_interactions.html
Displaying resources/ch05-webpages/Four short links: 28 March 2018.entity_interactions.html:


Data written to resources/ch05-webpages/Four short links: 27 March 2018.entity_interactions.html
Displaying resources/ch05-webpages/Four short links: 27 March 2018.entity_interactions.html:


In [38]:
from platform import python_version
print("Code updated to Python 3, developed on ipynb in Python version:"+ python_version())

Code updated to Python 3, developed on ipynb in Python version:3.6.1
