In [15]:
url = 'https://stratechery.com/2019/the-cost-of-apple-news/?utm_source=pocket&utm_medium=email&utm_campaign=pockethits'
url = 'https://www.nytimes.com/2019/01/15/world/europe/brexit-vote-theresa-may.html'
url = 'https://hbr.org/2019/02/why-you-should-work-less-and-spend-more-time-on-hobbies?utm_source=pocket&utm_medium=email&utm_campaign=pockethits'

### Canonization
Convert link to a canonized format, e.g. remote utm references, remove unncessary suffices etc

In [16]:
import w3lib.url
import re

def canonization(url):
    url = w3lib.url.canonicalize_url(url)
    if "utm_" in url:
        matches = re.findall(r'(.+\?)([^#]*)(.*)', url)
        if len(matches) > 0:
            match = matches[0]
            query = match[1]
            sanitized_query = '&'.join([p for p in query.split('&') if not p.startswith('utm_')])
            url = match[0] + sanitized_query + match[2]
            if url.endswith('?'):
                return url[:-1]
    return url

url = canonization(url)
print(url)

https://hbr.org/2019/02/why-you-should-work-less-and-spend-more-time-on-hobbies


### Link to id

Convert link to article id (possible using hash) so that we can check whether same link is present for some other users. We can resuse command properties from existing article if present.

In [17]:
import hashlib

def link_to_id(url):
    url = url.encode('utf-8')
    m = hashlib.sha256()
    m.update(url)
    return m.hexdigest()

article_id = link_to_id(url)
print(article_id)

bbc769bb4eb547c68d26f4151cab90164edead651ef84924468482cb354ba18d


### Parse the article

Using newspaper lib, parse the article. Goal is to extract the markdown and text format of the article. Not sure whether we can use the summary, authors etc that is extracted.

In [27]:
import newspaper
import pypandoc

def clean_markdown(markdown):
    markdown = re.sub(r'{\..*\n*.*}', '', markdown)
    markdown = re.sub(r'{#.*\n*.*}', '', markdown)
    return markdown

def is_failed(text):
    text = text.lower()
    if 'make sure your browser supports javascript' in text:
        return True
    return False

def extract_article(url):
    article = newspaper.Article(url=url, keep_article_html=True)
    try:
        article.download()
        article.parse()
        article.nlp()

        if is_failed(article.summary):
            print('Parsing failed')
            return ('', '', [])
    except Exception as e:
        print('Error occurred', e)
        return ('', '', [])

    article_html = article.article_html.strip()
    if article_html.startswith('<div>'):
        article_html = article_html[len('<div>'):-len('</div>')]

    article_md = pypandoc.convert_text(article_html, 'md', format='html')
    article_md = clean_markdown(article_md)
    
    article_txt = pypandoc.convert_text(article_html, 'asciidoc', format='html')

    def lx(l): return set(l) if l else set([])

    keywords = lx(article.keywords).union(lx(article.tags))
    keywords = [k.lower() for k in keywords]
    
    return (article_html, article_md, article_txt, keywords)

r = extract_article(url)

In [29]:
print r[1]

As professionals around the world feel increasingly pressed for time,
they’re giving up on things that matter to them. A recent HBR article
noted that in surveys, most people “could name several activities, such
as pursuing a hobby, that they’d like to have time for.” This is more
significant than it may sound, because it isn’t just individuals who are
missing out. When people don’t have time for hobbies, businesses pay a
price. Hobbies can make employees substantially better at their jobs for
three reasons: they reawaken your creativity, give you a fresh
perspective, and bolster your confidence.

![](/resources/images/article_assets/2019/02/Feb19_07_682304925.jpg) Tara Moore/Getty
Images

As professionals around the world feel increasingly pressed for time,
they’re giving up on things that matter to them. A recent HBR
[article](/cover-story/2019/01/time-for-happiness) noted that in
surveys, most people “could name several activities, such as pursuing a
hobby, that they’d like to have 

In [119]:
txt = r[1]

In [134]:
print r[3]

[u'parliament', u'process', u'crushing', u'votes', u'minister', u'vote', u'prime', u'brexit', u'uncertain', u'withdrawing', u'face', u'future', u'withdraw', u'defeat', u'weeks', u'european']


In [22]:
import textacy

txt1 = u"""
It's Monday, the dreadful countdown has started. You're already thinking about the end of the week, and it barely started. As the days go by, you are fixated on Friday 5pm. By Friday afternoon, your mind is so overfilled with the prospect of the two-day break that you can barely get anything done anymore. Some of your co-workers are not even at their desks anymore; they left early. When it's your turn to leave, you breathe a sigh of relief. It's the moment you've been waiting for. The start of the weekend.

However, what's so odd is that it's already Monday again. The weekend was a blur. Everything that didn't fit the workday was squeezed into the weekend. Groceries, laundry, chores, medical appointments and so on. By the time you've finished all that it's Sunday evening. Just like work, the weekend made you tired. You want to idle, but tomorrow's Monday and you've already begun thinking about work. You don't have time to do anything anymore because you need to sleep early to wake up for work on time.
"""

In [23]:
txt = textacy.preprocess_text(txt1, lowercase=True, no_punct=True, no_urls=True, no_emails=True, no_phone_numbers=True, no_numbers=True, no_contractions=True)

In [24]:
print txt

it s monday the dreadful countdown has started you are already thinking about the end of the week and it barely started as the days go by you are fixated on friday 5pm by friday afternoon your mind is so overfilled with the prospect of the two day break that you can barely get anything done anymore some of your co workers are not even at their desks anymore they left early when it s your turn to leave you breathe a sigh of relief it s the moment you have been waiting for the start of the weekend however what s so odd is that it s already monday again the weekend was a blur everything that did not fit the workday was squeezed into the weekend groceries laundry chores medical appointments and so on by the time you have finished all that it s sunday evening just like work the weekend made you tired you want to idle but tomorrow s monday and you have already begun thinking about work you do not have time to do anything anymore because you need to sleep early to wake up for work on time


In [26]:
doc = textacy.Doc(txt, lang=u'en')

In [27]:
list(textacy.extract.named_entities(doc))

[monday,
 end of the week,
 friday 5pm,
 friday,
 afternoon,
 two day,
 start of the weekend,
 monday,
 weekend,
 weekend,
 sunday,
 evening,
 weekend]

In [43]:
textacy.keyterms.textrank(doc, normalize='lemma', n_keyterms=10)

[(u'weekend', 0.08097662938118627),
 (u'monday', 0.05164902855670848),
 (u'day', 0.04818167258919268),
 (u'friday', 0.03879800635707472),
 (u'time', 0.03506380177753061),
 (u'desk', 0.027897214786210996),
 (u'turn', 0.027887739263186346),
 (u'worker', 0.027664081063982205),
 (u'sigh', 0.027641401070913483),
 (u'co', 0.027114361874702838)]

In [42]:
textacy.keyterms.sgrank(doc, ngrams=(1, 2, 3, 4), normalize='lower', n_keyterms=0.1)

[(u'laundry chores medical appointments', 0.1699350676097159),
 (u'monday', 0.12129439765330248),
 (u'friday afternoon your mind', 0.11455718853607658),
 (u'groceries laundry chores medical', 0.09482761805609785),
 (u'weekend groceries laundry chores', 0.06607233607247258),
 (u'dreadful countdown', 0.05516403134445235),
 (u'co workers', 0.04478560298415586),
 (u'day break', 0.041029524939625),
 (u'days', 0.02139083074833643),
 (u'pm', 0.02052379973416223),
 (u'prospect', 0.019230783488795745),
 (u'sunday evening', 0.018948361375412625),
 (u'time', 0.018523952527400175),
 (u'weekend', 0.0180801644145477),
 (u'moment', 0.016982582819967573),
 (u'relief', 0.016860373322372023),
 (u'odd', 0.01682895300579131),
 (u'blur', 0.01682000176057459),
 (u'start', 0.01658340687760984),
 (u'desks', 0.01639140781837948)]

In [30]:
ts = textacy.TextStats(doc)

In [31]:
ts.n_words

195

In [32]:
ts.flesch_kincaid_grade_level

14.68769230769231

In [33]:
ts.readability_stats

{u'automated_readability_index': 17.465538461538465,
 u'coleman_liau_index': 7.65347268205128,
 u'flesch_kincaid_grade_level': 14.68769230769231,
 u'flesch_reading_ease': 59.22230769230771,
 u'gulpease_index': 55.51282051282051,
 u'gunning_fog_index': 17.035897435897436,
 u'lix': 57.97435897435898,
 u'smog_index': 9.888512548439397,
 u'wiener_sachtextformel': 6.3195435897435885}

In [51]:
statements = textacy.extract.semistructured_statements(doc, 'monday')
for s in statements:
    print s

In [52]:
ents = textacy.extract.named_entities(doc)
for ent in ents:
    print ent

monday
end of the week
friday 5pm
friday
afternoon
two day
start of the weekend
monday
weekend
weekend
sunday
evening
weekend


In [55]:
textrank = textacy.keyterms.textrank(doc)

In [57]:
print textrank

[(u'weekend', 0.08097662938118627), (u'monday', 0.05164902855670848), (u'day', 0.04818167258919268), (u'friday', 0.03879800635707472), (u'time', 0.03506380177753061), (u'desk', 0.027897214786210996), (u'turn', 0.027887739263186346), (u'worker', 0.027664081063982205), (u'sigh', 0.027641401070913483), (u'co', 0.027114361874702838)]


In [61]:
ns = textacy.keyterms.key_terms_from_semantic_network(doc, join_key_words=True)
for n in ns:
    print n

(u'weekend', 0.08097662938118627)
(u'monday', 0.05164902855670848)
(u'day', 0.04818167258919268)
(u'friday', 0.03879800635707472)
(u'time', 0.03506380177753061)
(u'desk', 0.027897214786210996)
(u'turn', 0.027887739263186346)
(u'worker', 0.027664081063982205)


In [36]:
import textacy.lexicon_methods

r = textacy.lexicon_methods.emotional_valence(doc.tokens)

In [39]:
for k, v in r.iteritems():
    print k, v

AFRAID 0.1093850104
ANGRY 0.11387720775
ANNOYED 0.120148705723
SAD 0.126134382228
DONT_CARE 0.129405291891
AMUSED 0.131473752158
INSPIRED 0.147904499604
HAPPY 0.123881667119


In [131]:
textacy.lexicon_methods.emotional_valence(doc.tokens)

defaultdict(float,
            {'AFRAID': 0.12476944952500002,
             'AMUSED': 0.12697590541249992,
             'ANGRY': 0.12105484752499998,
             'ANNOYED': 0.13344697778749998,
             'DONT_CARE': 0.12488479982499998,
             'HAPPY': 0.12623712858749997,
             'INSPIRED': 0.11847041011249995,
             'SAD': 0.12416048125000001})

In [13]:
txt1 = u"""
At least 157 miners died and more than 200 remain trapped underground after an explosion and fire in a coal mine in the western Turkish province of Manisa on Tuesday, May 13, officials said. Rescue workers are trying desperately to reach the scores of trapped miners and have managed to evacuate around 50. An explosion at the mine was believed to have been triggered by a faulty electrical transformer. Fire officials pumped clean air into the mine shaft for the survivors, many of whom are stuck in an area 4 kilometers from the entrance.
"""

In [17]:
txt1 = textacy.preprocess.replace_urls(txt1)
txt1 = textacy.preprocess.replace_emails(txt1)
txt1 = textacy.preprocess_text(txt1, lowercase=True, no_punct=True)

ImportError: cannot import name viewkeys

In [114]:
doc1 = textacy.Doc(txt1, lang=u'en')

In [115]:
textacy.lexicon_methods.emotional_valence(doc1.tokens)

defaultdict(float,
            {'AFRAID': 0.18222059750943403,
             'AMUSED': 0.0991074739433962,
             'ANGRY': 0.12722538890566035,
             'ANNOYED': 0.0979675382692308,
             'DONT_CARE': 0.09134906949056605,
             'HAPPY': 0.10248973754716985,
             'INSPIRED': 0.10514038488679246,
             'SAD': 0.19634825343396226})

In [4]:
html = u"""
<html><body><div><div class="article-executive-summary is-hidden">\n\t\t\t\t\t\t<h4 class="text-gray-light mbn mt-large2">Executive Summary</h4>\n\t\t\t\t\t\t<p>As professionals around the world feel increasingly pressed for time, they’re giving up on things that matter to them. A recent HBR article noted that in surveys, most people “could name several activities, such as pursuing a hobby, that they’d like to have time for.” This is more significant than it may sound, because it isn’t just individuals who are missing out. When people don’t have time for hobbies, businesses pay a price. Hobbies can make employees substantially better at their jobs for three reasons: they reawaken your creativity, give you a fresh perspective, and bolster your confidence.</p>\n\n\t\t\t\t\t</div>\n\t\t\t\t<div class="article article-first-row">\n\t\t\t\t\t\t<div class="mbn pbn"> \n <figure>\n  <img class="alignnone size-full wp-image-223624" src="/resources/images/article_assets/2019/02/Feb19_07_682304925.jpg" srcset="/resources/images/article_assets/2019/02/Feb19_07_682304925.jpg 1200w, /resources/images/article_assets/2019/02/Feb19_07_682304925-300x169.jpg 300w, /resources/images/article_assets/2019/02/Feb19_07_682304925-768x432.jpg 768w, /resources/images/article_assets/2019/02/Feb19_07_682304925-1024x576.jpg 1024w, /resources/images/article_assets/2019/02/Feb19_07_682304925-500x281.jpg 500w, /resources/images/article_assets/2019/02/Feb19_07_682304925-383x215.jpg 383w, /resources/images/article_assets/2019/02/Feb19_07_682304925-700x394.jpg 700w, /resources/images/article_assets/2019/02/Feb19_07_682304925-850x478.jpg 850w" alt="" sizes="(min-width: 48em) 55.7291667vw, 97.3924381vw"/>\n  <figcaption class="credit ptn mtn">\n   Tara Moore/Getty Images\n  </figcaption>\n </figure> \n</div>\n<p>As professionals around the world feel increasingly pressed for time, they’re giving up on things that matter to them. A recent HBR <a href="/cover-story/2019/01/time-for-happiness">article</a> noted that in surveys, most people “could name several activities, such as pursuing a hobby, that they’d like to have time for.”</p>\n<p>This is more significant than it may sound, because it isn’t just individuals who are missing out. When people don’t have time for hobbies, businesses pay a price. Hobbies can make workers substantially better at their jobs. I know this from personal experience. I’ve always <a href="https://officialgaetano.com/">loved</a> playing the guitar and composing. But just like workers everywhere, I can fall into the trap of feeling that I have no time to engage in it. As head of demand generation for Nextiva, I have enough on my plate to keep me busy around the clock. I can easily fall into the trap of the “<a href="/2013/09/welcome-to-the-72-hour-work-we">72-hour workweek</a>,” which takes into account time people spend connected to work on our phones outside of official work hours.</p>\n<p>When I crash, there’s always the temptation to do something sedentary and mindless. It’s little surprise that watching TV is by far the most popular use of leisure time <a href="https://www.bls.gov/charts/american-time-use/activity-leisure-by-sex.htm">in the U.S.</a> and tops the list elsewhere as well, including <a href="https://www.dw.com/en/tv-tops-germanys-most-popular-leisure-activities/a-45371410">Germany</a> and <a href="https://www.statista.com/statistics/553674/free-time-activities-adult-participation-uk-england/">England</a>.</p>\n<p>But by spending time on music, I boost some of my most important workplace skills.</p>\n<p><strong>Creativity.</strong> To stand out and compete in today’s crowded and constantly changing business environment, organizations need new, innovative ideas that will rise above the noise. I’m tasked with constantly looking for new ways to attract attention from potential buyers. But coming up with a fully original idea can be difficult when your mind is filled with targets, metrics, and deadlines.</p>\n<p>A creative hobby pulls you out of all that. Whether you’re a musician, artist, writer, or cook, you often start with a blank canvas in your mind. You simply think: What will I create that will evoke the emotion I’m going for?</p>\n<p>It’s no surprise that by giving yourself this mental space, and focusing on <em>feelings</em>, you can reawaken your creativity. Neuroscientists have <a href="/2015/12/how-to-free-your-innate-creativity">found</a> that rational thought and emotions involve different parts of the brain. For the floodgates of creativity to open, both must be in play.</p>\n<p><strong>Perspective.</strong> One of the trickiest tasks in the creative process is thinking through how someone else would experience your idea. But in doing creative hobbies, people think that way all the time. A potter imagines how the recipient of a vase would respond to it. A mystery novelist considers whether an unsuspecting reader will be surprised by a plot twist.</p>\n<p>When I take a break from work to go make music, I reconnect with that perspective. I keep thinking about how someone hearing my song for the first time might respond. I do all I can to see (or hear) the world <a href="/2012/11/train-your-people-to-take-others-perspectives">through someone else’s eyes</a> (or ears). Then, when I resume the work project, I take that mentality with me.</p>\n<p><strong>Confidence. </strong>When I face a tough challenge at work and feel stymied, I can start to question whether I’ll ever figure out a successful solution. It’s easy to lose <a href="https://ideas.ted.com/david-kelley-on-the-need-for-creative-confidence/">creative confidence</a>. But after an hour of shredding on the guitar, hitting notes perfectly, I’m feeling good. I can tell that my brain was craving that kind of satisfaction. And when I face that work project again, I bring the confidence with me.</p>\n<p>It turns out people like me have been studied. In one study, researchers <a href="https://onlinelibrary.wiley.com/doi/full/10.1111/joop.12064">found</a> that “creative activity was positively associated with recovery experiences (i.e., mastery, control, and relaxation) and performance‐related outcomes (i.e., job creativity and extra‐role behaviors).” In fact, they wrote, “Creative activity while away from work may be a leisure activity that provides employees essential resources to perform at a high level.”</p>\n<p>So to my fellow professionals, I highly recommend taking some time to keep up your creative hobby. It doesn’t have to be long. A <a href="https://newsblog.drexel.edu/2016/09/16/study-just-45-minutes-of-art-making-improves-self-confidence/">study</a> found that spending 45 minutes making art helps boost someone’s confidence and ability to complete tasks.</p>\n<p>I also suggest you encourage your business to celebrate employees’ hobbies. Zappos <a href="https://www.skyword.com/contentstandard/creativity/4-ways-zappos-organizational-culture-inspires-creativity/">puts employee artwork</a> up on its walls and encourages people to decorate their desks in whatever ways they wish. Some businesses\xa0<a href="https://smallbusiness.chron.com/funny-corporate-talent-show-ideas-18482.html">hold talent shows</a>. Even employees who may not have these kinds of talents should be encouraged to do something that <em>feels</em> creative and fun. Some CEOs spend time <a href="/2018/10/why-ceos-devote-so-much-time-to-their-hobbies">on their own hobbies</a>, setting the right example.</p>\n<p>And when you find a little time for a creative hobby break, make it guilt free. After all, when you do this, everyone stands to gain.</p>\n\t\t\t\t\t\n\t\t\t\t</div>\n\t\t\t</div></body></html>
"""

In [23]:
import pypandoc
md = pypandoc.convert_text(html, 'md', format='html')
md = clean_markdown(md)

In [10]:
import html2text

ImportError: No module named html2text

In [1]:
import requests, re, htmlentitydefs

def markdown(text):
  for nuketagblock in ['title', 'head']:
    text = noTagBlock(text, nuketagblock)
  text = justBody(text)
  text = noComments(text)
  for nuketagblock in ['script', 'style', 'noscript', 'form',
    'object', 'embed', 'select']:
    text = noTagBlock(text, nuketagblock)
  text = stripParams(text)
  text = lowercaseTags(text)
  text = listNuker(text)
  for nuketag in ['div', 'span', 'img', 'a', 'b', 'i', 'param', 'table',
    'td', 'tr', 'font', 'title', 'head', 'meta', 'strong', 'em', 'iframe']:
    text = noTag(text, nuketag)
  text = singleizer(text)
  text = convert_html_entities(text)
  text = addmarkdown(text)
  text = just2LR(text)
  
  return text

def getHTML(url):
  try:
    response = requests.get(url)
  except:
    return ''
  return response.text

def justBody(text):
  pattern = r"<\s*body\s*.*?>(?P<capture>.*)<\s*/body\s*>"
  pat = re.compile(pattern, re.IGNORECASE | re.DOTALL)
  bodymat = pat.search(text)
  if bodymat:
    return bodymat.group('capture')
  else:
    return ''

def addmarkdown(text):
  text = text.replace('<p>', "\n")
  text = text.replace('</p>', "")
  text = text.replace('<hr>', "\n---\n")
  text = text.replace('<blockquote>', "\n> ")
  text = text.replace('</blockquote>', "")
  text = text.replace('<h1>', "\n# ")
  text = text.replace('<h2>', "\n## ")
  text = text.replace('<h3>', "\n### ")
  text = text.replace('<h4>', "\n#### ")
  text = text.replace('</h1>', "")
  text = text.replace('</h2>', "")
  text = text.replace('</h3>', "")
  text = text.replace('</h4>', "")
  text = text.strip()
  return text

def just2LR(text):
  pattern = r"\n{2,}"
  pat = re.compile(pattern, re.IGNORECASE | re.DOTALL)
  less = pat.sub(r'\n\n', text)
  pattern = " {2,}"
  pat = re.compile(pattern, re.IGNORECASE | re.DOTALL)
  less = pat.sub(' ', less)
  return less

def lowercaseTags(text):
  pattern = r"<(/?[a-zA-Z0-9]+)>"
  pat = re.compile(pattern, re.IGNORECASE | re.DOTALL)
  lowered = pat.sub(r'<\1>'.lower(), text)
  return lowered

def stripParams(text):
  pattern = r"(<\s*[a-zA-Z0-9]+).*?(?:>)"
  pat = re.compile(pattern, re.IGNORECASE | re.DOTALL)
  nopram = pat.sub(r'\1>', text)
  return nopram

def listNuker(text):
  pattern = r"<\s*(ol|ul)\s*.*?>.*?<\s*/(ol|ul)\s*>"
  pat = re.compile(pattern, re.IGNORECASE | re.DOTALL)
  listless = pat.sub('', text)
  pattern = r"<\s*li\s*.*?>.*?<\s*/li\s*>"
  pat = re.compile(pattern, re.IGNORECASE | re.DOTALL)
  listless = pat.sub('', listless)
  pattern = r"(<\s*(li|ol|ul)\s*.*?>)|(<\s*/(li|ol|ul)\s*>)"
  pat = re.compile(pattern, re.IGNORECASE | re.DOTALL)
  listless = pat.sub('', listless)
  return listless

def noTag(text, tag):
  pattern = r"(<\s*%s\s*.*?>)|(<\s*/%s\s*>)" % (tag, tag)
  pat = re.compile(pattern, re.IGNORECASE | re.DOTALL)
  tagless = pat.sub('', text)
  return tagless

def noTagBlock(text, tag):
  pattern = r"<\s*%s\s*.*?>.*?<\s*/%s\s*>" % (tag, tag)
  pat = re.compile(pattern, re.IGNORECASE | re.DOTALL)
  byeblock = pat.sub('', text)
  return byeblock

def noComments(text):
  pattern = r"<!--.*?-->"
  pat = re.compile(pattern, re.IGNORECASE | re.DOTALL)
  byeblock = pat.sub('', text)
  return byeblock

def singleizer(text):
  pattern = r"\t|\r"
  pat = re.compile(pattern, re.IGNORECASE | re.DOTALL)
  singled = pat.sub('', text)
  pattern = r"^.{,30}$"
  pat = re.compile(pattern, re.IGNORECASE | re.DOTALL | re.MULTILINE)
  singled = pat.sub('', singled)
  pattern = r"\n{2,}"
  pat = re.compile(pattern, re.IGNORECASE | re.DOTALL)
  singled = pat.sub(r'\n', singled)
  pattern = r"\n.{,10}\n"
  pat = re.compile(pattern, re.IGNORECASE | re.DOTALL)
  singled = pat.sub(r'\n', singled)
  pattern = r"^\n|\n$"
  pat = re.compile(pattern, re.IGNORECASE | re.DOTALL)
  singled = pat.sub('', singled)
  return singled

def convert_html_entities(s):
  matches = re.findall("&#\d+;", s)
  if len(matches) > 0:
    hits = set(matches)
    for hit in hits:
      name = hit[2:-1]
      try:
        entnum = int(name)
        s = s.replace(hit, unichr(entnum))
      except ValueError:
        pass

  matches = re.findall("&#[xX][0-9a-fA-F]+;", s)
  if len(matches) > 0:
    hits = set(matches)
    for hit in hits:
      hex = hit[3:-1]
      try:
        entnum = int(hex, 16)
        s = s.replace(hit, unichr(entnum))
      except ValueError:
        pass

  matches = re.findall("&\w+;", s)
  hits = set(matches)
  amp = "&"
  if amp in hits:
    hits.remove(amp)
  for hit in hits:
    name = hit[1:-1]
    if htmlentitydefs.name2codepoint.has_key(name):
      s = s.replace(hit, unichr(htmlentitydefs.name2codepoint[name]))
  s = s.replace(amp, "&")
  return s

In [11]:
md = markdown(html)

In [12]:
print(md)

As professionals around the world feel increasingly pressed for time, they’re giving up on things that matter to them. A recent HBR article noted that in surveys, most people “could name several activities, such as pursuing a hobby, that they’d like to have time for.” This is more significant than it may sound, because it isn’t just individuals who are missing out. When people don’t have time for hobbies, businesses pay a price. Hobbies can make employees substantially better at their jobs for three reasons: they reawaken your creativity, give you a fresh perspective, and bolster your confidence.

As professionals around the world feel increasingly pressed for time, they’re giving up on things that matter to them. A recent HBR article noted that in surveys, most people “could name several activities, such as pursuing a hobby, that they’d like to have time for.”

This is more significant than it may sound, because it isn’t just individuals who are missing out. When people don’t have tim