# All material ©2019, Alex Siegman


-----

## Hello! And welcome to Python 101, Part 2

<br>

The goal of these notebooks is simple: To help you secure a foothold in what is otherwise the very daunting climb towards learning a new programming anguage (and, of course, to have fun!) 

## In Part 1 we learned about: 

1. The different ways we can work with (and manipulate) words and numbers in Python.
2. Lists in Python. 
3. 'If' and 'While' statements in Python. 

## In Part 2 we will learn about: 

1. Libraries in Python. 
2. Functions in Python. 

### Alas, let's get started!

---

### Today we are going to be mimicking this brilliant and simple bot, @TwoHeadlines: <br> https://twitter.com/twoheadlines?lang=en

<br>

The concept is simple. It takes two different headlines from two different outlets via their RSS feeds (which we'll go over in a moment) and combines them to produce often comical and almost always nonsensical news headlines.

<br>

The first thing we must do to create our own TwoHeadlines bot is import a few libraries. 

### *Libraries in Python are collections of functions and methods that allow you to perform various actions without writing your own code.

<br>

For instance, in our Two Headlines bot we are going to use: 

#### Feedparser: a library that will allow us to read various RSS feeds (again, we'll get to RSS in a moment)<br>
https://pythonhosted.org/feedparser/introduction.html

#### Random: a library that will allow us to generate random numbers <br> 
https://docs.python.org/2/library/random.html

#### Time: a library that will allow us to work around traditionally tricky time functions <br>
https://docs.python.org/2/library/time.html

<br>

Thus, your first lines of code will look as follows: <br>

In [1]:
import feedparser
import random
import time

<br> Great! Now, we want to begin by defining our function. <br>

#### Remember, funcitons come in handy when you want to repeat the same task many times using the same _type_ of input. <br>

In [15]:
#For example

def printSentence(sentence):
    print(sentence + " Plus a new sentence.")
    return;

In [16]:
printSentence("This is the sentence I want to print.")

This is the sentence I want to print. Plus a new sentence.


#### In this case, we will call our function 'TwoHeadlines' 


In [17]:
def TwoHeadlines(): # we are leaving the input blank for now, and you'll see why in a moment
    pass # this 'pass' is here just to avoid an error as we work on our function. To see what happens without it, 
         # try removing the 'pass' line and see the error you receive.

## Now, for some RSS feeds! 

RSS is (Rich Site Summary, or, Real Simple Syndication) allows us to access website's content in a standardized and machine-readable format. 

To best understand RSS, take a look at the following examples: 

http://www.wsj.com/public/page/rss_news_and_feeds.html <br>
https://archive.nytimes.com/www.nytimes.com/services/xml/rss/index.html <br>
http://rss.cnn.com/rss/cnn_topstories.rss <br>

In [27]:
# to see how you can actually pull these RSS feeds using Python, we're going to rely on Python. 
# as an example, let's pull two feeds.

# note that we first set a variable equal to the desired url for the desired RSS feed. 
# then, we use feedparser to store that information into a new variable.

nyt_rss_url = 'https://rss.nytimes.com/services/xml/rss/nyt/HomePage.xml' # find the desired rss feed
espn_rss_url = 'https://www.espn.com/espn/rss/news' # find a second desired rss feed

nyt_feed = feedparser.parse(nyt_rss_url) # use feedparser to, well, parse the feed
espn_feed = feedparser.parse(espn_rss_url) # use feedparser to, well, parse the feed

## Next, we need to get a bit creative:

First and foremost, we don't want that entire RSS feed, we just want the headline for the latest article! But if you type the following:

In [29]:
print(nyt_feed) # print the full RSS feed

{'feed': {'title': 'NYT > Top Stories', 'title_detail': {'type': 'text/plain', 'language': None, 'base': 'https://rss.nytimes.com/services/xml/rss/nyt/HomePage.xml', 'value': 'NYT > Top Stories'}, 'links': [{'rel': 'alternate', 'type': 'text/html', 'href': 'https://www.nytimes.com?emc=rss&partner=rss'}, {'href': 'https://rss.nytimes.com/services/xml/rss/nyt/HomePage.xml', 'rel': 'self', 'type': 'application/rss+xml'}], 'link': 'https://www.nytimes.com?emc=rss&partner=rss', 'subtitle': '', 'subtitle_detail': {'type': 'text/html', 'language': None, 'base': 'https://rss.nytimes.com/services/xml/rss/nyt/HomePage.xml', 'value': ''}, 'language': 'en-us', 'rights': 'Copyright 2019 The New York Times Company', 'rights_detail': {'type': 'text/plain', 'language': None, 'base': 'https://rss.nytimes.com/services/xml/rss/nyt/HomePage.xml', 'value': 'Copyright 2019 The New York Times Company'}, 'updated': 'Mon, 01 Jul 2019 19:53:51 +0000', 'updated_parsed': time.struct_time(tm_year=2019, tm_mon=7, t

In [33]:
for i in range(0,10): # for the first ten entries in the RSS feed (the ten most recent stories)
    print(nyt_feed['entries'][i]['title']) # print the title of said article

Angry Core of Hong Kong Protesters Storms Legislature, Dividing the Movement
Hong Kong Protest Live Updates: Police Disperse Protesters Outside Legislative Building
On Hong Kong Handover Anniversary, Many Fear Loss of Freedoms
Iran Breaches Critical Limit on Nuclear Fuel Under 2015 Deal
In New Talks, U.S. May Settle for a Nuclear Freeze by North Korea
Ivanka Trump Tests Her Diplomatic Chops and Riles a Legion of Critics
Pete Buttigieg Raised $24.8 Million in Second Quarter, His Campaign Says
Consumers Are Spending. Businesses Aren’t. Who’s Right About the Future?
S&P and Dow Follow Global Markets Higher, as Investors Take Heart in Trade Thaw
Inside the Migrant Detention Center in Clint, Tex.


## But how did we know to use "['entries'][i]['title']"?

### To understand, we need to briefly delve into the world of dictionaries 

In [47]:
dictionary = {'favorite_food':'pasta'} # create a new dictionary 

# consider 'favorite_food' to be the word, and 'pasta' to be the definition, it it helps you

In [48]:
print(dictionary['favorite_food'])

# we then call 'favorite_food' and get the "definiton" 
# in reality, this is known as a Key:Value pair, with "Key" being the word, and "Value" being the definition

pasta


#### As you may be able to see, our RSS is actually formated quite cleverily. It is a dicitionary (a set of key-value pairs) that includes lists. 

#### For example, look at the very top of the feed. It starts 

#### {'feed': {'title': 'WSJ.com: World News',

#### The best way to read this is - the first entry in the dictionary is 'Feed' and the first value for that entry (also known as a 'key' is 'Title'. 

#### Now, 'Title' happens to be another dictionary (you can tell because it begins with a '{')

#### If we keep searching, we'll see that the headline comes after 'entries' and is paired with the 'title'

#### I know this is all exceptionally confusing, but just bear with me. The more you practice parsing information from RSS feeds (or HTML in general) the easier it will become, I promise!

#### So, if we want that headline, and that headline only, we are going to: 

1. Navigate to the entire RSS feed
2. Navigate to the 'entries' section
3. Navigate to the first 'entries' section (each story is going to have its own, and we want the first headline)
4. Navigate to the 'title' section 

<br>

## Now, back to replicating 'TwoHeadlinesBot'

In [49]:
my_list = [] # create a new, empty list called 'my_list'

for i in range(0,10): 
    my_list.append(nyt_feed['entries'][i]['title']) # append the first ten titles to this list

In [50]:
my_list[3] # select the third index of that list

'Iran Breaches Critical Limit on Nuclear Fuel Under 2015 Deal'

In [51]:
Article4 = my_list[3]

In [52]:
Article4[:25] # get the first 25 characters of the title of the 3rd index (fourth article) in our list

'Iran Breaches Critical Li'

In [53]:
len(Article4) # how many characters long is our title? 

60

In [54]:
len(Article4)/2 # figure out the half-way point of the title 

30.0

In [55]:
Article4[0:30] # get the first half of our article title

'Iran Breaches Critical Limit o'

In [69]:
Article5 = my_list[4] # let's see what the next title is in our list
print(Article5)

In New Talks, U.S. May Settle for a Nuclear Freeze by North Korea


## So, how do we want to mash our headlines together?

In [71]:
nyt_first_story = nyt_feed['entries'][0]['title'] #Recall that '0' is actually the first instance
print(nyt_first_story)

Angry Core of Hong Kong Protesters Storms Legislature, Dividing the Movement


In [72]:
words = nyt_first_story.split(' ') # remember, I can split that single sentence into a list of individual words 
print(words) 

['Angry', 'Core', 'of', 'Hong', 'Kong', 'Protesters', 'Storms', 'Legislature,', 'Dividing', 'the', 'Movement']


In [77]:
for i in range(0,10): 

    nyt_first_story = nyt_feed['entries'][i]['title'] # pull the title of the ith story in the first RSS feed
    espn_first_story = espn_feed['entries'][i]['title'] # pull the title of the ith story in the second RSS feed

    nyt_words = nyt_first_story.split(' ') # split the title by spaces (aka, make every word in the title it's own)
    espn_words = espn_first_story.split(' ') # split the title by spaces (aka, make every word in the title it's own)
    
print(nyt_words) 
print(" --- ") # print a line for formatting purposes
print(espn_words)

['Inside', 'the', 'Migrant', 'Detention', 'Center', 'in', 'Clint,', 'Tex.']
 --- 
['The', 'legend', 'of', 'The', 'Martian,', 'the', "Yankees'", '$5', 'million,', '16-year-old', 'international', 'coup']


## Let's keep going. Remember, we want to take half of one headline and half of a different headline and mash them together. So, how do we get just the first or second half of a list of words?  <br>

In [80]:
for i in range(0,10): 

    nyt_first_story = nyt_feed['entries'][i]['title'] # pull the title of the ith story in the first RSS feed
    espn_first_story = espn_feed['entries'][i]['title'] # pull the title of the ith story in the second RSS feed

    nyt_words = nyt_first_story.split(' ') # split the title by spaces (aka, make every word in the title it's own)
    espn_words = espn_first_story.split(' ') # split the title by spaces (aka, make every word in the title it's own)

    nyt_words = nyt_words[:int(len(nyt_words)/2)] 
    espn_words = espn_words[int(len(espn_words)/2):]
    
print(nyt_words)
print(" --- ")
print(espn_words)

['Inside', 'the', 'Migrant', 'Detention']
 --- 
["Yankees'", '$5', 'million,', '16-year-old', 'international', 'coup']


### OK, so let's walk through that code: 

#### 1) First, the [: 
   

In [81]:
# the ':' at the front of a list means 'everything leading up to this point. For instance: 

list = ['a','b','c','d','e']
list = list[:3]
print(list)

['a', 'b', 'c']


### In other words, we want to print everything leading up to (but not including!) the third instance in our list.

#### 2)  Next, the int allows us to ensure we're working with integers so we can do the necessary division at the end of the line of code.  


In [89]:
len(nyt_words)/2 # the result is a float, which we don't want

2.0

In [92]:
int(len(nyt_words)/2) # tis gives us an integer

2

#### 3)  len is a function that gives you the number of items in a list. For instance: 

In [83]:
list = ['a','b','c','d','e']
len(list)

5

#### 4) Finally, we are taking the total number of words in the headline and dividing by two

### In total, we are saying: "Take the headline, find out how many words are in the headline and divide by two. Then, take the first half of that headline and store it as the new healdine." 

#### Note that while for the first healdine we take the first half (by putting the ':' at the beginning of the code) we are taking the second half of the second headline (by putting the ':' at the end of the code).

<br> 

## Finally, we want to join the two halves of our healdine and store it as the variable 'new_headline' 

In [93]:
for i in range(0,10): 

    nyt_first_story = nyt_feed['entries'][i]['title'] # pull the title of the ith story in the first RSS feed
    espn_first_story = espn_feed['entries'][i]['title'] # pull the title of the ith story in the second RSS feed

    nyt_words = nyt_first_story.split(' ') # split the title by spaces (aka, make every word in the title it's own)
    espn_words = espn_first_story.split(' ') # split the title by spaces (aka, make every word in the title it's own)

    nyt_words = nyt_words[:int(len(nyt_words)/2)] 
    espn_words = espn_words[int(len(espn_words)/2):]
    
    new_headline = nyt_words + espn_words # Take the first half of the title from the first RSS feed and add the second half of the second RSS feed
    new_headline = ' '.join(new_headline) # Join the two strings created above with spaces

    print(new_headline) # Print your newly created headline

Angry Core of Hong Kong changed the NBA
Hong Kong Protest Live Updates: this off, and what's next
On Hong Kong Handover Anniversary, 1st round at Wimbledon
Iran Breaches Critical Limit on acquire Whiteside from Heat
In New Talks, U.S. May Settle will remain with Warriors
Ivanka Trump Tests Her Diplomatic Chops Aho via rare offer sheet
Pete Buttigieg Raised $24.8 Million meet over Vegas issue
Consumers Are Spending. Businesses Aren’t. finalizing Russell acquisition
S&P and Dow Follow Global Markets Higher, Latest buzz, news and reports
Inside the Migrant Detention Yankees' $5 million, 16-year-old international coup
