### So you've spent a lot of time writing useful code...Now what?

**BEFORE:**

In [8]:
from bs4 import BeautifulSoup
import requests

artist = 'ed-sheeran'
url = "https://www.metrolyrics.com/" + artist + "-lyrics.html"

response = requests.get(url)

soup = BeautifulSoup(markup=response.text)

links = []
for td in soup.find_all('td'):
    if td.a is not None:
        links.append(td.a.get('href'))

lyrics = []
for li in links[:2]:
    response = requests.get(li)
    soup = BeautifulSoup(markup=response.text)
    lyrics_section = soup.find(attrs={'id':'lyrics-body-text'})
    lyrics_chunk = []
    for verse in lyrics_section.find_all('p', class_='verse'):
        lyrics_chunk.append(verse.text)
        
    lyrics.append((' '.join(lyrics_chunk), 'ed sheeran'))

---
### Now what?
- Make sure your Notebook can run from the top-down. 
- Move imports to the top.
- Start wrapping your code into **FUNCTIONS**!!!
    - make the code generalizable. 
    - make your code more dynamic / less hard-coded.
- Make each function do a single thing, e.g.:
    - getting links
    - extracting some text from links
    - cleaning some piece of text
    - saving something to disk
- For each function, do the following:
    - Give it a descriptive name that start with a verb! e.g. `get_links()`, `clean_text()`
    - Decide which arguments should my function accept (i.e. what goes inside the `()`)?
    - Decide what my function should return
    - Add a docstring
    - Add type annotations (optional, but encouraged)
    - GENERALIZE!! Do not hardcode anything. These should be things that you should pass as arguments so the users of the function can control the behavior of the function how they like.
    

In [32]:
from typing import Tuple, List

In [44]:
def collect_song_links(artist:str, site:str='metrolyrics') -> Tuple[List, str]:
    
    """
    Given some artist (string) and site name (string), 
    collect links for all the artist's songs and return them.
    """
    
    links = []  
    if site == 'metrolyrics':
        artist = artist.replace(' ', '-')
        url = "https://www.metrolyrics.com/" + artist + "-lyrics.html"
        response = requests.get(url)
        if response.status_code != 200:
            print('Sorry try again.')

        soup = BeautifulSoup(markup=response.text)
        for td in soup.find_all('td'):
            if td.a is not None:
                links.append(td.a.get('href'))  
    
    elif site == 'lyrics':
        print('Feature not available.')
    
    return links, artist

In [69]:
def get_songs_lyrics(links:list, artist_name:str, num:int) -> List[Tuple]:
    
    """Given a list of song urls, return a list of the lyrics"""
    
    lyrics = []
    for li in links[:num]:
        response = requests.get(li)
        soup = BeautifulSoup(markup=response.text)
        lyrics_section = soup.find(attrs={'id':'lyrics-body-text'})
        lyrics_chunk = []
        for verse in lyrics_section.find_all('p', class_='verse'):
            lyrics_chunk.append(verse.text)

        lyrics.append((' '.join(lyrics_chunk), artist_name))
    
    return lyrics

### As a Bonus, wrap all your functions into a single main function at the end:

In [58]:
def main(artist, num):
    
    links, artist = collect_song_links(artist)
    results = get_songs_lyrics(links, artist, num)
    
    return results

**AFTER**
- One single function:
    - comprised of 2 custom functions we wrote that talk to each other.
    - The second function accepts arguments that the first one returns.

In [71]:
# main('ed sheeran', 2)

---
---

### Other funky function stuff:

In [102]:
def sum_nums(a, b=1):
    """Default args"""
    return a + b

In [103]:
sum_nums(5)

6

##### Args and Kwargs!

In [104]:
def sum_nums(*args):
    """Useful for when you don't know how many arguments a user might supply"""
    
    return sum(args)
    
    

In [105]:
sum_nums(1, 2, 3, 4, 5, 6, 7, 8, 9)

45

In [90]:
from sklearn.feature_extraction.text import CountVectorizer

In [106]:
def vectorize_text(some_text, **kwargs):
    """kwargs is useful for when you don't know which of many potential 
       keyword arguments a user might supply."""

    cv = CountVectorizer(**kwargs)
    vec = cv.fit_transform(some_text)

    return vec
    
    
    

In [108]:
vectorize_text(['chicken chicken funky duck goose'], stop_words='english')

<1x4 sparse matrix of type '<class 'numpy.int64'>'
	with 4 stored elements in Compressed Sparse Row format>

##### Super-short functions:
- Might be better to use a lambda (anonymous function)


In [96]:
def square(x):
    return x ** 2

In [98]:
[square(x) for x in range(10)]

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

In [101]:
[(lambda x: x**2)(x) for x in range(10)]

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]