# Functions & Downloads

Today we'll learn how to write more efficient code using ```functions``` and then apply them toward scraping/downloading documents from the Web.

## Functions: Making Your Code Work Smarter

A **function** is a reusable block of code that performs a specific task. Instead of writing the same code over and over, you write it once in a function and call it whenever you need it.

Functions help you:
* **Stay DRY** (Don't Repeat Yourself) – write code once, use it many times
* **Organize your code** – break complex problems into smaller, manageable pieces
* **Make debugging easier** – fix issues in one place instead of hunting through repeated code
* **Build modular programs** – combine simple functions to accomplish complex tasks

A function runs only when something "invokes" or "calls" it, giving you control over when and how your code executes.


In [3]:
## You've already been using functions
print("Hello World!")

Hello World!


In [4]:
## define a function that says "Hello World"
def sayHello():
    print("Hello World!")

In [5]:
## invoke" or "call" the function
sayHello()

Hello World!


In [6]:
## Let's create one that prints a name
def sayHello():
    print("Hello Sandeep, hope you're having a good day!")

In [7]:
## what if we want a different name?
def sayHello2():
    print("Hello Frank, hope you're having a good day!")

In [8]:
sayHello2()

Hello Frank, hope you're having a good day!


In [9]:
## instead of writing a different function for different names, we add a parameter:
## rewrite so that can automatically input a name
def sayHello(name):
    print(f"Hello {name}, hope you are having a great day!")

In [10]:
## Call updated function
sayHello("Sami")

Hello Sami, hope you are having a great day!


In [11]:
## Build a function called addNumbers that adds any 2 numbers together and prints:
## "The total is (whatever the number is)"!
## built it here
def addNumbers(number1, number2):
    total=number1 + number2
    print(f"The sum of {number1} and {number2} is {total}")

In [12]:
## Call myFunction using 4 and 5 as the arguments
addNumbers(4,5)

The sum of 4 and 5 is 9


In [13]:
## Call myFunction using 2 strings instead of numbers.
addNumbers("Sandeep", "Lucy")

The sum of Sandeep and Lucy is SandeepLucy


In [14]:
## function that shows relationship between parameters and arguments
## printing people's name and age

def aging(name, age):
    print(f"{name} is {age}-years-old")

In [15]:
## invoke function
aging("Sandeep", 59)

Sandeep is 59-years-old


In [16]:
## note that if we put in a different order
aging(59, "Sandeep")

59 is Sandeep-years-old


In [17]:
## So far we have only printed out values processed by a function.  But we really want to retain the values created by a function.
## A function that adds two numbers together and prints the value:
addNumbers(10, 20)

The sum of 10 and 20 is 30


In [18]:
## let's try to save it in  a variable
my_result = addNumbers(10, 20)

The sum of 10 and 20 is 30


In [19]:
## Print or call your variable. What does it hold?
print(my_result)

None


In [20]:
## type of data calculated


In [21]:
## Tweak out function by adding return statement
## instead of printing a value we want to return a value(or values)
def addNumbers(number1, number2):
    print(f"The sum of {number1} and {number2} is {number1+number2}")
    return number1 + number2



In [22]:
## call the function add_numbers_ret
## and store in variable
my_result = addNumbers(10,30)

The sum of 10 and 30 is 40


In [23]:
## print
print(my_result)

40


In [24]:
## What type?
type(my_result)

int

In [25]:
def pctChg(old_value, new_value):
    return(new_value - old_value)/old_value * 100
    

In [26]:
pctChg(50,100)

100.0

In [27]:
pctChg(100,50)

-50.0

In [28]:
## Docstring is a way to know what function does
## brief summary of what the function does
## a description of each parameter and their expected data types
## 

def pctChg(old_value, new_value):
    '''
    This function takes two numbers and returns the percent change
    para1 = is the older number
    para2 = is the newer number
    '''
    return (new_value - old_value)/old_value * 100


In [29]:
## run function
pctChg(75,100)

33.33333333333333

In [30]:
## if you hit shift+tab after the function so for example pctChg then you will see the docstring

### Build a credit check function

Recall that we built a conditional expression to evaluate credit rating based on these values:

- 300-579: Poor
- 580-669: Fair
- 670-739: Good
- 740-799: Very good
- 800-850: Excellent

Now build a function that evaluates a score to return a rating, but also prints "Your credit rating is **whatever**!"

## here is the expression we previously built:

```python
if credit <= 579:
  print(f"Your credit of {credit} is poor")
elif 579 < credit <= 669:
  print(f"Your credit of {credit} is fair")
elif 670 < credit <= 739:
  print(f"Your credit of {credit} is good")
elif 740 <= credit <= 799:
  print(f"Your credit of {credit} is very good")
else:
  print(f"Your credit of {credit} is excellent")

```

Can you think of a way to make it more efficient and DRY.

In [74]:
## Code your function here

def eval_credit_score(score):
    ## Do a docstring to explain what the function does
    ## Not sure why docstring didn't work when I did it with six apostrophes
    '''
    This function evaluates the rating of a credit score.
    Prints the score too based on the below ratings.

    Ratings:
    300-579: Poor
    580-669: Fair
    670-739: Good
    740-799: Very good
    800-850: Excellent
    '''
    ## Evaluate the rating
    if 300 <= score <= 579:
      rating = "poor"
    elif 579 < score <= 669:
      rating = "fair"
    elif 670 < score <= 739:
      rating = "good"
    elif 740 <= score <= 799:
      rating = "very good"
    else:
      rating = "excellent"

    ## State the message
    print(f"Your credit score of {score} is {rating}!")
    



In [78]:
eval_credit_score(602)

Your credit score of 602 is fair!


## Why functions are awesome

For functions I use regularly, I can simply import them and reduce time spent copy-n-pasting, or rewriting same functions.

#### (The next few steps are demo only)

In [35]:
pip install git+https://github.com/sandeepmj/my_functions.git

Collecting git+https://github.com/sandeepmj/my_functions.git
  Cloning https://github.com/sandeepmj/my_functions.git to /private/var/folders/rh/0500xmc965b68qdvl6txhdjc0000gn/T/pip-req-build-tfxlvq2i
  Running command git clone --filter=blob:none --quiet https://github.com/sandeepmj/my_functions.git /private/var/folders/rh/0500xmc965b68qdvl6txhdjc0000gn/T/pip-req-build-tfxlvq2i
  Resolved https://github.com/sandeepmj/my_functions.git to commit d74bfc7c105c92892e7a5e0d5e98b711ab6f014b
  Preparing metadata (setup.py) ... [?25ldone
Note: you may need to restart the kernel to use updated packages.


In [36]:
## import my_fuctions
##from_my_functions import timer


In [37]:
## writr a for loop that counts to 5 and print the number
## add a timer
## time(10,20)

In [38]:
## call timer
## for number in range(1,5):
    ## print(


### Your Challenge

Write a function that makes a request to a website and returns the content as soup.

In [40]:
## write code here

## dependencies
import requests
from bs4 import BeautifulSoup

## function
def makeSoup(url):
    headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'} 
    response = requests.get(url, headers = headers)
    if response.status_code == 200:
        return BeautifulSoup(response.text, "html.parser")
    else:
        print(f"Your request returned {response.status_code}")



In [41]:
## place url into a variable
## typed it separately rather than putting into parameter so it is reusable
url = "https://sandeepmj.github.io/scrape-example-page/demo-text.html"

In [42]:
## invoke your function here
makeSoup(url)

<!DOCTYPE html>

<html lang="en">
<head>
<title>title tag</title>
<style>
body {padding: 20px; max-width: 700px; margin: 0 auto;}
</style>
</head>
<body>
<h1 class="title"><b>The title headline is Demo for BeautifulSoup</b></h1>
<p>Learning to scrape using BeautifulSoup.</p>
<div class="content article">
<section>
<p>Here's some pretty useless info:</p>
</section>
<section class="main" id="all_plants">
<h2 class="subhead" id="vegitation">Plants</h2>
<p class="article">Three plants that thrive in deep shade:</p>
<ol>
<li><a class="plants life" href="http://example.com/plant1" id="plant1">Plant 1</a>: <span class="cost">$10</span></li>
<li><a class="plants life" href="http://example.com/plant2" id="plant2">Plant 2</a>: <span class="cost">$20</span></li>
<li><a class="plants life" href="http://example.com/plant3" id="plant3">Plant 3</a> <span class="cost">$30</span></li>
</ol>
</section>
<section class="main" id="all_animals">
<h2 class="subhead" id="creatures">Animals</h2>
<p class="arti

In [43]:
##base_url = "https://bestsellingalbums.org/decade/2010-"

##for url_number in range(2,4):
    ##url = f"{base_url}{url_number}"
    ##print(makeSoup(url))
    ##timer(5,10)
    ##print("\n\n\n\n\n\n\n\n\n\n\n")

## Scraping/Downloading Web Documents

You want to create a dataset that tracks how many companies the <a href="https://www.sec.gov/litigation/suspensions.shtml">SEC suspended</a> between 2024 and 2004 (and for what reasons).

We want to write a scraper that aggregates:

* Date of suspension
* Company name
* Order
* Release (the PDFs in the XX-YYYYY format

The challenge? All that info is held in the PDFs.

We will need to download all the PDFs before we can analyze the info.

## Practice Site

We'll practice the required techniques <a href="https://sandeepmj.github.io/scrape-example-page/pages.html">on this demo site</a> by:

1. downloading all ```txt``` files.
2. downloading all ```pdf``` files.
3. Perhaps all files at one time.

In [45]:
import time
from random import uniform

In [46]:
## command, shift, u to inpsect what is on the website?

In [47]:
## make soup of the site
url = "https://sandeepmj.github.io/scrape-example-page/pages.html"
soup = makeSoup(url)
soup

<html lang="en">
<head>
<!-- Makes the page responsive and scaled to be read easily -->
<meta content="width=device-width, initial-scale=1" name="viewport"/>
<!-- Links to stylesheet -->
<link href="style.css" rel="stylesheet" type="text/css"/>
<!-- Remember to update page title -->
<title>List of Documents</title>
</head>
<body>
<!-- All content goes here -->
<div class="container">
<h1>Documents to Download</h1>
<li>Junk Li <a href="">tag 1</a></li>
<li>Junk Li <a href="">tag 2</a></li>
<ul class="txts downloadable">
<p class="pages">Download this first set of text documents</p>
<li>Text Document <a href="files/text_doc_01.txt">1</a> </li>
<li>Text Document <a href="files/text_doc_02.txt">2</a></li>
<li>Text Document <a href="files/text_doc_03.txt">3</a></li>
<li>Text Document <a href="files/text_doc_04.txt">4</a></li>
<li>Text Document <a href="files/text_doc_05.txt">5</a></li>
<li>Text Document <a href="files/text_doc_06.txt">6</a></li>
<li>Text Document <a href="files/text_doc_07.

In [48]:
## make soup of the page
soup

<html lang="en">
<head>
<!-- Makes the page responsive and scaled to be read easily -->
<meta content="width=device-width, initial-scale=1" name="viewport"/>
<!-- Links to stylesheet -->
<link href="style.css" rel="stylesheet" type="text/css"/>
<!-- Remember to update page title -->
<title>List of Documents</title>
</head>
<body>
<!-- All content goes here -->
<div class="container">
<h1>Documents to Download</h1>
<li>Junk Li <a href="">tag 1</a></li>
<li>Junk Li <a href="">tag 2</a></li>
<ul class="txts downloadable">
<p class="pages">Download this first set of text documents</p>
<li>Text Document <a href="files/text_doc_01.txt">1</a> </li>
<li>Text Document <a href="files/text_doc_02.txt">2</a></li>
<li>Text Document <a href="files/text_doc_03.txt">3</a></li>
<li>Text Document <a href="files/text_doc_04.txt">4</a></li>
<li>Text Document <a href="files/text_doc_05.txt">5</a></li>
<li>Text Document <a href="files/text_doc_06.txt">6</a></li>
<li>Text Document <a href="files/text_doc_07.

In [49]:
## narrow to class
targets = soup.find_all("ul", class_="txts")
targets

[<ul class="txts downloadable">
 <p class="pages">Download this first set of text documents</p>
 <li>Text Document <a href="files/text_doc_01.txt">1</a> </li>
 <li>Text Document <a href="files/text_doc_02.txt">2</a></li>
 <li>Text Document <a href="files/text_doc_03.txt">3</a></li>
 <li>Text Document <a href="files/text_doc_04.txt">4</a></li>
 <li>Text Document <a href="files/text_doc_05.txt">5</a></li>
 <li>Text Document <a href="files/text_doc_06.txt">6</a></li>
 <li>Text Document <a href="files/text_doc_07.txt">7</a></li>
 <li>Text Document <a href="files/text_doc_08.txt">8</a></li>
 <li>Text Document <a href="files/text_doc_09.txt">9</a></li>
 <li>Text Document <a href="files/text_doc_10.txt">10</a></li>
 </ul>,
 <ul class="txts downloadable">
 <p class="pages">Download this second set of text documents</p>
 <li>Text Document <a href="files/text_doc_A.txt">1</a> </li>
 <li>Text Document <a href="files/text_doc_B.txt">2</a></li>
 <li>Text Document <a href="files/text_doc_C.txt">3</a

In [50]:
## narrow to links
atags = [atag.find_all("a") for atag in targets]
atags


[[<a href="files/text_doc_01.txt">1</a>,
  <a href="files/text_doc_02.txt">2</a>,
  <a href="files/text_doc_03.txt">3</a>,
  <a href="files/text_doc_04.txt">4</a>,
  <a href="files/text_doc_05.txt">5</a>,
  <a href="files/text_doc_06.txt">6</a>,
  <a href="files/text_doc_07.txt">7</a>,
  <a href="files/text_doc_08.txt">8</a>,
  <a href="files/text_doc_09.txt">9</a>,
  <a href="files/text_doc_10.txt">10</a>],
 [<a href="files/text_doc_A.txt">1</a>,
  <a href="files/text_doc_B.txt">2</a>,
  <a href="files/text_doc_C.txt">3</a>,
  <a href="files/text_doc_D.txt">4</a>,
  <a href="files/text_doc_E.txt">5</a>,
  <a href="files/text_doc_F.txt">6</a>,
  <a href="files/text_doc_G.txt">7</a>,
  <a href="files/text_doc_H.txt">8</a>,
  <a href="files/text_doc_I.txt">9</a>,
  <a href="files/text_doc_J.txt">10</a>]]

In [51]:
## flatter our nested list
## we can itertools to flatten a list of list
##flattened_list = list(chain.from_iterable(list_of_lists))

In [52]:
## import itertools
from itertools import chain

In [53]:
## flatten using itertools
flat_targets = list(chain.from_iterable(atags))
flat_targets

[<a href="files/text_doc_01.txt">1</a>,
 <a href="files/text_doc_02.txt">2</a>,
 <a href="files/text_doc_03.txt">3</a>,
 <a href="files/text_doc_04.txt">4</a>,
 <a href="files/text_doc_05.txt">5</a>,
 <a href="files/text_doc_06.txt">6</a>,
 <a href="files/text_doc_07.txt">7</a>,
 <a href="files/text_doc_08.txt">8</a>,
 <a href="files/text_doc_09.txt">9</a>,
 <a href="files/text_doc_10.txt">10</a>,
 <a href="files/text_doc_A.txt">1</a>,
 <a href="files/text_doc_B.txt">2</a>,
 <a href="files/text_doc_C.txt">3</a>,
 <a href="files/text_doc_D.txt">4</a>,
 <a href="files/text_doc_E.txt">5</a>,
 <a href="files/text_doc_F.txt">6</a>,
 <a href="files/text_doc_G.txt">7</a>,
 <a href="files/text_doc_H.txt">8</a>,
 <a href="files/text_doc_I.txt">9</a>,
 <a href="files/text_doc_J.txt">10</a>]

In [54]:
## capture links, add base urls
base_url = "https://sandeepmj.github.io/scrape-example-page/"

urls = [base_url + item.get("href") for item in flat_targets]
urls

['https://sandeepmj.github.io/scrape-example-page/files/text_doc_01.txt',
 'https://sandeepmj.github.io/scrape-example-page/files/text_doc_02.txt',
 'https://sandeepmj.github.io/scrape-example-page/files/text_doc_03.txt',
 'https://sandeepmj.github.io/scrape-example-page/files/text_doc_04.txt',
 'https://sandeepmj.github.io/scrape-example-page/files/text_doc_05.txt',
 'https://sandeepmj.github.io/scrape-example-page/files/text_doc_06.txt',
 'https://sandeepmj.github.io/scrape-example-page/files/text_doc_07.txt',
 'https://sandeepmj.github.io/scrape-example-page/files/text_doc_08.txt',
 'https://sandeepmj.github.io/scrape-example-page/files/text_doc_09.txt',
 'https://sandeepmj.github.io/scrape-example-page/files/text_doc_10.txt',
 'https://sandeepmj.github.io/scrape-example-page/files/text_doc_A.txt',
 'https://sandeepmj.github.io/scrape-example-page/files/text_doc_B.txt',
 'https://sandeepmj.github.io/scrape-example-page/files/text_doc_C.txt',
 'https://sandeepmj.github.io/scrape-exam

In [55]:
pip install wget

Note: you may need to restart the kernel to use updated packages.


In [56]:
## import wget
import wget

In [57]:
## download all
for i, link in enumerate(urls, start  = 1):
    print(f"Downloading link {i} of {len(urls)}")
    wget.download(link)
    timer(10,20)

Downloading link 1 of 20
100% [................................................................] 76 / 76

NameError: name 'timer' is not defined