![image](https://www.python.org/static/img/python-logo.png)
# AUP111-Fundamentals of Programming
![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)


# Week17-Advanced programming-web crawler related basic package
* URL tool (urllib)
* Database (Database)
* Network crawler (Network)
* Asynchronous execution (asynchronous)





## Topic 1(主題1)-urllib
https://docs.python.org/3/howto/urllib2.html

urllib.request is a Python module used to get data from URLs (Uniform Resource Locators). It provides a very simple interface that can accept many different protocols, the urlopen function. It also provides a more complex interface to handle some common situations, such as: basic authentication, cookies, proxies, etc., which can be operated by handler or opener objects.

Under normal circumstances urlopen is very easy to use, but when you encounter errors or more complicated situations, you may need to have a certain understanding of the HyperText Transfer Protocol. The most complete and reference value is RFC 2616.

### Step 1: Get resources from URL

In [None]:
import urllib.request
with urllib.request.urlopen('http://www.asia.edu.tw/') as response:
   html = response.read()

### Step 2:Exception handling
* URLError: If urlopen cannot process the response message, URLError will be triggered.
* HTTPError is a subclass of URLError and will be triggered when the URL is HTTP.



In [None]:
#plan 1
from urllib.request import Request, urlopen
from urllib.error import URLError, HTTPError
req = Request(someurl)
try:
    response = urlopen(req)
except HTTPError as e:
    print('The server couldn\'t fulfill the request.')
    print('Error code: ', e.code)
except URLError as e:
    print('We failed to reach a server.')
    print('Reason: ', e.reason)
else:
    # everything is fine

In [None]:
# plan 2
from urllib.request import Request, urlopen
from urllib.error import URLError
req = Request(someurl)
try:
    response = urlopen(req)
except URLError as e:
    if hasattr(e, 'reason'):
        print('We failed to reach a server.')
        print('Reason: ', e.reason)
    elif hasattr(e, 'code'):
        print('The server couldn\'t fulfill the request.')
        print('Error code: ', e.code)
else:
    # everything is fine

## Topic 2-SQLite database DB-API 2.0 interface
https://docs.python.org/3/library/sqlite3.html


### Step 3:Create a Connection object
To use SQLite, you must first create a Connection object, which represents the database.

In [None]:
import sqlite3
con = sqlite3.connect('example.db')

### Step 4:Create a Cursor object
When you have a Connection object, you can create a Cursor object, and then call its execute() method to execute SQL statements:

In [None]:
import sqlite3
con = sqlite3.connect('example.db')

cur = con.cursor()

# Create table
cur.execute('''CREATE TABLE stocks
               (date text, trans text, symbol text, qty real, price real)''')

# Insert a row of data
cur.execute("INSERT INTO stocks VALUES ('2006-01-05','BUY','RHAT',100,35.14)")

# Save (commit) the changes
con.commit()

# We can also close the connection if we are done with it.
# Just be sure any changes have been committed or they will be lost.
con.close()

### Step 5:Get data after executing SELECT SQL query

To get data after executing a SELECT statement, you can use the cursor as an iterator, and then call its fetchone() method to get a matching row, or you can call fetchall() to get a list of multiple matching rows.

In [None]:
import sqlite3
con = sqlite3.connect('example.db')
cur = con.cursor()
for row in cur.execute('SELECT * FROM stocks ORDER BY price'):
  print(row)

### Step 6:Question mark style
What should be used is DB-API's formal parameter substitution. Put a placeholder wherever you want to use a value, and then provide a tuple containing multiple values as the second parameter of the execute() method of the data cursor. SQL statements can use one of two types of placeholders: question mark placeholders (question mark style) or name placeholders (name style). For question mark style, parameters must be a sequence. For the name style, it can be a sequence or a dict instance. The length of the sequence must match the number of placeholders, otherwise a ProgrammingError will be raised. If a dict is given, it must contain the keys corresponding to all the name parameters. Any extra entries will be ignored. The following are examples that include these two styles:

In [None]:
import sqlite3

con = sqlite3.connect(":memory:")
cur = con.cursor()
cur.execute("create table lang (name, first_appeared)")

# This is the qmark style:
cur.execute("insert into lang values (?, ?)", ("C", 1972))

# The qmark style used with executemany():
lang_list = [
    ("Fortran", 1957),
    ("Python", 1991),
    ("Go", 2009),
]
cur.executemany("insert into lang values (?, ?)", lang_list)

# And this is the named style:
cur.execute("select * from lang where first_appeared=:year", {"year": 1972})
print(cur.fetchall())

con.close()

### Step 7: Time processing of sqlite3
The sqlite3 module has two adapters that can be used with Python's built-in datetime.date and datetime.datetime types.

In [None]:
import sqlite3
import datetime
import time

def adapt_datetime(ts):
    return time.mktime(ts.timetuple())

sqlite3.register_adapter(datetime.datetime, adapt_datetime)

con = sqlite3.connect(":memory:")
cur = con.cursor()

now = datetime.datetime.now()
cur.execute("select ?", (now,))
print(cur.fetchone()[0])

con.close()

## Topic 3-Collect big data of school news


### Step 8: Read school news headlines

In [None]:
from urllib import request
with request.urlopen('http://www.asia.edu.tw/news1.php') as response:
    html = response.read().decode('utf-8')
    print(html)

### Step 9: News headlines for each year

In [None]:
import re
from urllib import request
count = 0
sss = ["2008", "2009","2010", "2011", "2012","2013","2014","2015","2016","2017","2018" ,"2019","2020", "2021"]
titles=list()
for i in range(len(sss)):
    year = sss[i]
    with request.urlopen('http://www.asia.edu.tw/news1.php?y='+year) as response:
        html = response.read().decode('utf-8')
        #print(html)
        pattern = '<font color="#446666" face="新細明體" style="font-weight: 700;" size="2">'
        for pos in re.finditer(pattern, html):
            pos2 = html.find('</font>', pos.end())
            sub = html[pos.end():pos2]
            titles.append(sub)
            count = count + 1
print (count)

### Step 10: Build a database of news headlines



In [None]:
import sqlite3
conn = sqlite3.connect('news.db')
c = conn.cursor()

# Create table
c.execute("CREATE TABLE news (title text)")

# Insert a row of data
for t in titles:
  ss = "INSERT INTO news VALUES ('{}')".format(t)
  c.execute(ss)

# Save (commit) the changes
conn.commit()
conn.close()

### Step 11: Query school news headlines have artificial intelligence

In [None]:
conn = sqlite3.connect('news.db')
c = conn.cursor()
for row in c.execute('''SELECT * FROM news 
                        WHERE title LIKE '%AI%' '''):
    print(row)
conn.close()

## Topic 4-asyncio (Since 3.4) Asynchronous program

### Step 12:同步的網頁要求
以下範例是常見的程式寫法，該範例在 do_requests() 函式中以 for 迴圈對 example.com 發出 10 次 HTTP GET 要求(request)，並且列印其狀態碼(status code)：

![](https://myapollo.com.tw/images/begin-to-asyncio/seq.png)




In [None]:
import requests
import time

url = 'https://www.asia.edu.tw/'

start_time = time.time()

def send_req(url):

    t = time.time()
    print("Send a request at",t-start_time,"seconds.")

    res = requests.get(url)

    t = time.time()
    print("Receive a response at",t-start_time,"seconds.")

for i in range(10):
    send_req(url)

### Step 13: Asynchronous http page request

In [None]:
!pip install aiohttp requests

In [None]:
import requests
import time
import asyncio

url = 'https://www.asia.edu.tw/'

start_time = time.time()
async def send_req(url):
    t = time.time()
    print("Send a request at",t-start_time,"seconds.")
    res = await loop.run_in_executor(None,requests.get,url)
    t = time.time()
    print("Receive a response at",t-start_time,"seconds.")
tasks = []

loop = asyncio.get_event_loop()
for i in range(10):
    task = loop.create_task(send_req(url))
    tasks.append(task)
try:
  loop.run_until_complete(asyncio.wait(tasks))
except:
     print("ended!")

##Topic 5-Concurrency

multithreading

### Step 14: The two functions are executed in the same process(thread).

In [None]:
import time
def sleep_A():
    for i in range(2):
        print(i, end="_")
        time.sleep(1)
    return
def sleep_B():
    for i in range(3):
        print(i, end="=")
        time.sleep(1)
    return
start_time = time.time()
sleep_A()
sleep_B()
end_time = time.time()
print(f'It costs {end_time - start_time} seconds')

### Step 15:Perform two functions in different threads at the same time.

In [None]:
import os
import threading

def sleep_A():
    for i in range(2):
        print(i, end="_")
        time.sleep(1)
    return

def sleep_B():
    for i in range(3):
        print(i, end="=")
        time.sleep(1)
    return
  
start_time = time.time()

thread_1 = threading.Thread(target=sleep_A) # Instantiate a thread object to make the thread execute this function
thread_2 = threading.Thread(target=sleep_B) # Instantiate a thread object to make the thread execute this function
thread_1.start() # Start this thread
thread_2.start() # Start this thread
thread_1.join() # Wait for the end of thread_1, if the join program is not opened, it will execute directly
thread_2.join() # Wait for the end of thread_2, if the join program is not opened, it will execute directly

end_time = time.time()
print(f'It costs {end_time - start_time} seconds')

##Topic 6--Review of built-in functions and function libraries

### Step 16:Text sort

In [None]:
# ascending sort 
coms = ['Microsoft', 'Google', 'Amazon', 'Facebook', 'Apple']
print(sorted(coms))

In [None]:
# descending sort 
coms = ['Microsoft', 'Google', 'Amazon', 'Facebook', 'Apple']
print(sorted(coms, reverse=True))

### Step 17: The first day of the month and the days of the week

In [None]:
#calendar.monthrange()The function can know the day of the week and the number of days on the first day of each month
import calendar
from datetime import datetime, timezone, timedelta
# Set to +8 time zone
tz = timezone(timedelta(hours=+8))
dt=datetime.now(tz)
wday, mdays = calendar.monthrange(dt.year,dt.month)
print(f"For {dt.year}/{dt.month}, weekday is {wday}; number of days is {mdays}") #weekday (0-6 ~ Mon-Sun)