# Python4ev1: access web data

## Table of contents
1. [Regular expression](#regex)
2. [Class](#class) :
    - a. [Class/instand attributes](#cls_attr)
    - b. [Class/static methods](#cls_mtd)
    - c. [Subclass/inheritance](#cls_inh)
    - d. [Customize using special methods](#cls_sp)
    - e. [Property Ssetter](#cls_pro)
3. [Network and sockets](#socket)
4. [Programs for web surfing](#surfing) : [\[Urllib\]](#urllib) - [\[BeautifulSoup\]](#bsoup)
5. [Web service and XML](#xml) : [\[Parsing\]](#xml_parse)
6. [JSON and REST architecture](#json) : [\[Geocode-JSON](#geojson)[|-XML\]](#geoxml) - [\[Twitter-API\]](#twitter)

``
. [](#)
<a name= ''></a>

## 1. Regular expression <a name= 'regex'></a>

- `^`   start of a string  but inside square brackets means not
- `$`   end ...
- `.`   any CHARACTER [inside square brackets means just the dot symbol '.']
- `\s`  white space
- `\S`  any NON-white spece character
- `\w`  any letter or digit [A-z0-9]
- `\W`  any non-letter nor digit [^A-z0-9]
- `\d`  any digit [0-9]
- `\D`  any non-digit [^0-9]  
    Be careful in regex A-Z is put before a-z, but in python A-Z < a-z

Greedy: e.g. the regex `".*"` (including quotation marks) will match "Hi," he says, "how are you?"
the entitre sentense because that is the maximum.

- `*`   repeat a character {0,infinity} (greedy, match as long a sequence as it can)
- `*?`  repeat a character {0,infinity} (non-greedy, can stop at the minimum sequence)
- `+`   repeat a character {1,infinity} (greedy, match as long a sequence as it can)
- `+?`  repeat a character {1,infinity} (non-greedy, , can stop at the minimum sequence)
- `?`   repeat a character {0,1} time
- `{n}` repeat a character {n} times
- `{m,n}`  repeat a character {m,n} (given m < n)

- `[abc]`   match a characters in the set
- `[^XYZ]`  match a characters NOT in the set
- `[a-Z0-9]`   match a characters in the alphabet and digit set  
    Be noted: in Python str 0-9 < A-Z < a-z, but __in RegEx a-z < A-Z__
- `()`      indicating the range of sequence extraction (not a subexpression for commonly used regex)
- => `(at|on)`  march 'at' or 'on' <= __this is not ttue for python re library__

In [1]:
import re
fh = open('mbox.txt','r')
# regex library

count = 0
# similar to find() but with regex <re.find> function
for i in fh:
    if count >= 10:
        break
    if re.search('^X-\S+:', i):
        #print any line contains the regex pattern 
        print(i, end='')
        count += 1

# find() and slice to extract the matched sequences and return them as a list
temp = list()
for i in fh:
    # match the 'From'<email address>' pattern but only __extract the affiliate part__
    y = re.findall('^From \S+@(\S+)', i)
    # one line may contain __nultiple email addresses__, thus `finadall`
    if y not in temp:
        temp.extend(y)
        # extend the list (append each item in the iterable one by one to the list)

print(temp[:10])


X-Sieve: CMU Sieve 2.3
X-Content-Type-Outer-Envelope: text/plain; charset=UTF-8
X-Content-Type-Message-Body: text/plain; charset=UTF-8
X-DSPAM-Result: Innocent
X-DSPAM-Processed: Sat Jan  5 09:14:16 2008
X-DSPAM-Confidence: 0.8475
X-DSPAM-Probability: 0.0000
X-Sieve: CMU Sieve 2.3
['umich.edu', 'iupui.edu', 'umich.edu', 'iupui.edu', 'iupui.edu', 'iupui.edu', 'umich.edu', 'umich.edu', 'umich.edu', 'umich.edu']


## 2. Class <a name='class'></a>

### 2a. Class attributes <a name='cls_attr'></a>

Inside `__init__` - instance/objext attributes;  
outside of `__init__` - class attributes:  
Use `__dict__` to display instance/class attributes/methods, and use `help()` to display inheritance.

In [2]:
class Example:
    classAttr = 0
    def __init__(self, instanceAttr):
        self.instanceAttr = instanceAttr

#create instance of the Example class
a = Example(1)
# .__dict__ function displays the attributes of the object 
print(a.__dict__)
print(Example.__dict__)

# help shows the internal methods and attributes of an object
help(a)

{'instanceAttr': 1}
{'__module__': '__main__', 'classAttr': 0, '__init__': <function Example.__init__ at 0x000001F04768C840>, '__dict__': <attribute '__dict__' of 'Example' objects>, '__weakref__': <attribute '__weakref__' of 'Example' objects>, '__doc__': None}
Help on Example in module __main__ object:

class Example(builtins.object)
 |  Example(instanceAttr)
 |  
 |  Methods defined here:
 |  
 |  __init__(self, instanceAttr)
 |      Initialize self.  See help(type(self)) for accurate signature.
 |  
 |  ----------------------------------------------------------------------
 |  Data descriptors defined here:
 |  
 |  __dict__
 |      dictionary for instance variables (if defined)
 |  
 |  __weakref__
 |      list of weak references to the object (if defined)
 |  
 |  ----------------------------------------------------------------------
 |  Data and other attributes defined here:
 |  
 |  classAttr = 0



If not maintained carefully (explicitly), class attributes can be overriden by instance attributes of the same name, causing confusion. Private attributes use single underscore `_priveteAttr` in convention.

In [3]:
b = Example(2)
print(b.__dict__)
print(Example.__dict__)

#if we specificly change the classAttr for instance b
b.classAttr = 653
print(b.__dict__)
#see now an instance 'classAttr' is added to b while the class 'classAttr' remain unchanged;
Example.__dict__

{'instanceAttr': 2}
{'__module__': '__main__', 'classAttr': 0, '__init__': <function Example.__init__ at 0x000001F04768C840>, '__dict__': <attribute '__dict__' of 'Example' objects>, '__weakref__': <attribute '__weakref__' of 'Example' objects>, '__doc__': None}
{'instanceAttr': 2, 'classAttr': 653}


mappingproxy({'__module__': '__main__',
              'classAttr': 0,
              '__init__': <function __main__.Example.__init__(self, instanceAttr)>,
              '__dict__': <attribute '__dict__' of 'Example' objects>,
              '__weakref__': <attribute '__weakref__' of 'Example' objects>,
              '__doc__': None})

### 2b. Class/static methods <a name='cls_mtd'></a>

Use `@classmethod` and `@staticmethod` decorator to define function.

In [19]:
class Employer:
    _count = 0 #try to make a private cls var
    raiseRate = 1.03
    
    def __init__(self, fn, ln): #initializing
        self.first = fn.strip(' ').capitalize()
        self.last = ln.strip(' ').capitalize()
        self._count += 1 #add counter
    
    #Regular functions
    def fullname(self):
        return self.first+' '+self.last
    
    def email(self):
        return self.first.lower()+'.'+self.last.lower()+'@company.com'
    
    #Class methods
    @classmethod #uses cls instead of self
    def detRaiseRate(cls, rate): 
        cls.raiseRate = rate #usually work with cls vars
    
    @classmethod #As alternative constructor
    def  from_fullname(cls, fulln):
        fn, ln = fulln.strip(' ').split(' ')
        return cls(fn, ln) #this will call cls.__init__
    
    #Static methods
    @staticmethod #no cls/self related number passed in
    def alterEmail(alias):
        return alias.strip(' ').lower()+'@company,com'

In [16]:
emp1 = Employer('john', 'smith') #test1
print(emp1.fullname())
print(emp1.email())

emp2 = Employer('TOM   ', '  haNks') #test2 with mis-formated inputs
print(emp2.fullname())
print(emp2.email())

emp1._count #count

John Smith
john.smith@company.com
Tom Hanks
tom.hanks@company.com


1

Class methods is commonly used as alternative way of using a regular function (such as constructor).

In [20]:
emp3 = Employer.from_fullname("  Julia jONes")
print(emp3.fullname())
print(emp3._count)


Employer.alterEmail('Jully')

Julia Jones
1


'jully@company,com'

### 2c. Subclass and inheritance <a name='cls_inh'></a>

If an attr/mtd is not presented in the current class, python will loop through all the higher levels to access anything with the same name. Therefore lower level attr/mtd has higher priority than the parental ones (override).

You can use `super().` to access the parent level.

In [32]:
#MMake a subvlass
class Programmer(Employer):
    raiseRate = 1.06
    
    def __init__(self, fn, ln, lang): #initializing
        super().__init__(fn, ln) #access parent method (don't need self)
        self.lang = lang.strip(' ').capitalize()
 
#ctrate new programmer
pro1 = Programmer('alan', 'west', 'Java')
print("{} uses {} with a raise rate of {}.".format(pro1.fullname(), pro1.lang, pro1.raiseRate))

#Checking inheritance
print(issubclass(Programmer, Employer))
isinstance(pro1, Employer)
#help(Programmer)
#show the inheritance

Alan West uses Java with a raise rate of 1.06.
True


True

### 2d. Special method <a name='cls_sp'></a>

[Python dunder method](https://docs.python.org/3/reference/datamodel.html#special-method-names)

- `__init__` : used when you create a class instance.
- `__repr__` : return something that can be used to recreate the object 
- `__str__` : return something when you want to print the objext (will override `__repr__`)
- `__add__` : will be called when you use the `'+'` operator
- `__call__` : that happens when you call the class (use it like a function)

In [38]:
#Create another subclass
class Manager(Employer):
    raiseRate = 1.07
    
    def __init__(self, fn, ln, dept): #initializing
        super().__init__(fn, ln) #access parent method (don't need self)
        self.dept = dept.strip(' ').capitalize()
    
    def __repr__(self): #representation
        return "Manager({}, {},{})".format(self.first, self.last, self.dept)
    
    def __str__(self): #print output
        return "{} - {} : {}".format(self.fullname, self.dept, self.email)
    
#ctrate new manager
mng1 = Manager('Adam', 'wang', 'market')
print(mng1)
mng1 #jupyter will distinguish repr and str if using only class name

<bound method Employer.fullname of Manager(Adam, Wang,Market)> - Market : <bound method Employer.email of Manager(Adam, Wang,Market)>


Manager(Adam, Wang,Market)

### 2e. Properties and Setter/Deleter <a name='cls_pro'></a>

Use `@property` to turn a method into a property (the method itself mustn't have external argument).
Use `@PROPERTYNAME.setter` and `@PROPERTYNAME.deleter` to manipulate the property via `'='` and `del` action.

In [49]:
#Modify another subclass
class Manager(Employer):
    raiseRate = 1.07
    
    def __init__(self, fn, ln, dept): #initializing
        super().__init__(fn, ln) #access parent method (don't need self)
        self.dept = dept.strip(' ').capitalize()
    
    def __repr__(self): #representation
        return "Manager({}, {},{})".format(self.first, self.last, self.dept)
    
    def __str__(self): #print output
        return "{} - {} : {}".format(self.fullname, self.dept, self.email)
    
    #Override the Employer fullname method
    @property 
    def fullname(self): #now it's a property
        return self.first+' '+self.last
    
    @fullname.setter #setter changes property related vars
    def fullname(self, name):
        fn, ln = name.strip(' ').split(' ')
        self.first = fn
        self.last = ln
    
    @fullname.deleter #deleter handles deletion
    def fullname(self):
        print("Name deleted")
        self.first = None
        self.last = None
    
#ctrate new manager
mng2 = Manager('Tan', 'Sue', 'HR')
del mng2.fullname #delete the property
print(mng2.first)
mng2.fullname = "Amy hubert" #invoke the setter
#print(Employer._count)
print(mng2)

mng2.fullname #noew its a property and can be accessed without the arguement '()'

Name deleted
None
Amy hubert - Hr : <bound method Employer.email of Manager(Amy, hubert,Hr)>


'Amy hubert'

## 3. Network and sockets <a name='socket'></a>

- TCP: transport internet protocal built upon IP (internet protocol)
- Socket is an endpoint of a bi-directional inter-process communication flow across the internet
- Port is a application- or process-specific software communication endpoint (like phone No. extesion):
    - 25: incoming email
    - 23: login
    - 80 (TCP)/ 443(HTTPS) : web server (e.g. purdue.edu:80)
    - 109/110 : personal mail box

Python has builtin sockets support. Making socket is like dialing the phone.

In [None]:
import socket
mysoc = socket.socket(socket.AF_INET, socket.SOCK_STREAM) #Mode?
mysoc.connect((data.pr4e.org, 80)) #Host, port

- HTTP: hypertext transfer protocol, used to retrieve webpages (HTML, image, etc).  
    Basic concept: make connection - request document - retrieve document - close connection.  
    RSS built on HTTP to retrieve data in addition to documents.
- URL : uniform resource locator (format: `PROTOCOL://HOST/DOCUMENT`)  
    Each hyperlink contain an `href = value` to issue a `GET` request from a server  
    `:80` - port 80
    `?guess=12` - parameter to the url (set the 'guess/ to 12) - quarrt string parameters
- RFCs: request for comments (all protocol developed by IETF - Internet Engineering Task Force)
- Metadata : after `GET HOST PROTOCAL/VERSIOM` the server will send you the properties along with the document (hence metadata)
- Status code : 200 - OK; 404 - not found; 302 - re-direct

In [52]:
import socket
mysoc = socket.socket(socket.AF_INET, socket.SOCK_STREAM) #Mode: make a dorrway
mysoc.connect(('data.pr4e.org', 80)) #Host, port - connect

#send request
cmd = 'GET http://data.pr4e.org/romeo.txt HTTP/1.0\r\n\r\n'.encode() #code it with the protocol
mysoc.send(cmd) 

#receive data
while True:
    data = mysoc.recv(512) #receive data in patch
    if len(data) < 1:
        break #end of transmittion
    print(data.decode()) #decode the data and print

mysoc.close() #close the connection

HTTP/1.1 200 OK
Date: Fri, 24 Jan 2020 23:02:34 GMT
Server: Apache/2.4.18 (Ubuntu)
Last-Modified: Sat, 13 May 2017 11:22:22 GMT
ETag: "a7-54f6609245537"
Accept-Ranges: bytes
Content-Length: 167
Cache-Control: max-age=0, no-cache, no-store, must-revalidate
Pragma: no-cache
Expires: Wed, 11 Jan 1984 05:00:00 GMT
Connection: close
Content-Type: text/plain

But soft what light through yonder window breaks
It is the east and Juliet is the sun
Arise fair sun and kill the envious moon
Who is already s
ick and pale with grief



## 4. Programs for web surfing <a name='surfing'></a>

- ASCII : American standard code for information interchange (single byte caracter)
- Unicode : multi-byte - UTF-16 (2 bytes) -> UTF-32 (4 bytes) -> __UTF-8__ (1-4 bytes, overlap with ASCII)

Since Python 3, everything is in Unicode now.

In [2]:
# return the ord value of a character
print(ord('0'))
print(ord('A'))
print(ord('\n'))
# for ASCII, number < upper < lower case

48
65
10


In [3]:
x1 = b'abc' #Meaning byte string (in ACEII)
x2 = 'abc'
x3 = '啊'

print(type(x1)) #now byte string is in ASCII
print(type(x2))
print(type(x3))

<class 'bytes'>
<class 'str'>
<class 'str'>


The `encode()` and `decode()` in the `socket` handles string <-> byte data conversion (default= UTF08).

### Using `urllib` module <a name='urllib'></a>
Instead of using `socket` to make your customized socket, the `urllib` automatically handles it for you.

In [2]:
import urllib.request, urllib.parse, urllib.error
#the urllib module

fhand = urllib.request.urlopen('http://data.pr4e.org/romeo.txt') #just like open a file
for _ in fhand:
    print(_.decode().strip()) #will skip the header
    
fhand.geturl()
fhand.getcode() #status code
fhand.info() #header info

But soft what light through yonder window breaks
It is the east and Juliet is the sun
Arise fair sun and kill the envious moon
Who is already sick and pale with grief


<http.client.HTTPMessage at 0x186afec5240>

### Web scraping using BeautifulSoup <a name='bsoup'></a>

Parsing of the HTML file that comes back. (__Caution__: some website may involve legal issue to be scraped).

In [1]:
# Test HTML
html_doc = """
<html><head><title>The Dormouse's story</title></head>
<body>
<p class="title"><b>The Dormouse's story</b></p>

<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1">Elsie</a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>

<p class="story">...</p>
"""

In [2]:
from bs4 import BeautifulSoup # test bs
soup = BeautifulSoup(html_doc, 'html.parser')

# print(soup.prettify()) # print everything in format

tags = soup('a') # retrienve all the hperlinks
for tag in tags:
    print(tag.get('href', None))

http://example.com/elsie
http://example.com/lacie
http://example.com/tillie


Put everything together: urllib + BeautifulSoup. Using SSL to handle the certificate requirement.

In [1]:
import urllib.request, urllib.parse, urllib.error
from bs4 import BeautifulSoup as bs
import ssl # for security measuresss

# ignore ssl certificate error
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE

In [8]:
# url = input('Enter - ')

# Test url (some website may have specific certificates need to be handled)
url1 = 'http://data.pr4e.org/page1.htm'
url2 = 'http://data.pr4e.org/page2.htm'

# include context to ifnore ssl cert
html = urllib.request.urlopen(url2, context=ctx).read() # read the entire file (not a for loop)
soup2 = bs(html, 'html.parser') # ti's a dictionary

# retrieve all the anchor tags
tags = soup2('a')
for tag in tags:
    print(tag.get('href', None))
    # print all hyperlinks

page1.htm


## 5. Web service and XML  <a name='xml'></a>

Web service is the wire protocol that sends data across the wire (network). Socket and Urllib are basic protocols that don't suit modern landcape (insecure). More specialized APIs to acceess the web.

### XML format
XML is an HTML style format for data transfer (standard - `<tag attr=value> text data </tag>`, or self-closing format - `<tad attr=value/>` - no text data). Flanking white space DOESN'T matter in XML scheme. The process of converting the data into/from the agreed web service format (language independent) as oppose to the internal structure on a local computer is called __serializing__ and __de-serializing__.

Complex XML can be graphed as a tree with nested tag as branches, and attrs and text as leaf (end) node.

### XML schema
XML schema is the contract (existed outside of the program) to standarize and constrain the srlz and de-srlz of the XML data between systems. If an XML doc meats the specification of a schema (XML contract), it is said to __'validate'__.

An XML contract is a special XML file that specify the formating of the XML data. There are sifferent schema languages to specify the schema: 
- __XSD__ - XML schema from W3C
    - structure - xs:element, xs:sequence, xs:complexType
    - Attr - name, type, minOccurs, maxOccurs
    - Type - xs:string, xs:decimal, xs:data, xs:datetime (in UTC/GMT, ISO 8601)
- ISO 8879:1986 SGML - standard generalized markup language
- DTD - document type definition

### Parsing XML in Python <a name='xml_parse'></a>

In [9]:
import xml.etree.ElementTree as et
# three ' to include multiple lines
data = '''
<person>
  <name>Chuck</name>
  <phone type="intl">
    +1 734 303 4456
  </phone>
  <email hide="yes" />
</person>'''

# making a tree from xml
tree = et.fromstring(data) # will fail if xml contains syntax error
# it sends the person object to tree, thus no need to specify 'person'
print('Name:', tree.find('name').text) # return the text in that tag
print('Attr:', tree.find('email').get('hide')) # get the value of the attr named hide

Name: Chuck
Attr: yes


If complex types /sequences are involved.

In [10]:
import xml.etree.ElementTree as et

input = '''
<stuff>
  <users>
    <user x="2">
      <id>001</id>
      <name>Chuck</name>
    </user>
    <user x="7">
      <id>009</id>
      <name>Brent</name>
    </user>
  </users>
</stuff>'''

stuff = et.fromstring(input)
# use findall to retrieve all elements and use `/` to navigate down the tag tree
lst = stuff.findall('users/user') # use `/` to navigate down the tag tree

# len return the no. of items
print('User count:', len(lst))

# for loop to print all
for item in lst:
    print('Name', item.find('name').text)
    print('ID', item.find('id').text)
    print('Attr', item.get('x'))

User count: 2
Name Chuck
ID 001
Attr 2
Name Brent
ID 009
Attr 7


## 6. JSON and REST architecture <a name='json'></a>

JSON (JavaScript Object Notation) is native to JS which is a builtin for all browsers nowadays. It's a nested list structure (object in JS or dictionary in Python). JSON doesn't distinguish text and attrs (all in kwargs). Handled using the `json` library.

In [6]:
import json

# test json data
data = '''
{
  "name" : "Chuck",
  "phone" : {
    "type" : "intl",
    "number" : "+1 734 303 4456"
   },
   "email" : {
     "hide" : "yes"
   }
}''' # JS object (nested)

info = json.loads(data) # load string : return a python dictionary
print('Name', info['name']) # access value using breacket method
print('Name', info['email']['hide']) # like a dict

Name Chuck
Name yes


Looping through JSON data.

In [16]:
import json

data = '''
[
  { "id" : "001",
    "x" : "2",
    "name" : "Chuck"
  } ,
  { "id" : "009",
    "x" : "7",
    "name" : "Brent"
  }
]'''

info = json.loads(data) # loads will auto loop through
print('User count:', len(info))

# like the xml example
for _ in info:
    print("Name", _['name'])
    print("ID", _['id'])
    print("Attr", _['x'])

# this returns a list of dicts
info.type()

User count: 2
Name Chuck
ID 001
Attr 2
Name Brent
ID 009
Attr 7


[{'id': '001', 'x': '2', 'name': 'Chuck'},
 {'id': '009', 'x': '7', 'name': 'Brent'}]

### Service oriented approach (APIs) <a name='json_api'></a>

__API__ - application programming interface. e.g.:
The __Google Geocode__ API. Syntax: `+` is white space, `%2C` is comma(,) etc.

##### Geocode JSON  <a name='geojson'></a>

In [1]:
import urllib.request, urllib.parse, urllib.error
import json
import ssl

api_key = False
# If you have a Google Places API key, enter it here
# api_key = 'AIzaSy___IDByT70'
# https://developers.google.com/maps/documentation/geocoding/intro

if api_key is False:
    api_key = 42
    serviceurl = 'http://py4e-data.dr-chuck.net/json?'
else :
    serviceurl = 'https://maps.googleapis.com/maps/api/geocode/json?'

# Ignore SSL certificate errors
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE

# my test data
address = 'Ann Arbor, MI'
count = 0

while True:
#     address = input('Enter location: ')
#     if len(address) < 1: break

    # customized loop breaker
    if count > 0:
        break
    
    # encoding search url (pass in the address and user key)
    parms = dict()
    parms['address'] = address
    if api_key is not False: parms['key'] = api_key
    url = serviceurl + urllib.parse.urlencode(parms) # encode the address for search engine to read
    
    # send request and retrieve results
    print('Retrieving', url)
    uh = urllib.request.urlopen(url, context=ctx)
    data = uh.read().decode() # also needs decoding
    print('Retrieved', len(data), 'characters')
    
    # parsing the json results
    try:
        js = json.loads(data)
    except:
        js = None
    
    # if not found
    if not js or 'status' not in js or js['status'] != 'OK':
        print('==== Failure To Retrieve ====')
        print(data)
        continue

    # print the result
    print(json.dumps(js, indent=4))

    lat = js['results'][0]['geometry']['location']['lat']
    lng = js['results'][0]['geometry']['location']['lng']
    print('lat', lat, 'lng', lng)
    location = js['results'][0]['formatted_address']
    print(location)
    
    # break the loop
    count += 1

Retrieving http://py4e-data.dr-chuck.net/json?address=Ann+Arbor%2C+MI&key=42
Retrieved 1736 characters
{
    "results": [
        {
            "address_components": [
                {
                    "long_name": "Ann Arbor",
                    "short_name": "Ann Arbor",
                    "types": [
                        "locality",
                        "political"
                    ]
                },
                {
                    "long_name": "Washtenaw County",
                    "short_name": "Washtenaw County",
                    "types": [
                        "administrative_area_level_2",
                        "political"
                    ]
                },
                {
                    "long_name": "Michigan",
                    "short_name": "MI",
                    "types": [
                        "administrative_area_level_1",
                        "political"
                    ]
                },
                {
     

##### Geocode XML  <a name='geoxml'></a>

In [2]:
import urllib.request, urllib.parse, urllib.error
import xml.etree.ElementTree as ET
import ssl

api_key = False
# If you have a Google Places API key, enter it here
# api_key = 'AIzaSy___IDByT70'
# https://developers.google.com/maps/documentation/geocoding/intro

if api_key is False:
    api_key = 42
    serviceurl = 'http://py4e-data.dr-chuck.net/xml?'
else :
    serviceurl = 'https://maps.googleapis.com/maps/api/geocode/xml?'

# Ignore SSL certificate errors
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE

# my test data
address = 'Ann Arbor, MI'
count = 0

while True:
#     address = input('Enter location: ')
#     if len(address) < 1: break

    # customized loop breaker
    if count > 0:
        break

    parms = dict()
    parms['address'] = address
    if api_key is not False: parms['key'] = api_key
    url = serviceurl + urllib.parse.urlencode(parms)
    print('Retrieving', url)
    uh = urllib.request.urlopen(url, context=ctx)

    data = uh.read()
    print('Retrieved', len(data), 'characters')
    print(data.decode()) # print the XML results
    
    # parsing the XML data
    tree = ET.fromstring(data)

    results = tree.findall('result')
    lat = results[0].find('geometry').find('location').find('lat').text
    lng = results[0].find('geometry').find('location').find('lng').text
    location = results[0].find('formatted_address').text

    print('lat', lat, 'lng', lng)
    print(location)
    
    # break the loop
    count += 1

Retrieving http://py4e-data.dr-chuck.net/xml?address=Ann+Arbor%2C+MI&key=42
Retrieved 1559 characters
<?xml version="1.0" encoding="UTF-8"?>
<GeocodeResponse>
 <status>OK</status>
 <result>
  <type>locality</type>
  <type>political</type>
  <formatted_address>Ann Arbor, MI, USA</formatted_address>
  <address_component>
   <long_name>Ann Arbor</long_name>
   <short_name>Ann Arbor</short_name>
   <type>locality</type>
   <type>political</type>
  </address_component>
  <address_component>
   <long_name>Washtenaw County</long_name>
   <short_name>Washtenaw County</short_name>
   <type>administrative_area_level_2</type>
   <type>political</type>
  </address_component>
  <address_component>
   <long_name>Michigan</long_name>
   <short_name>MI</short_name>
   <type>administrative_area_level_1</type>
   <type>political</type>
  </address_component>
  <address_component>
   <long_name>United States</long_name>
   <short_name>US</short_name>
   <type>country</type>
   <type>political</type>
  </

##### Twitter API  <a name='twitter'></a>

__Not finished__. Just to show you how this is done. Tokens and keys needed to use.

In [6]:
# twitter2.py
import urllib.request, urllib.parse, urllib.error
import twurl # put your authorization here
import json
import ssl

# https://apps.twitter.com/
# Create App and get the four strings, put them in hidden.py

TWITTER_URL = 'https://api.twitter.com/1.1/friends/list.json'

# Ignore SSL certificate errors
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE

while True:
    print('')
    acct = input('Enter Twitter Account:')
    if (len(acct) < 1): break
    url = twurl.augment(TWITTER_URL,
                        {'screen_name': acct, 'count': '5'})
    print('Retrieving', url)
    connection = urllib.request.urlopen(url, context=ctx)
    data = connection.read().decode()

    js = json.loads(data)
    print(json.dumps(js, indent=2))

    headers = dict(connection.getheaders())
    print('Remaining', headers['x-rate-limit-remaining'])

    for u in js['users']:
        print(u['screen_name'])
        if 'status' not in u:
            print('   * No status found')
            continue
        s = u['status']['text']
        print('  ', s[:50])


Enter Twitter Account:


In [4]:
## .  <a name=''></a>