## Extracting data from the Internet
> JSON, XML Tree and HTML Parsing

### Understanding OOP (Object Oriented Programming) and dot notation


Almost everything in Python is an object.  Every object has certain properties and methods.  The connection between the properties or methods and the object is indicated by the dot (.) written between them.

Below we define a type of object called Shark.  Within the class called Shark, we can define a method called swim.  This method is attached to the Shark object.  
```
class Shark:
    def swim(self):
        print("The shark is swimming")
```
A shark named Buddy is an instance of the Shark object.

```
Buddy = Shark()
```
We can then call the method attached to the Shark object using dot notation.
```
Buddy.swim()
```

In [None]:
# Define the object with class
class Shark:
    # create a function to be associated with this object
    def swim(self):
        print("The shark is swimming")

# Create a new instance of the Shark object
Buddy=Shark()
# Call the swim method
Buddy.swim()

### Review and Context


**We need ability to:**
- create and send HTTP requests
- receive and process HTTP requests
- convert data in JSON/XML/HTML format into python objects  

**Use Python libraries for getting web data:**
- requests
- urlib.requests

**Web Data Formats:**
- JSON - JavaScript Object Notation - json library  
- ***XML - Extensible Markup Language - lxml library***
- HTML - Hyper Text Markup Language - BeautifulSoup, Selenium


### Reviewing Requests


import the libary
```
import requests
```
construct the url
```
url="http://www.epicurious.com/search/Peanut+Sauce"
```
send the request and get a response
```
response=requests.get(url)
```
check if the request was successful (one way we can do this):
```
if response.status_code == 200:
    return "Success"
else:
    return "Failure"
```

### XML Extensible Markup Language


Although similar to HTML, it has a different purpose which is to organize and store data in a way that makes that data portable.  HTML is designed to display data.  For example:   Web Services use XML to send requests and responses back and forth.

**XML Structure:**
- Tree Structure - "element tree"
- Tagged elements (nested) - more detail than JSON
- Attributes associated with elements
- Text (the leaves of the tree) - this is where we find the data.  Getting data involves traversing the tree

In [4]:
data_string = """
<Bookstore>
   <Book ISBN="ISBN-13:978-1599620787" Price="15.23" Weight="1.5">
      <Title>New York Deco</Title>
      <Authors>
         <Author Residence="New York City">
            <First_Name>Richard</First_Name>
            <Last_Name>Berenholtz</Last_Name>
         </Author>
      </Authors>
   </Book>
   <Book ISBN="ISBN-13:978-1579128562" Price="15.80">
      <Remark>
      Five Hundred Buildings of New York and over one million other books are available for Amazon Kindle.
      </Remark>
      <Title>Five Hundred Buildings of New York</Title>
      <Authors>
         <Author Residence="Beijing">
            <First_Name>Bill</First_Name>
            <Last_Name>Harris</Last_Name>
         </Author>
         <Author Residence="New York City">
            <First_Name>Jorg</First_Name>
            <Last_Name>Brockmann</Last_Name>
         </Author>
      </Authors>
   </Book>
</Bookstore>
"""

#### lxml Python library
The tool we need is an object within the lxml library called etree (element tree).  etree is the data type that will contain the entire XML tree.   Assume we have some data from a file or a website and we import it as "data_string".  Next we define a variable called "root".  We take our data_string and input it as the argument to the XML function that is part of the etree module.  We assign all of that to our "root" variable.    

In [5]:
from lxml import etree # xml object definition
root=etree.XML(data_string) # extracts the structure of the XML data using the XML function
print(type(root)) # this is an object unique to the lxml library
print(root.tag) # prints the root tag

<class 'lxml.etree._Element'>
Bookstore


If we want to look at the entire object, we can do the following.  The tostring() function converts the object we called "root" into a string.  The decode("utf-8") is a step we need to run so that the characters will be interpreted in the correctly as a string.

In [6]:
result=etree.tostring(root, pretty_print=True).decode("utf-8")
print(type(result))
print(result)

<class 'str'>
<Bookstore>
   <Book ISBN="ISBN-13:978-1599620787" Price="15.23" Weight="1.5">
      <Title>New York Deco</Title>
      <Authors>
         <Author Residence="New York City">
            <First_Name>Richard</First_Name>
            <Last_Name>Berenholtz</Last_Name>
         </Author>
      </Authors>
   </Book>
   <Book ISBN="ISBN-13:978-1579128562" Price="15.80">
      <Remark>
      Five Hundred Buildings of New York and over one million other books are available for Amazon Kindle.
      </Remark>
      <Title>Five Hundred Buildings of New York</Title>
      <Authors>
         <Author Residence="Beijing">
            <First_Name>Bill</First_Name>
            <Last_Name>Harris</Last_Name>
         </Author>
         <Author Residence="New York City">
            <First_Name>Jorg</First_Name>
            <Last_Name>Brockmann</Last_Name>
         </Author>
      </Authors>
   </Book>
</Bookstore>



#### Accessing XML elements
Next we might want to extract all of the elements in the tree.  For this we will introduce the iterator, a powerful tool in python that we will use from time to time.  The iterator traverses one complete branch and then comes back up one level and traverses the next branch.

#### Iterating over an XML tree
- use an iterator
- The iterator will generate every tree element for a given subtree  
[More about element.iter()](https://docs.python.org/2/library/xml.etree.elementtree.html)

In [7]:
for element in root:
  print(element)

<Element Book at 0x108674100>
<Element Book at 0x108674d40>


In [8]:
for element in root.iter():
    print(element)

<Element Bookstore at 0x1081df580>
<Element Book at 0x10868d5c0>
<Element Title at 0x10868db00>
<Element Authors at 0x10868df00>
<Element Author at 0x10868d5c0>
<Element First_Name at 0x10868db00>
<Element Last_Name at 0x10868df00>
<Element Book at 0x10868d5c0>
<Element Remark at 0x10868db00>
<Element Title at 0x10868df00>
<Element Authors at 0x10868d5c0>
<Element Author at 0x10868db00>
<Element First_Name at 0x10868df00>
<Element Last_Name at 0x10868d5c0>
<Element Author at 0x10868db00>
<Element First_Name at 0x10868df00>
<Element Last_Name at 0x10868d5c0>


#### Use the child in subtree construction

In [9]:
# we can look at the elements one level under the root <Bookstore> element
for i in root:
    print(i)

<Element Book at 0x1086748c0>
<Element Book at 0x108675000>


#### Accessing the tag

In [10]:
# If we only want to see the name of the element, we can use the tag() function
for item in root:
    print(item.tag)

Book
Book


#### Using the iterator to get specific tags
- In the below example, only the author tags are accessed
- For each author tag, the .find function accesses the First_Name and Last_Name tags
- The .find function only looks at the children, not other descendants, so be careful!
- The .text attribute prints the text in a leaf node

In [11]:
for element in root.iter("Author"):
    print(element.find("First_Name").text, element.find("Last_Name").text)

Richard Berenholtz
Bill Harris
Jorg Brockmann


### Practice

In [None]:
example_data="""
<data>
    <country name="Liechtenstein">
        <rank>1</rank>
        <year>2008</year>
        <gdppc>141100</gdppc>
        <neighbor name="Austria" direction="E"/>
        <neighbor name="Switzerland" direction="W"/>
    </country>
    <country name="Singapore">
        <rank>4</rank>
        <year>2011</year>
        <gdppc>59900</gdppc>
        <neighbor name="Malaysia" direction="N"/>
    </country>
    <country name="Panama">
        <rank>68</rank>
        <year>2011</year>
        <gdppc>13600</gdppc>
        <neighbor name="Costa Rica" direction="W"/>
        <neighbor name="Colombia" direction="E"/>
    </country>
</data>
"""

1. Import etree from the lxml library  
Extract the structure of the XML data and assign it to the variable root_ex  
Print the tag associated with the XML structure  

In [None]:
from lxml import etree # xml object definition
root_ex=etree.XML(example_data) # extracts the structure of the XML data using the XML function
print(type(root_ex)) # this is an object unique to the lxml library
print(root_ex.tag) # print the tag associated with the XML structure

2. Use the tostring() function to convert the object called "root_ex" into a string. The decode("utf-8") is a step we need to run so that the characters will be interpreted in the correctly as a string.

In [None]:
result_ex=etree.tostring(root_ex).decode("utf-8")
print(type(result_ex))
print(result_ex)

3. Use an iterator to generate every tree element for a given subtree

In [None]:
for element in root_ex.iter():
    print(element.tag)

4. Access only the 'country' tags. For each 'country' tag, use the .find function accesses the rank and year tags.  The .find function only looks at the children, not other descendants, so be careful! Use the .text attribute prints the text in a leaf node.

In [None]:
for element in root_ex.iter("country"):
    print(element.find("rank").text, element.find("year").text)

#### Access The Title of the Book with ISBN-13:978-1599620787
Say we wanted to find the title of a book that has a specific attribute, for example find the title of a book by it's ISBN number. Below we use the code:  Book[@ISBN="ISBN-13:978-1599620787"], this is like saying, "book where ISBN = value"

In [None]:
print(root.find('Book[@ISBN="ISBN-13:978-1599620787"]/Title').text)

#### Access the last name of the author who wrote the book that weighs 1.5 units

In [None]:
print(root.find('Book[@Weight="1.5"]/Authors/Author/Last_Name').text)

#### Access the last name of the author who resides in Beijing

In [None]:
print(root.find('Book/Authors/Author[@Residence="Beijing"]/Last_Name').text)

#### Access the first name of the author who wrote the book that costs 15.80

In [None]:
# your code
print(root.find('Book[@Price="15.80"]/Authors/Author/First_Name').text)

#### Access the last name of the author who's residence is New York City

In [None]:
# your code
print(root.find('Book/Authors/Author[@Residence="New York City"]/Last_Name').text)

#### Access the first name and last name of the author who's residence is New York City

In [None]:
print(root.find('Book/Authors/Author[@Residence="New York City"]/First_Name').text,
      root.find('Book/Authors/Author[@Residence="New York City"]/Last_Name').text)

## Exercises

### 1. List Operations:  

In [None]:
def nested_list(lst):
    # start by printing the initial state of the list
    print(lst)

    # a. Add the values 'k', 'l', 'm' to the list using one command
    lst.extend(['k','l','m'])
    print(lst)
    # should return ['a', ['bb', ['ccc', 'ddd'], 'ee', 'ff'], 'g', ['hh', 'ii'], 'j', 'k', 'l', 'm']

    # b. Add the list ['nn','oo'] to the list using one command
    lst.append(['nn','oo'])
    print(lst)
    # should return ['a', ['bb', ['ccc', 'ddd'], 'ee', 'ff'], 'g', ['hh', 'ii'], 'j', 'k', 'l', 'm', ['nn', 'oo']]

    # c. Delete the last item from the list
    lst.pop()
    print(lst)
    # should return ['a', ['bb', ['ccc', 'ddd'], 'ee', 'ff'], 'g', ['hh', 'ii'], 'j', 'k', 'l', 'm']

    # d. Show how to access the value 'ddd' and print this output
    print(lst[1][1][1])

    # e. Delete the value 'ee' from the list
    lst[1].remove('ee')
    print(lst)
    # should return ['a', ['bb', ['ccc', 'ddd'], 'ff'], 'g', ['hh', 'ii'], 'j', 'k', 'l', 'm']


In [None]:
data = ['a', ['bb', ['ccc', 'ddd'], 'ee', 'ff'], 'g', ['hh', 'ii'], 'j']
nested_list(data)

['a', ['bb', ['ccc', 'ddd'], 'ee', 'ff'], 'g', ['hh', 'ii'], 'j']
['a', ['bb', ['ccc', 'ddd'], 'ee', 'ff'], 'g', ['hh', 'ii'], 'j', 'k', 'l', 'm']
['a', ['bb', ['ccc', 'ddd'], 'ee', 'ff'], 'g', ['hh', 'ii'], 'j', 'k', 'l', 'm', ['nn', 'oo']]
['a', ['bb', ['ccc', 'ddd'], 'ee', 'ff'], 'g', ['hh', 'ii'], 'j', 'k', 'l', 'm']
ddd
['a', ['bb', ['ccc', 'ddd'], 'ff'], 'g', ['hh', 'ii'], 'j', 'k', 'l', 'm']


### 2. Dictionary Operations


In [None]:
def dict_operations(dct):
    # print the original dictionary
    print(dct)

    # a. Add the key value pair "stylus": 16
    dct['stylus'] = 16
    print(dct)

    # b. Update the ipad inventory to 19
    dct['ipad'] = 19
    print(dct)

    # c. Remove the key value pair, "monitor": 5
    del(dct['monitor'])
    print(dct)

    # d. Extract each of the values in the dictionary and add them to the list called counts
    counts=[]
    for i in dct:
        counts.append(dct[i])
    print(counts)

In [None]:
inventory = {
    "iphone": 40,
    "ipad": 17,
    "macbook": 12,
    "monitor": 5
}

dict_operations(inventory)

{'iphone': 40, 'ipad': 17, 'macbook': 12, 'monitor': 5}
{'iphone': 40, 'ipad': 17, 'macbook': 12, 'monitor': 5, 'stylus': 16}
{'iphone': 40, 'ipad': 19, 'macbook': 12, 'monitor': 5, 'stylus': 16}
{'iphone': 40, 'ipad': 19, 'macbook': 12, 'stylus': 16}
[40, 19, 12, 16]


### 3. Extracting Semi-Structured Data

In [None]:
def nested_data(dct):
    lst=[]
    for i in dct:
        lst.append(i['lastSalePrice'])
    print(lst)

In [None]:
stocks=[
    {
        "symbol": "ARAY",
        "sector": "healthtechnology",
        "lastSalePrice": 4.865
    },
    {
        "symbol": "CPS",
        "sector": "producermanufacturing",
        "lastSalePrice": 50.31
    },
    {
        "symbol": "BOND",
        "sector": "miscellaneous",
        "lastSalePrice": 104.8
    }
]

nested_data(stocks)

[4.865, 50.31, 104.8]


### 4.  Converting JSON Data

In [None]:
def convert_to_python(data):
    # Load the json library
    import json

    # a. Convert to a python data object and print the result
    python_data=json.loads(data)
    print(python_data)

    # Check the data types
    print(python_data, type(python_data))
    print(python_data[0], type(python_data[0]))
    print(python_data[0]["A"], type(python_data[0]["A"]))

In [None]:
json_data='[{"A": [1,2]}, {"B": [3,4]}]'

convert_to_python(json_data)

[{'A': [1, 2]}, {'B': [3, 4]}]
[{'A': [1, 2]}, {'B': [3, 4]}] <class 'list'>
{'A': [1, 2]} <class 'dict'>
[1, 2] <class 'list'>


In [None]:
def convert_to_json(data):
    # Load the json library
    import json

    # b. Convert the following to json data
    json_stocks=json.dumps(data)

    # Print the data and check the data type of the converted data object
    print(json_stocks)
    print(type(json_stocks))

In [None]:
stocks=[
    {
        "symbol": "ARAY",
        "sector": "healthtechnology",
        "lastSalePrice": 4.865
    },
    {
        "symbol": "CPS",
        "sector": "producermanufacturing",
        "lastSalePrice": 50.31
    },
    {
        "symbol": "BOND",
        "sector": "miscellaneous",
        "lastSalePrice": 104.8
    }
]

convert_to_json(stocks)

[{"symbol": "ARAY", "sector": "healthtechnology", "lastSalePrice": 4.865}, {"symbol": "CPS", "sector": "producermanufacturing", "lastSalePrice": 50.31}, {"symbol": "BOND", "sector": "miscellaneous", "lastSalePrice": 104.8}]
<class 'str'>


### 5.  Extracting JSON data from an API
This website, https://domainsdb.info/. The corresponding API can be examined here: https://api.domainsdb.info/v1/domains/search?domain=census allows users to find domain names. Here is the structure of the data in the API (values differ due to frequent updates):
![domains](https://drive.google.com/uc?id=1yqnxQcfexP7FF09gEJP2qppavpoj2U1w)           

In [None]:
def extract_domains(data):
    # get request converted from json
    import json
    #a. convert the json data into a python object
    response=json.loads(data)
    print(type(response), '\n')

    # b. Print all of the keys
    for key in response:
        print(key)
    print('\n')

    # c. Access the first domain name in the data object.
    # census-rasp.ru
    print(response['domains'][0]['domain'],'\n')

    # d. Collect all of the domain names in a list.
    # ['census-rasp.ru', 'census-ghana.net', 'census-india.com', 'census-r7-pj-dev.com', 'us-census-bureau.com']
    dmn=[]
    for i in response['domains']:
        dmn.append(i['domain'])
    print(dmn[:5])

Please run the code block below to define the simulated JSON data, called `domain_api`, that could be pulled from the census API.
↓

In [None]:
#@title
domain_api = """
{
  "domains": [
    {
      "domain": "census-rasp.ru",
      "create_date": "2023-02-18T16:00:07.927224",
      "update_date": "2023-02-18T16:00:07.927227",
      "country": null,
      "isDead": "False",
      "A": null,
      "NS": null,
      "CNAME": null,
      "MX": null,
      "TXT": null
    },
    {
      "domain": "census-ghana.net",
      "create_date": "2023-02-17T07:11:45.132922",
      "update_date": "2023-02-17T07:11:45.132925",
      "country": "US",
      "isDead": "False",
      "A": [
        "192.243.99.137"
      ],
      "NS": [
        "ns2.trhdns.com",
        "ns1.trhdns.com"
      ],
      "CNAME": null,
      "MX": [
        {
          "exchange": "census-ghana.net",
          "priority": 0
        }
      ],
      "TXT": null
    },
    {
      "domain": "census-india.com",
      "create_date": "2023-02-17T07:10:00.616301",
      "update_date": "2023-02-17T07:10:00.616304",
      "country": null,
      "isDead": "False",
      "A": null,
      "NS": null,
      "CNAME": null,
      "MX": null,
      "TXT": null
    },
    {
      "domain": "census-r7-pj-dev.com",
      "create_date": "2023-01-15T18:27:37.091139",
      "update_date": "2023-01-15T18:27:37.091143",
      "country": null,
      "isDead": "False",
      "A": null,
      "NS": null,
      "CNAME": null,
      "MX": null,
      "TXT": null
    },
    {
      "domain": "us-census-bureau.com",
      "create_date": "2022-12-29T15:35:08.919483",
      "update_date": "2022-12-29T15:35:08.919485",
      "country": "US",
      "isDead": "False",
      "A": [
        "208.91.197.27"
      ],
      "NS": [
        "ns81.worldnic.com",
        "ns82.worldnic.com"
      ],
      "CNAME": null,
      "MX": [
        {
          "exchange": "p.webcom.ctmail.com",
          "priority": 10
        }
      ],
      "TXT": null
    },
    {
      "domain": "us-census-careers.com",
      "create_date": "2022-12-29T15:35:08.919696",
      "update_date": "2022-12-29T15:35:08.919699",
      "country": "US",
      "isDead": "False",
      "A": [
        "208.91.197.27"
      ],
      "NS": [
        "ns83.worldnic.com",
        "ns84.worldnic.com"
      ],
      "CNAME": null,
      "MX": [
        {
          "exchange": "p.webcom.ctmail.com",
          "priority": 10
        }
      ],
      "TXT": null
    },
    {
      "domain": "us-census-jobs.com",
      "create_date": "2022-12-29T15:35:08.919900",
      "update_date": "2022-12-29T15:35:08.919902",
      "country": "US",
      "isDead": "False",
      "A": [
        "208.91.197.27"
      ],
      "NS": [
        "ns1.worldnic.com",
        "ns2.worldnic.com"
      ],
      "CNAME": null,
      "MX": [
        {
          "exchange": "p.webcom.ctmail.com",
          "priority": 10
        }
      ],
      "TXT": null
    },
    {
      "domain": "start-census-online.com",
      "create_date": "2022-12-29T14:23:12.034842",
      "update_date": "2022-12-29T14:23:12.034844",
      "country": "US",
      "isDead": "False",
      "A": [
        "208.91.197.27"
      ],
      "NS": [
        "ns5.worldnic.com",
        "ns6.worldnic.com"
      ],
      "CNAME": null,
      "MX": [
        {
          "exchange": "p.webcom.ctmail.com",
          "priority": 10
        }
      ],
      "TXT": null
    },
    {
      "domain": "respond-to-census-online.com",
      "create_date": "2022-12-29T13:19:28.761344",
      "update_date": "2022-12-29T13:19:28.761346",
      "country": "US",
      "isDead": "False",
      "A": [
        "208.91.197.27"
      ],
      "NS": [
        "ns47.worldnic.com",
        "ns48.worldnic.com"
      ],
      "CNAME": null,
      "MX": [
        {
          "exchange": "p.webcom.ctmail.com",
          "priority": 10
        }
      ],
      "TXT": null
    },
    {
      "domain": "respond-census-online.com",
      "create_date": "2022-12-29T13:19:28.755245",
      "update_date": "2022-12-29T13:19:28.755247",
      "country": "US",
      "isDead": "False",
      "A": [
        "208.91.197.27"
      ],
      "NS": [
        "ns91.worldnic.com",
        "ns92.worldnic.com"
      ],
      "CNAME": null,
      "MX": [
        {
          "exchange": "p.webcom.ctmail.com",
          "priority": 10
        }
      ],
      "TXT": null
    },
    {
      "domain": "online-census-form.com",
      "create_date": "2022-12-29T12:17:34.813694",
      "update_date": "2022-12-29T12:17:34.813696",
      "country": "US",
      "isDead": "False",
      "A": [
        "208.91.197.27"
      ],
      "NS": [
        "ns51.worldnic.com",
        "ns52.worldnic.com"
      ],
      "CNAME": null,
      "MX": [
        {
          "exchange": "p.webcom.ctmail.com",
          "priority": 10
        }
      ],
      "TXT": null
    },
    {
      "domain": "online-census-survey.com",
      "create_date": "2022-12-29T12:17:34.813908",
      "update_date": "2022-12-29T12:17:34.813910",
      "country": "US",
      "isDead": "False",
      "A": [
        "208.91.197.27"
      ],
      "NS": [
        "ns6.worldnic.com",
        "ns5.worldnic.com"
      ],
      "CNAME": null,
      "MX": [
        {
          "exchange": "p.webcom.ctmail.com",
          "priority": 10
        }
      ],
      "TXT": null
    },
    {
      "domain": "census-and-sensibility.com",
      "create_date": "2022-10-22T16:13:55.271374",
      "update_date": "2022-12-09T05:45:29.360541",
      "country": "US",
      "isDead": "False",
      "A": [
        "192.0.78.24",
        "192.0.78.25"
      ],
      "NS": [
        "ns1.wordpress.com",
        "ns3.wordpress.com",
        "ns2.wordpress.com"
      ],
      "CNAME": null,
      "MX": null,
      "TXT": null
    },
    {
      "domain": "census-cameroon.com",
      "create_date": "2022-10-22T16:13:55.273204",
      "update_date": "2022-12-09T05:45:29.357367",
      "country": "FR",
      "isDead": "False",
      "A": [
        "91.234.195.40"
      ],
      "NS": [
        "ns2.dnshostservices.com",
        "ns1.dnshostservices.com"
      ],
      "CNAME": null,
      "MX": [
        {
          "exchange": "census-cameroon.com",
          "priority": 0
        }
      ],
      "TXT": [
        "v=spf1 +a +mx +ip4:91.234.195.40 +ip4:91.234.195.41 ~all"
      ]
    },
    {
      "domain": "census-docs.com",
      "create_date": "2022-10-22T16:13:55.274478",
      "update_date": "2022-12-09T05:45:29.357402",
      "country": "US",
      "isDead": "False",
      "A": [
        "34.204.131.44",
        "34.148.79.160"
      ],
      "NS": [
        "dns1.p01.nsone.net",
        "dns2.p01.nsone.net",
        "dns3.p01.nsone.net",
        "dns4.p01.nsone.net"
      ],
      "CNAME": null,
      "MX": null,
      "TXT": null
    },
    {
      "domain": "census-mapping.com",
      "create_date": "2022-10-22T16:13:55.277080",
      "update_date": "2022-12-09T05:45:29.357426",
      "country": "VG",
      "isDead": "False",
      "A": [
        "208.91.197.39"
      ],
      "NS": [
        "dns101.register.com",
        "dns102.register.com"
      ],
      "CNAME": null,
      "MX": null,
      "TXT": null
    },
    {
      "domain": "census-form.com",
      "create_date": "2022-11-20T12:44:23.406326",
      "update_date": "2022-11-20T12:44:23.406328",
      "country": null,
      "isDead": "False",
      "A": null,
      "NS": null,
      "CNAME": null,
      "MX": null,
      "TXT": null
    },
    {
      "domain": "census-online.site",
      "create_date": "2022-11-13T01:08:13.518473",
      "update_date": "2022-11-14T06:39:06.320700",
      "country": null,
      "isDead": "False",
      "A": null,
      "NS": [
        "albert.ns.cloudflare.com",
        "elma.ns.cloudflare.com"
      ],
      "CNAME": null,
      "MX": null,
      "TXT": [
        "v=spf1 -all"
      ]
    },
    {
      "domain": "better-census-api.com",
      "create_date": "2022-10-20T07:19:45.184769",
      "update_date": "2022-11-03T20:33:19.364097",
      "country": null,
      "isDead": "False",
      "A": null,
      "NS": [
        "ns1.digitalocean.com",
        "ns2.digitalocean.com",
        "ns3.digitalocean.com"
      ],
      "CNAME": null,
      "MX": null,
      "TXT": null
    },
    {
      "domain": "census-fill.com",
      "create_date": "2022-10-22T16:13:55.274928",
      "update_date": "2022-11-01T07:23:15.744073",
      "country": "US",
      "isDead": "False",
      "A": [
        "173.239.8.164",
        "173.239.5.6",
        "74.206.228.78"
      ],
      "NS": [
        "ns1.expiereddnsmanager.com",
        "ns2.expiereddnsmanager.com"
      ],
      "CNAME": null,
      "MX": [
        {
          "exchange": "mx7.census-fill.com",
          "priority": 1
        }
      ],
      "TXT": null
    },
    {
      "domain": "census-mail.com",
      "create_date": "2022-10-22T16:13:55.276705",
      "update_date": "2022-11-01T07:23:15.744113",
      "country": "PT",
      "isDead": "False",
      "A": [
        "94.46.176.163"
      ],
      "NS": [
        "ns8.mydnspt.net",
        "ns1.mydnspt.net",
        "ns2.mydnspt.net",
        "ns7.mydnspt.net"
      ],
      "CNAME": null,
      "MX": [
        {
          "exchange": "mail.census-mail.com",
          "priority": 0
        }
      ],
      "TXT": [
        "v=spf1 +a +mx +ip4:94.46.176.202 +ip4:94.46.176.163 +include:_spf.cleanmx.pt ~all"
      ]
    },
    {
      "domain": "census-resource.com",
      "create_date": "2022-10-22T16:13:55.278049",
      "update_date": "2022-11-01T07:23:15.744154",
      "country": "US",
      "isDead": "False",
      "A": [
        "34.102.136.180"
      ],
      "NS": [
        "ns73.domaincontrol.com",
        "ns74.domaincontrol.com"
      ],
      "CNAME": null,
      "MX": null,
      "TXT": null
    },
    {
      "domain": "census-search.com",
      "create_date": "2022-10-22T16:13:55.278247",
      "update_date": "2022-11-01T07:23:15.744184",
      "country": "US",
      "isDead": "False",
      "A": [
        "18.191.220.148"
      ],
      "NS": [
        "ns-cloud-e1.googledomains.com",
        "ns-cloud-e2.googledomains.com",
        "ns-cloud-e4.googledomains.com",
        "ns-cloud-e3.googledomains.com"
      ],
      "CNAME": null,
      "MX": null,
      "TXT": null
    },
    {
      "domain": "fill-out-census-online.com",
      "create_date": "2022-10-22T18:33:10.862651",
      "update_date": "2022-10-22T18:33:10.862653",
      "country": "US",
      "isDead": "False",
      "A": [
        "208.91.197.27"
      ],
      "NS": [
        "ns53.worldnic.com",
        "ns54.worldnic.com"
      ],
      "CNAME": null,
      "MX": [
        {
          "exchange": "p.webcom.ctmail.com",
          "priority": 10
        }
      ],
      "TXT": null
    },
    {
      "domain": "fill-in-census-online.com",
      "create_date": "2022-10-22T18:33:10.841105",
      "update_date": "2022-10-22T18:33:10.841110",
      "country": "US",
      "isDead": "False",
      "A": [
        "208.91.197.27"
      ],
      "NS": [
        "ns14.worldnic.com",
        "ns13.worldnic.com"
      ],
      "CNAME": null,
      "MX": [
        {
          "exchange": "p.webcom.ctmail.com",
          "priority": 10
        }
      ],
      "TXT": null
    },
    {
      "domain": "complete-census-online.com",
      "create_date": "2022-10-22T16:44:28.230114",
      "update_date": "2022-10-22T16:44:28.230116",
      "country": "US",
      "isDead": "False",
      "A": [
        "208.91.197.27"
      ],
      "NS": [
        "ns64.worldnic.com",
        "ns63.worldnic.com"
      ],
      "CNAME": null,
      "MX": [
        {
          "exchange": "p.webcom.ctmail.com",
          "priority": 10
        }
      ],
      "TXT": null
    },
    {
      "domain": "census-test.com",
      "create_date": "2022-10-22T16:13:55.279081",
      "update_date": "2022-10-22T16:13:55.279083",
      "country": "GB",
      "isDead": "False",
      "A": [
        "212.53.89.138"
      ],
      "NS": [
        "ns1.netnames.net",
        "ns2.netnames.net"
      ],
      "CNAME": null,
      "MX": [
        {
          "exchange": "relay2.netnames.net",
          "priority": 100
        },
        {
          "exchange": "relay1.netnames.net",
          "priority": 10
        }
      ],
      "TXT": null
    },
    {
      "domain": "census-work.com",
      "create_date": "2022-10-22T16:13:55.279429",
      "update_date": "2022-10-22T16:13:55.279431",
      "country": "US",
      "isDead": "False",
      "A": [
        "208.91.197.27"
      ],
      "NS": [
        "ns10.worldnic.com",
        "ns9.worldnic.com"
      ],
      "CNAME": null,
      "MX": [
        {
          "exchange": "p.webcom.ctmail.com",
          "priority": 10
        }
      ],
      "TXT": null
    },
    {
      "domain": "census-treuhand.com",
      "create_date": "2022-10-22T16:13:55.279255",
      "update_date": "2022-10-22T16:13:55.279257",
      "country": "DE",
      "isDead": "False",
      "A": [
        "176.28.24.21"
      ],
      "NS": [
        "ns2.c4b1.net",
        "ns1.c4b1.net"
      ],
      "CNAME": null,
      "MX": [
        {
          "exchange": "cm4allbusiness.de",
          "priority": 10
        }
      ],
      "TXT": null
    },
    {
      "domain": "census-sensibility.com",
      "create_date": "2022-10-22T16:13:55.278672",
      "update_date": "2022-10-22T16:13:55.278676",
      "country": "GB",
      "isDead": "False",
      "A": [
        "82.145.42.127"
      ],
      "NS": [
        "ns2.cloudabove.com",
        "ns1.cloudabove.com"
      ],
      "CNAME": null,
      "MX": [
        {
          "exchange": "census-sensibility.com",
          "priority": 0
        }
      ],
      "TXT": [
        "v=spf1 +a +mx +ip4:82.145.42.127 ~all"
      ]
    },
    {
      "domain": "census-steuerberatung.com",
      "create_date": "2022-10-22T16:13:55.278905",
      "update_date": "2022-10-22T16:13:55.278908",
      "country": "DE",
      "isDead": "False",
      "A": [
        "81.169.145.158"
      ],
      "NS": [
        "shades17.rzone.de",
        "docks14.rzone.de"
      ],
      "CNAME": null,
      "MX": [
        {
          "exchange": "smtpin.rzone.de",
          "priority": 5
        }
      ],
      "TXT": null
    },
    {
      "domain": "census-sense.com",
      "create_date": "2022-10-22T16:13:55.278454",
      "update_date": "2022-10-22T16:13:55.278456",
      "country": "VG",
      "isDead": "False",
      "A": [
        "208.91.196.74"
      ],
      "NS": [
        "sk.s5.ans1.ns117.ztomy.com",
        "sk.s5.ans2.ns117.ztomy.com"
      ],
      "CNAME": null,
      "MX": null,
      "TXT": [
        "v=spf1 a -all"
      ]
    },
    {
      "domain": "census-online.com",
      "create_date": "2022-10-22T16:13:55.277252",
      "update_date": "2022-10-22T16:13:55.277254",
      "country": "US",
      "isDead": "False",
      "A": [
        "173.244.217.232"
      ],
      "NS": [
        "ns64.worldnic.com",
        "ns63.worldnic.com"
      ],
      "CNAME": null,
      "MX": [
        {
          "exchange": "mail.census-online.com",
          "priority": 10
        }
      ],
      "TXT": [
        "google-site-verification=NZWbFlP_xyqYoVq-K7D8QX7awCiN0TX7aTBMhjJ00Xk"
      ]
    },
    {
      "domain": "census-records.com",
      "create_date": "2022-10-22T16:13:55.277840",
      "update_date": "2022-10-22T16:13:55.277842",
      "country": "CA",
      "isDead": "False",
      "A": [
        "216.138.250.172"
      ],
      "NS": [
        "ns22.domaincontrol.com",
        "ns21.domaincontrol.com"
      ],
      "CNAME": null,
      "MX": [
        {
          "exchange": "mailstore1.secureserver.net",
          "priority": 10
        },
        {
          "exchange": "smtp.secureserver.net",
          "priority": 0
        }
      ],
      "TXT": null
    },
    {
      "domain": "census-ot.com",
      "create_date": "2022-10-22T16:13:55.277424",
      "update_date": "2022-10-22T16:13:55.277426",
      "country": null,
      "isDead": "False",
      "A": null,
      "NS": null,
      "CNAME": null,
      "MX": null,
      "TXT": null
    },
    {
      "domain": "census-q.com",
      "create_date": "2022-10-22T16:13:55.277597",
      "update_date": "2022-10-22T16:13:55.277599",
      "country": "DE",
      "isDead": "False",
      "A": [
        "91.195.240.135"
      ],
      "NS": [
        "ns1.meganameservers.eu",
        "ns3.meganameservers.eu",
        "ns2.meganameservers.eu"
      ],
      "CNAME": null,
      "MX": [
        {
          "exchange": "mx1c51.megamailservers.eu",
          "priority": 10
        },
        {
          "exchange": "mx3c51.megamailservers.eu",
          "priority": 110
        },
        {
          "exchange": "mx2c51.megamailservers.eu",
          "priority": 100
        }
      ],
      "TXT": null
    },
    {
      "domain": "census-info.com",
      "create_date": "2022-10-22T16:13:55.276012",
      "update_date": "2022-10-22T16:13:55.276014",
      "country": "US",
      "isDead": "False",
      "A": [
        "206.189.254.20",
        "138.197.224.229"
      ],
      "NS": [
        "ns-1057.awsdns-04.org",
        "ns-1908.awsdns-46.co.uk",
        "ns-201.awsdns-25.com",
        "ns-997.awsdns-60.net"
      ],
      "CNAME": null,
      "MX": null,
      "TXT": null
    },
    {
      "domain": "census-jobs.com",
      "create_date": "2022-10-22T16:13:55.276358",
      "update_date": "2022-10-22T16:13:55.276361",
      "country": "US",
      "isDead": "False",
      "A": [
        "208.91.197.27"
      ],
      "NS": [
        "ns39.worldnic.com",
        "ns40.worldnic.com"
      ],
      "CNAME": null,
      "MX": [
        {
          "exchange": "p.webcom.ctmail.com",
          "priority": 10
        }
      ],
      "TXT": null
    },
    {
      "domain": "census-map.com",
      "create_date": "2022-10-22T16:13:55.276905",
      "update_date": "2022-10-22T16:13:55.276907",
      "country": "CA",
      "isDead": "False",
      "A": [
        "198.50.252.64"
      ],
      "NS": [
        "ns1.onlydomains.com",
        "ns3.onlydomains.com",
        "ns2.onlydomains.com"
      ],
      "CNAME": null,
      "MX": [
        {
          "exchange": "mail.census-map.com",
          "priority": 10
        }
      ],
      "TXT": null
    },
    {
      "domain": "census-jiniucloud.com",
      "create_date": "2022-10-22T16:13:55.276185",
      "update_date": "2022-10-22T16:13:55.276187",
      "country": null,
      "isDead": "False",
      "A": null,
      "NS": null,
      "CNAME": null,
      "MX": null,
      "TXT": null
    },
    {
      "domain": "census-labs.com",
      "create_date": "2022-10-22T16:13:55.276531",
      "update_date": "2022-10-22T16:13:55.276534",
      "country": "US",
      "isDead": "False",
      "A": [
        "207.38.94.44"
      ],
      "NS": [
        "ns1.webfaction.com",
        "ns2.webfaction.com",
        "ns4.webfaction.com",
        "ns3.webfaction.com"
      ],
      "CNAME": null,
      "MX": [
        {
          "exchange": "mx9.webfaction.com",
          "priority": 10
        },
        {
          "exchange": "mx8.webfaction.com",
          "priority": 10
        },
        {
          "exchange": "mx7.webfaction.com",
          "priority": 10
        }
      ],
      "TXT": [
        "v=spf1 a:smtp.webfaction.com ~all"
      ]
    },
    {
      "domain": "census-immobilien.com",
      "create_date": "2022-10-22T16:13:55.275833",
      "update_date": "2022-10-22T16:13:55.275835",
      "country": "DE",
      "isDead": "False",
      "A": [
        "89.31.143.1"
      ],
      "NS": [
        "ns.udag.de",
        "ns.udag.net",
        "ns.udag.org"
      ],
      "CNAME": null,
      "MX": [
        {
          "exchange": "mx01.udag.de",
          "priority": 20
        },
        {
          "exchange": "mx00.udag.de",
          "priority": 10
        }
      ],
      "TXT": null
    },
    {
      "domain": "census-gov.com",
      "create_date": "2022-10-22T16:13:55.275363",
      "update_date": "2022-10-22T16:13:55.275365",
      "country": null,
      "isDead": "False",
      "A": null,
      "NS": [
        "ns09.domaincontrol.com",
        "ns10.domaincontrol.com"
      ],
      "CNAME": null,
      "MX": [
        {
          "exchange": "mx3.zoho.com",
          "priority": 50
        },
        {
          "exchange": "mx.zoho.com",
          "priority": 10
        },
        {
          "exchange": "mx2.zoho.com",
          "priority": 20
        }
      ],
      "TXT": [
        "v=spf1 include:zoho.com ~all"
      ]
    },
    {
      "domain": "census-gmbh.com",
      "create_date": "2022-10-22T16:13:55.275185",
      "update_date": "2022-10-22T16:13:55.275187",
      "country": "DE",
      "isDead": "False",
      "A": [
        "217.160.233.146"
      ],
      "NS": [
        "ns1108.ui-dns.org",
        "ns1108.ui-dns.com",
        "ns1108.ui-dns.biz",
        "ns1108.ui-dns.de"
      ],
      "CNAME": null,
      "MX": [
        {
          "exchange": "secmail02.cloud4partner.net",
          "priority": 20
        },
        {
          "exchange": "secmail01.cloud4partner.net",
          "priority": 10
        }
      ],
      "TXT": null
    },
    {
      "domain": "census-charts.com",
      "create_date": "2022-10-22T16:13:55.274132",
      "update_date": "2022-10-22T16:13:55.274134",
      "country": "US",
      "isDead": "False",
      "A": [
        "104.237.150.58"
      ],
      "NS": [
        "ns2.linode.com",
        "ns1.linode.com",
        "ns5.linode.com",
        "ns3.linode.com",
        "ns4.linode.com"
      ],
      "CNAME": null,
      "MX": [
        {
          "exchange": "mail.census-charts.com",
          "priority": 10
        }
      ],
      "TXT": null
    },
    {
      "domain": "census-facts.com",
      "create_date": "2022-10-22T16:13:55.274683",
      "update_date": "2022-10-22T16:13:55.274685",
      "country": "NL",
      "isDead": "False",
      "A": [
        "37.97.254.27"
      ],
      "NS": [
        "ns0.transip.net",
        "ns2.transip.eu",
        "ns1.transip.nl"
      ],
      "CNAME": null,
      "MX": [
        {
          "exchange": "census-facts.com",
          "priority": 10
        }
      ],
      "TXT": null
    },
    {
      "domain": "census-capital.com",
      "create_date": "2022-10-22T16:13:55.273551",
      "update_date": "2022-10-22T16:13:55.273553",
      "country": "DE",
      "isDead": "False",
      "A": [
        "89.31.143.1"
      ],
      "NS": [
        "ns.udag.de",
        "ns.udag.org",
        "ns.udag.net"
      ],
      "CNAME": null,
      "MX": [
        {
          "exchange": "mx00.udag.de",
          "priority": 10
        },
        {
          "exchange": "mx01.udag.de",
          "priority": 20
        }
      ],
      "TXT": null
    },
    {
      "domain": "census-center-1950.com",
      "create_date": "2022-10-22T16:13:55.273957",
      "update_date": "2022-10-22T16:13:55.273960",
      "country": "US",
      "isDead": "False",
      "A": [
        "192.64.119.71"
      ],
      "NS": [
        "dns1.registrar-servers.com",
        "dns2.registrar-servers.com"
      ],
      "CNAME": null,
      "MX": [
        {
          "exchange": "eforward4.registrar-servers.com",
          "priority": 15
        },
        {
          "exchange": "eforward1.registrar-servers.com",
          "priority": 10
        },
        {
          "exchange": "eforward2.registrar-servers.com",
          "priority": 10
        },
        {
          "exchange": "eforward5.registrar-servers.com",
          "priority": 20
        },
        {
          "exchange": "eforward3.registrar-servers.com",
          "priority": 10
        }
      ],
      "TXT": [
        "v=spf1 include:spf.efwd.registrar-servers.com ~all"
      ]
    },
    {
      "domain": "census-bureau.com",
      "create_date": "2022-10-22T16:13:55.273026",
      "update_date": "2022-10-22T16:13:55.273028",
      "country": "US",
      "isDead": "False",
      "A": [
        "208.91.197.27"
      ],
      "NS": [
        "ns65.worldnic.com",
        "ns66.worldnic.com"
      ],
      "CNAME": null,
      "MX": [
        {
          "exchange": "p.webcom.ctmail.com",
          "priority": 10
        }
      ],
      "TXT": null
    },
    {
      "domain": "census-careers.com",
      "create_date": "2022-10-22T16:13:55.273724",
      "update_date": "2022-10-22T16:13:55.273726",
      "country": "US",
      "isDead": "False",
      "A": [
        "208.91.197.27"
      ],
      "NS": [
        "ns65.worldnic.com",
        "ns66.worldnic.com"
      ],
      "CNAME": null,
      "MX": [
        {
          "exchange": "p.webcom.ctmail.com",
          "priority": 10
        }
      ],
      "TXT": null
    }
  ],
  "total": 82,
  "time": "1806",
  "next_page": null
}
"""

In [None]:
# Run the function with the domain_api as an input
extract_domains(domain_api)

<class 'dict'> 

domains
total
time
next_page


census-rasp.ru 

['census-rasp.ru', 'census-ghana.net', 'census-india.com', 'census-r7-pj-dev.com', 'us-census-bureau.com']


### 6.  Pulling Data from XML
We have learned about two common ways of storing web data. Many organizations collect and transport data in XML format. The object of this exercise is to create a function to extract specific data values from an XML tree. The function will be tested on the following XML data.


In [None]:
def xml_data_extract(xml_data):
    from lxml import etree
    root = etree.XML(xml_data)

    # a. print all the element tags in the tree, use the iterator function
    for i in root.iter():
        print(i.tag)
    print('\n')

    # b. print all the author's last names
    for i in root.iter('Author'):
        print(i.find('Last_Name').text)
    print('\n')

    # c. Print the last name of the author who's residence is Beijing
    print(root.find('Book[@Weight="1.5"]/Authors/Author/Last_Name').text)
    print(root.find('Book/Authors/Author[@Residence="Beijing"]/Last_Name').text)

In [None]:
data_string = """
<Bookstore>
   <Book ISBN="ISBN-13:978-1599620787" Price="15.23" Weight="1.5">
      <Title>New York Deco</Title>
      <Authors>
         <Author Residence="New York City">
            <First_Name>Richard</First_Name>
            <Last_Name>Berenholtz</Last_Name>
         </Author>
      </Authors>
   </Book>
   <Book ISBN="ISBN-13:978-1579128562" Price="15.80">
      <Remark>
      Five Hundred Buildings of New York and over one million other books are available for Amazon Kindle.
      </Remark>
      <Title>Five Hundred Buildings of New York</Title>
      <Authors>
         <Author Residence="Beijing">
            <First_Name>Bill</First_Name>
            <Last_Name>Harris</Last_Name>
         </Author>
         <Author Residence="New York City">
            <First_Name>Jorg</First_Name>
            <Last_Name>Brockmann</Last_Name>
         </Author>
      </Authors>
   </Book>
</Bookstore>
"""

xml_data_extract(data_string)


Bookstore
Book
Title
Authors
Author
First_Name
Last_Name
Book
Remark
Title
Authors
Author
First_Name
Last_Name
Author
First_Name
Last_Name


Berenholtz
Harris
Brockmann


Berenholtz
Harris


### Exercise: Filter out Strings
Create a function that takes a list of non-negative integers and strings and return a new list without the strings. Call the function filter_list().

Requirements
* Zero is a non-negative integer.
* The given list only has integers and strings.
* The original order must be maintained.

In [21]:
def filter_list(lst):
    nums=[]
    for i in lst:
        if type(i)==int:
            nums.append(i)
    return nums

In [25]:
## Tests (uncomment to run)
f1=filter_list([1, 2, "a", "b"]) # [1, 2]
print(f1)

f2=filter_list([1, "a", "b", 0, 15]) # [1, 0, 15]
print(f2)

f3=filter_list([1, 2, "aasf", "1", "123", 123]) # [1, 2, 123]
print(f3)

[1, 2]
[1, 0, 15]
[1, 2, 123]


### Exercise: Palindrome
Write a program called palindrome() that asks the user for a string and returns True if the string is a palindrome and False otherwise. (A palindrome is a string that reads the same forwards and backwards.)

hint: https://www.w3schools.com/python/ref_func_reversed.asp

In [26]:
#CODE
def palindrome(string):
    return string==string[::-1]

# def palindrome(word):
#   rword = reversed(word)
#   rword = "".join(rword)
#   if word == rword:
#     return True
#   else:
#     return False

In [32]:
# Tests

test1='dog' #output: False
print(palindrome(test1))

test2='dad' #output: True
print(palindrome(test2))

test3='saippuakivikauppias' #output: True
print(palindrome(test3))

test4="it's pretty much my favorite animal" #output: False
print(palindrome(test4))

False
True
True
False
