## Task: Read a txt file in Python

There are many ways to read and write files. Instead of importing a library (e.g. Pandas), we will use Python's built-in ***open*** function to get a file object.

The syntax to open a file object in Python is: <br>
`file_object  = open(“filename”, “mode”)` <br>
***File_object*** is the variable you're assigning the file to and ***mode*** tells the interpreter how to use the file: 

- ‘r’ – Read mode
- ‘w’ – Write mode (edit and write new information to the file)
- ‘a’ – Append mode (add new data to the end of the file)
- ‘r+’ – Special read and write mode (handle both actions when working with a file)

In [28]:
external_sites = open('../data/external_sites.txt', 'r')

In [29]:
pwd

'/home/jovyan/work/4.cybersec_attacks/ipynb'

In [30]:
external_sites

<_io.TextIOWrapper name='../data/external_sites.txt' mode='r' encoding='UTF-8'>

In [31]:
external_sites.read()

'abc.jpl.nasa.gov\nabclab.jpl.nasa.gov\nacce-ops.jpl.nasa.gov\nacce.jpl.nasa.gov\naccellion.jpl.nasa.gov\nacquisition.jpl.nasa.gov\nacquisitions.jpl.nasa.gov\naeolus.jpl.nasa.gov\naggregate.jpl.nasa.gov\nai.jpl.nasa.gov\nairbornescience.jpl.nasa.gov\nairflip.jpl.nasa.gov\nairsar-t.jpl.nasa.gov\nairsar.jpl.nasa.gov\nairsnrt.jpl.nasa.gov\nairsteam.jpl.nasa.gov\nairswot.jpl.nasa.gov\naitest.jpl.nasa.gov\naiweb.jpl.nasa.gov\nakana-ext.jpl.nasa.gov\nakana-intext.jpl.nasa.gov\nalhat.jpl.nasa.gov\namfs1-webdav.jpl.nasa.gov\namfs1.jpl.nasa.gov\nammos.jpl.nasa.gov\nanalogs.jpl.nasa.gov\nao.jpl.nasa.gov\nappell.jpl.nasa.gov\napps-ldt.jpl.nasa.gov\napps-sishub.jpl.nasa.gov\narcadia.jpl.nasa.gov\nargo.jpl.nasa.gov\naria-csk-dav.jpl.nasa.gov\naria-dap.jpl.nasa.gov\naria-dav-pub.jpl.nasa.gov\naria-dav.jpl.nasa.gov\naria-puccini.jpl.nasa.gov\naria-qa.jpl.nasa.gov\naria-search-gamma.jpl.nasa.gov\naria-search.jpl.nasa.gov\naria-share.jpl.nasa.gov\naria-timeseries.jpl.nasa.gov\naria-wiki.jpl.nasa.gov\na

In [34]:
external_sites = open('../data/external_sites.txt', 'r')

In [35]:
external_sites_list = external_sites.read().splitlines()

In [36]:
external_sites_list

['abc.jpl.nasa.gov',
 'abclab.jpl.nasa.gov',
 'acce-ops.jpl.nasa.gov',
 'acce.jpl.nasa.gov',
 'accellion.jpl.nasa.gov',
 'acquisition.jpl.nasa.gov',
 'acquisitions.jpl.nasa.gov',
 'aeolus.jpl.nasa.gov',
 'aggregate.jpl.nasa.gov',
 'ai.jpl.nasa.gov',
 'airbornescience.jpl.nasa.gov',
 'airflip.jpl.nasa.gov',
 'airsar-t.jpl.nasa.gov',
 'airsar.jpl.nasa.gov',
 'airsnrt.jpl.nasa.gov',
 'airsteam.jpl.nasa.gov',
 'airswot.jpl.nasa.gov',
 'aitest.jpl.nasa.gov',
 'aiweb.jpl.nasa.gov',
 'akana-ext.jpl.nasa.gov',
 'akana-intext.jpl.nasa.gov',
 'alhat.jpl.nasa.gov',
 'amfs1-webdav.jpl.nasa.gov',
 'amfs1.jpl.nasa.gov',
 'ammos.jpl.nasa.gov',
 'analogs.jpl.nasa.gov',
 'ao.jpl.nasa.gov',
 'appell.jpl.nasa.gov',
 'apps-ldt.jpl.nasa.gov',
 'apps-sishub.jpl.nasa.gov',
 'arcadia.jpl.nasa.gov',
 'argo.jpl.nasa.gov',
 'aria-csk-dav.jpl.nasa.gov',
 'aria-dap.jpl.nasa.gov',
 'aria-dav-pub.jpl.nasa.gov',
 'aria-dav.jpl.nasa.gov',
 'aria-puccini.jpl.nasa.gov',
 'aria-qa.jpl.nasa.gov',
 'aria-search-gamma.jpl.n

In [37]:
external_sites_list[1]

'abclab.jpl.nasa.gov'

## Task: Use `requests.get()` to call on all URLs to get the response code

First, let's build out our script for `requests.get()`

In [39]:
import requests

In [40]:
r = requests.get('abclab.jpl.nasa.gov')

MissingSchema: Invalid URL 'abclab.jpl.nasa.gov': No schema supplied. Perhaps you meant http://abclab.jpl.nasa.gov?

In [None]:
url_list = []
for row in external_sites_list:
    url_list.append("https://" + row)

In [93]:
url_list

['https://abc.jpl.nasa.gov',
 'https://abclab.jpl.nasa.gov',
 'https://acce-ops.jpl.nasa.gov',
 'https://acce.jpl.nasa.gov',
 'https://accellion.jpl.nasa.gov',
 'https://acquisition.jpl.nasa.gov',
 'https://acquisitions.jpl.nasa.gov',
 'https://aeolus.jpl.nasa.gov',
 'https://aggregate.jpl.nasa.gov',
 'https://ai.jpl.nasa.gov',
 'https://airbornescience.jpl.nasa.gov',
 'https://airflip.jpl.nasa.gov',
 'https://airsar-t.jpl.nasa.gov',
 'https://airsar.jpl.nasa.gov',
 'https://airsnrt.jpl.nasa.gov',
 'https://airsteam.jpl.nasa.gov',
 'https://airswot.jpl.nasa.gov',
 'https://aitest.jpl.nasa.gov',
 'https://aiweb.jpl.nasa.gov',
 'https://akana-ext.jpl.nasa.gov',
 'https://akana-intext.jpl.nasa.gov',
 'https://alhat.jpl.nasa.gov',
 'https://amfs1-webdav.jpl.nasa.gov',
 'https://amfs1.jpl.nasa.gov',
 'https://ammos.jpl.nasa.gov',
 'https://analogs.jpl.nasa.gov',
 'https://ao.jpl.nasa.gov',
 'https://appell.jpl.nasa.gov',
 'https://apps-ldt.jpl.nasa.gov',
 'https://apps-sishub.jpl.nasa.gov',

In [94]:
r = requests.get('https://abclab.jpl.nasa.gov')

In [95]:
r.status_code

200

Awesome! Now let's put it all together using a for-loop. We want to have ***one*** chunk of code that does all the work for us.

In [97]:
url_response_dict = {}
for row in url_list:
    r = requests.get(row)
    url_response_dict[row] = r

ConnectionError: HTTPSConnectionPool(host='ai.jpl.nasa.gov', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f63fd17be80>: Failed to establish a new connection: [Errno 110] Connection timed out',))

In [98]:
url_response_dict

{'https://abc.jpl.nasa.gov': <Response [200]>,
 'https://abclab.jpl.nasa.gov': <Response [200]>,
 'https://acce-ops.jpl.nasa.gov': <Response [200]>,
 'https://acce.jpl.nasa.gov': <Response [200]>,
 'https://accellion.jpl.nasa.gov': <Response [200]>,
 'https://acquisition.jpl.nasa.gov': <Response [200]>,
 'https://acquisitions.jpl.nasa.gov': <Response [200]>,
 'https://aeolus.jpl.nasa.gov': <Response [200]>,
 'https://aggregate.jpl.nasa.gov': <Response [200]>}

In [99]:
url_response_dict = {}
for row in url_list:
    r = requests.get(row)
    url_response_dict[row] = r
    

ConnectionError: HTTPSConnectionPool(host='ai.jpl.nasa.gov', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f63fd15b588>: Failed to establish a new connection: [Errno 110] Connection timed out',))

In [104]:
import time
from time import sleep

In [105]:
url_response_dict = {}
for row in url_list:
    r = requests.get(row)
    url_response_dict[row] = r
    time.sleep(1)

ConnectionError: HTTPSConnectionPool(host='ai.jpl.nasa.gov', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f63fd047390>: Failed to establish a new connection: [Errno 110] Connection timed out',))

**Connection timed out:** During the attempt to establish the TCP connection, no response came from the other side within a given time limit. This may mean that the HTTPS response did not arrive in time. This is sometimes also caused by firewalls, sometimes by network congestion or heavy load on the remote (or even local) site.


**Some solutions:**
* Test connection: `telnet ai.jpl.nasa.gov 443`


In [107]:
r = requests.get('https://ai.jpl.nasa.gov')

ConnectionError: HTTPSConnectionPool(host='ai.jpl.nasa.gov', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f63fd9dcac8>: Failed to establish a new connection: [Errno 110] Connection timed out',))

In [109]:
url_list.remove('https://ai.jpl.nasa.gov')

In [110]:
url_response_dict = {}
for row in url_list:
    r = requests.get(row)
    url_response_dict[row] = r

ConnectionError: HTTPSConnectionPool(host='aitest.jpl.nasa.gov', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f63fbbbb390>: Failed to establish a new connection: [Errno 110] Connection timed out',))

Looks like there are going to be a lot of these errors. We'll need to apply ***exception handling*** to handle errors. An Exception is an error that happens during execution of a program. When that error occurs, Python generates an exception that can be handled, which can avoid crashing your script.

In [None]:
url_list.add('https://ai.jpl.nasa.gov')

In [117]:
url_response_dict = {}
for row in url_list:
    try:
        r = requests.get(row)
        url_response_dict[row] = r
        print(row, url_response_dict[row])
    except:
        print("Error for " + row)

https://abc.jpl.nasa.gov <Response [200]>
https://abclab.jpl.nasa.gov <Response [200]>
https://acce-ops.jpl.nasa.gov <Response [200]>
https://acce.jpl.nasa.gov <Response [200]>
https://accellion.jpl.nasa.gov <Response [200]>
https://acquisition.jpl.nasa.gov <Response [200]>
https://acquisitions.jpl.nasa.gov <Response [200]>
https://aeolus.jpl.nasa.gov <Response [200]>
https://aggregate.jpl.nasa.gov <Response [200]>
https://airbornescience.jpl.nasa.gov <Response [200]>
https://airflip.jpl.nasa.gov <Response [200]>
https://airsar-t.jpl.nasa.gov <Response [500]>
https://airsar.jpl.nasa.gov <Response [200]>
https://airsnrt.jpl.nasa.gov <Response [200]>
https://airsteam.jpl.nasa.gov <Response [200]>
https://airswot.jpl.nasa.gov <Response [403]>
Error for https://aiweb.jpl.nasa.gov
https://akana-ext.jpl.nasa.gov <Response [403]>
https://akana-intext.jpl.nasa.gov <Response [403]>
https://alhat.jpl.nasa.gov <Response [200]>
https://amfs1-webdav.jpl.nasa.gov <Response [401]>
https://amfs1.jpl.n



https://arcadia.jpl.nasa.gov <Response [200]>
https://argo.jpl.nasa.gov <Response [200]>
https://aria-csk-dav.jpl.nasa.gov <Response [200]>
https://aria-dap.jpl.nasa.gov <Response [200]>
https://aria-dav-pub.jpl.nasa.gov <Response [200]>
https://aria-dav.jpl.nasa.gov <Response [200]>
https://aria-puccini.jpl.nasa.gov <Response [503]>
https://aria-qa.jpl.nasa.gov <Response [200]>
https://aria-search-gamma.jpl.nasa.gov <Response [200]>
https://aria-search.jpl.nasa.gov <Response [503]>
https://aria-share.jpl.nasa.gov <Response [200]>
https://aria-timeseries.jpl.nasa.gov <Response [200]>
https://aria-wiki.jpl.nasa.gov <Response [503]>
https://aria.jpl.nasa.gov <Response [200]>
https://aria1-dav.jpl.nasa.gov <Response [401]>
https://aria1.jpl.nasa.gov <Response [200]>
https://aria2-dav.jpl.nasa.gov <Response [401]>
Error for https://asc.jpl.nasa.gov
Error for https://ase.jpl.nasa.gov
https://aso-log.jpl.nasa.gov <Response [503]>


## Task: Export dictionary into txt file

In [119]:
url_response = open('../data/url_response.txt', 'w')
url_response.write('dict = ' + repr(url_response_dict) + '\n')
url_response.close()