## Prepare URLs
- Load url from `.txt` file
- Assign a unique id to each url
- Save the result to `urls.json` file

### Load url from `.txt` file

We are loading the URLs from a `.txt` file. We are assuming that the file contains one URL per line. We will load the URLs into a `list`, as such:

```python
['https://www.example.com/page1',
 'https://www.example.com/page2',
 ...]
```

In [None]:
URL_PATH = "urls.txt"

# INITIALIZE THE URL LIST
url_list = []

# READ THE URLS
with open(URL_PATH, "r") as f:
    url_list = f.readlines()
    # filter out empty lines
    url_list = [url.strip() for url in url_list if url.strip()]

# PRINT THE URLS
print(f"Read {len(url_list)} URLs from {URL_PATH}")
print("First 5 URLs:")
for url in url_list[:5]:
    print(url)

### Assign a unique id to each url
URLs are pretty long and freeform. We will assign a unique ID to each URL. This will help us to refer to the URLs in a more concise manner later on.

We will use a `list` of `dictionaries` to store the URLs and their IDs. The dictionary will have two keys: `id` and `url`. A dictionary is a collection of key-value pairs. It's a data structure that allows us to store and retrieve values by name. In this case, we can store two fields in one object. It will look as such:

```python
[{'id': "abcdef", 'url': 'https://www.example.com/page1'},
 {'id': "abceef", 'url': 'https://www.example.com/page2'},
 ...]
```
If you are familiar with JSON, you can think of a dictionary as a JSON object. Find more about Python dictionaries [here](https://docs.python.org/3/tutorial/datastructures.html#dictionaries).

In [None]:
# md5 is a hashing function that generates a 32 character string
# for the same input, the output will always be the same
import hashlib

#######################
# INITIALIZE THE NEW LIST
#######################

url_with_id_list = []

#######################
# ASSIGN UUIDS TO URLS
#######################

for url in url_list:
    # generate a md5 hash from the url
    md5_hash = hashlib.md5()
    md5_hash.update(url.encode())
    id_str = md5_hash.hexdigest()

    # create the dictionary entry
    url_with_id = {"id": id_str, "url": url}

    # append the dictionary to the list
    url_with_id_list.append(url_with_id)


# PRINT THE URLS
print(f"Assigned UUIDs to URLs")
print("First 5 URLs:")
for url_with_id in url_with_id_list[:5]:
    print(url_with_id)

In [19]:
#######################
# WRITE THE NEW LIST TO A FILE
#######################

import json

# WRITE THE URL DICTIONARY TO A FILE
URL_DICT_PATH = "urls.json"
with open(URL_DICT_PATH, "w") as f:
    json.dump(url_with_id_list, f, indent=2)