# 🌐 Domain Name Extractor

This tool extracts the domain names from URLs delimited by new lines and semicolons.  The domain names in the output will resemble the format of the input.

It is designed to work well with lines of domain names copied from a spreadsheet.  The URLs are supposed to be delimited by a semicolon in the same cell (or on the same line).  The domain names in the output boxes will be in the same order as the URLs in the input box. 

Permanent link [https://dm-gadgets.jasontc.net/domain_name_extractor/](https://dm-gadgets.jasontc.net/domain_name_extractor/)

##Input
The box on the left is for the URLs. 

##Output
The box in the middle is for top-level domain names.  The box on the right is for full domain names (including sub-domains).

##Usage
Please copy the cells / the column containing URLs and paste them into the input box.  Then, press the "extract 🡒" button to the right of the input box.


In [None]:
#@title Run the extractor
#@markdown **Important:** Press the **▶** button once to initialise the tool before using the tool.

!pip3 install tldextract

import tldextract
import ipywidgets as widgets
from IPython.display import display, clear_output

clear_output()

# logic of the transformation
def extract_domain_names(input_urls):
  lines = input_urls.split("\n")
  new_tldn_lines = []
  new_fdn_lines = []

  for line in lines:
    urls = line.split(";")
    toplevel_domains = []
    full_domains = []
    for url in urls:
      extract = tldextract.extract(url.strip())
      toplevel_domains.append(extract.registered_domain)
      if extract.registered_domain:
        full_domains.append(".".join(part for part in extract if part))
      else:
        full_domains.append("")
    new_tldn_lines.append(";".join(toplevel_domains))
    new_fdn_lines.append(";".join(full_domains))

  return {"toplevel_domains": "\n".join(new_tldn_lines), "full_domains": "\n".join(new_fdn_lines) }

textarea_input_urls = widgets.Textarea(
    value='',
    placeholder='Input',
    description='URLs',
    disabled=False,
    layout={"height":"400px"}
    )

button_extract = widgets.Button(
    description="extract 🡒",
    layout={"vertical-align":"middle"}
    )

textarea_output_toplevel = widgets.Textarea(
    value='',
    placeholder='Output',
    description='Top level',
    disabled=False,
    layout={"height":"400px"}
    )

textarea_output_full = widgets.Textarea(
    value='',
    placeholder='Output',
    description='Full',
    disabled=False,
    layout={"height":"400px"}
    )

def on_extract_button_clicked(b):
  extracted = extract_domain_names(textarea_input_urls.value)
  textarea_output_toplevel.value = extracted["toplevel_domains"]
  textarea_output_full.value = extracted["full_domains"]

button_extract.on_click(on_extract_button_clicked)

widgets.HBox([textarea_input_urls, button_extract, textarea_output_toplevel, textarea_output_full])

HBox(children=(Textarea(value='', description='URLs', layout=Layout(height='400px'), placeholder='Input'), But…

---

### ⚠️ Warranties and liabilities

This tool is provided without warranty of any kind. The developer is not liable for any loss or damage arising from your use of it.