# URLs

In this lesson, you will learn to craft regular expressions to match URLs in Python.

URL is short for *Uniform Resource Locator*, and a URL is simply a web address or a link, a location on the Web.

## Pattern

A URL is made up of several parts. Taking the following URL as example:

```
https://www.example.com/path
```

Here,

* `https` is the **scheme/protocol**
* `www.example.com` is the **domain**
* `path` is the **path**

Some URLs contain additional components, but the ones above will be our focus in this lesson. I encrouage you to research and modify your regex code to accomodate for these extra parts.

Let's create a regular expression to match this pattern.

```
(https://|http://)?
([a-zA-Z0-9]+\.)
([a-zA-Z0-9]+\.?)+
(/[\w.-]*)*
```

Converting into Python code...

In [1]:
import re

pattern = re.compile("""
(https://|http://)?
([a-zA-Z0-9]+\.)
([a-zA-Z0-9]+\.?)+
(/[\w.-]*)*
""", flags = re.VERBOSE)

Now let's try matching this pattern against various valid and invalid URLs.

In [2]:
print(pattern.fullmatch("https://www.example.com/path"))
print(pattern.fullmatch("google.com"))
print(pattern.fullmatch("https://docs.python.org/library/re.html"))
print(pattern.fullmatch("https://example.com/"))

print()
print(pattern.fullmatch("https://www.example com"))
print(pattern.fullmatch("example."))
print(pattern.fullmatch("http://.example.com"))
print(pattern.fullmatch("https://example..com"))

<re.Match object; span=(0, 28), match='https://www.example.com/path'>
<re.Match object; span=(0, 10), match='google.com'>
<re.Match object; span=(0, 39), match='https://docs.python.org/library/re.html'>
<re.Match object; span=(0, 20), match='https://example.com/'>

None
None
None
None


## Summary

That's it for today's lesson on matching URLs with regular expressions.

The pattern we created today is capable of matching most URLs, but I intentionally left off certain URL components and edge cases. I strongly encourage you to take some time looking into modifying our pattern to fit more unusual yet valid URLs.

If you haven't realized yet, regular expressions are very versatile, but the first step to any pattern is understanding what we want to match. If you know the pattern, you can write a regular expression for it.