# Examples of usage

In this section we'll show several examples of usage of the `byteparsing` package.

For now, we'll import all the parsers.

In [1]:
from byteparsing.parsers import *

## Simple email address parser

An email address typically contains three pieces of information:

- Username
- Host

This information is easy to parse with the naked eye:

```sh
[username]@[host]
```

A parser, of course, has no eyes. 
Nor common sense. 
So we'll need to use some explicit instructions.
What about the following?

0. Keep in mind that not all chars are valid for an email.
1. The first email-valid chars constitute the `user` field. It should contain at least one char.
2. After the `user` field we expect an "@". We check that it is there, and we ignore it.
3. The next email-valid chars after the "@" correspond to the `host` field. It should contain at least one char.

In the example below, you can see the implementation of this algorithm.

In [2]:
# First, we define what charachters are acceptable on an email (email-valid chars)
email_char = choice(ascii_alpha_num, ascii_underscore, text_literal("."), text_literal("-"))

# We abstract the information contained in an email as:
# [username]@[host]
email = named_sequence(
            user=some_char(email_char), # Step 1
            _1=text_literal("@"), # Step 2
            host=some_char(email_char) # Step 3
        )

# Notice that we ignore the "@" by assigning it to the field "_1".
# Why not use just "_"? Because we need these fields to be unique.
# In case we had more than one ignored value, we recommend to use
# _1, _2, and so on for the ignored fields.

Let's apply to a made-up email address and see if it works:

In [3]:
parsed = parse_bytes(email, b'p.rodriguez-sanchez@esciencecenter.nl')

print(parsed)

{'user': b'p.rodriguez-sanchez', 'host': b'esciencecenter.nl'}


Notice that we used the `parse_bytes` method to actually apply the parser.
We'll use this method very often, so it is good to stop for a moment and reflect about its structure.
Typically, `parse_bytes` will take two arguments as an input:

1. A parser, indicating the kind of data we expect.
2. The data itself.

The output will be the parsed data.

## More advanced email address parsers

### More detailed fields

The information contained in an email address can be further dissected.
For instance, the `host` information can be split in `server` and `country` code.
That is:

[username]@[server].[country]

We can create a more detailed parser that splits strings wherever it finds a dot.

In order to do this, we first have to redefine the set of acceptable email chars, to not include the dot anymore.

In [4]:
email_char = choice(ascii_alpha_num, ascii_underscore, text_literal("-"))

Now, we can use the parser below to dissect email components.

In [5]:
email_component = sep_by(some_char(email_char, lambda b: b.decode()), text_literal("."))

Let's build the improved parser.

In [6]:
better_email = named_sequence(
                user=email_component,
                _1=text_literal("@"),
                host=email_component
                )

And try it:

In [7]:
my_email = parse_bytes(better_email,
            b"pablo.rodriguez-sanchez@esciencecenter.nl")

The output is a dictionary containing the dissected parts of the email.

In [8]:
print(my_email)

{'user': ['pablo', 'rodriguez-sanchez'], 'host': ['esciencecenter', 'nl']}


#### Pro tip: construct a data class

We can use the dictionary to create an instance of a data class.
As we will see, this will allow for maximum flexibility.

First, we create a data class representing an email address.

In [9]:
from dataclasses import dataclass

@dataclass
class Email:
    user: List[str]
    host: List[str]
        
    @property
    def country(self):
        """Return the country code"""
        return self.host[-1]

    def __str__(self):
        """Prints the email in a human-readable fashion"""
        return ".".join(self.user) + "@" + ".".join(self.host)

The `construct` method pipes the output directly into the class constructor

In [10]:
even_better_email = named_sequence(
                        user=email_component,
                        _1=text_literal("@"),
                        host=email_component
                    ) >> construct(Email)

Let's try it:

In [11]:
my_email = parse_bytes(even_better_email,
            b"pablo.rodriguez-sanchez@esciencecenter.nl")

str(my_email)

'pablo.rodriguez-sanchez@esciencecenter.nl'

The output is an instance of the class `Email`.

In [12]:
my_email

Email(user=['pablo', 'rodriguez-sanchez'], host=['esciencecenter', 'nl'])

We can of course use the class' methods:

In [13]:
str(my_email)

'pablo.rodriguez-sanchez@esciencecenter.nl'

In [14]:
my_email.country

'nl'

### Parse a list of emails

Imagine now we want to parse several email addresses from a file containing the information below.
Notice that each email address is separated by an end-of-line char.

In [15]:
data = b"j.hidding@esciencecenter.nl\np.rodriguez-sanchez@esciencencenter.nl"

The following parser will be helpful for dealing with end-of-line chars, because they are encoded differently depending on the OS.

In [16]:
eol = choice(text_literal("\n"), text_literal("\n\r"))

We can create a parser for a list of emails just by:

In [17]:
list_of_emails = sep_by(even_better_email, eol)

Let's try it:

In [18]:
our_emails = parse_bytes(list_of_emails, data)

It returns a list of instances of the class `Email`.

In [19]:
our_emails

[Email(user=['j', 'hidding'], host=['esciencecenter', 'nl']),
 Email(user=['p', 'rodriguez-sanchez'], host=['esciencencenter', 'nl'])]

And once again, we can access the class' methods:

In [20]:
for email in our_emails:
    print(email)
    print(email.country)

j.hidding@esciencecenter.nl
nl
p.rodriguez-sanchez@esciencencenter.nl
nl
