## 4.1 Importing packages
If you are working with a notebook, it is good practice to import all the packages that you need at the top of the notebook. This will directly show you which packages you need to have installed, and you will avoid random package imports throughout the notebook.

1. Import the three packages you have used so far with their correct renaming conventions.

## 4.2 Loading a file into a data frame
In practice you should always try to use a function 
If your data has a fixed format, you can load a file directly into a data frame. There are many functions available such as `pd.read_csv()`, `pd.read_excel()`, `pd.read_hdf()` etc... For a list of all supported formats please see the [user guide on IO tools](https://pandas.pydata.org/docs/user_guide/io.html).

Since the passwords are all in separate lines in the file, the most suitable function is `pd.read_table(name)` which just reads data from a generic table in a file called `name`. Since there is no header specified in the file, you have to call the function with the keyword argument `header=None`. If you want to name the column(s) manually, you can pass a list to the keyword argument `names` with the column name(s). Conveniently, the function `pd.read_table()` will automatically remove the newline characters and it will skip empty lines. You therefore do not have to do any preprocessing or filtering (for now).

When you load a lot of data from a file, errors can always happen. If a line has a bad format, an invalid character or any other issue, the line should just be skipped to allow the file to be loaded regardless. You can then check the line manually or use another tool to resolve the error. The corresponding keyword argument to skip bad lines is called `on_bad_lines`, and if you pass the value `"warn"` you will receive a warning for each line that was skipped.

The file `passwords.txt` in todays directory contains ~1 million passwords. If you would rather continue working with a smaller sample size, please use the file `day3/passwords.txt`.

1. Use the function `pd.read_table(name, header=None)` to load your passwords into a data frame. If there is an error when loading the file, also use the argument `on_bad_lines="warn"`. What is the name of the password column in the data frame?
2. Use the keyword argument `names` to change the name of the column when loading the file into a data frame.

<!-- Load the passwords again, and prepare the data frame with all the columns from yesterday. The index should contain the **unique** passwords and you should have the counts, the lengths, the digit sums, the number of alphabetic characters and the number of numeric characters in the data frame. -->

## 4.3 Preparing the data frame again
Since you have loaded the raw passwords into a data frame, you can directly use pandas to get the unique passwords and their counts. Single columns of a data frame are called series (here `s`), and they have a method `s.value_counts()` that will return a new series with the unique values as the index and the corresponding counts as a column called `"count"`. This replaces the function `np.unique(passwords, return_counts=True) that you used in the previous notebook.

1. Get the password column from the data frame and assign it to a variable called `passwords`. Use the function `type()` to make sure that the type of the variable is `pandas.core.series.Series`.
2. Call the method `passwords.value_counts()` to get the unique passwords and their counts and assign the new series to the variable `unique_passwords`. How is this new series sorted?
3. Create a data frame from the new series again by calling the function `pd.DataFrame(unique_passwords)` or by calling the method `unique_passwords.to_frame()`. Assign this data frame to a new variable which you will work with for the rest of the notebook.
4. Calculate the lengths, the digit sums, and the counts of alphabetic, numeric (and special) characters again and assign the results to new columns in the data frame.

## 4.4 Creating multiple plots in one figure
So far you have only used the functions `plt.plot()`, `plt.bar()` etc. that automatically create a figure with a single plot (which is also called axis). The Jupyter environment will automatically call the function `plt.show()` for you at the end of the cell to actually display the figure/plot.  

If you want to have multiple plots/axes in one figure, you have to define the shape of the figure beforehand. The corresponding function in matplotlib is `plt.subplots()` where you can specify how many rows and columns of axes you want to have in the figure. As an example, if you want to create three axes in a row, you can use the following code snippet:
```python
fig, axs = plt.subplots(ncols=3)
```
The function will return a figure and an array of axes and they will be automatically displayed if you execute the cell. Instead of calling `plt.plot()`, you can now plot something in a specific axes by calling `axs[i].plot()`. The arguments will be the same, you can pass the x-values, the y-data and some styling just like in `plt.plot()`. See the following code snippet that will create two axes showing different linear functions:
```python
fig, axs = plt.subplots(ncols=2)
axs[0].plot(np.arange(10))
axs[1].plot(-np.arange(40))
```

If you want to plot something directly from a data frame, you have to tell the data frame method which axis you want to use. For example, if you want to plot the digit sum (in the column `"digit_sum"`) as a function of the number of alphabetic characters (in the column `"n_alphabetic"`) in the second axis of the figure, you can use the following code snippet:
```python
fig, axs = plt.subplots(ncols=3)
df.plot("n_alphabetic", "digit_sum", kind="scatter", ax=axs[1])
```

If you want to change the overall size of the figure, you can pass a list with the width and the height to the parameter `figsize` of the function `plt.subplots()`. As an example, see the following code snippet that will return a very wide figure with two axes:
```python
fig, axs = plt.subplots(ncols=2, figsize=[20, 4])
```

To save your figure at the end of the cell, you can use the method `fig.savefig(name)` with the `name` of the figure. See the following code snippet to create a new figure with three empty axes, and to save it to the file `"plot.png"` at the end of the cell:
```python
fig, axs = plt.subplots(ncols=3)
fig.savefig("plot.png")
```

1. Create a new figure with three empty axes in one column. If you do not like the aspect ratio, use the parameter `figsize` to change the size of the figure.
2. Create a new figure with two axes in one row. Show the digit sum as a function of the password length in the first axes, and show the digit sum as a function of the number of numeric characters in the second axes. In both axes, display the maximum possible digit sum. Assign labels to all the data and add a legend.
3. Create a new figure with two axes in one row. Display different columns from the data frame in the two axes. Save the figure and open the file on your own device to check that everything looks as expected.
4. EXTRA: Create a figure with 4 axes in 2 rows. How do you have to index the `axs` to select the individual plots? Fill the axes with some data from the data frame. Call `fig.set_tight_layout(True)` if the axes are overlapping with the labels.

## EXTRA: Computing the palindrome depth
A palindrome is a word/string that reads the same backwards as forwards, for example "radar" or "level". Instead of just checking all passwords whether they are a palindrome or not, the exercise is to count the number of matching characters up to the middle when reading the password backwards and forwards. If the number of characters is odd, you also have to include the middle character in the count. See the following examples to clarify how the palindrome depth is defined:
```
radar -> 3
12abcd21 -> 2
levels -> 0
```

1. Write the function that computes the palindrome depth for a single password. Try the examples and two other passwords to check that your function works correctly.
2. Use the function to calculate the palindrome depth for the passwords in your data frame and add the values to the data frame as a new column.
3. Identify the true palindromes (e.g. `"radar"` or `"level"`, but not `"abca"`) using the palindrome depths and the lengths of the passwords. The result should be a boolean array that you can add to the data frame as a new column.
4. Visualize your results and see if you can find anything interesting.

## EXTRA: Computing the password strength
The [password strength](https://en.wikipedia.org/wiki/Password_strength) estimates how many trials you need on average to crack a password using a brute-force approach. The equation for the information entropy is
$$
H = L \frac{\log{N}}{\log{2}}
$$
where $L$ is the length of the password and $N$ is the number of possible symbols that depends on the set of symbols. If a passwords consists only of numeric characters, there will only be 10 possible symbols. On the other hand, if a password contains numeric characters and alphabetic characters the number of possible symbols will be higher. See the table on [Entropy per symbol](https://en.wikipedia.org/wiki/Password_strength#Random_passwords) for the relevant symbol sets.

1. Write the function that computes the password strength of a single password based on the symbol set and the password length. You should differentiate between at least the following three symbol sets:
   - Arabic numerals (0–9)
   - Case insensitive Latin alphabet (a–z or A–Z)
   - All ASCII printable characters (a-z, A-Z, 0-9 and special characters)
   - Use more symbol sets if you want to
2. Compute the password strength for each password in the data frame and add the result as a new column.
3. Visualize your results and see if you can find anything interesting.

## EXTRA: Computing numeric character frequencies
To analyze which digits are used the most frequently in the password you have to count the individual numeric characters for each password. If a numeric character does not occur in a password, the count should be zero. See the following example as an illustration:
```python
password = "1234123459"
get_numeric_character_counts(password) -> {"0": 0, "1": 2, "2": 2, "3": 2, "4": 2, "5": 1, "6": 0, "7": 0, "8": 0, "9": 1}
```

1. Write the function that counts the individual numeric characters in a single password.
2. Get the counts from each password in the data frame and assign the data to the data frame. You need ten additional columns in the data frame, one for each numeric character.
3. Visualize your results and see if you can find anything interesting.