Skip to content

Commit

Permalink
Merge pull request #213 from school-brainhack/update_python_scripts_i…
Browse files Browse the repository at this point in the history
…ssue#183

update python scripts module
  • Loading branch information
htwangtw committed Apr 28, 2023
2 parents f6cd195 + c28a07a commit c099339
Show file tree
Hide file tree
Showing 4 changed files with 104 additions and 149 deletions.
52 changes: 35 additions & 17 deletions content/en/modules/python_scripts/correction/cypher_script.py
Original file line number Diff line number Diff line change
@@ -1,22 +1,40 @@
#!/usr/bin/env python
import argparse
import os
from useful_functions import encrypt_letter, decrypt_letter, process_message
from useful_functions import process_message


def main(input_path, key, mode, output_path):
with open(input_path, "r") as message_file:
message = message_file.read()
encrypt = mode == "encryption"
processed_message = process_message(message, key, encrypt)
with open(output_path, "w") as out_file:
out_file.write(processed_message)


if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("--message", type=str, help="Path to the text file containing the message \
to be encrypted or decrypted.")
parser.add_argument("--key", type=str, help="Key to use to encrypt or decrypt the message.")
parser.add_argument("--mode", type=str, choices=["enc", "dec"], help="whether to encrypt ('enc') \
or decrypt ('dec') the message.")
parser.add_argument(
"-i",
dest="input_path",
type=str,
required=True,
help="Path to the input file.",
)
parser.add_argument(
"-o",
dest="output_path",
type=str,
required=True,
help="Path to the output file.",
)
parser.add_argument("-k", dest="key", type=str, required=True, help="Key.")
parser.add_argument(
"-m",
dest="mode",
type=str,
required=True,
choices=["encryption", "decryption"],
help="Wether to encrypt or decrypt the message.",
)
args = parser.parse_args()

with open(args.message, 'r') as f:
message = f.read()
encrypt = args.mode == "enc"
processed_message = process_message(message, args.key, encrypt)
suffix = "_encrypted" if encrypt else "_decrypted"
save_path = os.path.splitext(args.message)[0] + suffix + ".txt"
with open(save_path, 'w') as f:
f.write(processed_message)
main(args.input_path, args.key, args.mode, args.output_path)
53 changes: 40 additions & 13 deletions content/en/modules/python_scripts/correction/useful_functions.py
Original file line number Diff line number Diff line change
@@ -1,20 +1,47 @@
def encrypt_letter(msg, key):
enc_id = (ord(msg) + ord(key)) % 1114112
return chr(enc_id)
def encrypt_letter(letter, key):
letter_index = ord(letter)
key_index = ord(key)
new_index = (letter_index + key_index) % 1114112
return chr(new_index)


def decrypt_letter(msg, key):
dec_id = (1114112 + ord(msg) - ord(key)) % 1114112
return chr(dec_id)
def decrypt_letter(letter, key):
letter_index = ord(letter)
key_index = ord(key)
new_index = (letter_index - key_index) % 1114112
return chr(new_index)


def process_message(message, key, encrypt):
returned_message = ""

processed_message = ""
process_letter = encrypt_letter if encrypt else decrypt_letter
for i, letter in enumerate(message):
if encrypt:
returned_message += encrypt_letter(letter, key[i%len(key)])
else:
returned_message += decrypt_letter(letter, key[i%len(key)])
key_letter = key[i % len(key)]
processed_key = process_letter(letter, key_letter)
processed_message += processed_key

return processed_message


if __name__ == "__main__":
# Test 1
test_letter = "l"
test_key = "h"
decrypted_letter = decrypt_letter(encrypt_letter(test_letter, test_key), test_key)

if test_letter == decrypted_letter:
print("first test passed")
else:
print("first test failed")

# Test 2
message = "word"
key = "key"

encrypted_msg = process_message(message, key, encrypt=True)
decrypted_msg = process_message(encrypted_msg, key, encrypt=False)

return returned_message
if decrypted_msg == message:
print("second test passed")
else:
print("second test failed")
147 changes: 28 additions & 119 deletions content/en/modules/python_scripts/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,15 +34,23 @@ The prerequisites to take this module are:
If you have any questions regarding the module content please ask them in the relevant module channel on the school Discord server. If you do not have access
to the server and would like to join, please send us an email at school [dot] brainhack [at] gmail [dot] com.

Contact your local TA if you have questions on this module, or if you want to check that you completed successfully all the exercises.

## Resources
This module was presented by [Greg Kiar](https://github.com/gkiar) during the QLSC 612 course in 2020, with [slides](https://docs.google.com/presentation/d/1n2RlMdmv1p25Xy5thJUhkKGvjtV-dkAIsUXP-AL4ffI/edit#slide=id.g362da58057_0_1) from Joel Grus' talk at JupyterCon 2018.
This module is based on [Greg Kiar](https://github.com/gkiar)'s [QLSC 612 course](https://youtu.be/zpOQENxs1G4) in 2020, with [slides](https://docs.google.com/presentation/d/1n2RlMdmv1p25Xy5thJUhkKGvjtV-dkAIsUXP-AL4ffI/edit#slide=id.g362da58057_0_1) from Joel Grus' talk at JupyterCon 2018.

The video is available below:
<iframe width="560" height="315" src="https://www.youtube.com/embed/zpOQENxs1G4" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
<iframe width="560" height="315" src="https://www.youtube.com/embed/karKf2CCpPA" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen></iframe>


## Exercise

* Watch the video and follow along the hands on part to do the exercise. If you prefer to do the execercise on your own, the instructions are also written below.

<details>

<summary> <h4> Click to show the exercise instructions &#11015 <h4/></summary>

In this exercise we will program a key-based encryption and decryption system. We will implement a version of the [Vigenere cipher](https://en.wikipedia.org/wiki/Vigen%C3%A8re_cipher), but instead of just using the 26 letters of the alphabet, we will use all the unicode characters.

The Vigenere cipher consists in shifting the letters of the message to encrypt by the index of the corresponding letter in the key. For example the encryption of the letter B with the key D will result in the letter of new_index = index(B) + index(D) = 2 + 4 = 6, so it will be the 6th letter which is F.
Expand All @@ -65,7 +73,7 @@ For the indices of the letter, we will not use the the number of the letter in t

### Step 1: Create relevant functions in `useful_functions.py`

You'll implement the following functions :
In that file, implement the following functions :
* `encrypt_letter(letter, key)` : return the encrypted letter with the key, e.g. `encrypt_letter("l", "h")` should return `'Ô'`.
* `decrypt_letter(letter, key)` : return the decrypted letter with the key, e.g. `decrypt_letter("Ô", "h")` should return `'l'`.
* `process_message(message, key, encrypt)`: return the encrypted message using the letters in `key` if encrypt is `True`, and the decrypted message if encrypt is `False`. For example :
Expand All @@ -80,139 +88,40 @@ process_message('ÆÛÚÉÒá', 'clef', False)
After creating these function, try to call them in your python terminal or in a JupyterNotebook to try things out.
Are the functions performing as you expected?

If so, let's make sure they are in a file named `useful_functions.py` and conclude the first part of the exercise.

### Step 2: Create a file `cypher_script.py`:
To reliably make sure that the `process_message `function works correctly, let's add a test at the end of the `useful_functions.py` file.
* Define a `message` variable with a word (e.g. `message = "word"`), then a `key` variable with an other word (e.g. `key = "key"`).
* Use `process_message` to generate the encryption of the `message` variable with the `key` in an `encrypted_msg` variable.
* Use the `process_message` function again to decrypt the `encrypted_msg` variable (still using the same `key`) in a `decrypted_msg` variable.
* Verify that `message == decrypted_msg` by printing "Test passed" if it is true and "Test failed" if it is false.

* use the Argparse library introduced in the video so that a user can call the script with three arguments : `--message`, `--key` and `--mode`. `--message` will contain the path to a text files containing the message. `--key` will be a string directly containing the key. `--mode` will be a string that can take the value `"enc"` or `"dec"` to tell the script if you want to encrypt or decrypt the message.
* The script should import the functions from `useful_functions.py` and use them in its main function to encrypt or decrypt the text in the message file using the text in the key file as the key, and save the results in a file that has the same name as the message file but with a `_encrypted` or `_decrypted` suffix depending on the mode. So calling `python cypher_script.py --message msg_file.txt --key my_key --mode enc` should create a `msg_file_encrypted.txt` file.
* Don't forget to write the code under `if __name__ == "__main__"`. Even though in this example it won’t make a difference, it is never too early to get used to good practices. Read this section below to understand the usefulness of `if __name__ == "__main__"`.
Now we have a proper test of our `process_message` function, and we can run it by executing the `useful_functions.py` script. However we don't want to run the test when we just import the functions from the file, so we will need to use the `if __name__ == "__main__":` statement.
* Put the test in an `if __name__ == "__main__":` block.

<br/>
Now we have our functions and a test to validate them, we can conclude the first part of the exercise.

<details>

<summary> <h4> On the usefulness of "if __name__ == '__main__':" (click to show &#11015) <h4/></summary>

It is not obvious why you should put the `if __name__ == "__main__":` line in your script. Indeed in a lot of cases, putting it or not won't change anything to how your code runs. But in specific settings with multiple scripts importing from each pother, not putting it in can quickly lead to a nightmare.
To give you an insight of how and why it is useful, here is an example (if you don't want to read or if you want complementary explanations, here is [a nice youtube video](https://www.youtube.com/watch?v=g_wlZ9IhbTs) about it).

Suppose you have a script to fit a Ridge model on provided data, judiciously named `fit_Ridge.py`, which looks like this :
```
#!/usr/bin/env python
import argparse
import pickle # pickle is a librairie to save and load python objects.
import numpy as np
from sklearn.linear_model import Ridge
def fit_Ridge_model(X, Y):
model = Ridge()
model.fit(X, Y)
return model
parser = argparse.ArgumentParser()
parser.add_argument("--X_data_path", type=str)
parser.add_argument("--Y_data_path", type=str)
parser.add_argument("--output_path", type=str)
args = parser.parse_args()
X = np.load(args.X_data_path)
Y = np.load(args.Y_data_path)
model = fit_Ridge_model(X, Y)
pickle.dump(model, open(args.output_path, 'wb'))
```
This script allows the user to provide the paths to two numpy files as data to fit a Ridge model, and to save the model to the provided path with a command like :
```
python fit_Ridge.py --X_data_path data_folder/X.npy --Y_data_path data_folder/Y.npy --output_path models/Ridge.pk
```
There is no `if __name__ == "__main__":` to be seen but, used on its own, the script works fine.

But later, you write an other script `compare_to_Lasso.py` that compare Ridge and Lasso models on the same data, so you need to fit a Ridge model again. Eager to apply the good practices of programming, you judiciously decide not to duplicate the code for fitting a ridge model, but to import the `fit_Ridge_model` function from the `fit_Ridge.py`. Thus your second script looks like that :
```
#!/usr/bin/env python
import numpy as np
import argparse
from sklearn.linear_model import Lasso
from fit_Ridge import fit_Ridge_model
parser = argparse.ArgumentParser()
parser.add_argument("--X_data_path", type=str)
parser.add_argument("--Y_data_path", type=str)
args = parser.parse_args()
X = np.load(args.X_data_path)
Y = np.load(args.Y_data_path)
ridge_model = fit_Ridge_model(X, Y)
lasso_model = Lasso()
lasso_model.fit(X, Y)
ridge_score = ridge_model.score(X, Y)
lasso_score = lasso_model.score(X, Y)
if Ridge_score > lasso_score:
print("Ridge model is better.")
else:
print("Lasso model is better.")
```

It seems fine but here when you try to call
```
python compare_to_Lasso.py --X_data_path data_folder/x.npy --Y_data_path data_folder/Y.npy
```
you get an error :
```
Traceback (most recent call last):
File "compare_lasso_ridge.py", line 5, in <module>
from fit_Ridge import fit_Ridge_model
File "/Users/francois/scratch/fit_Ridge.py", line 21, in <module>
pickle.dump(model, open(args.output_path, 'wb'))
TypeError: expected str, bytes or os.PathLike object, not NoneType
```
### Step 2: Create a file `cypher_script.py`:

The error shows that the script tried to save a model to the path `args.output_path`, which was not defined so it was set to None and raised a TypeError. But our `compare_to_Lasso.py` script never tries to save a model ! Indeed looking at the other lines of the error message, we see that it comes from the import. In fact what happens is that when we try to import the `fit_Ridge_model` function from the `fit_Ridge.py` file, python will read the entire file and execute everything that is written in it, so it will try to fit a Ridge model and to save it. But we don't want python to execute everything, we just want it to read the definition of the `fit_Ridge_model` function. That is why here we absolutely need the `if __name__ == "__main__":`, so we modify the `fit_Ridge.py` script like that :
```
#!/usr/bin/env python
import argparse
import pickle # pickle is a librairie to save and load python objects.
import numpy as np
from sklearn.linear_model import Ridge
def fit_Ridge_model(X, Y):
model = Ridge()
model.fit(X, Y)
return model
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("--X_data_path", type=str)
parser.add_argument("--Y_data_path", type=str)
parser.add_argument("--output_path", type=str)
args = parser.parse_args()
X = np.load(args.X_data_path)
Y = np.load(args.Y_data_path)
model = fit_Ridge_model(X, Y)
pickle.dump(model, open(args.output_path, 'wb'))
```
Now when importing from this script, python will read the definition of the function, but after that it will not execute the rest, since during the import the variable `__name__` is not set to `"__main__"` but to `"fit_Ridge"`.
* use the Argparse library introduced in the video so that a user can call the script with four arguments : `-i`, `-o`, `-k` and `-m`. `-i` will contain the path to the input text files containing the message. `-o` will contain the path for the output file where the processed message will be written. `-k` will be a string directly containing the key. `-m` will be the mode: a string that can take the value `"encryption"` or `"decryption"` to tell the script if you want to encrypt or decrypt the input message.
* The script should import the functions from `useful_functions.py` and use them in its main function to encrypt or decrypt the text in the input file using the text in the key file as the key, and save the results in the output file. So calling `python cypher_script.py -i msg_file.txt -o msg_encrypted.txt -k my_key -m encryption` should create a `msg_encrypted.txt` file.
* Don't forget to write the code under `if __name__ == "__main__"`. Even though in this file it won’t make a difference, it is never too early to get used to good practices.

In the end using `if __name__ == "__main__":` is the only way to safely import functions from our script, and since you never know for sure that you won't have to import something from a script in the future, putting it in all of your script by default is not a bad idea.
<br/>

</details>
<br>

<br/>

### Step 3: verify your results
### Last step: verify your implementation

Finally, decrypt the file obtained with :
```
wget https://raw.githubusercontent.com/BrainhackMTL/psy6983_2021/master/content/en/modules/python_scripts/message_encrypted.txt
wget https://raw.githubusercontent.com/school-brainhack/school-brainhack.github.io/main/content/en/modules/python_scripts/message_encrypted.txt
```
with the following key :
```
my_super_secret_key
```
Can you see something cool in the decrypted file ?

* Follow up with your local TA(s) to validate you completed the exercises correctly.
* :tada: :tada: :tada: you completed this training module! :tada: :tada: :tada:
Expand Down
1 change: 1 addition & 0 deletions content/en/modules/python_scripts/message_encrypted.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
™“•…’“…ƒ’ᔋ…™™™“•…’“ám’…”‹…™™“•á’†“…’…”‹… ™ï…’“…ƒ’…”‹…õ™“•ì…’š…ƒî…”Ûu…™§Œ £Ï…э¡“ “Óʓ¦š§Ÿ•Œ’Û}…ƒš…”‹‹…ٍ™“•…’¿“…’…‹…§õi“•ŒŸŽ“…ƒ’…”‹…™™“ѝŒ’’á~‹…™õ“䐅’ŽÏ…ƒ’Ô”煙™“•ªá|“…ƒ’Á”‹…™ÌՎҕ…’¢…’…›‹…õw™“•”™Ò…ƒ’ÃÒ‹…Ø›´¾ÒԐ…’“…ß|…”šÅ™™ӗ’‡”•Åƒ’…”‹ÅÕª™“ñz…¡¿“…ƒ’…”‹…™™“•…’“…’¯…”ço´™“•…’“…ƒ’Œ¡Œ—’¦š¦Œš²…’ïoߒ…”‹…™™“•…’“Å¿’…ð‹…™›™Û}ѐ…’“…ƒ’…”‹…™™“•ÌÄѾ“Ÿƒ’…ðišŒ§™“•…’“…ƒ’…”‹…™™¿Ï²…îiÏĒҒ¡™“§›§¾“•…’“…ƒ’…”Ž¨…™éƒ“•…’“…ƒ’…𿘓™™“•…¡œ“Ÿƒîo”‹…™™“•…î­…ß“Ó¾‹”ٍ§“•ìo’“…ƒ’…”‹…™éãÆ敞…’Ӆߒ…”’Ⴭ™“•…’“…ƒî…”‹…³Ù“•ž…î“ám³×èÍޙ·èÀᕷ“’²çÆÕÝo

0 comments on commit c099339

Please sign in to comment.