Skip to content

Commit

Permalink
Merge branch 'main' of github.com:nogibjj/rust-data-engineering
Browse files Browse the repository at this point in the history
  • Loading branch information
noahgift committed Jul 30, 2023
2 parents 16507f4 + 1cc3566 commit be3f30e
Show file tree
Hide file tree
Showing 30 changed files with 941 additions and 8 deletions.
22 changes: 22 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,28 @@ Website for projects here: [https://nogibjj.github.io/rust-data-engineering/](ht
* cli customize fruit salad: `cd cli-customize-fruit-salad && cargo run -- fruits.csv` or `cargo run -- --fruits "apple, pear"`
* data race example: `cd data-race && cargo run` (will not compile because of data race)

#### Ciphers vs Encryption

The main differences between ciphers and encryption algorithms:

* Ciphers operate directly on the plaintext, substituting or transposing the letters mathematically. Encryption algorithms operate on the binary data representations of the plaintext.

* Ciphers typically have a small key space based on simple operations like letter mappings or transposition rules. Encryption algorithms use complex math and very large key sizes.

* Ciphers provide security through obscuring letter frequencies but are still vulnerable to cryptanalysis. Encryption algorithms rely on computational hardness assumptions.

* Ciphers only handle textual data well. Encryption algorithms can handle all binary data like images, video, etc.

In summary:

* Ciphers like homophonic substitution operate directly on textual plaintext with simple math operations and fixed small key spaces.

* Encryption algorithms like AES operate on any binary data with complex math and very large key sizes.

* Ciphers are considered obsolete for serious encryption use today due to vulnerabilities.

* Modern encryption provides provable security based on mathematical problems assumed to be computationally infeasible to solve.


#### Suggested Exercises

Expand Down
9 changes: 9 additions & 0 deletions caesar-cipher-cli/Cargo.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
[package]
name = "caeser-cipher-cli"
version = "0.1.0"
edition = "2021"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]
clap = { version = "4.3.17", features = ["derive"] }
13 changes: 13 additions & 0 deletions caesar-cipher-cli/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
format:
cargo fmt --quiet

lint:
cargo clippy --quiet

test:
cargo test --quiet

run:
cargo run

all: format lint test run
24 changes: 24 additions & 0 deletions caesar-cipher-cli/src/lib.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
/*
This code defines two functions: encrypt and decrypt.
The encrypt function takes a plaintext string and a shift value, and returns the ciphertext string. The decrypt function takes a ciphertext string and a shift value,
and returns the plaintext string.
*/

pub fn encrypt(text: &str, shift: u8) -> String {
let mut result = String::new();
for c in text.chars() {
if c.is_ascii_alphabetic() {
let base = if c.is_ascii_lowercase() { b'a' } else { b'A' };
let offset = (c as u8 - base + shift) % 26;
result.push((base + offset) as char);
} else {
result.push(c);
}
}
result
}

pub fn decrypt(text: &str, shift: u8) -> String {
encrypt(text, 26 - shift)
}
49 changes: 49 additions & 0 deletions caesar-cipher-cli/src/main.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
/*
To run:
cargo run -- --message "Off to the bunker. Every person for themselves" --encrypt --shift 10
To decrypt:
cargo run -- --message "Ypp dy dro lexuob. Ofobi zobcyx pyb drowcovfoc" --decrypt --shift 10
*/


use caeser_cipher_cli::{decrypt, encrypt};
use clap::Parser;

/// CLI tool to encrypt and decrypt messages using the caeser cipher
#[derive(Parser, Debug)]
#[command(author, version, about, long_about = None)]
struct Args {
/// Encrypt the message
#[arg(short, long)]
encrypt: bool,

/// decrypt the message
#[arg(short, long)]
decrypt: bool,

/// The message to encrypt or decrypt
#[arg(short, long)]
message: String,

/// The shift to use for the cipher
/// Must be between 1 and 25, the default is 3
#[arg(short, long, default_value = "3")]
shift: u8,
}

// run it
fn main() {
let args = Args::parse();
if args.encrypt {
println!("{}", encrypt(&args.message, args.shift));
} else if args.decrypt {
println!("{}", decrypt(&args.message, args.shift));
} else {
println!("Please specify either --encrypt or --decrypt");
}
}
8 changes: 8 additions & 0 deletions caesar-cipher/Cargo.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
[package]
name = "caesar-cipher"
version = "0.1.0"
edition = "2021"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]
13 changes: 13 additions & 0 deletions caesar-cipher/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
format:
cargo fmt --quiet

lint:
cargo clippy --quiet

test:
cargo test --quiet

run:
cargo run

all: format lint test run
24 changes: 24 additions & 0 deletions caesar-cipher/src/lib.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
/*
This code defines two functions: encrypt and decrypt.
The encrypt function takes a plaintext string and a shift value, and returns the ciphertext string. The decrypt function takes a ciphertext string and a shift value,
and returns the plaintext string.
*/

pub fn encrypt(text: &str, shift: u8) -> String {
let mut result = String::new();
for c in text.chars() {
if c.is_ascii_alphabetic() {
let base = if c.is_ascii_lowercase() { b'a' } else { b'A' };
let offset = (c as u8 - base + shift) % 26;
result.push((base + offset) as char);
} else {
result.push(c);
}
}
result
}

pub fn decrypt(text: &str, shift: u8) -> String {
encrypt(text, 26 - shift)
}
21 changes: 21 additions & 0 deletions caesar-cipher/src/main.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
/*
This code defines two functions: encrypt and decrypt.
The encrypt function takes a plaintext string and a shift value,
and returns the ciphertext string.
The decrypt function takes a ciphertext string and a shift value,
and returns the plaintext string.
*/

use caesar_cipher::decrypt;
use caesar_cipher::encrypt;

fn main() {
let plaintext = "the quick brown fox jumps over the lazy dog";
let shift = 3;
let ciphertext = encrypt(plaintext, shift);
let decrypted_text = decrypt(&ciphertext, shift);
println!("Plaintext: {}", plaintext);
println!("Ciphertext: {}", ciphertext);
println!("Decrypted text: {}", decrypted_text);
}
7 changes: 7 additions & 0 deletions cli-customize-fruit-salad/src/lib.rs
Original file line number Diff line number Diff line change
@@ -1,3 +1,10 @@
/*
This code defines a function called create_fruit_salad
that takes a mutable vector of strings as input and returns
a new vector of strings that contains the same elements as the input vector,
but in a random order.
*/

use rand::seq::SliceRandom;
use rand::thread_rng;

Expand Down
28 changes: 28 additions & 0 deletions data-race/src/main.rs
Original file line number Diff line number Diff line change
@@ -1,3 +1,31 @@
/*
// Mutex that protects the data vector, and then we spawn three threads
//that each acquire a lock on the mutex and modify an element of the vector.
use std::sync::Mutex;
use std::thread;
fn main() {
let data = Mutex::new(vec![1, 2, 3]);
let handles: Vec<_> = (0..3).map(|i| {
let data = data.clone();
thread::spawn(move || {
let mut data = data.lock().unwrap();
data[i] += 1;
})
}).collect();
for handle in handles {
handle.join().unwrap();
}
println!("{:?}", data);
}
*/

use std::thread;

fn main() {
Expand Down
9 changes: 9 additions & 0 deletions decoder-ring/Cargo.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
[package]
name = "decoder-ring"
version = "0.1.0"
edition = "2021"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]
clap = { version = "4.3.17", features = ["derive"] }
13 changes: 13 additions & 0 deletions decoder-ring/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
format:
cargo fmt --quiet

lint:
cargo clippy --quiet

test:
cargo test --quiet

run:
cargo run

all: format lint test run
115 changes: 115 additions & 0 deletions decoder-ring/src/lib.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
use std::collections::HashMap;

fn gen_counts() -> HashMap<char, f32> {
// Reference letter frequencies in English
let mut eng_freq: HashMap<char, f32> = HashMap::new();

// Accounts for 80% of all letters in English
eng_freq.insert('e', 12.7);
eng_freq.insert('t', 9.1);
eng_freq.insert('a', 8.2);
eng_freq.insert('o', 7.5);
eng_freq.insert('i', 7.0);
eng_freq.insert('n', 6.7);
eng_freq.insert('s', 6.3);
eng_freq.insert('h', 6.1);
eng_freq.insert('r', 6.0);
eng_freq.insert('d', 4.3);

eng_freq
}

fn stats_analysis(text: &str) -> Vec<(char, u32, f32, Option<f32>, f32)> {
let mut counts: HashMap<char, u32> = HashMap::new();

for c in text.chars() {
*counts.entry(c).or_insert(0) += 1;
}

let total: u32 = counts.values().sum();

let eng_freq_map = gen_counts();
let eng_freq_map: HashMap<char, f32> = eng_freq_map.iter().map(|(k, v)| (*k, *v)).collect();

let mut results = Vec::new();

for (letter, count) in &counts {
let freq = (*count as f32 / total as f32) * 100.0;
let eng_freq = eng_freq_map.get(&letter.to_ascii_lowercase()).cloned();

let eng_freq_diff = eng_freq.map_or(0.0, |f| (freq - f).abs());

results.push((*letter, *count, freq, eng_freq, eng_freq_diff));
}
results
}

pub fn print_stats_analysis(text: &str) {
let stats = stats_analysis(text);
for (letter, count, freq, eng_freq, eng_freq_diff) in stats {
println!(
"{}: {} ({}%), English Freq: {} ({}%)",
letter,
count,
freq,
eng_freq.unwrap_or(0.0),
eng_freq_diff
);
}
}

pub fn decrypt(text: &str, shift: u8) -> String {
let mut result = String::new();

for c in text.chars() {
if c.is_ascii_alphabetic() {
let base = if c.is_ascii_lowercase() { b'a' } else { b'A' };
let offset = (c as u8 - base + shift) % 26;
result.push((base + offset) as char);
} else {
result.push(c);
}
}

result
}

/*
Guess Shift:
First, uses statistical analysis to determine the most likely shift.
Then, uses the most likely shift to decrypt the message.
Accepts:
* text: the message to decrypt
* depth: the number of shifts to try
Returns:
* depth: the number of shifts to tried
* shift: the most likely shift
* decrypted: the decrypted message
*/

pub fn guess_shift(text: &str, depth: u8) -> (u8, u8, String, f32) {
let mut max_score = 0.0;
let mut best_shift = 0;
let mut decrypted = String::new();

for shift in 0..depth {
let decrypted_text = decrypt(text, shift);
let stats = stats_analysis(&decrypted_text);

let mut score = 0.0;
for (_, _, freq, eng_freq, eng_freq_diff) in stats {
if let Some(eng_freq) = eng_freq {
score += (1.0 - eng_freq_diff / eng_freq) * freq;
}
}
println!("Shift: {}, Score: {}", shift, score);
if score > max_score {
max_score = score;
best_shift = shift;
decrypted = decrypted_text;
}
}

(depth, best_shift, decrypted, max_score)
}
Loading

0 comments on commit be3f30e

Please sign in to comment.