# Introduction to Zero Trust
- Update: 2024
- Duration: 75 minutes

<hr>

## Why Zero Trust?


### Traditional Security Approaches

Traditional security measures like firewalls and user ID/password systems were once considered sufficient to protect corporate networks. However, these systems have proven vulnerable to cybercriminals who exploit stolen credentials to gain unauthorized access, leading to data theft, manipulation, or ransomware attacks. The shift towards remote work, personal devices, and cloud services has further complicated the security landscape, making the corporate network perimeter less relevant. Instead, securing individual identities has become paramount.
<!-- ![ml-examples.png](img/ml-examples.png) -->
<center><img src="https://github.com/moaldeen/trustzone/blob/main/traditional.png?raw=true" alt="ml-examples.png" width="55%"></center>

### Security Breaches Example

- Colonial Pipeline:

 Attackers took advantage of the fact that the VPN connection to Colonial Pipeline network was possible using a plain text password without any multi-factor authentication in-place.

- Kudankulam Nuclear Power Plant:

  Malware was discovered on an Indian nuclear power plant employee’s computer that was connected to the administrative network’s internet servers. Once the attackers gained access, they were able to roam within the network due to “trust” that comes with being inside the network.

<div style="display: flex; justify-content: center;">
  <img src="https://github.com/moaldeen/trustzone/blob/main/password.png?raw=true" alt="ml-examples.png" width="30%">
  <img src="https://github.com/moaldeen/trustzone/blob/main/nuclear.png?raw=true" alt="ml-examples.png" width="25%">
</div>

## What Is Zero Trust?



The Zero Trust security model is a cybersecurity framework that operates on the principle of "never trust, always verify." It assumes that threats exist both inside and outside the network, and therefore, every access request must be authenticated, authorized, and continuously validated regardless of its origin. For example, involves strong identity verification, validating device compliance before access, and least privilege access to resources.

The **US National Institute of Standards and Technology** (**NIST**) SP 800-207 Zero Trust
Architecture standard states the following:

*“Zero trust assumes there is no implicit trust granted to assets or user accounts based solely on their physical or network location (that is, local area networks versus the internet) or based on asset ownership (enterprise or personally owned).”*

A zero trust network is built upon five fundamental assertions:
- The network is always assumed to be hostile.
- External and internal threats exist on the network at all times
- Network locality alone is not sufficient for deciding trust in a network.
- Every device, user, and network flow is authenticated and authorized.
- Policies must be dynamic and calculated from as many sources of data
as possible.

Traditional network security architecture breaks different networks (or pieces of a single network) into zones, contained by one or more firewalls. Each zone is granted some level of trust, which determines the network resources it is permitted to reach. This model provides very strong defensein- depth.


<center><img src="https://github.com/moaldeen/trustzone/blob/main/zero-trust-vs-trust-based-network-shadow.png?raw=true" alt="ml-examples.png" width="50%"></center>

### Key Principles of Zero Trust

- Always Verify.
- Use Least-Privilege Access.
- Assume Breach.

<!-- ![sup-learning.png](img/sup-learning.png) -->
<center><img src="https://github.com/moaldeen/trustzone/blob/main/concepts.jpg?raw=true" alt="sup-learning.png" width="40%"></center>

### Core tenants of Zero Trust Architecture

<!-- ![sup-learning.png](img/sup-learning.png) -->
<center><img src="https://image.slidesharecdn.com/zerotrustmodel-220117062537/75/Zero-Trust-Model-10-2048.jpg" alt="sup-learning.png" width="60%"></center>

### Zero Trust polices according to NIST 800-207 standard

<center><img src="https://github.com/moaldeen/trustzone/blob/main/policies.jpg?raw=true" alt="sup-learning.png" width="50%"></center>



### Zero Trust Polices in Practice

<center><img src="https://www.omnibud.com/nist%20logical%20component.png" alt="sup-learning.png" width="60%"></center>




#### Input features $X$ and target $y$


Download SMS Spam Collection Dataset by running the following code:


In [None]:
cmle.download("data/spam.csv")

Load the dataset and show some samples:

In [None]:
sms_df = pd.read_csv("data/spam.csv", encoding="latin-1")
sms_df = sms_df.drop(columns=["Unnamed: 2", "Unnamed: 3", "Unnamed: 4"])
sms_df = sms_df.rename(columns={"v1": "target", "v2": "sms"})
train_df, test_df = train_test_split(sms_df, test_size=0.10, random_state=42)
train_df.head().style.set_properties(**{"text-align": "left"})

Unnamed: 0,target,sms
3130,spam,"LookAtMe!: Thanks for your purchase of a video clip from LookAtMe!, you've been charged 35p. Think you can do better? Why not send a video in a MMSto 32323."
106,ham,"Aight, I'll hit you up when I get some cash"
4697,ham,Don no da:)whats you plan?
856,ham,Going to take your babe out ?
3454,ham,No need lar. Jus testing e phone card. Dunno network not gd i thk. Me waiting 4 my sis 2 finish bathing so i can bathe. Dun disturb u liao u cleaning ur room.


#### Training a supervised machine learning model with $X$ and $y$

In [None]:
X_train, y_train = train_df["sms"], train_df["target"]
X_test, y_test = test_df["sms"], test_df["target"]

clf = Pipeline([
    ("vect", CountVectorizer(max_features=5000)),
    ("clf", LogisticRegression(max_iter=5000)),
])
clf.fit(X_train, y_train)

#### Predicting labels for unseen data using the trained model

In [None]:
pd.DataFrame(X_test.iloc[0:4]).style.set_properties(**{"text-align": "left"})

Unnamed: 0,sms
3245,"Funny fact Nobody teaches volcanoes 2 erupt, tsunamis 2 arise, hurricanes 2 sway aroundn no 1 teaches hw 2 choose a wife Natural disasters just happens"
944,"I sent my scores to sophas and i had to do secondary application for a few schools. I think if you are thinking of applying, do a research on cost also. Contact joke ogunrinde, her school is one me the less expensive ones"
1044,"We know someone who you know that fancies you. Call 09058097218 to find out who. POBox 6, LS15HB 150p"
2484,Only if you promise your getting out as SOON as you can. And you'll text me in the morning to let me know you made it in ok.


In [None]:
pred_dict = {
    "sms": X_test.iloc[0:4],
    "spam": y_test.iloc[0:4],  # actual spam
    "spam_predictions": clf.predict(X_test.iloc[0:4]),
}
pred_df = pd.DataFrame(pred_dict)
pred_df.style.set_properties(**{"text-align": "left"})

Unnamed: 0,sms,spam,spam_predictions
3245,"Funny fact Nobody teaches volcanoes 2 erupt, tsunamis 2 arise, hurricanes 2 sway aroundn no 1 teaches hw 2 choose a wife Natural disasters just happens",ham,ham
944,"I sent my scores to sophas and i had to do secondary application for a few schools. I think if you are thinking of applying, do a research on cost also. Contact joke ogunrinde, her school is one me the less expensive ones",ham,ham
1044,"We know someone who you know that fancies you. Call 09058097218 to find out who. POBox 6, LS15HB 150p",spam,spam
2484,Only if you promise your getting out as SOON as you can. And you'll text me in the morning to let me know you made it in ok.,ham,ham


**We have accurately predicted labels for the unseen text messages above!**

### (Supervised) Machine learning: a popular definition
<blockquote>
A field of study that gives computers the ability to learn without being explicitly programmed. <br> -- Arthur Samuel (1959)
</blockquote>

ML is a different way to think about problem-solving.

<!-- ![traditional-programming-vs-ml.png](img/traditional-programming-vs-ml.png) -->
<center><img src="https://yongkaw.people.clemson.edu/ece4420/img/traditional-programming-vs-ml.png" alt="traditional-programming-vs-ml.png" width="70%"></center>

### Examples

Let's look at some concrete examples of supervised machine learning.

#### Example 1: Predicting whether a patient has a liver disease or not

##### Input data

Suppose we are interested in predicting whether a patient has the disease or not. We are given some tabular data with inputs and outputs of liver patients, as shown below. The data contains a number of input features and a special column called "Target" which is the output we are interested in predicting.

Download the data from [here](https://www.kaggle.com/uciml/indian-liver-patient-records).

In [None]:
cmle.download("data/indian_liver_patient.csv")

In [None]:
df = pd.read_csv("data/indian_liver_patient.csv")
df = df.drop(columns=["Gender"])
df["Dataset"] = df["Dataset"].replace(1, "Disease")
df["Dataset"] = df["Dataset"].replace(2, "No Disease")
df.rename(columns={"Dataset": "Target"}, inplace=True)
train_df, test_df = train_test_split(df, test_size=4, random_state=42)
train_df.head()

Unnamed: 0,Age,Total_Bilirubin,Direct_Bilirubin,Alkaline_Phosphotase,Alamine_Aminotransferase,Aspartate_Aminotransferase,Total_Protiens,Albumin,Albumin_and_Globulin_Ratio,Target
268,40,14.5,6.4,358,50,75,5.7,2.1,0.5,Disease
356,33,0.7,0.2,256,21,30,8.5,3.9,0.8,Disease
110,24,0.7,0.2,188,11,10,5.5,2.3,0.71,No Disease
488,60,0.7,0.2,171,31,26,7.0,3.5,1.0,No Disease
132,18,0.8,0.2,199,34,31,6.5,3.5,1.16,No Disease


##### Building a supervised machine learning model

Let's train a supervised machine learning model with the input and output above.

In [None]:
X_train = train_df.drop(columns=["Target"])
y_train = train_df["Target"]
X_test = test_df.drop(columns=["Target"])
y_test = test_df["Target"]
model = LGBMClassifier(random_state=123)
model.fit(X_train, y_train)

##### Model predictions on unseen data

- Given the features of new patients, we'll use this model to predict whether these patients have the liver disease or not.

In [None]:
pred_df = pd.DataFrame({"Predicted_target": model.predict(X_test).tolist()})

df_concat = pd.concat([X_test.reset_index(drop=True), pred_df], axis=1)
df_concat

Unnamed: 0,Age,Total_Bilirubin,Direct_Bilirubin,Alkaline_Phosphotase,Alamine_Aminotransferase,Aspartate_Aminotransferase,Total_Protiens,Albumin,Albumin_and_Globulin_Ratio,Predicted_target
0,19,1.4,0.8,178,13,26,8.0,4.6,1.3,No Disease
1,12,1.0,0.2,719,157,108,7.2,3.7,1.0,Disease
2,60,5.7,2.8,214,412,850,7.3,3.2,0.78,Disease
3,42,0.5,0.1,162,155,108,8.1,4.0,0.9,Disease


#### Example 2: Predicting the label of a given image

Suppose you want to predict the label of a given image using supervised machine learning. We are using a pre-trained model here to predict labels of new unseen images.
<!-- ![monkey.jpg](img/monkey.jpg) -->
<center><img src="https://yongkaw.people.clemson.edu/ece4420/img/monkey.jpg" alt="monkey.jpg" width="25%"></center>

<!-- ![cat.jpg](img/cat.jpg) -->
<center><img src="https://yongkaw.people.clemson.edu/ece4420/img/cat.jpg" alt="cat.jpg" width="45%"></center>

Demo: [Image Recognize](https://imagerecognize.com/#uploaded-img)

- If you know deep learning, pytorch or tensorflow, it is easy to build a similar recognition system.
- If you are interested in deep learning, take **ECE 8550** offered every spring.

#### Example 3: Predicting sentiment expressed in a movie review

Suppose you are interested in predicting whether a given movie review is positive or negative. You can do it using supervised machine learning.

Download the data by running the following cell.

In [None]:
cmle.download("data/imdb_master.csv")
imdb_df = pd.read_csv("data/imdb_master.csv", encoding="ISO-8859-1")
imdb_df = imdb_df[imdb_df["label"].str.startswith(("pos", "neg"))]
imdb_df = imdb_df.drop(columns=["Unnamed: 0", "type", "file"])
imdb_df.rename(columns={"label": "target"}, inplace=True)
train_df, test_df = train_test_split(imdb_df, test_size=0.10, random_state=123)
train_df.head(1).style.set_properties(**{"text-align": "left"})

Unnamed: 0,review,target
17812,"It may have been inevitable that with the onslaught of ""slasher"" movies in the early 1980's, that a few good ones might slip through the cracks. This is a great ""rare"" film from Jeff Lieberman, who insured his cult status with his memorable 1970's films ""Squirm"" and ""Blue Sunshine"". Five young people head into the Oregon mountains (this movie was actually shot on location) to do some camping and check out the deed to some land that one of them has acquired. Before long, they will predictably be terrorized by a bulky killer with an incredibly creepy wheezing laugh. ""Just Before Dawn"" is noticeably more ambitious, ""arty"", and intelligent than some slasher films. Lieberman actually fleshes out the characters - well, two of them, anyway - as much as a 90-minute-long film will allow him. The film has genuine moments of suspense and tension, and actually refrains from graphic gore, save for one killing right at the beginning. There is an above-average cast here, including Oscar winner George Kennedy, as a forest ranger who's understandably gone a little flaky from having been alone in the wilderness for too long. Jack Lemmon's son Chris, future Brian De Palma regular Gregg Henry, blonde lead Deborah Benson (it's too bad she hasn't become a more well-known performer, judging by her work here), Ralph Seymour (""Ghoulies""), Mike Kellin (""Sleepaway Camp""), and Jamie Rose (""Chopper Chicks in Zombietown"") round out the cast. Some of the shots are interesting, and the early music score by Brad Feidel (now best known for his ""Terminator"" theme) is haunting and atmospheric. This is worth catching for the important plot twist at about the one hour mark, although a moment at about 75 minutes in involving the heroine and a tree and the killer is almost comical; it may actually remind a viewer of a cartoon! One of the most clever touches is the final dispatching of the killer, which I'd never seen before in a horror film and probably won't see again. I didn't give it 10 out of 10 because I can't honestly that I was that frightened. Still, it's an interesting slasher that is worthy of re-discovery. ""That deed don't mean nothing, son. Those mountains can't read."" 9/10",pos


In [None]:
# Build an ML model
X_train, y_train = train_df["review"], train_df["target"]
X_test, y_test = test_df["review"], test_df["target"]

clf = Pipeline([
    ("vect", CountVectorizer(max_features=1000)),
    ("clf", LogisticRegression(max_iter=1000)),
])
clf.fit(X_train, y_train)

In [None]:
# Predict on unseen data using the built model
pred_dict = {
    "reviews": X_test.iloc[0:3],
    "sentiment_predictions": clf.predict(X_test.iloc[0:3]),
}
pred_df = pd.DataFrame(pred_dict)
pred_df.style.set_properties(**{"text-align": "left"})

Unnamed: 0,reviews,sentiment_predictions
11872,"You'll feel like you've experienced a vacation in Hell after you have sat down and watched this horrible TV movie. This movie is an exercise in over-acting (very bad over-acting) to situations that made out to be more than what they are. I won't give away the plot, but once you realize why the people in this film are running from the native man in the film you will demand the two wasted hours of your life back. The only plus is seeing Marcia Brady running around in a bikini!",neg
40828,"Bela Lugosi gets to play one of his rare good guy roles in a serial based upon the long running radio hit (which was also the source of a feature film where Lugosi played the villain.) Lugosi cuts a fine dashing figure and its sad that he didn't get more roles where he could be the guy in command in a good way. Here Chandu returns from the East in order to help the Princess Nadji who is being hunted by the leaders of the cult of Ubasti who need her to bring back from the dead the high priestess of their cult. This is a good looking globe trotting serial that is a great deal of fun. To be certain the pacing is a bit slack, more akin to one of Principals (the producing studios) features then a rip roaring adventure, but it's still enjoyable. This plays better than the two feature films that were cut from it because it allows for things to happen at their own pace instead of feeling rushed or having a sense that ""hey I missed something"". One of the trilogy of three good serials Lugosi made, the others being SOS Coast Guard and Phantom Creeps",pos
36400,"When you wish for the dragon to eat every cast member, you know you're in for a bad ride. I went in with very, very low expectations, having read some of the other comments, and was not let down. Unlike some other cheap and failed movies, however, this one doesn't really remain hilariously (and unintentionally) funny throughout. -SPOILERS FOLLOW- First of all, plot it very inconsistent. Looking past the ""small"" mistakes, such as the dragon growing up in 3 hours, the whole idea it's based on is messed up. See, the movie wants us to believe that dragons came from outer space in the form of meteorites which really were dragon eggs. After explaining this, they show some peasant poking at one with his pitchfork and the dragon pops out. Later, the obligatory ""crazy scientist"" guy babbles on about how dragons outlived the dinosaurs. So apparently humans were around when dinosaurs were, or we just have a fine little plot hole here. The other major thing is that the lab is blown up with a force ""half as strong"" as what was used for Hiroshima. Then two guys later walk in to check everything out, and it's almost unscathed! There's even another dragon, which grew out of who knows what. All in all it's very predictable. As soon as the guy mentioned cloning, I guessed they'd clone a dragon. That means that our Mr. Smarty-pants security guy isn't so intuitive and smart as the movie would have you believe, if you ignore that I knew this film would be about, you know, dragons. Putting that aside, the second worst thing is the ""special effects."" Others have mentioned the fake rocks falling during the beginning, the CG helicopter, and the dragon. It looks a bit better than a blob, but it ruined whatever it had going for it when it trudged down the hall in the same manner time after time. To their credit, the flying dragons in the beginning looked OK from far away (although the one in the cave is probably the worst one in the whole movie.) These things are funny to watch, however. The scenes where a million different shots of the same person facing different ways are shown are not. Nor are the ""introduction"" screens with the vital stats. Coming to the actors, they weren't the greatest, but I guess at least they tried? They seemed more enthusiastic about what they were doing than many of the actors participating in the recent ""BloodRayne,"" for example, and you've got to give them points for that. One thing I noticed though was that the woman who plays Meredith often had her face covered in make-up that was many tones lighter than the rest of her. She looked like she had a bad run-in with some white-face. The script is bad and cheesy. You don't really notice the music, but it's actually not too bad for the most part. The bottom line is don't watch it unless you want to see it because you hear it's bad (like I did), although the only funny things are the bad CG effects. Other than that, don't waste your time and money.",neg


#### Example 4: Predicting housing prices

Suppose we want to predict housing prices given a number of attributes associated with houses.

Download the data from [here](https://www.kaggle.com/harlfoxem/housesalesprediction) or run the following cell.

In [None]:
cmle.download("data/kc_house_data.csv")
df = pd.read_csv("data/kc_house_data.csv")
df = df.drop(columns=["id", "date"])
df.rename(columns={"price": "target"}, inplace=True)
train_df, test_df = train_test_split(df, test_size=0.2, random_state=4)
train_df.head()

Unnamed: 0,target,bedrooms,bathrooms,sqft_living,sqft_lot,floors,waterfront,view,condition,grade,sqft_above,sqft_basement,yr_built,yr_renovated,zipcode,lat,long,sqft_living15,sqft_lot15
8583,509000.0,2,1.5,1930,3521,2.0,0,0,3,8,1930,0,1989,0,98007,47.6092,-122.146,1840,3576
19257,675000.0,5,2.75,2570,12906,2.0,0,0,3,8,2570,0,1987,0,98075,47.5814,-122.05,2580,12927
1295,420000.0,3,1.0,1150,5120,1.0,0,0,4,6,800,350,1946,0,98116,47.5588,-122.392,1220,5120
15670,680000.0,8,2.75,2530,4800,2.0,0,0,4,7,1390,1140,1901,0,98112,47.6241,-122.305,1540,4800
3913,357823.0,3,1.5,1240,9196,1.0,0,0,3,8,1240,0,1968,0,98072,47.7562,-122.094,1690,10800


In [None]:
# Build a regression model

X_train, y_train = train_df.drop(columns=["target"]), train_df["target"]
X_test, y_test = test_df.drop(columns=["target"]), train_df["target"]

model = XGBRegressor()
model.fit(X_train, y_train)

In [None]:
# Predict on unseen examples using the built model
pred_df = pd.DataFrame(
    # {"Predicted target": model.predict(X_test[0:4]).tolist(), "Actual price": y_test[0:4].tolist()}
    {"Predicted_target": model.predict(X_test[0:4]).tolist()})
df_concat = pd.concat([pred_df, X_test[0:4].reset_index(drop=True)], axis=1)
df_concat

Unnamed: 0,Predicted_target,bedrooms,bathrooms,sqft_living,sqft_lot,floors,waterfront,view,condition,grade,sqft_above,sqft_basement,yr_built,yr_renovated,zipcode,lat,long,sqft_living15,sqft_lot15
0,333981.625,4,2.25,2130,8078,1.0,0,0,4,7,1380,750,1977,0,98055,47.4482,-122.209,2300,8112
1,615222.4375,3,2.5,2210,7620,2.0,0,0,3,8,2210,0,1994,0,98052,47.6938,-122.13,1920,7440
2,329770.0625,4,1.5,1800,9576,1.0,0,0,4,7,1800,0,1977,0,98045,47.4664,-121.747,1370,9576
3,565091.625,3,2.5,1580,1321,2.0,0,2,3,8,1080,500,2014,0,98107,47.6688,-122.402,1530,1357


To summarize, supervised machine learning can be used on a variety of problems and different kinds of data.

### Machine learning workflow

Supervised machine learning is quite flexible; it can be used on a variety of problems and different kinds of data. Here is a typical workflow of a supervised machine learning system.

We will build machine learning pipelines in this course, focusing on some of the steps below.

<!-- ![ml-workflow.png](img/ml-workflow.png) -->
<center><img src="https://yongkaw.people.clemson.edu/ece4420/img/ml-workflow.png" alt="ml-workflow.png" width="65%"></center>

## Summary

- Machine learning is a different paradigm for problem-solving.
- Very often it reduces the time you spend programming and helps customizing and scaling your products.
- In supervised learning, we are given a set of observations ($X$) and their corresponding targets ($y$) and we wish to find a model function $f$ that relates $X$ to $y$.


## Recommendation reading materials

- [Machine Learning (Wikipedia)](https://en.wikipedia.org/wiki/Machine_learning)
- [A Gentle Introduction to Machine Learning](https://youtu.be/Gv9_4yMHFhI)
- [What is Machine Learning (YouTube)](https://www.youtube.com/watch?v=HcqpanDadyQ)
- [The 7 steps of machine learning (Youtube)](https://www.youtube.com/watch?v=nKW8Ndu7Mjw)

<hr>