## Lesson Objectives
By the end of this lesson, you will be able to:

* Understand the fundamentals of the __perceptron__ algorithm and basic __neural network__ architecture
* Apply __PyTorch__ for practical deep learning tasks
* Apply __Hugging Face__ for practical deep learning tasks
* Use __transfer learning__ to leverage pre-trained models for a variety of machine learning tasks


__Technical Terms Explained:__

`Perceptron:` A basic computational model in machine learning that makes decisions by weighing input data. It's like a mini-decision maker that labels data as one thing or another.

`Binary Classifier:` A type of system that categorizes data into one of two groups. Picture a light switch that can be flipped to either on or off.

`Vector of Numbers:` A sequence of numbers arranged in order, which together represent one piece of data.

`Activation Function:` A mathematical equation that decides whether the perceptron's calculated sum from the inputs is enough to trigger a positive or negative output.

`Multi-Layer Perceptron (MLP):` A type of artificial neural network that has multiple layers of nodes, each layer learning to recognize increasingly complex features of the input data.

`Input Layer:` The first layer in an MLP where the raw data is initially received.

`Output Layer:` The last layer in an MLP that produces the final result or prediction of the network.

`Hidden Layers:` Layers between the input and output that perform complex data transformations.

`Labeled Dataset:` This is a collection of data where each piece of information comes with a correct answer or label. It's like a quiz with the questions and answers already provided.

`Gradient Descent:` This method helps find the best settings for a neural network by slowly tweaking them to reduce errors, similar to finding the lowest point in a valley.

`Cost Function:` Imagine it as a score that tells you how wrong your network's predictions are. The goal is to make this score as low as possible.

`Learning Rate:` This hyperparameter specifies how big the steps are when adjusting the neural network's settings during training. Too big, and you might skip over the best setting; too small, and it'll take a very long time to get there.

`Backpropagation:` Short for backward propagation of errors. This is like a feedback system that tells each part of the neural network how much it contributed to any mistakes, so it can learn and do better next time.

`Loss functions:` They measure how well a model is performing by calculating the difference between the model's predictions and the actual results.

`Cross entropy loss:` This is a measure used when a model needs to choose between categories (like whether an image shows a cat or a dog), and it shows how well the model's predictions align with the actual categories.

`Mean squared error:` This shows the average of the squares of the differences between predicted numbers (like a predicted price) and the actual numbers. It's often used for predicting continuous values rather than categories.

`Gradients:` Directions and amounts by which a function increases most. The parameters can be changed in a direction opposite to the gradient of the loss function in order to reduce the loss.

`Learning Rate:` This hyperparameter specifies how big the steps are when adjusting the neural network's settings during training. Too big, and you might skip over the best setting; too small, and it'll take a very long time to get there.

`Momentum:` A technique that helps accelerate the optimizer in the right direction and dampens oscillations.

`PyTorch Dataset class:` This is like a recipe that tells your computer how to get the data it needs to learn from, including where to find it and how to parse it, if necessary.

`PyTorch Data Loader:` Think of this as a delivery truck that brings the data to your AI in small, manageable loads called batches; this makes it easier for the AI to process and learn from the data.

`Batches:` Batches are small, evenly divided parts of data that the AI looks at and learns from each step of the way.

`Shuffle:` It means mixing up the data so that it's not in the same order every time, which helps the AI learn better.

`Training Loop:` The cycle that a neural network goes through many times to learn from the data by making predictions, checking errors, and improving itself.

`Batches:` Batches are small, evenly divided parts of data that the AI looks at and learns from each step of the way.

`Epochs:` A complete pass through the entire training dataset. The more epochs, the more the computer goes over the material to learn.

`Loss functions:` They measure how well a model is performing by calculating the difference between the model's predictions and the actual results.

`Optimizer:` Part of the neural network's brain that makes decisions on how to change the network to get better at its job.

`Hugging Face :` is a company making waves in the technology world with its amazing tools for understanding and using human language in computers. Hugging Face offers everything from tokenizers, which help computers make sense of text, to a huge variety of ready-to-go language models, and even a treasure trove of data suited for language tasks.

`Tokenizers:` These work like a translator, converting the words we use into smaller parts and creating a secret code that computers can understand and work with.

`Models:` These are like the brain for computers, allowing them to learn and make decisions based on information they've been fed.

`Datasets:` Think of datasets as textbooks for computer models. They are collections of information that models study to learn and improve.

`Trainers:` Trainers are the coaches for computer models. They help these models get better at their tasks by practicing and providing guidance. HuggingFace Trainers implement the PyTorch training loop for you, so you can focus instead on other aspects of working on the model.

`Tokenization:` It's like cutting a sentence into individual pieces, such as words or characters, to make it easier to analyze.

`Tokens:` These are the pieces you get after cutting up text during tokenization, kind of like individual Lego blocks that can be words, parts of words, or even single letters. These tokens are converted to numerical values for models to understand.

`Pre-trained Model:` This is a ready-made model that has been previously taught with a lot of data.

`Uncased:` This means that the model treats uppercase and lowercase letters as the same.

`Truncating:` This refers to shortening longer pieces of text to fit a certain size limit.

`Padding:` Adding extra data to shorter texts to reach a uniform length for processing.

`Batches:` Batches are small, evenly divided parts of data that the AI looks at and learns from each step of the way.

`Batch Size:` The number of data samples that the machine considers in one go during training.

`Epochs:` A complete pass through the entire training dataset. The more epochs, the more the computer goes over the material to learn.

`Dataset Splits:` Dividing the dataset into parts for different uses, such as training the model and testing how well it works.

`Transfer learning:` The process where knowledge from a pre-trained model is applied to a new, but related task.

`Foundation Model:` A large AI model trained on a wide variety of data, which can do many tasks without much extra training.

`Adapted:` Modified or adjusted to suit new conditions or a new purpose, i.e. in the context of foundation models.

`Generalize:` The ability of a model to apply what it has learned from its training data to new, unseen data.

Foundation Models and Traditional Models are two distinct approaches in the field of artificial intelligence with different strengths. Foundation Models, which are built on large, diverse datasets, have the incredible ability to adapt and perform well on many different tasks. In contrast, Traditional Models specialize in specific tasks by learning from smaller, focused datasets, making them more straightforward and efficient for targeted applications.

![image.png](attachment:a5a3bc80-2ea2-4659-879e-4978a8e88852.png)

`Sequential data:` Information that is arranged in a specific order, such as words in a sentence or events in time.

`Self-attention mechanism:` The self-attention mechanism in a transformer is a process where each element in a sequence computes its representation by attending to and weighing the importance of all elements in the sequence, allowing the model to capture complex relationships and dependencies.

`Semantic Equivalence:` When different phrases or sentences convey the same meaning or idea.

`Textual Entailment:` The relationship between text fragments where one fragment follows logically from the other.

The GLUE benchmarks serve as an essential tool to assess an AI's grasp of human language, covering diverse tasks, from grammar checking to complex sentence relationship analysis. By putting AI models through these varied linguistic challenges, we can gauge their readiness for real-world tasks and uncover any potential weaknesses.



![image.png](attachment:59a5247a-b0fa-4263-b618-8df18796a50b.png) ![image.png](attachment:d93234dd-9102-4244-966c-f8cc808d0639.png)

`Preprocessing:` This is the process of preparing and cleaning data before it is used to train a machine learning model. It might involve removing errors, irrelevant information, or formatting the data in a way that the model can easily learn from it.

`Fine-tuning:` After a model has been pre-trained on a large dataset, fine-tuning is an additional training step where the model is further refined with specific data to improve its performance on a particular type of task.

![image.png](attachment:6e7d111f-98f0-48e3-ae18-309533c32895.png)

`Gigabytes/Terabytes:` Units of digital information storage. One gigabyte (GB) is about 1 billion bytes, and one terabyte (TB) is about 1,000 gigabytes. In terms of text, a single gigabyte can hold roughly 1,000 books.

`Common Crawl:` An open repository of web crawl data. Essentially, it is a large collection of content from the internet that is gathered by automatically scraping the web.\

Biases in training data deeply influence the outcomes of AI models, reflecting societal issues that require attention. Ways to approach this challenge include promoting diversity in development teams, seeking diverse data sources, and ensuring continued vigilance through bias detection and model monitoring.


`Selection Bias:` When the data used to train an AI model does not accurately represent the whole population or situation by virtue of the selection process, e.g. those choosing the data will tend to choose dataset their are aware of

`Historical Bias:` Prejudices and societal inequalities of the past that are reflected in the data, influencing the AI in a way that perpetuates these outdated beliefs.

`Confirmation Bias:` The tendency to favor information that confirms pre-existing beliefs, which can affect what data is selected for AI training.

`Discriminatory Outcomes:` Unfair results produced by AI that disadvantage certain groups, often due to biases in the training data or malicious actors.

`Echo Chambers:` Situations where biased AI reinforces and amplifies existing biases, leading to a narrow and distorted sphere of information.

`Bias Detection and Correction:` Processes and algorithms designed to identify and remove biases from data before it's used to train AI models.

`Transparency and Accountability:` Openness about how AI models are trained and the nature of their data, ensuring that developers are answerable for their AI's performance and impact.