(AWS) Mastering Large Language Models

Want to know more?

If you have questions about the course or the tasks, or if you find any errors, feel free to ask questions and participate in discussions within the repository issues.

About

This course gives you a complete, practical foundation in building and working with Large Language Models. You’ll start with core NLP techniques, from tokenization and embeddings to classical classification methods, and understand how these fundamentals connect to modern transformer-based systems.

You’ll learn to fine-tune pre-trained models, integrate retrieval systems for knowledge-grounded generation, and apply parameter-efficient methods like LoRA and PEFT. The course also covers vector databases and RAG systems.

By the end, you’ll have a deep, hands-on understanding of modern LLMs – from how they work internally to how they can be adapted, optimized, and used in real projects.

Content

NLP basics

Data exploration and quality assessment
Tokenization strategies and implementation
Word embeddings (GloVe) and semantic relationships
Embedding visualization techniques
K-nearest neighbors classification
Bag-of-words feature extraction
Naive Bayes classifier
Logistic Regression with word counts
Logistic Regression with embeddings

Language modeling

N-gram language models and probability estimation
Text generation algorithms
Perplexity evaluation metrics
Laplace smoothing techniques
Advanced text processing tools
Recurrent neural networks (RNNs) for language modeling
Loss functions and optimization
Training procedures and best practices

MinLlama

Transformer architecture fundamentals
Evaluation of language models for sentiment analysis
Rotary Position Embedding (RoPE) implementation
Implementing core components of the Llama
Validating your complete LLaMA model implementation
Implementing the AdamW Optimizer for LLaMA training
Building a classification head using pre-trained LLaMA representations
Deploying LLaMA for text generation, zero-shot classification, and fine-tuning

Fine-tuning

Introduction to fine-tuning concepts
Prompt engineering and optimization
Dataset preparation and formatting
Training utilities and helper functions
Grid search and hyperparameter optimization
Parameter-efficient fine-tuning (PEFT) techniques

Retrieval-Augmented Generation (RAG)

Dataset for Retrieval-Augmented Generation (RAG)
Understanding the RAG pipeline
Building triplet datasets for contrastive retrieval
Fine-tuning a retrieval bi-encoder with LoRA (Low-Rank Adaptation)
Building an in-memory cosine vector store
RAG inference

GenAnkiCards

Designing a flashcard generator
Generating content for cards (text, images, audio)
Building an Anki integration system

Contribution

Please be sure to review the project's contributing guidelines to learn how to help the project.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
.cadence/configs		.cadence/configs
FineTuning		FineTuning
GenAnkiCards		GenAnkiCards
LanguageModeling		LanguageModeling
MinLlama		MinLlama
NLPBasics		NLPBasics
RAG		RAG
.gitignore		.gitignore
README.md		README.md
course-info.yaml		course-info.yaml
course-remote-info.yaml		course-remote-info.yaml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

(AWS) Mastering Large Language Models

Want to know more?

About

Content

NLP basics

Language modeling

MinLlama

Fine-tuning

Retrieval-Augmented Generation (RAG)

GenAnkiCards

Contribution

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

(AWS) Mastering Large Language Models

Want to know more?

About

Content

NLP basics

Language modeling

MinLlama

Fine-tuning

Retrieval-Augmented Generation (RAG)

GenAnkiCards

Contribution

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages