<h2 align="center"> Building Frugal <code>OpenSource LLM</code>  Applications <br><br>using <code>Serverless Cloud</code> </h2>
<h5 align="center">Useful for PoCs and Batch Processing Jobs</h5>

<h2 align="center"> Motivation</h2>

- Want to build LLM applications? 
- Wondering what is the most cost effective way to learn and build them in cloud?

> Think OpenSource LLM. <br>
> Think Serverless

<h2 align="center"> Debates that we are reserving for a better day!</h2>


Or probably by end of session: 
> OpenSource LLMs vs Paid LLMs <br>
> Own Cloud hosted LLM vs Serverless Pay-as-you-go LLM APIs <br>

Note: 
- These are 2 different debates. 
- You can pay to the Serverless Bedrock API and use an Open Source LLM model like `Mistral AI Instruct`. 

<h2 align="center"> Purpose of this Presentation</h2>

Let us see how the intermingling of 2 concepts - Serverless + Open Source LLMs - help you build demo-able PoC LLM applications, at minimal cost. 


```
#LLMOps
#MLOps
#AWSLambda
#LLMonServerless
#OpenSourceLLMs
```

<h2 align="center"> What are we going to build?</h2>

- 1) A Lambda to run inference on a purpose-built ML Model
     - A Lambda to **Anonymize Text** using a Huggingface BERT Transformer-based Language Model for PII De-identification 
- 2) A Lambda to run a **Small Language Model** like Microsoft's Phi3
- 3) A Lambda to run a **RAG** Implementation on a Small Language Model like Phi3 
- 4) A Lambda to invoke **a LLM like Mistral 7B Instruct**
    -  the LLM is running in  SageMaker Endpoint

<h2 align="center"> 1. Lambda to Anonymize Text </h2>


- A Lambda to run inference on a purpose-built ML Model
     - This lambda can **Anonymize Text** 
     - using a Huggingface BERT Transformer-based Fine-tuned Model

![](../container_lambda_anonymize_text/container_lambda_with_api_gateway.png)

![](../container_lambda_anonymize_text/output_in_pic.png)

<img src="https://github.githubassets.com/images/modules/logos_page/GitHub-Mark.png" width="50" />

<h5 align="center"><a href="https://senthilkumarm1901.github.io/aws_serverless_recipes/container_lambda_anonymize_text/">https://senthilkumarm1901.github.io/aws_serverless_recipes/container_lambda_anonymize_text/</a></h5>


<h2 align="center"> 2. Small Language Model </h2>

- A Lambda to run a **Small Language Model** like Microsoft's Phi3

![](../container_lambda_to_run_slm/container_lambda_with_api_gateway_diag2.png)

![](../container_lambda_to_run_slm/output_in_pic.png)

<img src="https://github.githubassets.com/images/modules/logos_page/GitHub-Mark.png" width="50" />

<h5 align="center"><a href="https://senthilkumarm1901.github.io/aws_serverless_recipes/container_lambda_to_run_slm/">https://senthilkumarm1901.github.io/aws_serverless_recipes/container_lambda_to_run_slm/</a></h5>


<h2 align="center"> 3. Small Language Model with RAG </h2>

- A Lambda to run a RAG Implementation on a Small Language Model like Phi3, that gives better context

![](../container_lambda_to_run_rag_slm/slm_with_rag.png)

- URL we are testing on is from my favorite DL/NLP Researcher. 
    - https://magazine.sebastianraschka.com/p/understanding-large-language-models
    
    
![](../container_lambda_to_run_rag_slm/article_we_are_using_as_context.png)

![](../container_lambda_to_run_rag_slm/output_in_pic.png)

<img src="https://github.githubassets.com/images/modules/logos_page/GitHub-Mark.png" width="50" />

<h5 align="center"><a href="https://senthilkumarm1901.github.io/aws_serverless_recipes/container_lambda_to_run_rag_slm/">https://senthilkumarm1901.github.io/aws_serverless_recipes/container_lambda_to_run_rag_slm/</a></h5>


<h2 align="center"> 4. Large Language Model  (A Partial Serverless)</h2>

- A Lambda to invoke **a LLM like Mistral 7B Instruct**
    -  that is running in  SageMaker Endpoint

![](../lambda_to_invoke_a_sagemaker_endpoint/lambda_to_invoke_sagemaker_endpoint.png)

![](../lambda_to_invoke_a_sagemaker_endpoint/output_in_pic.png)

<img src="https://github.githubassets.com/images/modules/logos_page/GitHub-Mark.png" width="50" />

<h5 align="center"><a href="https://senthilkumarm1901.github.io/aws_serverless_recipes/lambda_to_invoke_a_sagemaker_endpoint/">https://senthilkumarm1901.github.io/aws_serverless_recipes/lambda_to_invoke_a_sagemaker_endpoint/</a></h5>


<h2 align="center"> Key Challenges Faced</h2>

- Serverless could mean we end up with low end cpu architecture. Hence, latency high for RAG LLM implementations
- RAG could mean any big context. But converting the RAG context into a vector store will take time. Hence size of the context needs to be lower for "AWS Lambda" implementations
- API Gateway times out in 30 seconds. Hence could not be used in RAG LLM implementation

<h2 align="center"> What knowledge you gain by this way of practice?</h2>


**MLOps Concepts**:
- Dockerizing ML Applications. What works in your machine works everywhere. More than 70% of the time building these LLM Apps is in perfecting the dockerfile. 
- The art of storing ML Models in AWS Lambda Containers. Use `cache_dir` well. Otherwise, models get downloaded everytime docker container is created


```python
os.environ['HF_HOME'] = '/tmp/model' #the only `write-able` dir in AWS lambda = `/tmp`
...
...
your_model="ab-ai/pii_model"
tokenizer = AutoTokenizer.from_pretrained(your_model,cache_dir='/tmp/model')
ner_model = AutoModelForTokenClassification.from_pretrained(your_model,cache_dir='/tmp/model')
```


**AWS Concepts**:
- `aws cli` is your friend for shorterning deployments, especially for Serverless
- API Gateway is a frustratingly beautiful service. But a combination of `aws cli` and `OpenAPI` spec makes it replicable

Finally, the **LLM Concepts**:
- Frameworks: Llama cpp, LangChain, Huggingface (there are so many I have not used)
- SLMs work well with Reasoning but are too slow/bad for general knowledge questions

> Well, it is difficult to keep up with these frameworks. I flick codes. Models are like wines and these frameworks are like bottles. Getting used to how the wines are stored in bottles help.  

**Next Steps for the reader**:
- Replicate the instructions in the given Github links
    - Familiarizing Dockerizing of ML Applications
    - Provisioning AWS Resources like AWS Lambda, API Gateway using tools like `aws cli` and `OpenAPI`
- Explore various other avenues of using LLMs (especially the paid ones). Paid APIs are cake-walk compared to this. But won't give you the depth in implementations

<h2 align="center"> Thank You</h2>

In [13]:
!jupyter nbconvert Frugal_LLM_Applications_using_Serverless_for_PoCs.ipynb --to slides

[NbConvertApp] Converting notebook Frugal_LLM_Applications_using_Serverless_for_PoCs.ipynb to slides
[NbConvertApp] Writing 597633 bytes to Frugal_LLM_Applications_using_Serverless_for_PoCs.slides.html
