In [None]:
!pip install -U transformers langchain chromadb accelerate bitsandbytes

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting transformers
  Using cached transformers-4.29.1-py3-none-any.whl (7.1 MB)
Collecting langchain
  Using cached langchain-0.0.166-py3-none-any.whl (803 kB)
Collecting chromadb
  Using cached chromadb-0.3.22-py3-none-any.whl (69 kB)
Collecting accelerate
  Using cached accelerate-0.19.0-py3-none-any.whl (219 kB)
Collecting bitsandbytes
  Using cached bitsandbytes-0.38.1-py3-none-any.whl (104.3 MB)
Collecting huggingface-hub<1.0,>=0.14.1 (from transformers)
  Using cached huggingface_hub-0.14.1-py3-none-any.whl (224 kB)
Collecting tokenizers!=0.11.3,<0.14,>=0.11.1 (from transformers)
  Using cached tokenizers-0.13.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.8 MB)
Collecting aiohttp<4.0.0,>=3.8.3 (from langchain)
  Using cached aiohttp-3.8.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.0 MB)
Collecting async-timeout<5.0.0,>=4.0.0 (from langchain)

# Models - Dolly 2.O

In [None]:
import torch
from transformers import pipeline

generate_text = pipeline(model="databricks/dolly-v2-3b", torch_dtype=torch.bfloat16,
                         trust_remote_code=True, device_map="auto", return_full_text=True)


Downloading (…)lve/main/config.json:   0%|          | 0.00/819 [00:00<?, ?B/s]

Downloading (…)instruct_pipeline.py:   0%|          | 0.00/9.16k [00:00<?, ?B/s]

A new version of the following files was downloaded from https://huggingface.co/databricks/dolly-v2-3b:
- instruct_pipeline.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.


Downloading pytorch_model.bin:   0%|          | 0.00/5.68G [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/450 [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/2.11M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/228 [00:00<?, ?B/s]

# Prompt Templates

Prompt templates sont une fonctionnalité fournie par LangChain qui prend un morceau de texte et injecte une entrée utilisateur dans le texte. Nous pouvons ensuite formater le prompt avec l'entrée utilisateur et l'alimenter dans le modèle de langue.

In [None]:
from langchain import PromptTemplate, LLMChain
from langchain.llms import HuggingFacePipeline

# template for an instrution with no input
prompt = PromptTemplate(
    input_variables=["instruction"],
    template="{instruction}")

# template for an instruction with input
prompt_with_context = PromptTemplate(
    input_variables=["instruction", "context"],
    template="{instruction}\n\nInput:\n{context}")

hf_pipeline = HuggingFacePipeline(pipeline=generate_text)

llm_chain = LLMChain(llm=hf_pipeline, prompt=prompt)
llm_context_chain = LLMChain(llm=hf_pipeline, prompt=prompt_with_context)


In [None]:
print(llm_chain.predict(instruction="Explain supervised learning").lstrip())

Supervised learning is an approach to machine learning where the learner is provided with examples or instances (also known as data) that are either "labeled" (i.e., the data points are either class 0 or class 1), or "unlabeled" (i.e., the data points are considered to have unknown labels). The goal is to construct a model (function) that is "optimized" to perform well on future data points, without the need for "de novo" training on the entire dataset. This makes it possible to find good model configuration or hyperparameters quickly on a smaller labeled dataset, which is desirable in many applications. An example of supervised learning is the "support vector machine" (SVM) or "decision tree".


In [None]:
template = """Below is an instruction that describes a task. Write a response that appropriately completes the request.
 
Instruction: 
You are a Machine Learning Scientist and your job is to help providing the best machine learning answer. 
Use only information in the following paragraphs to answer the question at the end. Explain the answer with reference to these paragraphs. If you don't know, say that you do not know.

{context}

Question: {question}

Response:
"""
prompt = PromptTemplate(input_variables=['context', 'question'], template=template)
llm_chain = LLMChain(llm=hf_pipeline, prompt=prompt)

context="""Supervised learning algorithms can be divided into two main categories: classification and regression. 
            In classification, the goal is to predict a categorical output label for each input data point. 
            Examples of classification tasks include image classification, sentiment analysis, and spam filtering. 
            In regression, the goal is to predict a continuous output value for each input data point. 
            Examples of regression tasks include predicting house prices, stock prices, or sales figures."""
print(llm_chain.predict(question="to predict house price, which category of Supervised learning algorithms should be used ",context=context).lstrip())

supervised learning algorithms can be divided into two main categories: classification and regression. 

In classification, the goal is to predict a categorical output label for each input data point. Examples of classification tasks include image classification, sentiment analysis, and spam filtering. 

In regression, the goal is to predict a continuous output value for each input data point. Examples of regression tasks include predicting house prices, stock prices, or sales figures.


# Chains

Une chaîne est une structure composite qui combine un modèle de langue d'entrée et un modèle de prompt pour créer une interface. Cette interface traite l'entrée utilisateur et génère une réponse à partir du modèle de langue. Fondamentalement, cela fonctionne comme une fonction composite avec le modèle de prompt comme fonction interne et le modèle de langue comme fonction externe.

De plus, il est possible de construire des chaînes séquentielles où la sortie générée par la première chaîne est utilisée comme entrée pour la deuxième chaîne, permettant ainsi un système de traitement multicouche.

In [None]:
from langchain import PromptTemplate

template = """
You are an expert data scientist with an expertise in building deep learning models. 
Explain the concept of {concept} in 3 lines
"""
prompt = PromptTemplate(
    input_variables = ["concept"],
    template=template
)

In [None]:
from langchain.chains import LLMChain 
chain = LLMChain(llm=hf_pipeline, prompt=prompt)

##Run the chain specifying only the input variable
print(chain.run("Autoencoder"))


An Autoencoder is a neural network which tries to reconstruct the input in a lower dimensional space. It does so by mapping the input to a low-dimensional subspace which is close to being able to reconstruct the input from the middle layer to the output.


In [None]:
second_prompt = PromptTemplate(
    input_variables = ["ml_concept"],
    template="Turn the concept description of {ml_concept} and explain it to me in 300 words"
)


chain_two = LLMChain(llm=hf_pipeline, prompt=second_prompt)

In [None]:
from langchain.chains import SimpleSequentialChain
overall_chain = SimpleSequentialChain(chains=[chain, chain_two], verbose=True)

##Run the chain specifying only the input variable for the first chain 
explanation = overall_chain.run("Autoencoder")
print(explanation)



[1m> Entering new SimpleSequentialChain chain...[0m
[36;1m[1;3m
Autoencoder is an unsupervised neural network architecture that can be used to reduce the dimension of a high dimensional feature space. Autoencoder has one input and one hidden layer and the dimension of the hidden layer is much lower than the original input dimension. 

One important property of autoencoder is that the reconstruction error (the difference between the input and the output) is minimized. 
The equation of the reconstruction error is 

where is the dimension of the output and is the dimension of the input.[0m
[33;1m[1;3m
Autoencoder reduces the dimension of input using the minimisation of reconstruction error. The reconstruction error is calculated using the equation mentioned above. The model has one input and one hidden layer. The dimension of the hidden layer is much lower than the original input dimension. To calculate the reconstruction error, is dimension of the output and is the dimension of 