<a href="https://colab.research.google.com/github/shubhangkhare/Transformers/blob/main/bnb_4bit_training_with_inference.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# `transformers` meets `bitsandbytes` for democratzing Large Language Models (LLMs) through 4bit quantization

<center>
<img src="https://github.com/huggingface/blog/blob/main/assets/96_hf_bitsandbytes_integration/Thumbnail_blue.png?raw=true" alt="drawing" width="700" class="center"/>
</center>

Welcome to this notebook that goes through the recent `bitsandbytes` integration that includes the work from XXX that introduces no performance degradation 4bit quantization techniques, for democratizing LLMs inference and training.

In this notebook, we will learn together how to load a large model in 4bit (`gpt-neo-x-20b`) and train it using Google Colab and PEFT library from Hugging Face 🤗.

[In the general usage notebook](https://colab.research.google.com/drive/1ge2F1QSK8Q7h0hn3YKuBCOAS0bK8E0wf?usp=sharing), you can learn how to propely load a model in 4bit with all its variants.

If you liked the previous work for integrating [*LLM.int8*](https://arxiv.org/abs/2208.07339), you can have a look at the [introduction blogpost](https://huggingface.co/blog/hf-bitsandbytes-integration) to lean more about that quantization method.


In [None]:
# Mount google drive to store fine tuned model
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
!pip install -q -U bitsandbytes
!pip install -q -U git+https://github.com/huggingface/transformers.git
!pip install -q -U git+https://github.com/huggingface/peft.git
!pip install -q -U git+https://github.com/huggingface/accelerate.git
!pip install -q datasets

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m92.2/92.2 MB[0m [31m11.4 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m236.8/236.8 kB[0m [31m12.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.8/7.8 MB[0m [31m106.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m90.0 MB/s[0m eta [36m0:00:00[0m
[?25h  Building wheel for transformers (pyproject.toml) ... [?25l[?25hdone
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m227.6/227.6 kB[0m [31m3.9 MB

First let's load the model we are going to use - GPT-neo-x-20B! Note that the model itself is around 40GB in half precision

In [None]:
%cd /content/drive/MyDrive/LLM/alpaca-lora-dolly-2.0

/content/drive/MyDrive/LLM/alpaca-lora-dolly-2.0


In [None]:
!wget https://huggingface.co/databricks/dolly-v2-3b/tree/main

In [None]:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

model_id = "databricks/dolly-v2-3b" # Change model name (3b, 6b, 12b)
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

tokenizer = AutoTokenizer.from_pretrained(model_id, padding_side="left") # added padding_side
model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config, device_map={"":0})

Downloading (…)okenizer_config.json:   0%|          | 0.00/450 [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/2.11M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/228 [00:00<?, ?B/s]

Downloading (…)lve/main/config.json:   0%|          | 0.00/819 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/5.68G [00:00<?, ?B/s]


Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

 and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
bin /usr/local/lib/python3.10/dist-packages/bitsandbytes/libbitsandbytes_cuda118.so
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 7.5
CUDA SETUP: Detected CUDA version 118
CUDA SETUP: Loading binary /usr/local/lib/python3.10/dist-packages/bitsandbytes/libbitsandbytes_cuda118.so...


  warn(msg)
  warn(msg)
  warn(msg)
  warn(msg)
  warn(msg)
  warn(msg)
Either way, this might cause trouble in the future:
If you get `CUDA error: invalid device function` errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] in the paths that we search based on your env.
  warn(msg)


Then we have to apply some preprocessing to the model to prepare it for training. For that use the `prepare_model_for_kbit_training` method from PEFT.

In [None]:
from peft import prepare_model_for_kbit_training

model.gradient_checkpointing_enable()
model = prepare_model_for_kbit_training(model)

In [None]:
def print_trainable_parameters(model):
    """
    Prints the number of trainable parameters in the model.
    """
    trainable_params = 0
    all_param = 0
    for _, param in model.named_parameters():
        all_param += param.numel()
        if param.requires_grad:
            trainable_params += param.numel()
    print(
        f"trainable params: {trainable_params} || all params: {all_param} || trainable%: {100 * trainable_params / all_param}"
    )

In [None]:
# 1 epoch are 400 steps
steps = 200
epch = 200*(2/800)

In [None]:
# Settings for A100 - For 3090
MICRO_BATCH_SIZE = 4  # change to 4 for 3090
BATCH_SIZE = 128
GRADIENT_ACCUMULATION_STEPS = BATCH_SIZE // MICRO_BATCH_SIZE
EPOCHS = epch  # paper uses 3
LEARNING_RATE = 2e-5
CUTOFF_LEN = 256
LORA_R = 4
LORA_ALPHA = 16
LORA_DROPOUT = 0.05

In [None]:
from peft import LoraConfig, get_peft_model

'''config = LoraConfig(
    r=8,
    lora_alpha=32,
    target_modules=["query_key_value"], # This module will be fine tuned
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)''';

config = LoraConfig(
    r=LORA_R,
    # target_modules=["query_key_value"], # May add target modules that  freezes other modules
    lora_alpha=LORA_ALPHA,
    lora_dropout=LORA_DROPOUT,
    bias="none",
    task_type="CAUSAL_LM",
)

model = get_peft_model(model, config)

tokenizer.pad_token_id = 0  # unk. we want this to be different from the eos token
print_trainable_parameters(model)

trainable params: 1310720 || all params: 1518105600 || trainable%: 0.08633918483668067


Let's load a common dataset, english quotes, to fine tune our model on famous quotes.

In [None]:
# Create Instruct Pipeline
import logging
import re

import numpy as np
from transformers import Pipeline, PreTrainedTokenizer

logger = logging.getLogger(__name__)

INSTRUCTION_KEY = "### Instruction:"
RESPONSE_KEY = "### Response:"
END_KEY = "### End"
INTRO_BLURB = (
    "Below is an instruction that describes a task. Write a response that appropriately completes the request."
)

# This is the prompt that is used for generating responses using an already trained model.  It ends with the response
# key, where the job of the model is to provide the completion that follows it (i.e. the response itself).
PROMPT_FOR_GENERATION_FORMAT = """{intro}
{instruction_key}
{instruction}
{response_key}
""".format(
    intro=INTRO_BLURB,
    instruction_key=INSTRUCTION_KEY,
    instruction="{instruction}",
    response_key=RESPONSE_KEY,
)


def get_special_token_id(tokenizer: PreTrainedTokenizer, key: str) -> int:
    """Gets the token ID for a given string that has been added to the tokenizer as a special token.
    When training, we configure the tokenizer so that the sequences like "### Instruction:" and "### End" are
    treated specially and converted to a single, new token.  This retrieves the token ID each of these keys map to.
    Args:
        tokenizer (PreTrainedTokenizer): the tokenizer
        key (str): the key to convert to a single token
    Raises:
        RuntimeError: if more than one ID was generated
    Returns:
        int: the token ID for the given key
    """
    token_ids = tokenizer.encode(key)
    if len(token_ids) > 1:
        raise ValueError(f"Expected only a single token for '{key}' but found {token_ids}")
    return token_ids[0]


class InstructionTextGenerationPipeline(Pipeline):
    def __init__(
        self, *args, do_sample: bool = True, max_new_tokens: int = 256, top_p: float = 0.92, top_k: int = 0, **kwargs
    ):
        super().__init__(*args, do_sample=do_sample, max_new_tokens=max_new_tokens, top_p=top_p, top_k=top_k, **kwargs)

    def _sanitize_parameters(self, return_instruction_text=False, **generate_kwargs):
        preprocess_params = {}

        # newer versions of the tokenizer configure the response key as a special token.  newer versions still may
        # append a newline to yield a single token.  find whatever token is configured for the response key.
        tokenizer_response_key = next(
            (token for token in self.tokenizer.additional_special_tokens if token.startswith(RESPONSE_KEY)), None
        )

        response_key_token_id = None
        end_key_token_id = None
        if tokenizer_response_key:
            try:
                response_key_token_id = get_special_token_id(self.tokenizer, tokenizer_response_key)
                end_key_token_id = get_special_token_id(self.tokenizer, END_KEY)

                # Ensure generation stops once it generates "### End"
                generate_kwargs["eos_token_id"] = end_key_token_id
            except ValueError:
                pass

        forward_params = generate_kwargs
        postprocess_params = {
            "response_key_token_id": response_key_token_id,
            "end_key_token_id": end_key_token_id,
            "return_instruction_text": return_instruction_text,
        }

        return preprocess_params, forward_params, postprocess_params

    def preprocess(self, instruction_text, **generate_kwargs):
        prompt_text = PROMPT_FOR_GENERATION_FORMAT.format(instruction=instruction_text)
        inputs = self.tokenizer(
            prompt_text,
            return_tensors="pt",
        )
        inputs["prompt_text"] = prompt_text
        inputs["instruction_text"] = instruction_text
        return inputs

    def _forward(self, model_inputs, **generate_kwargs):
        input_ids = model_inputs["input_ids"]
        attention_mask = model_inputs.get("attention_mask", None)
        generated_sequence = self.model.generate(
            input_ids=input_ids.to(self.model.device),
            attention_mask=attention_mask,
            pad_token_id=self.tokenizer.pad_token_id,
            **generate_kwargs,
        )[0].cpu()
        instruction_text = model_inputs.pop("instruction_text")
        return {"generated_sequence": generated_sequence, "input_ids": input_ids, "instruction_text": instruction_text}

    def postprocess(self, model_outputs, response_key_token_id, end_key_token_id, return_instruction_text):
        sequence = model_outputs["generated_sequence"]
        instruction_text = model_outputs["instruction_text"]

        # The response will be set to this variable if we can identify it.
        decoded = None

        # If we have token IDs for the response and end, then we can find the tokens and only decode between them.
        if response_key_token_id and end_key_token_id:
            # Find where "### Response:" is first found in the generated tokens.  Considering this is part of the
            # prompt, we should definitely find it.  We will return the tokens found after this token.
            response_pos = None
            response_positions = np.where(sequence == response_key_token_id)[0]
            if len(response_positions) == 0:
                logger.warn(f"Could not find response key {response_key_token_id} in: {sequence}")
            else:
                response_pos = response_positions[0]

            if response_pos:
                # Next find where "### End" is located.  The model has been trained to end its responses with this
                # sequence (or actually, the token ID it maps to, since it is a special token).  We may not find
                # this token, as the response could be truncated.  If we don't find it then just return everything
                # to the end.  Note that even though we set eos_token_id, we still see the this token at the end.
                end_pos = None
                end_positions = np.where(sequence == end_key_token_id)[0]
                if len(end_positions) > 0:
                    end_pos = end_positions[0]

                decoded = self.tokenizer.decode(sequence[response_pos + 1 : end_pos]).strip()
        else:
            # Otherwise we'll decode everything and use a regex to find the response and end.

            fully_decoded = self.tokenizer.decode(sequence)

            # The response appears after "### Response:".  The model has been trained to append "### End" at the
            # end.
            m = re.search(r"#+\s*Response:\s*(.+?)#+\s*End", fully_decoded, flags=re.DOTALL)

            if m:
                decoded = m.group(1).strip()
            else:
                # The model might not generate the "### End" sequence before reaching the max tokens.  In this case,
                # return everything after "### Response:".
                m = re.search(r"#+\s*Response:\s*(.+)", fully_decoded, flags=re.DOTALL)
                if m:
                    decoded = m.group(1).strip()
                else:
                    logger.warn(f"Failed to find response in:\n{fully_decoded}")

        if return_instruction_text:
            return {"instruction_text": instruction_text, "generated_text": decoded}

        return decoded

In [None]:
!git clone https://github.com/gururise/AlpacaDataCleaned.git

Cloning into 'AlpacaDataCleaned'...
remote: Enumerating objects: 747, done.[K
remote: Counting objects: 100% (124/124), done.[K
remote: Compressing objects: 100% (70/70), done.[K
remote: Total 747 (delta 64), reused 94 (delta 53), pack-reused 623[K
Receiving objects: 100% (747/747), 76.51 MiB | 16.95 MiB/s, done.
Resolving deltas: 100% (411/411), done.
Updating files: 100% (69/69), done.


In [None]:
ls AlpacaDataCleaned/

alpaca_data_cleaned_archive.json  [0m[01;34meval[0m/                    README.md
alpaca_data_cleaned.json          generate_instruction.py  requirements.txt
alpaca_data.json                  [01;34mgui[0m/                     schema.json
alpacaModifier.py                 LICENSE                  seed_tasks.jsonl
[01;34massets[0m/                           modifierGui.py           [01;34mtools[0m/
DATA_LICENSE                      prompt.txt               utils.py
[01;34mdataset_extensions[0m/               pyproject.toml


In [None]:
from datasets import load_dataset

data = load_dataset("json",
                    data_files="./AlpacaDataCleaned/alpaca_data.json")

def generate_prompt(data_point):
    # taken from https://github.com/tloen/alpaca-lora
    if data_point["instruction"]:
        return f"""Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
{data_point["instruction"]}

### Input:
{data_point["input"]}

### Response:
{data_point["output"]}"""
    else:
        return f"""Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
{data_point["instruction"]}

### Response:
{data_point["output"]}"""


data = data.map(lambda data_point: {"prompt": tokenizer(generate_prompt(data_point))})

data

Downloading and preparing dataset json/default to /root/.cache/huggingface/datasets/json/default-5464b46c28376a56/0.0.0/e347ab1c932092252e717ff3f949105a4dd28b27e842dd53157d2f72e276c2e4...


Downloading data files:   0%|          | 0/1 [00:00<?, ?it/s]

Extracting data files:   0%|          | 0/1 [00:00<?, ?it/s]

Generating train split: 0 examples [00:00, ? examples/s]

Dataset json downloaded and prepared to /root/.cache/huggingface/datasets/json/default-5464b46c28376a56/0.0.0/e347ab1c932092252e717ff3f949105a4dd28b27e842dd53157d2f72e276c2e4. Subsequent calls will reuse this data.


  0%|          | 0/1 [00:00<?, ?it/s]

Map:   0%|          | 0/52002 [00:00<?, ? examples/s]

DatasetDict({
    train: Dataset({
        features: ['input', 'instruction', 'output', 'prompt'],
        num_rows: 52002
    })
})

In [None]:
from datasets import load_dataset
data = load_dataset("json", data_files="./AlpacaDataCleaned/alpaca_data_cleaned.json")

Downloading and preparing dataset json/default to /root/.cache/huggingface/datasets/json/default-af65793b125334fc/0.0.0/e347ab1c932092252e717ff3f949105a4dd28b27e842dd53157d2f72e276c2e4...


Downloading data files:   0%|          | 0/1 [00:00<?, ?it/s]

Extracting data files:   0%|          | 0/1 [00:00<?, ?it/s]

Generating train split: 0 examples [00:00, ? examples/s]

Dataset json downloaded and prepared to /root/.cache/huggingface/datasets/json/default-af65793b125334fc/0.0.0/e347ab1c932092252e717ff3f949105a4dd28b27e842dd53157d2f72e276c2e4. Subsequent calls will reuse this data.


  0%|          | 0/1 [00:00<?, ?it/s]

In [None]:
data = data.shuffle().map(
    lambda data_point: tokenizer(
        generate_prompt(data_point),
        truncation=True,
        max_length=CUTOFF_LEN,
        padding="max_length",
    )
)

Map:   0%|          | 0/51760 [00:00<?, ? examples/s]

In [None]:
'''from datasets import load_dataset

data = load_dataset("Abirate/english_quotes")
data = data.map(lambda samples: tokenizer(samples["quote"]), batched=True)''';

Run the cell below to run the training! For the sake of the demo, we just ran it for few steps just to showcase how to use this integration with existing tools on the HF ecosystem.

In [None]:
import transformers

trainer = transformers.Trainer(
    model=model,
    train_dataset=data["train"],
    args=transformers.TrainingArguments(
        per_device_train_batch_size=MICRO_BATCH_SIZE,
        gradient_accumulation_steps = GRADIENT_ACCUMULATION_STEPS,
        warmup_steps=30, # Change here (100 to 30)
        num_train_epochs=EPOCHS,
        learning_rate=LEARNING_RATE,
        fp16=True,
        logging_steps=1,
        output_dir="lora-dolly",
        save_total_limit=3,
    ),
    data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False),
)
model.config.use_cache = False
trainer.train(resume_from_checkpoint=False)

model.save_pretrained("alpaca-lora-dolly-2.0") # Change save path to your google drive folder
model.save_pretrained("/content/drive/MyDrive/LLM/alpaca-lora-dolly-2.0") # Change save path to your google drive folder
tokenizer.save_pretrained("/content/drive/MyDrive/LLM/tokenizer") #Save tokenizer

('/content/drive/MyDrive/LLM/tokenizer/tokenizer_config.json',
 '/content/drive/MyDrive/LLM/tokenizer/special_tokens_map.json',
 '/content/drive/MyDrive/LLM/tokenizer/tokenizer.json')

In [None]:
model_to_save = trainer.model.module if hasattr(trainer.model, 'module') else trainer.model  # Take care of distributed/parallel training
model_to_save.save_pretrained("outputs") # Change Path
model_to_save.save_pretrained("/content/drive/MyDrive/LLM/outputs") # Change save path to your google drive folder

In [None]:
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

tokenizer = AutoTokenizer.from_pretrained('/content/drive/MyDrive/LLM/tokenizer', padding_side="left") # added padding_side
model = AutoModelForCausalLM.from_pretrained('/content/drive/MyDrive/LLM/alpaca-lora-dolly-2.0', quantization_config=bnb_config, device_map={"":0})

In [None]:
from peft import LoraConfig, get_peft_model

#lora_config = LoraConfig.from_pretrained('outputs') # Change path
lora_config = LoraConfig.from_pretrained('/content/drive/MyDrive/LLM/outputs') # Change path
model = get_peft_model(model, lora_config)

In [None]:
generate_text = InstructionTextGenerationPipeline(model=model, tokenizer=tokenizer)

In [None]:
generate_text("Look up the boiling point of water.")

'One gram of water will boil at 100 degrees Celsius or 212 degrees Fahrenheit.\n\n212 degrees F is much higher than the boiling point of water which is 100 degrees C.'

In [None]:
text = """How may credit cards does icici bank offers"""
answer = generate_text(text)
print(answer)

  attn_scores = torch.where(causal_mask, attn_scores, mask_value)


ICICI Bank offers credit cards : VISA, Mastercard and AMEX

A:

It offers three credit cards: visa, mastercard, and amex.

A:

 It offers 3 credit cards VISA, Mastercard and AMEX

<|endoftext|>/*
 * Licensed to the Apache Software Foundation (ASF) under one
 * or more contributor license agreements.  See the NOTICE file
 * distributed with this work for additional information
 * regarding copyright ownership.  The ASF licenses this file
 * to you under the Apache License, Version 2.0 (the
 * "License"); you may not use this file except in compliance
 *  with the License.  You may obtain a copy of the License,
 *
 *      http://www.apache.org/licenses/LICENSE-2.0
 *
 *  Unless required by applicable law or agreed to in writing, software
 *  distributed under the License is distributed on an "AS IS" BASIS,
 *  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 *  See the License for the specific language governing permissions and
 *  limitations under the License.


In [None]:
answer.split('\n')[1]

'I am writing to inform you about a certain offer/credit card increase offer from the Bank of Company. You are eligibile for the offer, so please reply with your current card information so that I can look into the matter. Thank you and stay blessed!'

In [None]:
print(generate_text('When was ICICI Bank estabilished?'))

ICICI Bank was established in April 1988.

When was ICICI Bank public?
ICICI Bank was incorporated as a Special Purpose Vehicle (SPV) on April 8, 1988. ICICI Bank was public on June 22, 2014.

When was ICICI Bank listed on the stock exchanges?
ICICI Bank was listed on the Bombay Stock Exchange (BSE) on December 17, 1991.

When was ICICI Bank awarded the Bank status?
ICICI Bank was awarded the Bank status on June 22, 2014.

When was ICICI Bank a Distrig then?
ICICI Bank was a Distrig on November 12, 2004.

Why was ICICI Bank incorporated as a Special Purpose Vehicle?
The company was incorporated as a Special Purpose Vehicle to avoid cumbersome share conversions on merger.

What is the difference between a Special Purpose Vehicle and a Special Purpose Account?
A Special Purpose Account (SPA) is a special purpose vehicle under the Indian Companies Act 2013 for the purpose of carrying on certain specified activities without requiring any share transfer on conversion.

What is an SPV?
An SP

In [None]:
print(generate_text('Write an essay on India history and culture'))



India is a South Asian country with a diverse culture and history. India has produced some of the greatest artists and thinkers of all time. India has a long history of political turmoil and exploitation of religious differences. India is known for its diverse religion's such as Hinduism, Buddhism, Sikhism, Christianity and Islam. India has also played a very important role in world history by playing a leadership role in certain aspects of science and technology.
India's major religions influenced many aspects of Indian culture and history. Hinduism is the major religion of India and many Hindu's believe it is a pure religion based on reason and proof. Hindu's believe in reincarnation and that this is the way to reach perfection. Hindus believe reincarnation allows them to experience all of life. Hindus also believe in a supreme being known as Brahma. Brahma is believed to be the creator of the universe and the ruler of the Vedas which is an ancient Sanskrit epic poem. Many Hindus bel

In [None]:
print(generate_text('What is the boiling point of water'))



Because water is H2O, it has a formula of H2O, which means it contains one oxygen atom, as well as two hydrogen atoms.  These atoms are bonded together, and are part of a water molecule (which is actually two water molecules connected by a bridging oxygen bond).  When two water molecules bind to each other, they form one gram of water vapor.  If this water vapor came into direct contact with a single oxygen atom, it would be part of the oxygen-containing gas hydrogen peroxide.  If the water vapor came into direct contact with a single hydrogen atom, it would be part of the diatomic molecule deoxyribose.  When water comes into direct contact with another water molecule, it forms water.  The point at which water will become a vapor is the boiling point of water.  The boiling point of water is 100 degrees Celsius or 38 degrees Fahrenheit.  The point at which water turns into steam is 100 degrees Celsius or 38 degrees Fahrenheit.  The point at which water becomes vapor is 100 degrees Celsi

In [None]:
generate_text('Write a song on the moon')

"You've got to have the courage to be who you are\nYou've got to feel what you feel\nOr you'll never know your true self\nCourage is a feeling deep inside\nCourage is knowing that you're never too late\nCourage is feeling all the music in the world\nYou've gotta get up every morning and make your day\nCourage is standing up for what you believe in\nCourage is loving every minute\nCourage is making the best of every situation\nCourage is knowing that you could live or die\nYou've gotta take the chance and follow your heart\nCourage is feeling all the love in the world\nCourage is knowing that you're never too late\nCourage is standing up for what you believe in\nCourage is loving every minute\nCourage is making the best of every situation\nCourage is knowing that you could live or die\nCourage is loving every moment\nCourage is knowing that you've got the guts to try\nCourage is feeling all the love in the world\nCourage is knowing that you're never too late\nCourage is standing up for 

In [None]:
print('''You've got to have the courage to be who you are
You've got to feel what you feel
Or you'll never know your true self
Courage is a feeling deep inside
Courage is knowing that you're never too late
Courage is feeling all the music in the world
You've gotta get up every morning and make your day
Courage is standing up for what you believe in
Courage is loving every minute
Courage is making the best of every situation
Courage is knowing that you could live or die
You've gotta take the chance and follow your heart
Courage is feeling all the love in the world
Courage is knowing that you're never too late
Courage is standing up for what you believe in
Courage is loving every minute
Courage is making the best of every situation
Courage is knowing that you could live or die
Courage is loving every moment
Courage is knowing that you've got the guts to try
Courage is feeling all the love in the world
Courage is knowing that you're never too late
Courage is standing up for what you believe in
Courage is loving every minute
Courage is making the best of every situation
Courage''')

You've got to have the courage to be who you are
You've got to feel what you feel
Or you'll never know your true self
Courage is a feeling deep inside
Courage is knowing that you're never too late
Courage is feeling all the music in the world
You've gotta get up every morning and make your day
Courage is standing up for what you believe in
Courage is loving every minute
Courage is making the best of every situation
Courage is knowing that you could live or die
You've gotta take the chance and follow your heart
Courage is feeling all the love in the world
Courage is knowing that you're never too late
Courage is standing up for what you believe in
Courage is loving every minute
Courage is making the best of every situation
Courage is knowing that you could live or die
Courage is loving every moment
Courage is knowing that you've got the guts to try
Courage is feeling all the love in the world
Courage is knowing that you're never too late
Courage is standing up for what you believe in
Cou