# Maral 7B Inference Notebook

<p align="center">
 <img src="https://huggingface.co/MaralGPT/Maral-7B-alpha-1/resolve/main/maral-7b-announce.png" width=256 height=256/>
</p>

## About Maral

Maral is just a new large lanugage model, specializing on the Persian language. This model is based on Mistral and trained an Alpaca Persian dataset. This model is one of the few efforts in Persian speaking scene in order to bring our language to a new life in the era of AI.

Also, since Maral is based on Mistral, it's capable of producing English answers as well.

## Our Team

* Muhammadreza Haghiri ([Website](https://haghiri75.com/en) - [Github](https://github.com/prp-e) - [LinkedIn](https://www.linkedin.com/in/muhammadreza-haghiri-1761325b))
* Mahi Mohrechi ([Website](https://mohrechi-portfolio.vercel.app/) - [Github](https://github.com/f-mohrechi) - [LinkedIn](https://www.linkedin.com/in/faeze-mohrechi/))

## Needed libraries

Since the model is loaded in 8 bit quantization mode on free colab, you need `bitsandbytes`. If you do own a better GPU, go with full 16 bit quantization.

In [None]:
!pip install transformers accelerate bitsandbytes -q

In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer, GenerationConfig
import torch

## Loading Model

In [None]:
model_name_or_id = "MaralGPT/Maral-7B-alpha-1"

In [None]:
model = AutoModelForCausalLM.from_pretrained(model_name_or_id, torch_dtype=torch.float16, device_map="auto", low_cpu_mem_usage=True, load_in_8bit=True)
tokenizer = AutoTokenizer.from_pretrained(model_name_or_id)

## Model Structure

In [None]:
model

MistralForCausalLM(
  (model): MistralModel(
    (embed_tokens): Embedding(32000, 4096)
    (layers): ModuleList(
      (0-31): 32 x MistralDecoderLayer(
        (self_attn): MistralAttention(
          (q_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (k_proj): Linear(in_features=4096, out_features=1024, bias=False)
          (v_proj): Linear(in_features=4096, out_features=1024, bias=False)
          (o_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (rotary_emb): MistralRotaryEmbedding()
        )
        (mlp): MistralMLP(
          (gate_proj): Linear(in_features=4096, out_features=14336, bias=False)
          (up_proj): Linear(in_features=4096, out_features=14336, bias=False)
          (down_proj): Linear(in_features=14336, out_features=4096, bias=False)
          (act_fn): SiLUActivation()
        )
        (input_layernorm): MistralRMSNorm()
        (post_attention_layernorm): MistralRMSNorm()
      )
    )
    (norm): MistralRM

## Prompt Format

This model, uses _Guanaco_ format, which is like this:

```
### Human: <prompt>
### Assistant: <answer>
```

So in the below cell, you can easily modify the prompt without messing with the format.

In [None]:
prompt = "در سال ۱۹۹۶ چه کسی رییس جمهور آمریکا بود؟"
prompt = f"### Human:{prompt}\n### Assistant:"

In [None]:
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")

### Generation Config

This cell, is a simple and easy way to tweak the configurations for text generation.

In [None]:
generation_config = GenerationConfig(
    do_sample=True,
    top_k=1,
    temperature=0.5,
    max_new_tokens=100,
    pad_token_id=tokenizer.eos_token_id
)

In [None]:
outputs = model.generate(**inputs, generation_config=generation_config)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

### Human:در سال ۱۹۹۶ چه کسی رییس جمهور آمریکا بود؟
### Assistant: در سال 1996 بیل کلنتن رییس جمهور آمریکا بود.
### Assistant: بیل کلنتن در سال 1992 به دومین رییس جمهور آمریکا انت
