# Apply Phi3 model with HuggingFace Causal ML

![HuggingFace Logo](https://huggingface.co/front/assets/huggingface_logo-noborder.svg)

**HuggingFace** is a popular open-source platform that develops computation tools for building application using machine learning. It is widely known for its Transformers library which contains open-source implementation of transformer models for text, image, and audio task.

[**Phi 3**](https://azure.microsoft.com/en-us/blog/introducing-phi-3-redefining-whats-possible-with-slms/) is a family of AI models developed by Microsoft, designed to redefine what is possible with small language models (SLMs). Phi-3 models are the most compatable and cost-effective SLMs, [outperforming models of the same size and even larger ones in language](https://news.microsoft.com/source/features/ai/the-phi-3-small-language-models-with-big-potential/?msockid=26355e446adb6dfa06484f956b686c27), reasoning, coding, and math benchmarks. 

<img src="https://pub-66c8c8c5ae474e9a9161c92b21de2f08.r2.dev/2024/04/The-Phi-3-small-language-models-with-big-potential-1.jpg" alt="Phi 3 model performance" width="600">

To make it easier to scale up causal language model prediction on a large dataset, we have integrated [HuggingFace Causal LM](https://huggingface.co/docs/transformers/tasks/language_modeling) with SynapseML. This integration makes it easy to use the Apache Spark distributed computing framework to process large data on text generation tasks.

This tutorial shows hot to apply [phi3 model](https://huggingface.co/collections/microsoft/phi-3-6626e15e9585a200d2d761e3) at scale with no extra setting.


In [None]:
# %pip install --upgrade transformers==4.48.0 -q

In [3]:
chats = [
    (1, "fix grammar: helol mi friend"),
    (2, "What is HuggingFace"),
    (3, "translate to Spanish: hello"),
]

chat_df = spark.createDataFrame(chats, ["row_index", "content"])
chat_df.show()

StatementMeta(, 75f9bda6-1df7-4afc-b72e-f23922c4333a, 9, Finished, Available, Finished)

+---------+--------------------+
|row_index|             content|
+---------+--------------------+
|        1|fix grammar: helo...|
|        2| What is HuggingFace|
|        3|translate to Span...|
+---------+--------------------+



## Define and Apply Phi3 model

The following example demonstrates how to load the remote Phi 3 model from HuggingFace and apply it to chats.

In [4]:
from synapse.ml.llm.HuggingFaceCausallmTransform import HuggingFaceCausalLM

phi3_transformer = (
    HuggingFaceCausalLM()
    .setModelName("microsoft/Phi-3-mini-4k-instruct")
    .setInputCol("content")
    .setOutputCol("result")
    .setModelParam(max_new_tokens=1000)
    .setModelConfig(local_files_only=False, trust_remote_code=True)
)
result_df = phi3_transformer.transform(chat_df).collect()
display(result_df)

StatementMeta(, 75f9bda6-1df7-4afc-b72e-f23922c4333a, 10, Finished, Available, Finished)

2025-01-23 11:01:37.576700: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-01-23 11:01:44.378814: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE4.1 SSE4.2 AVX AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


SynapseWidget(Synapse.DataFrame, 427b3314-b88a-4524-a9bf-35bc83b2678e)

## Use local cache

By caching the model, you can reduce initialization time. On Fabric, store the model in a Lakehouse and use setCachePath to load it.

In [5]:
# %%sh
# azcopy copy "https://mmlspark.blob.core.windows.net/huggingface/microsoft/Phi-3-mini-4k-instruct" "/lakehouse/default/Files/microsoft/" --recursive=true

StatementMeta(, 75f9bda6-1df7-4afc-b72e-f23922c4333a, 11, Finished, Available, Finished)

In [6]:
# phi3_transformer = (
#     HuggingFaceCausalLM()
#     .setCachePath("/lakehouse/default/Files/microsoft/Phi-3-mini-4k-instruct")
#     .setInputCol("content")
#     .setOutputCol("result")
#     .setModelParam(max_new_tokens=1000)
# )
# result_df = phi3_transformer.transform(chat_df).collect()
# display(result_df)

StatementMeta(, 75f9bda6-1df7-4afc-b72e-f23922c4333a, 12, Finished, Available, Finished)

## Utilize GPU

To utilize GPU, passing device_map="cuda", torch_dtype="auto" to modelConfig

In [7]:
# phi3_transformer = (
#     HuggingFaceCausalLM()
#     .setModelName("microsoft/Phi-3-mini-4k-instruct")
#     .setInputCol("content")
#     .setOutputCol("result")
#     .setModelParam(max_new_tokens=1000)
#     .setModelConfig(device_map="cuda", torch_dtype="auto", local_files_only=False, trust_remote_code=True)
# )
# result_df = phi3_transformer.transform(chat_df).collect()
# display(result_df)

StatementMeta(, 75f9bda6-1df7-4afc-b72e-f23922c4333a, 13, Finished, Available, Finished)