# Hugging Face Endpoint Inferences

This notebook demonstrates how to use the Hugging Face Inference API to make predictions on a model hosted on the Hugging Face Hub. The model used in this example is a small text generation model. The model is hosted on the Hugging Face Hub and can be accessed using the `openai` library.

In [5]:
from openai import OpenAI
from decouple import config

HF_ACCESS_TOKEN = config("HF_TOKEN")

# this will load `meta-llama/Llama-3.2-3B-Instruct` (https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct)`
client = OpenAI(
	base_url="https://o4d7qjr8aeavpeiz.us-east-1.aws.endpoints.huggingface.cloud/v1/", 
	api_key=HF_ACCESS_TOKEN
)

chat_completion = client.chat.completions.create(
	model="tgi",
	messages=[
	{
		"role": "user",
		"content": "What is deep learning?"
	}
],
	top_p=None,
	temperature=None,
	max_tokens=150,
	stream=True,
	seed=None,
	stop=None,
	frequency_penalty=None,
	presence_penalty=None
)

for message in chat_completion:
	print(message.choices[0].delta.content, end="")

Deep learning is a subset of machine learning that involves the use of artificial neural networks (ANNs) to analyze and interpret data. ANNs are modeled after the human brain, with a network of interconnected nodes (neurons) that process inputs and produce outputs.

In traditional machine learning, algorithms are used to learn patterns from data through a process called supervised or unsupervised learning. However, these algorithms often require extensive manual tuning and feature engineering to produce accurate results.

Deep learning addresses these limitations by using a multilayered architecture, with each layer learning a different level of abstraction. The inputs are fed into the network as a raw dataset, and the output is produced after passing the data through multiple layers of complex representations.

The key characteristics of deep