# Transformations - SentenceSplitter
- After the data is loaded, you then need to process and transform your data before putting it into a storage system. 
- These transformations include chunking, extracting metadata, and embedding each chunk. This is necessary to make sure that the data can be retrieved,
- and used optimally by the LLM.

- Transformation input/outputs are Node objects (a Document is a subclass of a Node). Transformations can also be stacked and reordered.



In [1]:
import os
from getpass import getpass
from huggingface_hub import login

In [2]:
HF_TOKEN = getpass()

 ········


In [3]:
login(token=HF_TOKEN)

Token will not been saved to git credential helper. Pass `add_to_git_credential=True` if you want to set the git credential as well.
Token is valid (permission: write).
Your token has been saved to C:\Users\amirs\.cache\huggingface\token
Login successful


In [4]:
# create llm model
from llama_index.llms.huggingface import HuggingFaceInferenceAPI
llm = HuggingFaceInferenceAPI(model_name="mistralai/Mixtral-8x7B-Instruct-v0.1", token=HF_TOKEN)
llm

HuggingFaceInferenceAPI(callback_manager=<llama_index.core.callbacks.base.CallbackManager object at 0x000002BF10106C10>, system_prompt=None, messages_to_prompt=<function messages_to_prompt at 0x000002BF1B8B2D40>, completion_to_prompt=<function default_completion_to_prompt at 0x000002BF1B945440>, output_parser=None, pydantic_program_mode=<PydanticProgramMode.DEFAULT: 'default'>, query_wrapper_prompt=None, model_name='mistralai/Mixtral-8x7B-Instruct-v0.1', token='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', timeout=None, headers=None, cookies=None, task=None, context_window=3900, num_output=256, is_chat_model=False, is_function_calling_model=False)

In [5]:
# load text file
from llama_index.core import SimpleDirectoryReader
reader = SimpleDirectoryReader(input_files= ["llamaindex.txt"])
documents = reader.load_data()

In [6]:
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core import Settings

In [7]:
text_splitter = SentenceSplitter(chunk_size=1000, chunk_overlap=10)

In [8]:
Settings.text_splitter = text_splitter

In [9]:
# per-index
from llama_index.core import VectorStoreIndex
index = VectorStoreIndex.from_documents(documents, transformations=[text_splitter],embed_model='local')

In [10]:
query_engine = index.as_query_engine(llm=llm)

In [11]:
response = query_engine.query("What is llamaindex?")
print(response)



LlamaIndex is an advanced and versatile toolkit written in Python for building large-scale knowledge graphs powered by artificial intelligence models. It facilitates rapid development and deployment of intelligent applications, offering features like scalable data ingestion, modular architecture, integration with leading AI models, high customizability, and extensive plugin support.
