In [1]:
!pip install -qU \
    transformers \
    accelerate \
    bitsandbytes \
    langchain==0.0.354 \
    chromadb \
    voyageai 

[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
cudf 23.8.0 requires cupy-cuda11x>=12.0.0, which is not installed.
cuml 23.8.0 requires cupy-cuda11x>=12.0.0, which is not installed.
dask-cudf 23.8.0 requires cupy-cuda11x>=12.0.0, which is not installed.
apache-beam 2.46.0 requires dill<0.3.2,>=0.3.1.1, but you have dill 0.3.7 which is incompatible.
apache-beam 2.46.0 requires pyarrow<10.0.0,>=3.0.0, but you have pyarrow 11.0.0 which is incompatible.
cudf 23.8.0 requires pandas<1.6.0dev0,>=1.3, but you have pandas 2.0.3 which is incompatible.
cudf 23.8.0 requires protobuf<5,>=4.21, but you have protobuf 3.20.3 which is incompatible.
cuml 23.8.0 requires dask==2023.7.1, but you have dask 2023.12.1 which is incompatible.
cuml 23.8.0 requires distributed==2023.7.1, but you have distributed 2023.12.1 which is incompatible.
dask-cuda 23.8.0 requires dask==2023.7

In [2]:
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig, pipeline
import torch

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
)



In [3]:
#getting mistral from kaggle and initialize tokenizer
model_name = "/kaggle/input/mistral/pytorch/7b-instruct-v0.1-hf/1"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
        model_name,
        quantization_config=bnb_config,
        torch_dtype=torch.bfloat16,
        device_map="auto",
        trust_remote_code=True,
    )


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

  return self.fget.__get__(instance, owner)()


In [4]:
from langchain.llms import HuggingFacePipeline
import transformers

text_generation_pipeline = transformers.pipeline(
    task="text-generation",
    model=model,
    tokenizer=tokenizer,
    device_map="auto",
    temperature=0.1,
    do_sample="False",
    torch_dtype=torch.bfloat16,
    repetition_penalty=1.2,
    return_full_text=True,
    max_new_tokens=300,
)
mistral_llm = HuggingFacePipeline(pipeline=text_generation_pipeline)

-------------------------

In [5]:
#import dataset
from langchain.document_loaders.csv_loader import CSVLoader
file_path = '/kaggle/input/pokemon-corpus-2/corpus_df_2.csv'
loader = CSVLoader(file_path=file_path)

data = loader.load()

In [6]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 1500,
    chunk_overlap = 150
)

#Create a split of the document using the text splitter
splits = text_splitter.split_documents(data)

--------------------

In [7]:
#embedding model
#voyage2
from langchain_community.embeddings import VoyageEmbeddings

embeddings = VoyageEmbeddings(model="voyage-2",
                              voyage_api_key="YOUR_API_KEY",
                              show_progress_bar=True)


In [8]:
from langchain_community.vectorstores import Chroma

vectorstore = Chroma.from_documents(splits, embeddings)

  0%|          | 0/1495 [00:00<?, ?it/s]

In [14]:
query = "How can I evolve Eevee to Espeon?"

docs = vectorstore.similarity_search(query)

print(docs[0].page_content)

  0%|          | 0/1 [00:00<?, ?it/s]

: 242
pokemon_info: Espeon (Japanese: エーフィ Eifie) is a Psychic-type Pokémon introduced in Generation II.
It evolves from Eevee when leveled up with high friendship during the day.
(Specifics may differ in past games. Refer to Game data→Evolution data for these details.)
It is one of Eevee's final forms, the others being Vaporeon, Jolteon, Flareon, Umbreon, Leafeon, Glaceon, and Sylveon.
Espeon is the starter Pokémon of Pokémon Colosseum alongside Umbreon.


-----------------------------

In [81]:
#conversational chat history
from langchain.chains import ConversationalRetrievalChain
from langchain.prompts import ChatPromptTemplate

#vectorstore = vectorstore
retriever = vectorstore.as_retriever(search_type='similarity', search_kwargs = {"k" : 5})

# This controls how the standalone question is generated.
# Should take `chat_history` and `question` as input variables.
template = (
    """
    <s>[INST]
    You are an assisant for question-answering tasks.
    Use the following pieces of retrieved context to 
    answer the question. Combine the chat history and
    follow up question into a standalone question, if 
    the chat history is relevant to the question. If 
    you don't know the answer, say you don't know. 
    If the chat history is not relevant to the question,
    do not focus on it.
    </s>
    [INST]
    Chat History: {chat_history}
    Follow up question: {question}
    [/INST]
    """
)
prompt = ChatPromptTemplate.from_template(template)
llm = mistral_llm
chain = ConversationalRetrievalChain.from_llm(llm, retriever, prompt, return_source_documents=True)


In [82]:
chat_history = []

question = "Is earthquake good to use against Pidgeotto?"

result = chain({"question": question, "chat_history": chat_history})
chat_history.append((question, result["answer"]))
print(result['answer'])

  0%|          | 0/1 [00:00<?, ?it/s]

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


 No, Earthquake is not good to use against Pidgeotto.


In [83]:
question = "What would be a good move against it?"

result = chain({"question": question, "chat_history": chat_history})
chat_history.append((question, result["answer"]))
print(result['answer'])

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


  0%|          | 0/1 [00:00<?, ?it/s]

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


 
I think the best move against Pidgeotto would be Stone Edge.


In [84]:
question = "Is Clefable or Garchomp better for competitive battling?"

result = chain({"question": question, "chat_history": chat_history})
chat_history.append((question, result["answer"]))
print(result['answer'])

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


  0%|          | 0/1 [00:00<?, ?it/s]

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


 
Garchomp is generally considered better for competitive battling due to its versatility and strong offensive capabilities.


In [85]:
question = "Which one is better for a defensive build?"

result = chain({"question": question, "chat_history": chat_history})
chat_history.append((question, result["answer"]))
print(result['answer'])

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


  0%|          | 0/1 [00:00<?, ?it/s]

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


 Both Clefable and Garchomp are excellent choices for defensive builds in competitive battling, each bringing unique strengths and weaknesses to the table. Clefable's decent bulk, reliable recovery, and fantastic defensive typing allow it to act as a solid check to Pokemon like Garchomp, Mega Medicham, Mega Lopunny, and Hawlucha, making it a valuable defensive Pokemon in the tier. On the other hand, Garchomp's powerful offensive capabilities and strong defensive typing make it a formidable opponent for any defensive build. Ultimately, the choice between these two will depend on your specific playstyle and preferences.
