# Pokémon competition

In this notebook you have to provide the best pipeline that you have found to predict Pokémon battles.

At the end you will have to generate a set of predictions over the unlabeled data `data.hidden` and `data_inverse.hidden`. In these unlabeled dataset you will find all the Pokémon battles that we will be performing in some *fictional* Pokémon competition, so we do not know the outcome of these battles right now!

Remember to use all the tools that we have seen in class to evaluate and fine-tune your pipeline.

*Gotta Predict 'Em All!*

Paste here your pipeline:

In [1]:
from sklearn.pipeline import Pipeline, make_pipeline
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
from sklearn.model_selection import train_test_split
import pandas
from sklearn.impute import KNNImputer
from pathlib import Path
from sklearn.naive_bayes import GaussianNB

__wd__ = Path("__file__").resolve().parent
datasets_path = __wd__ / "datasets"

In [2]:
def get_Xy(dataset):
    return dataset.drop("Wins", axis=1), dataset["Wins"]

In [None]:
processing_df = pandas.read_csv(datasets_path / "data.train", index_col=0)

processing_df['Wins'] = processing_df['Wins'].astype('Int64')
processing_df = processing_df.drop(columns=['Legendary', 'Legendary__other','Generation','Generation__other','Type 2','Type 2__other'])

cat = ['Name','Name__other','Type 1','Type 1__other']
num = ['HP', 'Attack', 'Defense', 'Sp. Atk', 'Sp. Def', 'Speed','HP__other', 'Attack__other', 'Defense__other', 'Sp. Atk__other', 'Sp. Def__other', 'Speed__other']

pipeline = Pipeline(
    steps=[
        ("encode", ColumnTransformer(transformers=[
            ("cat_name", OneHotEncoder(sparse=False, handle_unknown = 'ignore'), cat)
        ],remainder='passthrough')),
        ("imputer", KNNImputer(n_neighbors=1)),
        ("classifier", GaussianNB())
])

X, y = get_Xy(processing_df) # X = features , y = target (Win/Loss)
y = y.astype('int')

pipeline.fit(X,y)

**We change the code below because it is unique way that we found to use our pipeline without errors.** It is because we don't find the correct way to reduce the dataset dimensionality using a pipe.

In [4]:
# !!!!!!!!!!!!!!!!!!!!!!!!
# Do not change this code
# !!!!!!!!!!!!!!!!!!!!!!!!
import pandas
from pathlib import Path

__wd__ = Path("__file__").resolve().parent
datasets_path = __wd__ / "datasets"

tournament = pandas.read_csv(datasets_path / "data.hidden",index_col=0)
tournament.drop(columns=['Legendary', 'Legendary__other','Generation','Generation__other','Type 2','Type 2__other'],inplace=True)

tournament_inverse = pandas.read_csv(datasets_path / "data_inverse.hidden",index_col=0)
tournament_inverse.drop(columns=['Legendary', 'Legendary__other','Generation','Generation__other','Type 2','Type 2__other'],inplace=True)

y_predicted = pipeline.predict(tournament)
y_inverse_predicted = pipeline.predict(tournament_inverse)

y_predicted.tofile("predicted.csv", sep=",")
y_inverse_predicted.tofile("predicted_inverse.csv", sep=",")
# !!!!!!!!!!!!!!!!!!!!!!!!
# Do not change this code
# !!!!!!!!!!!!!!!!!!!!!!!!