# Introduction to the League of Legends Champion Dataset

In this notebook, we'll introduce the dataset containing attributes of various champions from the popular online game, League of Legends. Understanding the dataset is the first step in any data analysis or machine learning project.

## Loading the Dataset

Let's start by loading the dataset and taking a look at the first few rows.

First loading the data using the inforchampion2.csv

In [None]:
# Your code goes here

## Exploratory Data Analysis (EDA)

Exploratory Data Analysis (EDA) is an essential step in the data analysis process. It involves visualizing and understanding the patterns, relationships, anomalies, and other characteristics of the data. Through EDA, we can gain valuable insights that can guide our subsequent analyses and modeling efforts.

In this section, we'll:
1. Visualize the distribution of attributes.
2. Understand the distribution of attributes by role.
3. Calculate basic statistics for the attributes.
4. Explore attribute correlations.

### 1. Visualize the Distribution of Attributes

We'll start by visualizing the distribution of numerical attributes such as Health, Attack, Defense, and Attack Speed. Histograms are a great way to understand the distribution of continuous data. Let's plot histograms for these attributes.

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

# Setting the style for seaborn plots
sns.set_style('whitegrid')

# Plotting histograms for the numerical attributes using matplotlib
# Declare an array call attributes which contain Health, Attack, Defense and Attack Speed 
# Plot the histogram by passing the array into the dataframe
# For example: df[array].hist()

The histograms provide insights into the distribution of the champion attributes:

Health: Most champions have health values clustered around 600-650. There's a slight skew towards higher health values, indicating that there are a few champions with exceptionally high health.
Attack: The attack values are fairly normally distributed, with most champions having attack values between 55 and 65.
Defense: The defense values show a slight skew towards higher values, with a peak around 30-35.
Attack Speed: The attack speed values are clustered around 0.625 to 0.675, with a few outliers.
Next, we'll explore how these attributes vary based on the role of the champion. This will help us understand if certain roles tend to have higher or lower values for specific attributes.

We'll use box plots to visualize the distribution of attributes for each role. Box plots provide a summary of the central tendency, variability, and distribution shape of a dataset. They also help identify outliers in the data. Let's plot box plots for the attributes grouped by the champion's role.

In [None]:
plt.figure(figsize=(15, 10))

for index, attribute in enumerate(attributes, 1):
    plt.subplot(2, 2, index)
    sns.boxplot(x='Role', y=attribute, data=champions_df, palette='pastel')
    plt.title(f'Distribution of {attribute} by Role')
    plt.xticks(rotation=45)

plt.tight_layout()
plt.show()

The box plots provide insights into the distribution of champion attributes based on their roles:

Health:
Tanks have the highest median health, which is expected as they are designed to absorb damage.
Supports and Marksmen have the lowest median health, indicating they are more fragile.
Attack:
Assassins and Fighters have higher median attack values, indicating they are more damage-oriented.
Supports have the lowest median attack, as their primary role is not to deal damage.
Defense:
Tanks have the highest median defense, reinforcing their role as damage absorbers.
Mages and Marksmen have lower defense values, making them more susceptible to damage.
Attack Speed:
Marksmen have a higher attack speed, which is consistent with their role as ranged damage dealers who rely on rapid attacks.
Mages and Supports have lower attack speeds, as they often rely on abilities rather than basic attacks.
These insights align with the general understanding of the roles in League of Legends. Understanding these patterns can be crucial when selecting champions for a team or when predicting the outcome of battles.

Next, we'll calculate basic statistics for the attributes to get a numerical understanding of their distributions.

### 3. Calculate Basic Statistics for the Attributes
We'll compute the mean, median, standard deviation, minimum, and maximum values for each attribute. This will give us a summary of the central tendency and spread of the data.

In [None]:
# Calculating basic statistics for the attributes in the champion dataframe using the attributes array
# Declare statistics_df variable and store the basic statistic for the attributes
# You can achieve calculating basic statistic by follow this example: dataframe[array].describe().transpose()

### 4. Explore Attribute Correlations
Correlation measures the strength and direction of a linear relationship between two variables. It can range from -1 to 1, where:

1 indicates a perfect positive linear relationship.
-1 indicates a perfect negative linear relationship.
0 indicates no linear relationship.
Let's calculate and visualize the correlations between the champion attributes.

In [None]:
# Calculating the correlation matrix
# Declare a vairable correlation_matrix and calculate the correlation matrix
# For example: dataframe[array].corr()

# Plotting the heatmap for correlations using heatmap function in seaborns package (SNS)

## Data Preparation for Machine Learning

Before we can train a machine learning model, we need to prepare the data. This involves several steps to ensure that the data is in the right format and is suitable for modeling. In this section, we'll cover the following steps:

1. **Feature Engineering**: Creating new features or modifying existing ones to improve model performance.
2. **Encoding Categorical Variables**: Converting categorical variables into a format that can be provided to machine learning algorithms.
3. **Splitting the Data**: Dividing the data into training and testing sets to evaluate the model's performance.
4. **Feature Scaling**: Normalizing or standardizing the features so they have a similar scale.

Let's start with feature engineering.

In [None]:
champion_outcome_df = []

# FOR each row i in champion_df:
#     FOR each row j in champion_df starting from i+1:
#         SET champ1 TO row i of champion_df
#         SET champ2 TO row j of champion_df

#         CALCULATE attack_diff AS difference between attack_power of champ1 and champ2
#         CALCULATE defense_diff AS difference between defense_power of champ1 and champ2
#         CALCULATE health_diff AS difference between health of champ1 and champ2

#         IF attack_diff is positive AND health_diff is positive THEN:
#             SET outcome TO 1
#         ELSE:
#             SET outcome TO 0

#         ADD [attack_diff, defense_diff, health_diff, outcome] TO champion_outcome_list

# CREATE match_df FROM champion_outcome_list with columns 'attack_diff', 'defense_diff', 'health_diff', 'outcome'




In [None]:
# SET X TO all columns of match_df EXCEPT 'outcome'
# SET y TO 'outcome' column of match_df

# SPLIT X and y INTO X_train, X_test, y_train, y_test with 20% data as test and a fixed random seed

In [None]:
model = DecisionTreeClassifier()
model.fit(X_train, y_train)

In [None]:
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy * 100:.2f}%")

In [None]:
def predict_outcome(champ1_name, champ2_name):
    champ1 = df[df.name == champ1_name].iloc[0]
    champ2 = df[df.name == champ2_name].iloc[0]

    attack_diff = champ1.attack_power - champ2.attack_power
    defense_diff = champ1.defense_power - champ2.defense_power
    health_diff = champ1.health - champ2.health

    prediction = model.predict([[attack_diff, defense_diff, health_diff]])

    if prediction == 1:
        return champ1_name + " wins!"
    else:
        return champ2_name + " wins!"

print(predict_outcome('Aatrox', 'Ahri'))
