<a href="https://www.kaggle.com/code/mikedelong/scatter-plots-of-pokemon?scriptVersionId=145009681" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

In [None]:
import pandas as pd
df = pd.read_csv(filepath_or_buffer='/kaggle/input/data-of-1010-pokemons/pokemons.csv', index_col=['id'])
df.head()

In [None]:
df.nunique()

In [None]:
from plotly.express import bar
bar(data_frame=df['rank'].value_counts().to_frame().reset_index(), x='rank', y='count')

In [None]:
from plotly.express import scatter
scatter(data_frame=df, x='weight', y='height', color='type1', size='total', hover_name='name', opacity=0.5, log_x=True, log_y=True)

Our data is clustered near the origin, so using a log-log plot gives us better spread.

In [None]:
from plotly.express import imshow
imshow(img=df[['hp', 'atk', 'def' ,	'spatk', 'spdef', 'speed', 'total', 'height', 'weight']].corr())

It probably makes more sense to look at the correlations without the total column.

In [None]:
imshow(img=df[['hp', 'atk', 'def' ,	'spatk', 'spdef', 'speed', 'height', 'weight']].corr())

All the numerical factors are positively correlated, and height and weight are the most correlated factors.

In [None]:
df[['hp', 'atk', 'def' , 'spatk', 'spdef', 'speed', 'height', 'weight']].mean()

In [None]:
from plotly.express import histogram
histogram(data_frame=df, x=['hp', 'atk', 'def' , 'spatk', 'spdef', 'speed',], barmode='group', nbins=50)

These attributes have similar distributions.

In [None]:
histogram(data_frame=df, x=['height', 'weight',], barmode='overlay', histnorm='density', log_y=True)

Not surprisingly height and weight have very different distributions.

In [None]:
from sklearn.manifold import TSNE
tsne_df = df[['name', 'type1', 'total', 'hp']].copy()
tsne = TSNE(n_components=2, verbose=1, init='pca', random_state=2023, n_iter=500)
tsne_df[['x', 'y']] = tsne.fit_transform(X=df[['hp', 'atk', 'def' ,	'spatk', 'spdef', 'speed', 'height', 'weight']])
scatter(data_frame=tsne_df, x='x', y='y', hover_name='name', color='total', hover_data=['type1', 'total'])

Because all of our factors are positively correlated we're going to see our color distributed roughly this way regardless of which one we pick.

In [None]:
from sklearn.decomposition import PCA
pca_df = df[['name', 'type1', 'total', 'hp']].copy()
pca = PCA(n_components=3,  random_state=2023, )
pca_df[['pca1', 'pca2', 'pca3']] = pca.fit_transform(X=df[['hp', 'atk', 'def' ,	'spatk', 'spdef', 'speed', 'height', 'weight']])
scatter(data_frame=pca_df, x='pca1', y='pca2', hover_name='name', color='total', hover_data=['type1', 'total'], log_x=True).show()
scatter(data_frame=pca_df, x='pca2', y='pca3', hover_name='name', color='total', hover_data=['type1', 'total'], log_x=True,
       log_y=True).show()

In [None]:
from plotly.express import scatter_3d
scatter_3d(data_frame=pca_df, x='pca1', y='pca2', z='pca3', hover_name='name', color='total', hover_data=['type1', 'total'], log_x=True).show()


In [None]:
from matplotlib.pyplot import subplots
from matplotlib.pyplot import axis
from matplotlib.pyplot import imshow
from wordcloud import WordCloud
subplots(figsize=(12, 12))
imshow(X=WordCloud(random_state=2023, height=1200, width=1200 ).generate(text=' '.join(df['desc'].values.tolist())), )
axis('off')

In [None]:
subplots(figsize=(12, 12))
imshow(X=WordCloud(random_state=2023, height=1200, width=1200 ).generate(text=' '.join(df['abilities'].values.tolist())), )
axis('off')